# Problem 4 (optional)

This is an optional task for more advanced students. 

Download your own data (daily summaries for years **1959-2018 August**) for **Sodankyla Lokka** (notice the place name should be without `ä` letter), from [NOAA Climate Data Online Search](https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND).
Make sure to click on starting day (and ending day) in the date selection panel after changing year!
After you have searched, click “Add to cart” for a selected station, then go to cart. Select the ``Custom GHCN-Daily Text`` -format for the resulting output file and hit continue.

- From ``Station Detail & Data Flag Options`` choose two of the following attributes: Station Name, Geographic Location. **Notice:** Do **NOT** include data flags because it makes the data difficult to read. Use **Standard** units.

- Take also Precipitation and Temperature which are under a separate button below. 
- From the next page, add your own email address where the weather data will be sent after a short moment.

Write your codes into a separate `weather_comparisons.py` file.

After you have downloaded the data. you should first,

- Calculate the average temperature using columns `TMAX` and `TMIN` and insert those values into a new column called `TAVG`.

Next, you should use the approaches learned during this week and the same approaches as in Problem 3 to answer / do:

- Calculate the temperature anomalies in Sodankyla, i.e. the difference between referenceTemps and the average temperature for each month (see Problem 3). 
- Calculate the monthly temperature differences between Sodankyla and Helsinki stations
- How different the summer temperatures (June, July, August) have been between Helsinki (used in Problems 1-3) and Sodankyla station?
    - Calculate the monthly differences into a DataFrame and save it (as `CSV` file) into your own Exercise repository for this week
    - What were the summer mean temperatures for both of these stations?
    - What were the summer standard deviations for both of these stations?
- Upload your script and data to GitHub.

In [78]:
"""
The scripts calculate temperature difference between Sodankyla and Helsinki stations.
Calculates mena value and standard deviation for summer temperatures for both stations.

@author: Gustavo Colmenares
"""

import pandas as pd


# From monthly data of Helsinki station
from temperature_anomalies import monthlyData as monthlyDataHel, months_dict as months_dict

Index(['STATION', 'ELEVATION', 'LATITUDE', 'LONGITUDE', 'DATE', 'PRCP', 'TAVG',
       'TMAX', 'TMIN'],
      dtype='object')
(23716, 9)
             STATION  ELEVATION  LATITUDE  LONGITUDE      DATE  PRCP  TAVG  \
0  GHCND:FIE00142080         51   60.3269    24.9603  19520101  0.31  37.0   

   TMAX  TMIN  
0  39.0  34.0  
3308
365
23716
19520101
20171004
41.32408859270874
nan
0    195201
1    195201
2    195201
3    195201
4    195201
Name: YM, dtype: object

Data type of the column YM is:
<class 'str'>
YM               object
monthNumber      object
TAVG_Celsius    float64
avgTempsC       float64
month            object
Diff            float64
dtype: object

 avgTempsC      float64
monthNumber     object
month           object
dtype: object


In [59]:
# Read data file 
data = pd.read_csv(r'data/lokka.txt', sep='\s+', names=['STATION','STATION_NAME','STATION_NAME_1','STATION_NAME_2','ELEVATION','LATITUDE','LONGITUDE','DATE','PRCP','TMAX','TMIN'], skiprows=2, na_values= [-9999])


The raw data was printng 3 columns representing `'STATION_NAME'` column. For this reason, it as necessaary to define 2 more columns
when reading the data, you can see it in the next cell, `'STATION_NAME_1'` & `'STATION_NAME_2'`.

- We will add those columns into one single colum  called 'STATION_NAME'.
- Then, we are going to use the funtion `Drop()` to drop the column created when reading the data for first time

In [60]:
data.head()

Unnamed: 0,STATION,STATION_NAME,STATION_NAME_1,STATION_NAME_2,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN
0,GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590101,0.03,,9.0
1,GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590102,0.0,,6.0
2,GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590103,0.02,,-9.0
3,GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590104,0.08,,10.0
4,GHCND:FIE00146538,SODANKYLA,LOKKA,FI,240,67.8206,27.7503,19590105,0.09,,13.0


In [61]:
# Merge station names to single column
data['STATION_NAME']= data['STATION_NAME']+ ' '+ data['STATION_NAME_1']+ ' '+ data['STATION_NAME_2']

# Drop unnecessary columns
data = data.drop(['STATION_NAME_1', 'STATION_NAME_2'], axis=1)

In [62]:
data.head()

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.0,,6.0
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0


* Calculate the average temperature using columns `'TMAX'`, & `'TMIN'`.
* Then insert those values into a new column called `'TAVG'`

In [63]:
# Calculate average temoerature
data['TAVG']= data[['TMAX', 'TMIN']].mean(axis=1, skipna=False)

In [64]:
# Create function to convert temperatures from fahrenheit to celsius
def fahrenheitToCelsius(fahr_temp):
    convert_temp = (fahr_temp -32)/1.8
    return convert_temp

In [65]:
# Create year-month, month columns
data['YM'] = (data['DATE'].astype(str)).str.slice(start=0, stop= 6)
data['monthNumber'] = (data['DATE'].astype(str)).str.slice(start=4, stop= 6)

In [66]:
# Convert temperatura in Celsius 
data['TAVG_Celsius']= fahrenheitToCelsius(data['TAVG'])

In [67]:
# Create empty dataframe
monthlyData = pd.DataFrame()

__Group the data by year-month `'YM'` column__ 

In [68]:
grouped_month = data.groupby('YM')

__Aggregate the data__

In [69]:
for key, group in grouped_month:
    mean_value = group[['TAVG_Celsius']].mean()
    mean_value['YM']= key
    mean_value['monthNumber']= key[4:6]
    monthlyData = monthlyData.append(mean_value, ignore_index=True)

In [70]:
# Re-order columns
monthlyData = monthlyData[['YM', 'monthNumber', 'TAVG_Celsius']]

# print monthlyData sample
monthlyData.head()

Unnamed: 0,YM,monthNumber,TAVG_Celsius
0,195901,1,
1,195902,2,
2,195903,3,
3,195904,4,
4,195905,5,5.259857


In [71]:
# Create empty dataframe
reference_temp = pd.DataFrame()

__Group the data by month__

In [72]:
# Group by month
grouped_data = data.groupby('monthNumber')

__Aggregate the data__

In [75]:
# Iterate over the grouped_data
for key, group in grouped_data:
    row = group[['TAVG_Celsius']].mean()
    row['monthNumber']= key
    reference_temp = reference_temp.append(row, ignore_index=True)

In [76]:
# Rename the columns 
reference_temp = reference_temp.rename(columns={'TAVG_Celsius': 'avgTempC'})

In [79]:
# Merge with dictionary
reference_temp = reference_temp.merge(months_dict, how='left', on='monthNumber')

# Join monthlyData and reference_temp
monthlyData = monthlyData.merge(reference_temp, how='left', on='monthNumber', sort=False)

In [85]:
reference_temp.head()

Unnamed: 0,avgTempC,monthNumber,month
0,-14.645439,1,January
1,-14.055386,2,February
2,-9.510641,3,March
3,-3.091635,4,April
4,3.865184,5,May


__Compare Temperatures__

In [80]:
monthlyData['Diff ']= monthlyData['TAVG_Celsius'] - monthlyData['avgTempC']

In [82]:
# Merge monthly temperatures in Sodankula lokka and Helsinki
monthlyData = monthlyData.merge(monthlyDataHel, how='inner', on='YM', sort=False)


__Calculate Difference__

In [83]:
monthlyData['Diff_SodHel'] = monthlyData['TAVG_Celsius_x']- monthlyData['TAVG_Celsius_y']

In [84]:
monthlyData.head()

Unnamed: 0,YM,monthNumber_x,TAVG_Celsius_x,avgTempC,month_x,Diff,monthNumber_y,TAVG_Celsius_y,avgTempsC,month_y,Diff.1,Diff_SodHel
0,195901,1,,-14.645439,January,,1,-5.148148,-5.877342,January,0.729194,
1,195902,2,,-14.055386,February,,2,-2.361111,-6.990482,February,4.629371,
2,195903,3,,-9.510641,March,,3,0.322581,-3.84127,March,4.16385,
3,195904,4,,-3.091635,April,,4,3.87037,2.427875,April,1.442495,
4,195905,5,5.259857,3.865184,May,1.394673,5,9.695341,9.522613,May,0.172727,-4.435484


__Choose summer temperatures__
- Months are: `June`, `July`, `August`

In [87]:
summer = monthlyData.loc[(monthlyData['monthNumber_x'] == '06') | (monthlyData['monthNumber_x']== '07') | (monthlyData['monthNumber_x']== '08')]

In [88]:
summer.head()

Unnamed: 0,YM,monthNumber_x,TAVG_Celsius_x,avgTempC,month_x,Diff,monthNumber_y,TAVG_Celsius_y,avgTempsC,month_y,Diff.1,Diff_SodHel
5,195906,6,11.157407,10.437809,June,0.719599,6,15.074074,14.711898,June,0.362176,-3.916667
6,195907,7,12.75,13.529078,July,-0.779078,7,17.831541,16.498881,July,1.33266,-5.081541
7,195908,8,11.424731,10.999253,August,0.425478,8,16.965812,15.022075,August,1.943737,-5.541081
17,196006,6,11.185185,10.437809,June,0.747377,6,16.277778,14.711898,June,1.56588,-5.092593
18,196007,7,16.962366,13.529078,July,3.433288,7,18.065134,16.498881,July,1.566253,-1.102769


__Calculate the `mean()` temperature for Sodankyla Lokka station__

In [95]:
meanSummerSod = summer['TAVG_Celsius_x'].mean()

meanSummerSod

11.60569808631495

__Calculate the `mean()` temperature for Helsinki station__

In [96]:
meanSummerHel = summer['TAVG_Celsius_y'].mean()

meanSummerHel

15.932024853245196

__Calculate the `std()`  <i>standard deviation</i> summer temperature for Sodankyla Lokka station__

In [98]:
stdSummerSod = summer['TAVG_Celsius_x'].std()

stdSummerSod

2.0697287769514627

__Calculate the `std()`  <i>standard deviation</i> summer temperature for Helsinki station__

In [99]:
stdSummerHel = summer['TAVG_Celsius_y'].std()

stdSummerHel

1.9683521407209792

In [100]:
grouped_month.head(10)

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,TAVG,YM,monthNumber,TAVG_Celsius
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0,,195901,01,
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.00,,6.0,,195901,01,
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0,,195901,01,
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0,,195901,01,
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0,,195901,01,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
21617,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20200806,0.00,69.0,50.0,59.5,202008,08,15.277778
21618,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20200807,0.00,69.0,51.0,60.0,202008,08,15.555556
21619,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20200808,0.00,73.0,42.0,57.5,202008,08,14.166667
21620,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20200809,0.01,71.0,54.0,62.5,202008,08,16.944444


In [103]:
grouped_data.head(10)

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,TAVG,YM,monthNumber,TAVG_Celsius
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0,,195901,01,
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.00,,6.0,,195901,01,
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0,,195901,01,
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0,,195901,01,
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0,,195901,01,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
339,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19591206,0.00,-2.0,-24.0,-13.0,195912,12,-25.000000
340,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19591207,0.00,9.0,-3.0,3.0,195912,12,-16.111111
341,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19591208,0.00,4.0,-8.0,-2.0,195912,12,-18.888889
342,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19591209,0.00,4.0,-20.0,-8.0,195912,12,-22.222222
