# Problem 4 (optional)

Calculating weather anomalies for another location. In this optional task you get to start from scratch and download the data yourself from NOAA.

## What to do

1. Start by downloading your own data (daily summaries for years **1959-2018 August**) for **Sodankyla Lokka** (notice the place name should be without the letter `ä`), from the [NOAA Climate Data Online Search](https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND). Make sure to select the starting day (and ending day) in the date selection panel after changing the year! After you have searched, click “Add to cart” for a selected station, then go to the cart. Select the ``Custom GHCN-Daily Text`` format for the resulting output file and hit continue.

    - From the `Station Detail & Data Flag Options` choose two of the following attributes: Station Name, Geographic Location. **Notice:** Do **NOT** include data flags because it makes the data difficult to read. Use **Standard** units.
    - Take also Precipitation and Temperature which are under a separate button below. 
    - From the next page, add your own email address where the weather data will be sent after a short moment.

2. After you have downloaded the data. you should first,

    - Calculate the average temperature using columns `TMAX` and `TMIN` and insert those values into a new column called `TAVG`.

3. Next, you should use the approaches learned during this week and used in Problem 3 to answer / do the following:

    - Calculate the temperature anomalies in Sodankylä, i.e., the difference between `reference_temps` and the average temperature for each month (see Problem 3).
    - Calculate the monthly temperature differences between the Sodankylä and Helsinki stations
        - How different are the summer temperatures (June, July, August) between Helsinki (used in Problems 1-3) and Sodankylä?
        - What were the summer mean temperatures for both of these stations?
        - What were the summer standard deviations for both of these stations?
    - Calculate the monthly differences in a DataFrame and save it (as `CSV` file) into your own Exercise repository for this week
4. Upload your notebook and data to GitHub

In [1]:
import pandas as pd
data = pd.read_fwf('data/data.txt', colspecs=[(0,17),(17,68),(68,79),(79,90),(90,101),(101,110),(110,119),(119,128),(128,137)],skiprows=[1], na_values=[-9999])
data.shape

(20912, 9)

In [2]:
data.head()

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.0,,6.0
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0


In [3]:
data['TAVG']= 0.0
for idx, row in data.iterrows():
    minus_temp= (row['TMAX'] + row['TMIN']) / 2
    data.at[idx, 'TAVG'] = minus_temp

In [4]:
data.head()

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,TAVG
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0,
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.0,,6.0,
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0,
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0,
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0,


In [5]:
data.tail()

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,TAVG
20907,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20180827,0.04,55.0,43.0,49.0
20908,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20180828,0.0,59.0,31.0,45.0
20909,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20180829,0.0,65.0,32.0,48.5
20910,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20180830,0.02,65.0,48.0,56.5
20911,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,20180831,0.0,59.0,46.0,52.5


In [6]:
from temp_functions import fahr_to_celsius

In [7]:
data['temp_celsius'] = data['TAVG'].apply(fahr_to_celsius)

In [8]:
data.head()

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,TAVG,temp_celsius
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0,,
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.0,,6.0,,
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0,,
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0,,
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0,,


In [9]:
avg_temp_1969_sodankyla = data.loc[(data['DATE'] >= 19690601) & (data['DATE'] <= 19690831), 'TMAX'].mean()
round(avg_temp_1969_sodankyla,2)

60.66

In [10]:
data['DATE_STR']= data['DATE'].astype(str)
data['YEAR_MONTH']= data['DATE_STR'].str.slice(start=0, stop=6)
data.head()

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,TAVG,temp_celsius,DATE_STR,YEAR_MONTH
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0,,,19590101,195901
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.0,,6.0,,,19590102,195901
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0,,,19590103,195901
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0,,,19590104,195901
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0,,,19590105,195901


In [11]:
data['month'] = data['YEAR_MONTH'].str.slice(start=4, stop=6).astype(float)
data.head()

Unnamed: 0,STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,TAVG,temp_celsius,DATE_STR,YEAR_MONTH,month
0,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590101,0.03,,9.0,,,19590101,195901,1.0
1,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590102,0.0,,6.0,,,19590102,195901,1.0
2,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590103,0.02,,-9.0,,,19590103,195901,1.0
3,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590104,0.08,,10.0,,,19590104,195901,1.0
4,GHCND:FIE00146538,SODANKYLA LOKKA FI,240,67.8206,27.7503,19590105,0.09,,13.0,,,19590105,195901,1.0


In [12]:
monthly_data = data.groupby(['YEAR_MONTH'], as_index=False)['TAVG', 'TMAX', 'TMIN', 'temp_celsius', 'month'].mean()

data['ref_temp'] = data['temp_celsius']
reference_temp = data.groupby(['month'], as_index=False)['ref_temp'].mean()

  monthly_data = data.groupby(['YEAR_MONTH'], as_index=False)['TAVG', 'TMAX', 'TMIN', 'temp_celsius', 'month'].mean()


In [13]:
monthly_data = monthly_data.merge(reference_temp, on='month', how='outer')
monthly_data.tail()

Unnamed: 0,YEAR_MONTH,TAVG,TMAX,TMIN,temp_celsius,month,ref_temp
682,201312,14.451613,22.935484,5.967742,-9.749104,12.0,-12.353487
683,201412,14.741935,20.225806,9.258065,-9.587814,12.0,-12.353487
684,201512,12.983871,22.483871,3.483871,-10.564516,12.0,-12.353487
685,201612,13.33871,22.935484,3.741935,-10.367384,12.0,-12.353487
686,201712,12.387097,20.354839,4.419355,-10.896057,12.0,-12.353487


In [14]:
monthly_data['diff'] = monthly_data['temp_celsius'] - monthly_data['ref_temp']

In [15]:
monthly_data.sort_values(by='YEAR_MONTH', ascending=True)

Unnamed: 0,YEAR_MONTH,TAVG,TMAX,TMIN,temp_celsius,month,ref_temp,diff
0,195901,,,-5.000000,,1.0,-14.682860,
57,195902,,,12.000000,,2.0,-14.127360,
114,195903,,,15.000000,,3.0,-9.549372,
171,195904,,,10.833333,,4.0,-3.118221,
228,195905,41.467742,49.935484,33.000000,5.259857,5.0,3.883698,1.376158
...,...,...,...,...,...,...,...,...
227,201804,28.733333,37.633333,19.833333,-1.814815,4.0,-3.118221,1.303406
284,201805,46.822581,58.000000,35.645161,8.234767,5.0,3.883698,4.351069
342,201806,49.733333,57.433333,42.033333,9.851852,6.0,10.388250,-0.536398
400,201807,63.870968,76.709677,51.032258,17.706093,7.0,13.543869,4.162224


In [16]:
helsinki_data = pd.read_csv('helsinki_temp.csv')
del helsinki_data['Unnamed: 0']
helsinki_data.tail()

Unnamed: 0,YEAR_MONTH,TAVG,TMAX,TMIN,temp_celsius,month,ref_temp,diff
785,201212,20.064516,24.096774,15.548387,-6.630824,12.0,-3.203887,-3.426938
786,201312,34.451613,37.935484,30.16129,1.362007,12.0,-3.203887,4.565894
787,201412,29.935484,33.580645,23.83871,-1.146953,12.0,-3.203887,2.056933
788,201512,35.967742,40.225806,30.322581,2.204301,12.0,-3.203887,5.408188
789,201612,30.580645,34.548387,25.83871,-0.78853,12.0,-3.203887,2.415356


In [17]:
helsinki_sorted = helsinki_data.sort_values(by='YEAR_MONTH', ascending=True)

In [18]:
helsinki_sorted = helsinki_data.reset_index(drop=True)

In [None]:
avg_temp_helsinki = helsinki_data.loc[(data['DATE'] >= 196905) & (data['DATE'] <= 19690831), 'temp_celsius'].mean()
avg_temp_helsinki

In [None]:
sodankyla_data = monthly_data.sort_values(by='YEAR_MONTH', ascending=True)


In [None]:
sodankyla_data= sodankyla_data.reset_index(drop=True)

In [None]:
sodankyla_data.head()