<a href="https://colab.research.google.com/github/GelResende/Exercise-2/blob/master/Exercise-6-problem-4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem 4 (optional)

Calculating weather anomalies for another location. In this optional task you get to start from scratch and download the data yourself from NOAA.

## What to do

1. Start by downloading your own data (daily summaries for years **1959-2018 August**) for **Sodankyla Lokka** (notice the place name should be without `ä` letter), from the [NOAA Climate Data Online Search](https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND). Make sure to click on starting day (and ending day) in the date selection panel after changing year! After you have searched, click “Add to cart” for a selected station, then go to cart. Select the ``Custom GHCN-Daily Text`` format for the resulting output file and hit continue.

    - From the `Station Detail & Data Flag Options` choose two of the following attributes: Station Name, Geographic Location. **Notice:** Do **NOT** include data flags because it makes the data difficult to read. Use **Standard** units.
    - Take also Precipitation and Temperature which are under a separate button below. 
    - From the next page, add your own email address where the weather data will be sent after a short moment.

2. After you have downloaded the data. you should first,

    - Calculate the average temperature using columns `TMAX` and `TMIN` and insert those values into a new column called `TAVG`.

3. Next, you should use the approaches learned during this week and used in Problem 3 to answer / do the following:

    - Calculate the temperature anomalies in Sodankyla, i.e. the difference between `reference_temps` and the average temperature for each month (see Problem 3).
    - Calculate the monthly temperature differences between Sodankyla and Helsinki stations
        - How different are the summer temperatures (June, July, August) between Helsinki (used in Problems 1-3) and Sodankyla station?
        - What were the summer mean temperatures for both of these stations?
        - What were the summer standard deviations for both of these stations?
    - Calculate the monthly differences in a DataFrame and save it (as `CSV` file) into your own Exercise repository for this week
4. Upload your script and data to GitHub

In [41]:
def fahr_to_celsius(temp_fahrenheit):
    """Function to convert Fahrenheit temperature into Celsius.

    Parameters
    ----------

    temp_fahrenheit: int | float
        Input temperature in Fahrenheit (should be a number)
        
    Returns
    -------
    
    Temperature in Celsius (float)
    """

    # Convert the Fahrenheit into Celsius
    converted_temp = (temp_fahrenheit - 32) / 1.8
    
    return converted_temp

In [42]:
# Calling the pandas package as pd
import pandas as pd

# Defining relative path to the file
fp = r'data/2580888.txt'

# Read data using varying amount of spaces as separator and specifying -9999 characters as NoData values
# and skipping the second row
data = pd.read_csv(fp, delim_whitespace=True, na_values=['-9999'], skiprows=[1])

data['TAVG'] = data[['TMAX', 'TMIN']].mean(axis=1)
data['TAVG'] = fahr_to_celsius(data['TAVG'])

data['DATE'] = pd.to_datetime(data['DATE'], format= '%Y%m%d', exact= False)
data['YEAR'] = data['DATE'].dt.year
data['month'] = data['DATE'].dt.month

monthly_data = data.groupby(['YEAR','month'], as_index=False).mean()

reference_temps = data.groupby('month', as_index=False).mean()
reference_temps = pd.DataFrame(reference_temps[['month','TAVG']])
new_names = {'month': 'month', 'TAVG': 'ref_temp'}
reference_temps = reference_temps.rename(columns=new_names)

monthly_data = monthly_data.merge(reference_temps, on = 'month', how = 'outer' )

monthly_data['diff'] = abs(monthly_data['TAVG']-monthly_data['ref_temp'])

print(len(monthly_data))

print(monthly_data.head())

monthly_data[["TAVG", "ref_temp", "diff"]].describe()

monthly_Sodankyla = monthly_data.groupby('month', as_index=False).mean()
monthly_Sodankyla['SD'] = monthly_data.groupby('month', as_index=False).std()[['TAVG']]
print(monthly_Sodankyla)

max_Sodankyla = monthly_data.loc[monthly_data['diff'] == monthly_data['diff'].max(), 'month'].iloc[0]

248
     YEAR  month  ELEVATION  ...       TAVG   ref_temp      diff
0  1959.0    8.0        240  ...  11.424731  10.603505  0.821226
1  1960.0    8.0        240  ...  11.998208  10.603505  1.394703
2  1961.0    8.0        240  ...  10.752688  10.603505  0.149183
3  1962.0    8.0        240  ...   8.566308  10.603505  2.037197
4  1963.0    8.0        240  ...  11.362007  10.603505  0.758502

[5 rows x 11 columns]
    month         YEAR  ELEVATION  ...   ref_temp      diff        SD
0     1.0  1970.550000        240  ... -16.244176  2.945654  3.587734
1     2.0  1970.550000        240  ... -15.907931  2.697506  3.448854
2     3.0  1970.550000        240  ... -11.252240  3.265457  3.973790
3     4.0  1970.550000        240  ...  -4.109722  1.564352  1.874112
4     5.0  1970.550000        240  ...   3.323477  1.528495  2.238401
5     6.0  1970.761905        240  ...   9.913580  1.670782  2.073982
6     7.0  1970.761905        240  ...  13.016300  1.337950  1.773694
7     8.0  1970.227273 

In [43]:
# Calling the pandas package as pd
import pandas as pd

# Defining relative path to the file
fp = r'data/2580956.txt'

# Read data using varying amount of spaces as separator and specifying -9999 characters as NoData values
# and skipping the second row
data = pd.read_csv(fp, delim_whitespace=True, na_values=['-9999'], skiprows=[1])

data['TAVG'] = data[['TMAX', 'TMIN']].mean(axis=1)
data['TAVG'] = fahr_to_celsius(data['TAVG'])

data['DATE'] = pd.to_datetime(data['DATE'], format= '%Y%m%d', exact= False)
data['YEAR'] = data['DATE'].dt.year
data['month'] = data['DATE'].dt.month

monthly_data = data.groupby(['YEAR','month'], as_index=False).mean()

reference_temps = data.groupby('month', as_index=False).mean()
reference_temps = pd.DataFrame(reference_temps[['month','TAVG']])
new_names = {'month': 'month', 'TAVG': 'ref_temp'}
reference_temps = reference_temps.rename(columns=new_names)

monthly_data = monthly_data.merge(reference_temps, on = 'month', how = 'outer' )

monthly_data['diff'] = abs(monthly_data['TAVG']-monthly_data['ref_temp'])

print(len(monthly_data))

print(monthly_data.head())

monthly_data[["TAVG", "ref_temp", "diff"]].describe()

monthly_Helsinki = monthly_data.groupby('month', as_index=False).mean()
monthly_Helsinki['SD'] = monthly_data.groupby('month', as_index=False).std()[['TAVG']]
print(monthly_Helsinki)

max_Helsinki = monthly_data.loc[monthly_data['diff'] == monthly_data['diff'].max(), 'month'].iloc[0]

153
   YEAR  month  ELEVATION  LATITUDE  ...       TMIN      TAVG  ref_temp      diff
0  2005     12         24   60.2028  ...  20.526316 -4.093567 -0.469373  3.624194
1  2006     12         24   60.2028  ...  34.935484  3.494624 -0.469373  3.963997
2  2007     12         24   60.2028  ...  32.612903  1.980287 -0.469373  2.449660
3  2008     12         24   60.2028  ...  31.064516  0.931900 -0.469373  1.401273
4  2009     12         24   60.2028  ...  20.766667 -3.833333 -0.469373  3.363960

[5 rows x 11 columns]
    month    YEAR  ELEVATION  ...   ref_temp      diff        SD
0       1  2012.0       24.0  ...  -4.156198  2.417933  3.194134
1       2  2012.0       24.0  ...  -4.290039  3.533237  3.955479
2       3  2012.0       24.0  ...  -0.545191  2.321645  2.876519
3       4  2012.0       24.0  ...   5.133903  0.901381  1.116906
4       5  2012.0       24.0  ...  11.701130  1.190537  1.582917
5       6  2012.0       24.0  ...  15.302119  1.082873  1.335892
6       7  2012.0       24

In [45]:
comparing = monthly_Helsinki[['month','TAVG','SD']].merge(monthly_Sodankyla[['month','TAVG','SD']], 
                                                         on = 'month', how = 'outer' )
new_names = {'month': 'month', 'TAVG_x': 'TAVG_Helsinki', 'TAVG_y': 'TAVG_Sodankyla','SD_x': 'SD_Helsinki', 'SD_y': 'SD_Sodankyla'}
comparing = comparing.rename(columns=new_names)

comparing['diff'] =abs(comparing['TAVG_Helsinki']-comparing['TAVG_Sodankyla'])

comparing.to_csv('problem_4_output.csv')

comparing

Unnamed: 0,month,TAVG_Helsinki,SD_Helsinki,TAVG_Sodankyla,SD_Sodankyla,diff
0,1,-4.19923,3.194134,-16.244176,3.587734,12.044946
1,2,-4.307555,3.955479,-15.907345,3.448854,11.599789
2,3,-0.540805,2.876519,-11.25224,3.97379,10.711435
3,4,5.133903,1.116906,-4.109722,1.874112,9.243625
4,5,11.70113,1.582917,3.323477,2.238401,8.377654
5,6,15.303124,1.335892,9.91358,2.073982,5.389544
6,7,18.819088,1.861487,13.0163,1.773694,5.802789
7,8,17.490498,1.075956,10.602026,1.124511,6.888471
8,9,12.784669,1.071033,5.123898,1.403264,7.660771
9,10,6.679633,1.413434,-1.798515,2.757865,8.478148
