# Problem 4 (optional)

Calculating weather anomalies for another location. In this optional task you get to start from scratch and download the data yourself from NOAA.

## What to do

1. Start by downloading your own data (daily summaries for years **1959-2018 August**) for **Sodankyla Lokka** (notice the place name should be without the letter `ä`), from the [NOAA Climate Data Online Search](https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND). Make sure to select the starting day (and ending day) in the date selection panel after changing the year! After you have searched, click “Add to cart” for a selected station, then go to the cart. Select the ``Custom GHCN-Daily Text`` format for the resulting output file and hit continue.

    - From the `Station Detail & Data Flag Options` choose two of the following attributes: Station Name, Geographic Location. **Notice:** Do **NOT** include data flags because it makes the data difficult to read. Use **Standard** units.
    - Take also Precipitation and Temperature which are under a separate button below. 
    - From the next page, add your own email address where the weather data will be sent after a short moment.

2. After you have downloaded the data. you should first,

    - Calculate the average temperature using columns `TMAX` and `TMIN` and insert those values into a new column called `TAVG`.

3. Next, you should use the approaches learned during this week and used in Problem 3 to answer / do the following:

    - Calculate the temperature anomalies in Sodankylä, i.e., the difference between `reference_temps` and the average temperature for each month (see Problem 3).
    - Calculate the monthly temperature differences between the Sodankylä and Helsinki stations
        - How different are the summer temperatures (June, July, August) between Helsinki (used in Problems 1-3) and Sodankylä?
        - What were the summer mean temperatures for both of these stations?
        - What were the summer standard deviations for both of these stations?
    - Calculate the monthly differences in a DataFrame and save it (as `CSV` file) into your own Exercise repository for this week
4. Upload your notebook and data to GitHub

In [3]:
import pandas as pd

In [64]:
# reading data
data_sodankyla = pd.read_csv('data/3332584.txt',skiprows=[1],na_values=[-9999],delim_whitespace=True)

In [65]:
# taking important columns from raw dataframe and renaming some columns names for better understading
data_sodankyla=data_sodankyla.reset_index()[['level_0','STATION','ELEVATION','LATITUDE','LONGITUDE','DATE','PRCP','TMAX','TMIN']].rename(columns={'level_0':'Station'})

In [71]:
# Calculating the average temperature using columns `TMAX` and `TMIN` and insert those values into a new column called `TAVG`
data_sodankyla['TAVG']=(data_sodankyla['TMAX']+data_sodankyla['TMIN'])/2

In [72]:
# Slice the DATE COLUMN string for extracting year and month column
data_sodankyla["YEAR"] = data_sodankyla["DATE"].astype('str').str.slice(start=0, stop=4)
data_sodankyla["MONTH"] = data_sodankyla["DATE"].astype('str').str.slice(start=4, stop=6)
# Let's see what we have
data_sodankyla.tail()

Unnamed: 0,Station,STATION,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TMAX,TMIN,YEAR,MONTH,TAVG
22580,GHCND:FIE00146538,LOKKA,240,67.8206,27.7503,20230327,0.0,23.0,-21.0,2023,3,1.0
22581,GHCND:FIE00146538,LOKKA,240,67.8206,27.7503,20230328,0.0,20.0,-19.0,2023,3,0.5
22582,GHCND:FIE00146538,LOKKA,240,67.8206,27.7503,20230329,0.2,22.0,-7.0,2023,3,7.5
22583,GHCND:FIE00146538,LOKKA,240,67.8206,27.7503,20230330,0.02,24.0,12.0,2023,3,18.0
22584,GHCND:FIE00146538,LOKKA,240,67.8206,27.7503,20230331,0.0,26.0,5.0,2023,3,15.5


In [73]:
def fahr_to_celsius(temp_fahrenheit):
    """Function to convert Fahrenheit temperature into Celsius.

    """

    # Convert the Fahrenheit into Celsius
    converted_temp = (temp_fahrenheit - 32) / 1.8

    return converted_temp

In [74]:
# applying fahr_to_celsius function on TAVG column for converting temp into celsisus and storing the values in celsius column
data_sodankyla['celsius']=data_sodankyla['TAVG'].apply(fahr_to_celsius)

In [90]:
# performing groupby on year and month columns and applying mean on celsisus column
monthly_mean_temp_sodankyla=data_sodankyla.groupby(['YEAR','MONTH'])['celsius'].agg(['mean'])

In [91]:
# creating new dataframe named monthly_data and storing mean temperature year and month in temp_celsius column in monthly_data dataframe
monthly_data_sodankyla=pd.DataFrame()
monthly_data_sodankyla['temp_celsius']=monthly_mean_temp_sodankyla['mean']

In [92]:
monthly_data_sodankyla=monthly_data_sodankyla.reset_index()

In [94]:
# # creating new data frame named ref_temp by filtering data frame based on year  and performing group by on month to create data frame mention in above picture
ref_temp_sodankyla=pd.DataFrame({'ref_temp_sodankyla':data_sodankyla[(data_sodankyla['YEAR'].astype('int') >=  1952) & (data_sodankyla['YEAR'].astype('int') <= 1980)].groupby('MONTH')['celsius'].mean()})

In [95]:
ref_temp_sodankyla=ref_temp_sodankyla.reset_index()

In [100]:
# now ref_temp contain mean temperature for months from 1952 to 1980 period.
ref_temp_sodankyla

Unnamed: 0,MONTH,ref_temp
0,1,-16.153425
1,2,-16.216231
2,3,-11.184289
3,4,-4.104938
4,5,3.423411
5,6,10.291667
6,7,12.93529
7,8,10.635753
8,9,5.119444
9,10,-1.918459


In [96]:
# merging ref_temp and monthly_data dataframe based on month column
monthly_data_sodankyla = monthly_data_sodankyla.merge(ref_temp_sodankyla, on='MONTH')

In [97]:
# Check the monthly data:
monthly_data_sodankyla.head()

Unnamed: 0,YEAR,MONTH,temp_celsius,ref_temp
0,1959,1,,-16.153425
1,1960,1,-19.121864,-16.153425
2,1961,1,-11.182796,-16.153425
3,1962,1,-15.421147,-16.153425
4,1963,1,-18.145161,-16.153425


In [98]:
# in mothly_data dataframe creating new column diff by taking difference of temp_celsius and ref_temp
monthly_data_sodankyla['diff']=monthly_data_sodankyla['temp_celsius']-monthly_data_sodankyla['ref_temp']

In [99]:
# Print out desriptive statistics for the relevant columns:
monthly_data_sodankyla[["temp_celsius", "ref_temp", "diff"]].describe()

Unnamed: 0,temp_celsius,ref_temp,diff
count,738.0,742.0,738.0
mean,-1.331566,-2.364368,0.981039
std,10.218366,10.280315,2.931099
min,-24.543651,-16.216231,-8.428704
25%,-9.997685,-12.656623,-0.745139
50%,-1.521207,-3.011699,0.882572
75%,8.227001,8.998611,2.686014
max,17.706093,12.93529,12.158691


In [103]:
# calculating largest temperature anomaly during the observed time period
anomaly_temp_sodankyla=monthly_data_sodankyla.reindex(monthly_data_sodankyla['diff'].abs().sort_values(ascending=False).index)

In [104]:
anomaly_temp_sodankyla.head(1)

Unnamed: 0,YEAR,MONTH,temp_celsius,ref_temp,diff
90,1990,2,-4.05754,-16.216231,12.158691


In [105]:
# Print the month with the largest temperature anomaly
print(f'Month with the largest temperature anomaly:{anomaly_temp.iloc[0,1]}')

Month with the largest temperature anomaly:02


In [106]:
# YOUR CODE HERE
print(f"The largest positive temperature anomaly during the observed time period was {monthly_data_sodankyla['diff'].max():.2f} degrees Celsius.")
print(f"The largest negative temperature anomaly during the observed time period was {monthly_data_sodankyla['diff'].min():.2f} degrees Celsius.")

The largest positive temperature anomaly during the observed time period was 12.16 degrees Celsius.
The largest negative temperature anomaly during the observed time period was -8.43 degrees Celsius.


## How different are the summer temperatures (June, July, August) between Helsinki (used in Problems 1-3) and Sodankylä?

In [145]:
# preparing dataframe for sodankyla summer months
summer_temp_sodankyla=pd.DataFrame({'ref_temp_sodankyla':data_sodankyla[(data['YEAR'].astype('int') >=  1952) & (data_sodankyla['YEAR'].astype('int') <= 1980)].groupby('MONTH')['celsius'].mean()})

In [146]:
# june ,july ,august month average temperature sodankyla
summer_temp_sodankyla=summer_temp_sodankyla.reset_index().iloc[[5,6,7]]

In [147]:
summer_temp_sodankyla

Unnamed: 0,MONTH,ref_temp_sodankyla
5,6,10.291667
6,7,12.93529
7,8,10.635753


In [148]:
#reading data
data_helsinki = pd.read_csv('data/1091402.txt',skiprows=[1],na_values=[-9999],delim_whitespace=True)

In [149]:
# Slice the DATE COLUMN string for extracting year and month column
data_helsinki["YEAR"] = data_helsinki["DATE"].astype('str').str.slice(start=0, stop=4)
data_helsinki["MONTH"] = data_helsinki["DATE"].astype('str').str.slice(start=4, stop=6)
# Let's see what we have
data_helsinki.tail()

Unnamed: 0,STATION,ELEVATION,LATITUDE,LONGITUDE,DATE,PRCP,TAVG,TMAX,TMIN,YEAR,MONTH
23711,GHCND:FIE00142080,51,60.3269,24.9603,20170930,,47.0,49.0,44.0,2017,9
23712,GHCND:FIE00142080,51,60.3269,24.9603,20171001,0.04,47.0,48.0,45.0,2017,10
23713,GHCND:FIE00142080,51,60.3269,24.9603,20171002,,47.0,49.0,46.0,2017,10
23714,GHCND:FIE00142080,51,60.3269,24.9603,20171003,0.94,47.0,,44.0,2017,10
23715,GHCND:FIE00142080,51,60.3269,24.9603,20171004,0.51,52.0,56.0,,2017,10


In [150]:
# applying fahr_to_celsius function on TAVG column for converting temp into celsisus and storing the values in celsius column
data_helsinki['celsius']=data_helsinki['TAVG'].apply(fahr_to_celsius)

In [175]:
# preparing dataframe for helsinki summer months
ref_temp_helsinki=pd.DataFrame({'ref_temp_helsinki':data_helsinki[(data_helsinki['YEAR'].astype('int') >=  1952) & (data_helsinki['YEAR'].astype('int') <= 1980)].groupby('MONTH')['celsius'].mean()})

In [176]:
# june ,july ,august month average temperature helsinki
summer_temp_helsinki=ref_temp_helsinki.reset_index().iloc[[5,6,7]]

In [153]:
#summer temperature in helsinki
summer_temp_helsinki

Unnamed: 0,MONTH,ref_temp_helsinki
5,6,14.711898
6,7,16.498881
7,8,15.022075


In [154]:
# creating dataframe for summer statistics
summer_stats=pd.DataFrame()

In [155]:
# merging summer_temp_helsinki and summer_temp_sodankyla on basis of month
summer_stats = summer_temp_helsinki.merge(summer_temp_sodankyla, on='MONTH')

In [156]:
# calculating monthly temperature difference
summer_stats['diff_summer']=summer_stats['ref_temp_helsinki']-summer_stats['ref_temp_sodankyla']

In [157]:
summer_stats

Unnamed: 0,MONTH,ref_temp_helsinki,ref_temp_sodankyla,diff_summer
0,6,14.711898,10.291667,4.420231
1,7,16.498881,12.93529,3.563592
2,8,15.022075,10.635753,4.386322


- What were the summer mean temperatures for both of these stations?
        - What were the summer standard deviations for both of these stations?

In [164]:
print(f'mean and standard deviation in temperature in helsinki respectively {summer_stats.ref_temp_helsinki.mean()} and {summer_stats.ref_temp_helsinki.std()}')

mean and standard deviation in temperature in helsinki respectively 15.410951408467128 and 0.9548540692973404


In [165]:
print(f'mean and standard deviation in temperature in sodankyla respectively {summer_stats.ref_temp_sodankyla.mean()} and {summer_stats.ref_temp_sodankyla.std()}')

mean and standard deviation in temperature in sodankyla respectively 11.287569749644376 and 1.437301309179359


## Monthly difference

In [200]:
# monthly temperature of helsinki
ref_temp_helsinki=ref_temp_helsinki.reset_index()

In [201]:
# creating new dataframe monthly diff 
monthly_diff_temp=pd.DataFrame()
# concating both ref_temp dataframe
monthly_diff_temp= pd.concat([ref_temp_helsinki,ref_temp_sodankyla],axis=1)

In [204]:
#ccaluclating monthly diffencein helsinki and sodankyla
monthly_diff_temp['monthly_diff']=monthly_diff_temp['ref_temp_helsinki']-monthly_diff_temp['ref_temp']

In [210]:
# saving this difference in csv
monthly_diff_temp['monthly_diff'].to_csv('data/monthly_difference_helsinki_sodankyla.csv',index=False,float_format='%.2f')