## Problem 4 (*optional*) - Parsing daily temperatures

**This is an optional task for those who want more practice.** This problem is more challenging as we provide only minimal instructions for completing the given tasks. You will need to search Pandas documentation (and other resources) for help. We will cover data aggregation in more detail during week 6 lesson, so this is a good opportunity to get a head start for next week!

In this problem, the aim is to aggregate the hourly temperature data for Helsinki Kumpula and Rovaniemi weather stations to a daily level. Currently, there are (at most) 3 measurements per hour in the data as you can see from the `YR--MODAHRMN` column (Year-Month-Day-Hour-Minute in Greenwich Mean Time (GMT):

```

    USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  28450  201705010000  31.0  NaN  NaN       -1
1  28450  201705010020  30.0  NaN  NaN       -1
2  28450  201705010050  30.0  NaN  NaN       -1
3  28450  201705010100  31.0  NaN  NaN       -1
4  28450  201705010120  30.0  NaN  NaN       -1

```

The output should contain mean, max and min Celsius temperatures for each day (for example, one mean temperature value for 1st of May and so on).


### What to do

- Your task is to summarize the information for each day by aggregating (grouping) the dataframe.
- The output should be a new DataFrame where you have calculated mean, max and min Celsius temperatures for each day separately based on hourly values.
- Repeat the task for the two data sets you created in Problem 2 (May-August temperatures from Rovaniemi and Kumpula).

Don't forget to:

- Include useful comments in your code
- Push your solution to GitHub

### Hint

You can find help from the [Pandas Official documentation](https://pandas.pydata.org/pandas-docs/stable/) and Google. Don't hestiate to ask for tips in Slack!

In [4]:
# YOUR CODE HERE
# raise NotImplementedError()

#0. import pandas library
import pandas as pd

#1. Reading separate files for each station
path_kumpula = r'C:/Users/David/Documents/Notebooks/Helsinski_course/Ejercicios/Github/exercise-5-dvdov-david/data/Kumpula_temps_May_Aug_2017.csv'
path_rovaniemi = r'C:/Users/David/Documents/Notebooks/Helsinski_course/Ejercicios/Github/exercise-5-dvdov-david/data/Rovaniemi_temps_May_Aug_2017.csv'
kumpula = pd.read_csv(path_kumpula, sep = ',')
rovaniemi = pd.read_csv(path_rovaniemi, sep = ',')

In [7]:
#2. Checking data

print('Kumpula data: \n:',kumpula.head())
print('')
print('Rovaniemi data\n:',rovaniemi.head())

Kumpula data: 
:     USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  29980  201705010000  37.0  NaN  NaN        3
1  29980  201705010100  37.0  NaN  NaN        3
2  29980  201705010200  37.0  NaN  NaN        3
3  29980  201705010300  37.0  NaN  NaN        3
4  29980  201705010400  39.0  NaN  NaN        4

Rovaniemi data
:     USAF  YR--MODAHRMN  TEMP  MAX  MIN  Celsius
0  28450  201705010000  31.0  NaN  NaN       -1
1  28450  201705010020  30.0  NaN  NaN       -1
2  28450  201705010050  30.0  NaN  NaN       -1
3  28450  201705010100  31.0  NaN  NaN       -1
4  28450  201705010120  30.0  NaN  NaN       -1


In [51]:
#3. Grouping celsius temperature by day 

#kumpula.groupby(by='Celsius').describe() -- just testing: interesting results

# 3.1 create a new field with day as string that will be used as groupby column
kumpula['day'] = kumpula['YR--MODAHRMN'].astype(str)
kumpula['day'] = kumpula['day'].str.slice(0,8,1)

rovaniemi['day'] = rovaniemi['YR--MODAHRMN'].astype(str)
rovaniemi['day'] = rovaniemi['day'].str.slice(0,8,1)

#checking results
print('kumpala day field:\n',kumpula['day'],'\n')
print('rovaniemi day field:\n',rovaniemi['day'],'\n')


kumpala day field:
 0       20170501
1       20170501
2       20170501
3       20170501
4       20170501
          ...   
2919    20170831
2920    20170831
2921    20170831
2922    20170831
2923    20170831
Name: day, Length: 2924, dtype: object 

rovaniemi day field:
 0       20170501
1       20170501
2       20170501
3       20170501
4       20170501
          ...   
8762    20170831
8763    20170831
8764    20170831
8765    20170831
8766    20170831
Name: day, Length: 8767, dtype: object 



In [54]:
#3.2 selecting summary statistics required from aggregated column
kumpula_aggreg = kumpula.groupby(by='day')['Celsius'].describe()[['mean','min','max']]
rovaniemi_aggreg = rovaniemi.groupby(by='day')['Celsius'].describe()[['mean','min','max']]

In [55]:
#cheking results

print('mean, min and max celsius temperature for every day in kumpala station: \n',kumpula_aggreg)

print('mean, min and max celsius temperature for every day in kumpala station: \n',rovaniemi_aggreg )

mean, min and max celsius temperature for every day in kumpala station: 
                mean   min   max
day                            
20170501   7.625000   3.0  12.0
20170502   9.750000   2.0  16.0
20170503   9.208333   4.0  13.0
20170504   6.666667   3.0  11.0
20170505  10.250000   2.0  17.0
...             ...   ...   ...
20170827  10.625000   6.0  14.0
20170828  11.826087   9.0  16.0
20170829  14.500000   8.0  17.0
20170830  16.833333  15.0  19.0
20170831  17.250000  16.0  19.0

[123 rows x 3 columns]
mean, min and max celsius temperature for every day in kumpala station: 
                mean  min   max
day                           
20170501   2.180556 -1.0   7.0
20170502   3.402778  1.0   7.0
20170503   2.112676 -1.0   4.0
20170504   4.388889 -1.0   9.0
20170505   6.916667  1.0  12.0
...             ...  ...   ...
20170827   7.690141  5.0  10.0
20170828   9.138889  3.0  13.0
20170829  10.722222  8.0  12.0
20170830  11.291667  9.0  14.0
20170831  12.000000  8.0  17.0

[123 row