## Problem 3 - Data analysis (4 points)

In this problem we will explore our temperature data by comparing spring temperatures between Helsinki Kumpula and Rovaniemi. To do this we'll use some conditions to extract subsets of our data and then analyse these subsets using basic pandas functions. Please perform the tasks below by writing your code into the code cells in each section.

### Tips for completing this problem

- Use **exactly** the same variable names as in the instructions because your answers will be automatically graded, and the tests that grade your answers rely on following the same formatting or variable naming as in the instructions.
- **Please do not**:

    - **Change the file names**. Do all of your editing in the provided `Exercise-5-problem-3.ipynb` file (this file).
    - **Copy/paste cells in this notebook**. We use an automated grading system that will fail if there are copies of code cells.
    - **Change the existing cell types**. You can add cells, but changing the cell types for existing cells (from code to markdown, for example) will also cause the automated grader to fail. 

### Scores for this problem

**Your score on this problem will be based on following criteria:**

- Calculating the median temperatures for Helsinki Kumpula and Rovaniemi for the summer of 2017
- Selecting temperatures for May and June 2017 in separate dataframes for each location
- Printing out some summary values for each month (May, June) and location (Kumpula, Rovaniemi)
- Including comments that explain what most lines in the code do
- Answering a couple questions at the end of the problem
- Uploading your notebook and data files to your GitHub repository for this week's exercise

## AI tool usage agreement

**Enter your name (and that of your partner) in the cell below** to confirm that you have followed the [course guidelines on the use of AI tools](https://geo-python-site.readthedocs.io/en/latest/course-info/ai-tools.html) and understand that misuse of AI tools is considered cheating.

YOUR ANSWER HERE

### Part 1 (0 points)

First, you need to load the data from Problem 2.

- Read in the csv files generated in Problem 2 to the variables `kumpula` and `rovaniemi`

In [1]:
import pandas as pd
from data import * 
kumpula = pd.read_csv(
    'data/Kumpula_temps_May_Aug_2017.csv',
    sep=',',
    na_values=['*', '**', '***', '****', '*****', '******'],
    )
rovaniemi = pd.read_csv(
    'data/Rovaniemi_temps_May_Aug_2017.csv',
    sep=',',
    na_values=['*', '**', '***', '****', '*****', '******'],
    )

In [2]:
print(kumpula.head())
print("")
print(rovaniemi.head())

    USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius
0  29980  201705010600  44.0  44.0  35.0     6.67
1  29980  201705011800  48.0  54.0  43.0     8.89
2  29980  201705020600  50.0  50.0  34.0    10.00
3  29980  201705021800  53.0  61.0  49.0    11.67
4  29980  201705030600  47.0  53.0  38.0     8.33

    USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius
0  28450  201705010600  31.0  34.0  31.0    -0.56
1  28450  201705011800  42.0  44.0  30.0     5.56
2  28450  201705020600  41.0  42.0  35.0     5.00
3  28450  201705021800  37.0  45.0  36.0     2.78
4  28450  201705030600  37.0  37.0  33.0     2.78


### Part 2 (1 points)

Next you can find the *median temperatures* for the period the data covers.

- What was the median Celsius temperature during the observed period in:
    - Helsinki Kumpula? (store the answer in a variable `kumpula_median`)
    - Rovaniemi? (store the answer in a variable `rovaniemi_median`)

In [3]:
kumpula_median = kumpula['Celsius'].mean()
rovaniemi_median = rovaniemi['Celsius'].mean()

In [4]:
# Prints the median temperatures
print(f"Kumpula median: {kumpula_median}")
print(f"Rovaniemi median: {rovaniemi_median}")


Kumpula median: 14.15843621399177
Rovaniemi median: 10.706818181818182


### Part 3 (2 points)

The median temperatures above consider data from the entire summer (May-Aug), hence the differences might not be so clear. Let's now find the *mean temperatures* from May and June 2017 in Kumpula and Rovaniemi.

- From the `kumpula` and `rovaniemi` DataFrames, select the rows where values of the `YR--MODAHRMN` column are from May 2017
    - Assign these selected rows to the variables `kumpula_may` and `rovaniemi_may` (you can check the [hints](https://geo-python.github.io/site/lessons/L5/exercise-5.html) for help!)
- Repeat the procedure for the month of June and assign those values to variables to `kumpula_june` and `rovaniemi_june`

In [5]:
# Select the subset of the Kumpula and Rovaniemi data for the 5th and 6th month
def decode_date(date_string): # сразу сделал сортировку для всего
    year = str(date_string)[:4]
    month = str(date_string)[4:6]
    day = str(date_string)[6:8]
    time = f'{str(date_string)[8:10]}:{str(date_string)[10:]}'
    return year, month, day, time,
kumpula[['Year', 'Month', 'Day', 'TIME']] = kumpula['YR--MODAHRMN'].apply(lambda x: pd.Series(decode_date(x)))
rovaniemi[['Year', 'Month', 'Day', 'TIME']] = rovaniemi['YR--MODAHRMN'].apply(lambda x: pd.Series(decode_date(x)))

kumpula_may = kumpula[kumpula['Month'] == '05']
rovaniemi_may = rovaniemi[rovaniemi['Month'] == '05']

kumpula_june = kumpula[kumpula['Month'] == '06']
rovaniemi_june = rovaniemi[rovaniemi['Month'] == '06']

Check that the subsets look ok:

In [6]:
print(f"First values in May, Kumpula:\n{kumpula_may.head()}\n")
print(f"Last values in May, Kumpula:\n{kumpula_may.tail()}")

First values in May, Kumpula:
    USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
0  29980  201705010600  44.0  44.0  35.0     6.67  2017    05  01  06:00
1  29980  201705011800  48.0  54.0  43.0     8.89  2017    05  01  18:00
2  29980  201705020600  50.0  50.0  34.0    10.00  2017    05  02  06:00
3  29980  201705021800  53.0  61.0  49.0    11.67  2017    05  02  18:00
4  29980  201705030600  47.0  53.0  38.0     8.33  2017    05  03  06:00

Last values in May, Kumpula:
     USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
57  29980  201705291800  55.0  67.0  55.0    12.78  2017    05  29  18:00
58  29980  201705300600  52.0  55.0  40.0    11.11  2017    05  30  06:00
59  29980  201705301800  48.0  56.0  48.0     8.89  2017    05  30  18:00
60  29980  201705310600  49.0  49.0  47.0     9.44  2017    05  31  06:00
61  29980  201705311800  53.0  61.0  49.0    11.67  2017    05  31  18:00


In [7]:
print(f"First values in June, Kumpula:\n{kumpula_june.head()}\n")
print(f"Last values in June, Kumpula:\n{kumpula_june.tail()}")

First values in June, Kumpula:
     USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
62  29980  201706010600  48.0  53.0  40.0     8.89  2017    06  01  06:00
63  29980  201706011800  44.0  52.0  42.0     6.67  2017    06  01  18:00
64  29980  201706020600  44.0  44.0  36.0     6.67  2017    06  02  06:00
65  29980  201706021800  47.0  49.0  43.0     8.33  2017    06  02  18:00
66  29980  201706030600  48.0  48.0  38.0     8.89  2017    06  03  06:00

Last values in June, Kumpula:
      USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
116  29980  201706281800  60.0  68.0  60.0    15.56  2017    06  28  18:00
117  29980  201706290600  61.0  61.0  47.0    16.11  2017    06  29  06:00
118  29980  201706291800  65.0  69.0  61.0    18.33  2017    06  29  18:00
119  29980  201706300600  61.0  65.0  58.0    16.11  2017    06  30  06:00
120  29980  201706301800  65.0  68.0  61.0    18.33  2017    06  30  18:00


In [8]:
print(f"First values in May, Rovaniemi:\n{rovaniemi_may.head()}\n")
print(f"Last values in May, Rovaniemi:\n{rovaniemi_may.tail()}")

First values in May, Rovaniemi:
    USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
0  28450  201705010600  31.0  34.0  31.0    -0.56  2017    05  01  06:00
1  28450  201705011800  42.0  44.0  30.0     5.56  2017    05  01  18:00
2  28450  201705020600  41.0  42.0  35.0     5.00  2017    05  02  06:00
3  28450  201705021800  37.0  45.0  36.0     2.78  2017    05  02  18:00
4  28450  201705030600  37.0  37.0  33.0     2.78  2017    05  03  06:00

Last values in May, Rovaniemi:
     USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
57  28450  201705291800  42.0  44.0  39.0     5.56  2017    05  29  18:00
58  28450  201705300600  41.0  42.0  31.0     5.00  2017    05  30  06:00
59  28450  201705301800  46.0  49.0  40.0     7.78  2017    05  30  18:00
60  28450  201705310600  36.0  45.0  33.0     2.22  2017    05  31  06:00
61  28450  201705311800  42.0  46.0  36.0     5.56  2017    05  31  18:00


In [9]:
print(f"First values in June, Rovaniemi:\n{rovaniemi_june.head()}\n")
print(f"Last values in June, Rovaniemi:\n{rovaniemi_june.tail()}")

First values in June, Rovaniemi:
     USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
62  28450  201706010600  34.0  42.0  30.0     1.11  2017    06  01  06:00
63  28450  201706011800  36.0  44.0  34.0     2.22  2017    06  01  18:00
64  28450  201706020600  36.0  36.0  32.0     2.22  2017    06  02  06:00
65  28450  201706021800  38.0  43.0  36.0     3.33  2017    06  02  18:00
66  28450  201706030600  36.0  38.0  35.0     2.22  2017    06  03  06:00

Last values in June, Rovaniemi:
      USAF  YR--MODAHRMN  TEMP   MAX   MIN  Celsius  Year Month Day   TIME
116  28450  201706281800  55.0  59.0  48.0    12.78  2017    06  28  18:00
117  28450  201706290600  53.0  55.0  50.0    11.67  2017    06  29  06:00
118  28450  201706291800  58.0  65.0  53.0    14.44  2017    06  29  18:00
119  28450  201706300600  62.0  62.0  51.0    16.67  2017    06  30  06:00
120  28450  201706301800  67.0  71.0  61.0    19.44  2017    06  30  18:00


### Part 4 (1 point)

Now you can make your temperature data from both locations and months easier to compare by printing out a few useful values.

- Use the `print()` function to show the mean, min and max Celsius temperatures for both places in May and June using the new subset dataframes (`kumpula_may`, `rovaniemi_may`, `kumpula_june`, and `rovaniemi_june`).

In [10]:
print("Kumpula in May:")
print("Mean Temperature:", "%.2f" %kumpula_may['Celsius'].mean())
print("Min Temperature:", kumpula_may['Celsius'].min())
print("Max Temperature:", kumpula_may['Celsius'].max())

print("\nRovaniemi in May:")
print("Mean Temperature:", "%.2f" %rovaniemi_may['Celsius'].mean())
print("Min Temperature:", rovaniemi_may['Celsius'].min())
print("Max Temperature:", rovaniemi_may['Celsius'].max())

print("\nKumpula in June:")
print("Mean Temperature:", "%.2f" %kumpula_june['Celsius'].mean())
print("Min Temperature:", kumpula_june['Celsius'].min())
print("Max Temperature:", kumpula_june['Celsius'].max())

Kumpula in May:
Mean Temperature: 10.07
Min Temperature: 2.22
Max Temperature: 17.78

Rovaniemi in May:
Mean Temperature: 3.49
Min Temperature: -2.78
Max Temperature: 12.22

Kumpula in June:
Mean Temperature: 14.23
Min Temperature: 6.67
Max Temperature: 21.11


### Problem 3 summary

In the spirit of [knowledge discovery](http://researcher.ibm.com/researcher/view_group.php?id=144), let's briefly interpret the results of the data analysis in Problem 3. Please answer the following questions based on the data analysis results from this problem:

- Does there seem to be a large difference in temperatures between the months?
- Is Rovaniemi a much colder place than Kumpula?

Also, be sure to:

- Check that your code includes informative comments explaining what your code does
- Commit and push your changes to your GitHub repository for Exercise 5

YOUR ANSWER HERE

### On to Problem 4 (*optional*)

Now you can continue to the *optional* [Problem 4: Data analysis](Exercise-5-problem-4.ipynb)