# Variance in Weather

You are planning a trip to Basel and want to get a sense of the best time of the year to visit. Luckily, you got your hands on a dataset starting from 2015 that contains over 52,000 data points about weather conditions in Basel. Surely, with this much information, you can discover something useful about when to make your trip!

In [1]:
import pandas as pd
import numpy as np

basel_data = pd.read_csv('basel_weather_dataexport_20210212T141159.csv', skiprows=9, delimiter=",")
basel_data['timestamp'] = pd.to_datetime(basel_data.timestamp)

## Explore the Data

1. All of the weather data is stored in a variable named london_data.

    Print the first few rows of the dataset by calling print(london_data.head()).

    Take a look at the browser to see the columns of this dataset. Here are two questions to ask yourself:

    * How often were measurements taken?
    * Which columns might be the most useful when thinking about planning a trip.

In [2]:
basel_data.head()

Unnamed: 0,timestamp,Basel Temperature [2 m elevation corrected],Basel Relative Humidity [2 m],Basel Wind Speed [10 m],Basel Wind Direction [10 m],Basel Wind Speed [500 mb],Basel Wind Direction [500 mb],Basel Mean Sea Level Pressure [MSL]
0,2015-02-01 00:00:00,0.670529,78.0,16.299694,226.78992,53.630333,276.55255,986.7
1,2015-02-01 01:00:00,2.120528,72.0,17.610588,229.14462,53.050858,262.19983,986.3
2,2015-02-01 02:00:00,1.090529,91.0,30.070238,238.21275,50.81486,277.32642,986.4
3,2015-02-01 03:00:00,1.020529,89.0,31.667723,261.50146,60.05529,295.95065,986.1
4,2015-02-01 04:00:00,0.430529,90.0,31.394394,266.0548,70.2987,302.87628,986.7


In [3]:
basel_data.tail()

Unnamed: 0,timestamp,Basel Temperature [2 m elevation corrected],Basel Relative Humidity [2 m],Basel Wind Speed [10 m],Basel Wind Direction [10 m],Basel Wind Speed [500 mb],Basel Wind Direction [500 mb],Basel Mean Sea Level Pressure [MSL]
52891,2021-02-12 19:00:00,-1.619471,60.0,29.381382,97.03793,66.79431,309.09387,1025.0
52892,2021-02-12 20:00:00,-2.839471,60.0,32.283787,97.04577,67.07409,308.89996,1025.3
52893,2021-02-12 21:00:00,-3.459471,60.0,35.543507,96.98105,67.63594,308.5169,1026.5
52894,2021-02-12 22:00:00,-4.029471,61.0,36.973244,96.70984,66.00873,309.24542,1026.8
52895,2021-02-12 23:00:00,-4.479472,62.0,36.93291,96.15517,65.835724,311.0091,1027.5


2. Let us also take a look at how many data points we have. Print len(basel_data)

In [4]:
len(basel_data)

52896

In [5]:
basel_data["Year"] = basel_data.timestamp.dt.year
basel_data_full_years = basel_data[basel_data["Year"] < 2021]
basel_data_full_years

Unnamed: 0,timestamp,Basel Temperature [2 m elevation corrected],Basel Relative Humidity [2 m],Basel Wind Speed [10 m],Basel Wind Direction [10 m],Basel Wind Speed [500 mb],Basel Wind Direction [500 mb],Basel Mean Sea Level Pressure [MSL],Year
0,2015-02-01 00:00:00,0.670529,78.0,16.299694,226.78992,53.630333,276.55255,986.7,2015
1,2015-02-01 01:00:00,2.120528,72.0,17.610588,229.14462,53.050858,262.19983,986.3,2015
2,2015-02-01 02:00:00,1.090529,91.0,30.070238,238.21275,50.814860,277.32642,986.4,2015
3,2015-02-01 03:00:00,1.020529,89.0,31.667723,261.50146,60.055290,295.95065,986.1,2015
4,2015-02-01 04:00:00,0.430529,90.0,31.394394,266.05480,70.298700,302.87628,986.7,2015
...,...,...,...,...,...,...,...,...,...
51859,2020-12-31 19:00:00,4.990529,74.0,3.259938,186.34020,92.671860,252.12814,1001.1,2020
51860,2020-12-31 20:00:00,4.360529,84.0,10.587917,252.18112,87.935110,255.05347,1001.5,2020
51861,2020-12-31 21:00:00,3.820529,88.0,8.435069,219.80557,78.100090,244.61615,1002.5,2020
51862,2020-12-31 22:00:00,3.170529,89.0,6.792466,212.00539,84.240770,241.68266,1002.8,2020


## Looking At Temperature

3. Now that we have seen what the data looks like, let us dive into one of the more promising columns — "Basel Temperature". This column stores the temperature in Celsius.

    To get a single column from a DataFrame, you can use this syntax:

        one_column = basel_data["column_name"]

    Create a variable named `temp` and set it equal to the "Basel Temperature" column of `basel_data`.

In [6]:
temps = basel_data_full_years.groupby("Year")["Basel Temperature [2 m elevation corrected]"].mean().reset_index()
temps.rename(columns = {"Year": "year", "Basel Temperature [2 m elevation corrected]": "avg_temp"}, inplace=True)
temps

Unnamed: 0,year,avg_temp
0,2015,13.435288
1,2016,11.954561
2,2017,12.38527
3,2018,13.408479
4,2019,12.695016
5,2020,13.111039


In [7]:
average_temp = float(temps.avg_temp[temps.year == 2015])

average_temp

13.435288213615856

5. Calculate the variance of the temperature column and store the results in the variable `temperature_var`. Print the results.

In [8]:
variances = basel_data_full_years.groupby("Year")["Basel Temperature [2 m elevation corrected]"].var().reset_index()
variances.rename(columns = {"Year": "year", "Basel Temperature [2 m elevation corrected]": "temp_var"}, inplace=True)
variances

Unnamed: 0,year,temp_var
0,2015,55.440864
1,2016,54.30404
2,2017,65.461749
3,2018,68.60755
4,2019,62.055132
5,2020,54.22504


In [9]:
temperature_var = float(variances.temp_var[variances.year == 2015])

temperature_var

55.44086390079679

6. Calculate the standard deviation of the temperature column and store a variable named `temperature_standard_deviation`. Print this variable.

    How would the variance and standard deviation help you plan a trip?

In [10]:
standard_deviations = basel_data_full_years.groupby("Year")["Basel Temperature [2 m elevation corrected]"].std().reset_index()
standard_deviations.rename(columns = {"Year": "year", "Basel Temperature [2 m elevation corrected]": "temp_std"}, inplace=True)
standard_deviations

Unnamed: 0,year,temp_std
0,2015,7.445862
1,2016,7.369128
2,2017,8.090844
3,2018,8.282967
4,2019,7.877508
5,2020,7.363765


In [11]:
temperature_standard_deviation = float(standard_deviations.temp_std[standard_deviations.year == 2015])

temperature_standard_deviation

7.445862199960243

## Filtering By Month

7. The statistics we just calculated are not very helpful when trying to plan a vacation since they describe the weather throughout an entire year.

    If we could find a way to use the rows from only a certain month, that might help us find the best month to plan our trip.

In [12]:
basel_data_full_years['Month'] = basel_data.timestamp.dt.month
basel_data_full_years.head()

Unnamed: 0,timestamp,Basel Temperature [2 m elevation corrected],Basel Relative Humidity [2 m],Basel Wind Speed [10 m],Basel Wind Direction [10 m],Basel Wind Speed [500 mb],Basel Wind Direction [500 mb],Basel Mean Sea Level Pressure [MSL],Year,Month
0,2015-02-01 00:00:00,0.670529,78.0,16.299694,226.78992,53.630333,276.55255,986.7,2015,2
1,2015-02-01 01:00:00,2.120528,72.0,17.610588,229.14462,53.050858,262.19983,986.3,2015,2
2,2015-02-01 02:00:00,1.090529,91.0,30.070238,238.21275,50.81486,277.32642,986.4,2015,2
3,2015-02-01 03:00:00,1.020529,89.0,31.667723,261.50146,60.05529,295.95065,986.1,2015,2
4,2015-02-01 04:00:00,0.430529,90.0,31.394394,266.0548,70.2987,302.87628,986.7,2015,2


8. We want to filter by the "month" column! The following line of code will create a variable that gets the temperature from the rows where "month" is 6. These will be all of the rows from the month of June.<br>
<br>
`june = basel_data.loc[basel_data["month"] == 6]["Basel Temperature [2 m elevation corrected]"]`<br>
<br>
Create this variable for June 2015.

In [13]:
june_2015 = basel_data_full_years.loc[(basel_data_full_years["Month"] == 6) & (basel_data_full_years["Year"] == 2015)]["Basel Temperature [2 m elevation corrected]"]
june_2015

2880    18.560530
2881    17.250528
2882    16.540530
2883    16.270529
2884    15.410529
          ...    
3595    29.050530
3596    27.740528
3597    26.500528
3598    24.390530
3599    22.920528
Name: Basel Temperature [2 m elevation corrected], Length: 720, dtype: float64

9. Create a variable named `july_2015` that contains all of the data points from July. The code to do this should look very similar to your code that created the June variable. This time, we are interested in month `7`.

In [14]:
july_2015 = basel_data_full_years.loc[(basel_data_full_years["Month"] == 7) & (basel_data_full_years["Year"] == 2015)]["Basel Temperature [2 m elevation corrected]"]
july_2015

3600    22.370530
3601    21.440529
3602    20.720530
3603    20.440529
3604    19.950530
          ...    
4339    23.140530
4340    22.110529
4341    20.770529
4342    19.370530
4343    18.620530
Name: Basel Temperature [2 m elevation corrected], Length: 744, dtype: float64

10. Calculate and print the mean temperature in Basel for both June and July using the `np.mean()` function.

    What do these numbers tell you? If you wanted to visit Basel on the month that was, on average, cooler, which month would you pick?

In [15]:
print(np.mean(june_2015))
print(np.mean(july_2015))

19.295056941666726
23.313768432123755


On average June is the cooler month.

11. Calculate and print the standard deviation of temperature in Basel for both June and July. Remember, the function you should use is `np.std()`.<br>
<br>
What do these numbers tell you? How might the standard deviation change your decision on when to visit Basel?

In [16]:
print(np.std(june_2015))
print(np.std(july_2015))

4.627485275240127
5.620416183662221


The temperature spread in June is much smaller than in July

12. Calculate the mean and standard deviation of every month.

In [18]:
basel_data_full_years.groupby('Month')["Basel Temperature [2 m elevation corrected]"].mean()

Month
1      4.181037
2      5.326100
3      8.077005
4     12.111804
5     15.178502
6     19.539939
7     21.971495
8     21.837763
9     17.409050
10    12.698412
11     8.140121
12     5.506355
Name: Basel Temperature [2 m elevation corrected], dtype: float64

In [17]:
basel_data_full_years.groupby('Month')["Basel Temperature [2 m elevation corrected]"].std()

Month
1     3.992862
2     4.362341
3     4.401967
4     5.444015
5     5.135352
6     4.959667
7     4.937764
8     5.075603
9     4.980971
10    4.392005
11    4.487989
12    3.687319
Name: Basel Temperature [2 m elevation corrected], dtype: float64

## Optional

13. By looking at the mean and standard deviation of the temperature in London during each month of the year, we can get a sense of the best time to visit.

    Looking at the spread of the data is an important statistic to consider if you are particularly sensitive to extreme days. For example, if you pick a month with a large standard deviation, you might have one day that is relatively cold while the following day is very hot.

    Take some time to see if you can find more insights in this dataset. Here are some ideas we have for you:

    * Look at columns other than temperature. Can you find something interesting about the humidity or the air pressure? Can you find the rainiest month?
    * Filter based on `"hour"`. Similar to how you filtered based on the month, are there certain hours that have higher variance than others?