# Variance in Weather

You’re planning a trip to London and want to get a sense of the best time of the year to visit. Luckily, you got your hands on a dataset from 2015 that contains over 39,000 data points about weather conditions in London. Surely, with this much information, you can discover something useful about when to make your trip!

In [3]:
import pandas as pd
import numpy as np

## Explore the Data

1. All of the weather data is stored in a variable named london_data.

   Print the first few rows of the dataset by calling print(london_data.head()).

    Take a look at the browser to see the columns of this dataset. Here are two questions to ask yourself:

      - How often were measurements taken?
      - Which columns might be the most useful when thinking about planning a trip.

   If you want to see different rows of the data, you can try something like this:

       print(london_data.iloc[100:200])

   This will print rows 100 through 199.

In [3]:
# import pickle
# london_data = pickle.load( open( "weather.p", "rb" ) )
# from weather_data import london_data

# I had to download the data directly, so I saved it in a csv and used read_csv.
# I also looked at it in Excel, which was easier.

In [4]:
london_data = pd.read_csv('london_data.csv')

# print(london_data.iloc[100:200])
london_data.head()

Unnamed: 0.1,Unnamed: 0,Time,TemperatureC,DewpointC,PressurehPa,WindDirection,WindDirectionDegrees,WindSpeedKMH,WindSpeedGustKMH,Humidity,HourlyPrecipMM,Conditions,Clouds,dailyrainMM,SoftwareType,DateUTC,station,hour,month
0,0,2015-01-01 00:00:00,4.6,2.9,1031.7,West,273,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:00:00,ILONDONL28,0,1
1,1,2015-01-01 00:12:00,4.5,2.8,1031.4,WNW,291,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:12:00,ILONDONL28,0,1
2,2,2015-01-01 00:27:00,4.5,2.8,1031.0,SW,229,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:27:00,ILONDONL28,0,1
3,3,2015-01-01 00:42:00,4.8,3.2,1031.7,West,281,0.0,4.8,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:42:00,ILONDONL28,0,1
4,4,2015-01-01 00:57:00,5.2,3.5,1031.4,NW,309,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:57:00,ILONDONL28,0,1


*The measurements are taken 3-4 times per hour.*

*Time (Date), TemperatureC, Wind Speed, Humidity, Hourly PrecipMM (or dailyrainMM) would all be good to look at in planning a trip and basing the time you visit to your weather preferences.*

2. Let’s also take a look at how many data points we have. Print `len(london_data)`.

In [5]:
print(len(london_data))

39106


## Looking At Temperature

3. Now that we’ve seen what the data looks like, let’s dive into one of the more promising columns — `"TemperatureC"`. This column stores the temperature in Celsius.

    To get a single column from a DataFrame, you can use this syntax:

        one_column = london_data["column_name"]

    Create a variable named `temp` and set it equal to the `"TemperatureC"` column of `london_data`.

In [6]:
temp = london_data['TemperatureC']

4. We can now calculate descriptive statistics about this column. To begin, find the average temperature in London in 2015. Store it in a variable named `average_temp`.

In [7]:
average_temp = np.mean(temp)
average_temp

12.081969518743934

5. Calculate the variance of the temperature column and store the results in the variable `temperature_var`. Print the results.

In [8]:
temperature_var = np.var(temp)
temperature_var

29.715642528199353

6. Calculate the standard deviation of the temperature column and store a variable named `temperature_standard_deviation`. Print this variable.

    How would the variance and standard deviation help you plan a trip?

In [9]:
temperature_std = np.std(temp)
temperature_std

5.4512056031853495

## Filtering By Month

7. The statistics we just calculated aren’t very helpful when trying to plan a vacation since they describe the weather throughout an entire year.

    If we could find a way to use the rows from only a certain month, that might help us find the best month to plan our trip.

    Once again, print `london_data.head()` to see the first few columns of our DataFrame. Which column will help us get only the data points from January? In the browser you can scroll to the right to see more columns.

In [10]:
london_data.head()
# the 'month' column will help us get only the data points from January

Unnamed: 0.1,Unnamed: 0,Time,TemperatureC,DewpointC,PressurehPa,WindDirection,WindDirectionDegrees,WindSpeedKMH,WindSpeedGustKMH,Humidity,HourlyPrecipMM,Conditions,Clouds,dailyrainMM,SoftwareType,DateUTC,station,hour,month
0,0,2015-01-01 00:00:00,4.6,2.9,1031.7,West,273,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:00:00,ILONDONL28,0,1
1,1,2015-01-01 00:12:00,4.5,2.8,1031.4,WNW,291,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:12:00,ILONDONL28,0,1
2,2,2015-01-01 00:27:00,4.5,2.8,1031.0,SW,229,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:27:00,ILONDONL28,0,1
3,3,2015-01-01 00:42:00,4.8,3.2,1031.7,West,281,0.0,4.8,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:42:00,ILONDONL28,0,1
4,4,2015-01-01 00:57:00,5.2,3.5,1031.4,NW,309,0.0,1.6,89,0.0,,,0.0,WeatherCatV1.24B11,2015-01-01 00:57:00,ILONDONL28,0,1


8. We want to filter by the "month" column! The following line of code will create a variable that gets the temperature from the rows where "month" is 6. These will be all of the rows from the month of June.

        june = london_data.loc[london_data["month"] == 6]["TemperatureC"]

    Create this variable for June.

In [11]:
june = london_data.loc[london_data["month"] == 6]["TemperatureC"]

9. Create a variable named `july` that contains all of the data points from July. The code to do this should look very similar to your code that created the June variable. This time, we’re interested in month 7.

In [12]:
july = london_data.loc[london_data["month"] == 7]["TemperatureC"]

10. Calculate and print the mean temperature in London for both June and July using the `np.mean() `function.

    What do these numbers tell you? If you wanted to visit London on the month that was, on average, cooler, which month would you pick?

In [13]:
print("The mean temperature in June is " + str(np.mean(june)) + ".\n")
print("The mean temperature in July is " + str(np.mean(july)) + ".\n")

#  june

The mean temperature in June is 17.047289251018462.

The mean temperature in July is 18.775608907446074.



11. Calculate and print the standard deviation of temperature in London for both June and July. Remember, the function you should use is `np.std()`.

    What do these numbers tell you? How might the standard deviation change your decision on when to visit London? Click on the hint to see our thoughts.

In [14]:
print("The mean standard deviation in temperature in June is " + str(np.std(june)) + ".\n")
print("The mean standard deviation in temperature in July is " + str(np.std(july)) + ".\n")

The mean standard deviation in temperature in June is 4.597909204651791.

The mean standard deviation in temperature in July is 4.136377318662126.



*July has more consistent temps, June has a larger range of temps, so you have a better chance of getting a cooler day in June.*

12. If you want to quickly see the mean and standard deviation of every month, use this block of code.

        for i in range(1, 13):
          month = london_data.loc[london_data["month"] == i]["TemperatureC"]
          print("The mean temperature in month "+str(i) +" is "+ str(np.mean(month)))
          print("The standard deviation of temperature in month "+str(i) +" is "+ str(np.std(month)) +"\n")

    During which month would you most like to visit? If you wanted to pick the month with the least variable temperature, which one would you pick?

In [24]:
for i in range(1, 13):
    month = london_data.loc[london_data["month"] == i]["TemperatureC"]
    print("The mean temperature in month "+str(i) +" is "+ str(round(np.mean(month),1)))
    print("The standard deviation of temperature in month "+str(i) +" is "
          + str(round(np.std(month),1)) +"\n")

The mean temperature in month 1 is 5.7
The standard deviation of temperature in month 1 is 3.6

The mean temperature in month 2 is 5.1
The standard deviation of temperature in month 2 is 2.7

The mean temperature in month 3 is 7.9
The standard deviation of temperature in month 3 is 2.7

The mean temperature in month 4 is 10.6
The standard deviation of temperature in month 4 is 4.1

The mean temperature in month 5 is 13.6
The standard deviation of temperature in month 5 is 3.5

The mean temperature in month 6 is 17.0
The standard deviation of temperature in month 6 is 4.6

The mean temperature in month 7 is 18.8
The standard deviation of temperature in month 7 is 4.1

The mean temperature in month 8 is 18.0
The standard deviation of temperature in month 8 is 3.5

The mean temperature in month 9 is 13.8
The standard deviation of temperature in month 9 is 3.0

The mean temperature in month 10 is 12.0
The standard deviation of temperature in month 10 is 3.1

The mean temperature in month 1

## Explore on Your Own

13. By looking at the mean and standard deviation of the temperature in London during each month of the year, we can get a sense of the best time to visit.

    Looking at the spread of the data is an important statistic to consider if you are particularly sensitive to extreme days. For example, if you pick a month with a large standard deviation, you might have one day that is relatively cold while the following day is very hot.

    Take some time to see if you can find more insights in this dataset. Here are some ideas we have for you:

    -    Look at columns other than "TemperatureC". Can you find something interesting about the humidity or the air pressure? Can you find the rainiest month? London is notoriously rainy!
    -    Filter based on"hour". Similar to how you filtered based on the month, are there certain hours that have higher variance than others?

In [16]:
london_data.columns

Index(['Unnamed: 0', 'Time', 'TemperatureC', 'DewpointC', 'PressurehPa',
       'WindDirection', 'WindDirectionDegrees', 'WindSpeedKMH',
       'WindSpeedGustKMH', 'Humidity', 'HourlyPrecipMM', 'Conditions',
       'Clouds', 'dailyrainMM', 'SoftwareType', 'DateUTC', 'station', 'hour',
       'month'],
      dtype='object')

In [25]:
for i in range(1, 13):
    month = london_data.loc[london_data["month"] == i]['dailyrainMM']
    print("The mean precipitation in month "+str(i) +" is "+ str(round(np.mean(month),2)) + " mm.")
    print("The standard deviation of precipitation in month "+str(i) +" is "
          + str(round(np.std(month),2)) + " mm." + "\n")

# These numbers seem low. It would be nice to verify the units.

The mean precipitation in month 1 is 0.77 mm.
The standard deviation of precipitation in month 1 is 3.09 mm.

The mean precipitation in month 2 is 0.04 mm.
The standard deviation of precipitation in month 2 is 0.22 mm.

The mean precipitation in month 3 is 0.24 mm.
The standard deviation of precipitation in month 3 is 1.1 mm.

The mean precipitation in month 4 is 0.34 mm.
The standard deviation of precipitation in month 4 is 1.02 mm.

The mean precipitation in month 5 is 0.4 mm.
The standard deviation of precipitation in month 5 is 1.57 mm.

The mean precipitation in month 6 is 0.29 mm.
The standard deviation of precipitation in month 6 is 1.24 mm.

The mean precipitation in month 7 is 0.76 mm.
The standard deviation of precipitation in month 7 is 3.16 mm.

The mean precipitation in month 8 is 1.24 mm.
The standard deviation of precipitation in month 8 is 4.57 mm.

The mean precipitation in month 9 is 1.14 mm.
The standard deviation of precipitation in month 9 is 3.91 mm.

The mean pre

In [27]:
for i in range(1, 13):
    month = london_data.loc[london_data["month"] == i]['Humidity']
    print("The mean humidity in month "+str(i) +" is "+ str(round(np.mean(month),2)) + "%.")
    print("The standard deviation of humidity in month "+str(i) +" is "+ str(round(np.std(month),2))
          + "\n")
    

The mean humidity in month 1 is 83.35%.
The standard deviation of humidity in month 1 is 8.87

The mean humidity in month 2 is 82.58%.
The standard deviation of humidity in month 2 is 9.62

The mean humidity in month 3 is 73.09%.
The standard deviation of humidity in month 3 is 13.78

The mean humidity in month 4 is 70.25%.
The standard deviation of humidity in month 4 is 15.99

The mean humidity in month 5 is 66.75%.
The standard deviation of humidity in month 5 is 14.85

The mean humidity in month 6 is 65.55%.
The standard deviation of humidity in month 6 is 15.88

The mean humidity in month 7 is 67.96%.
The standard deviation of humidity in month 7 is 16.61

The mean humidity in month 8 is 76.03%.
The standard deviation of humidity in month 8 is 15.81

The mean humidity in month 9 is 79.83%.
The standard deviation of humidity in month 9 is 13.17

The mean humidity in month 10 is 86.31%.
The standard deviation of humidity in month 10 is 10.05

The mean humidity in month 11 is 86.8%.


In [29]:
for i in range(1, 23):
    hour = london_data.loc[london_data['hour'] == i]["TemperatureC"]
    print("The mean temperature in hour "+str(i) +" is "+ str(round(np.mean(hour),1)))
    print("The standard deviation of temperature in hour "+str(i) +" is "
          + str(round(np.std(hour),1)) +"\n")

The mean temperature in hour 1 is 10.0
The standard deviation of temperature in hour 1 is 4.3

The mean temperature in hour 2 is 9.7
The standard deviation of temperature in hour 2 is 4.3

The mean temperature in hour 3 is 9.6
The standard deviation of temperature in hour 3 is 4.3

The mean temperature in hour 4 is 9.5
The standard deviation of temperature in hour 4 is 4.3

The mean temperature in hour 5 is 9.6
The standard deviation of temperature in hour 5 is 4.5

The mean temperature in hour 6 is 10.0
The standard deviation of temperature in hour 6 is 4.9

The mean temperature in hour 7 is 10.8
The standard deviation of temperature in hour 7 is 5.2

The mean temperature in hour 8 is 11.7
The standard deviation of temperature in hour 8 is 5.3

The mean temperature in hour 9 is 12.6
The standard deviation of temperature in hour 9 is 5.5

The mean temperature in hour 10 is 13.4
The standard deviation of temperature in hour 10 is 5.6

The mean temperature in hour 11 is 14.1
The standard