The data for this assignment comes from a subset of The National Centers for Environmental Information (NCEI) Daily Global Historical Climatology Network (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe.

Each row in the assignment datafile corresponds to a single observation.

The following variables are provided to you:

id : station identification code
date : date in YYYY-MM-DD format (e.g. 2012-01-24 = January 24, 2012)
element : indicator of element type
TMAX : Maximum temperature (tenths of degrees C)
TMIN : Minimum temperature (tenths of degrees C)
value : data value for element (tenths of degrees C)
For this assignment, you must:

Read the documentation and familiarize yourself with the dataset, then write some python code which returns a line graph of the record high and record low temperatures by day of the year over the period 2005-2014. The area between the record high and record low temperatures for each day should be shaded.
Overlay a scatter of the 2015 data for any points (highs and lows) for which the ten year record (2005-2014) record high or record low was broken in 2015.
Watch out for leap days (i.e. February 29th), it is reasonable to remove these points from the dataset for the purpose of this visualization.
Make the visual nice! Leverage principles from the first module in this course when developing your solution. Consider issues such as legends, labels, and chart junk.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [12]:
#importing datatset
temperature = pd.read_csv("temperature.csv")
temperature.head()

Unnamed: 0,ID,Date,Element,Data_Value
0,CHM00053646,2005-11-14,TMIN,-87
1,CHM00054102,2014-09-20,TMAX,175
2,CHM00050727,2011-08-05,TMAX,271
3,CHM00053646,2014-08-24,TMAX,249
4,RSM00031913,2009-07-12,TMAX,230


In [14]:
#Adding two more cols
temperature["Day-Month"] = temperature["Date"].apply(lambda x : x[6:])
temperature["Year"] = temperature["Date"].apply(lambda x : x[:4])
temperature.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
0,CHM00053646,2005-11-14,TMIN,-87,1-14,2005
1,CHM00054102,2014-09-20,TMAX,175,9-20,2014
2,CHM00050727,2011-08-05,TMAX,271,8-05,2011
3,CHM00053646,2014-08-24,TMAX,249,8-24,2014
4,RSM00031913,2009-07-12,TMAX,230,7-12,2009


In [15]:
#Dropping 29th Feb data as 2015 doesnt have 29th of Feb
temp = temperature[temperature["Day-Month"] != "29-02"]
temp.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
0,CHM00053646,2005-11-14,TMIN,-87,1-14,2005
1,CHM00054102,2014-09-20,TMAX,175,9-20,2014
2,CHM00050727,2011-08-05,TMAX,271,8-05,2011
3,CHM00053646,2014-08-24,TMAX,249,8-24,2014
4,RSM00031913,2009-07-12,TMAX,230,7-12,2009


In [17]:
#Sorting values wrt to ID and then Date
temperature.sort_values(["ID", "Date"], inplace=True)
temperature.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
72648,CHM00050557,2005-01-01,TMAX,-134,1-01,2005
72673,CHM00050557,2005-01-01,TMIN,-237,1-01,2005
237405,CHM00050557,2005-01-02,TMAX,-169,1-02,2005
237413,CHM00050557,2005-01-02,TMIN,-261,1-02,2005
193998,CHM00050557,2005-01-03,TMIN,-233,1-03,2005


In [18]:
#Creating 2015 dataset
temp_2015 = temperature[temperature.Date.str.contains("2015")]
temp_2015.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
649327,CHM00050557,2015-01-01,TMAX,-129,1-01,2015
649335,CHM00050557,2015-01-01,TMIN,-230,1-01,2015
565003,CHM00050557,2015-01-02,TMAX,-141,1-02,2015
573643,CHM00050557,2015-01-03,TMAX,-179,1-03,2015
304729,CHM00050557,2015-01-04,TMIN,-273,1-04,2015


In [19]:
#Creating Min and Max Data set

In [21]:
temp_2015_min = temp_2015[temp_2015["Element"] == "TMIN"]
temp_2015_min.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
649335,CHM00050557,2015-01-01,TMIN,-230,1-01,2015
304729,CHM00050557,2015-01-04,TMIN,-273,1-04,2015
265059,CHM00050557,2015-01-05,TMIN,-204,1-05,2015
426594,CHM00050557,2015-01-08,TMIN,-257,1-08,2015
360857,CHM00050557,2015-01-09,TMIN,-267,1-09,2015


In [22]:
temp_2015_max = temp_2015[temp_2015["Element"] == "TMAX"]
temp_2015_max.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
649327,CHM00050557,2015-01-01,TMAX,-129,1-01,2015
565003,CHM00050557,2015-01-02,TMAX,-141,1-02,2015
573643,CHM00050557,2015-01-03,TMAX,-179,1-03,2015
304732,CHM00050557,2015-01-04,TMAX,-157,1-04,2015
265062,CHM00050557,2015-01-05,TMAX,-120,1-05,2015


In [26]:
#Creating 2005 to 2014 dataset
temp_rest = temperature[~temperature.Date.str.contains("2015")]
temp_rest.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
72648,CHM00050557,2005-01-01,TMAX,-134,1-01,2005
72673,CHM00050557,2005-01-01,TMIN,-237,1-01,2005
237405,CHM00050557,2005-01-02,TMAX,-169,1-02,2005
237413,CHM00050557,2005-01-02,TMIN,-261,1-02,2005
193998,CHM00050557,2005-01-03,TMIN,-233,1-03,2005


In [27]:
#Creating Min and Max Data set 

In [39]:
temp_rest_min = temp_rest[temp_rest["Element"] == "TMIN"]
temp_rest_min.groupby("Day-Month").agg({"Data_Value" : max}).tail()

Unnamed: 0_level_0,Data_Value
Day-Month,Unnamed: 1_level_1
9-26,202
9-27,185
9-28,188
9-29,186
9-30,192


In [29]:
temp_rest_max = temp_rest[temp_rest["Element"] == "TMAX"]
temp_rest_max.head()

Unnamed: 0,ID,Date,Element,Data_Value,Day-Month,Year
72648,CHM00050557,2005-01-01,TMAX,-134,1-01,2005
237405,CHM00050557,2005-01-02,TMAX,-169,1-02,2005
194089,CHM00050557,2005-01-03,TMAX,-138,1-03,2005
142924,CHM00050557,2005-01-04,TMAX,-139,1-04,2005
74089,CHM00050557,2005-01-05,TMAX,-156,1-05,2005


Unnamed: 0_level_0,ID,Date,Element,Data_Value,Year
Day-Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0-01,RSM00032088,2014-10-01,TMIN,307,2014
0-02,RSM00032088,2014-10-02,TMIN,317,2014
0-03,RSM00032088,2014-10-03,TMIN,298,2014
0-04,RSM00032088,2014-10-04,TMIN,310,2014
0-05,RSM00032088,2014-10-05,TMIN,298,2014
...,...,...,...,...,...
9-26,RSM00032088,2014-09-26,TMIN,303,2014
9-27,RSM00032088,2014-09-27,TMIN,288,2014
9-28,RSM00032088,2014-09-28,TMIN,301,2014
9-29,RSM00032088,2014-09-29,TMIN,287,2014
