### Assignment 2

Before working on this assignment please read these instructions fully. In the submission area, you will notice that you can click the link to **Preview the Grading** for each step of the assignment. This is the criteria that will be used for peer grading. Please familiarize yourself with the criteria before beginning the assignment.

An NOAA dataset has been stored in the file `data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv`. This is the dataset to use for this assignment. Note: The data for this assignment comes from a subset of The National Centers for Environmental Information (NCEI) [Daily Global Historical Climatology Network](https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt) (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe.

Each row in the assignment datafile corresponds to a single observation.

The following variables are provided to you:

* **id** : station identification code
* **date** : date in YYYY-MM-DD format (e.g. 2012-01-24 = January 24, 2012)
* **element** : indicator of element type
    * TMAX : Maximum temperature (tenths of degrees C)
    * TMIN : Minimum temperature (tenths of degrees C)
* **value** : data value for element (tenths of degrees C)

For this assignment, you must:

1. Read the documentation and familiarize yourself with the dataset, then write some python code which returns a line graph of the record high and record low temperatures by day of the year over the period 2005-2014. The area between the record high and record low temperatures for each day should be shaded.
2. Overlay a scatter of the 2015 data for any points (highs and lows) for which the ten year record (2005-2014) record high or record low was broken in 2015.
3. Watch out for leap days (i.e. February 29th), it is reasonable to remove these points from the dataset for the purpose of this visualization.
4. Make the visual nice! Leverage principles from the first module in this course when developing your solution. Consider issues such as legends, labels, and chart junk.

The data you have been given is near **Ann Arbor, Michigan, United States**, and the stations the data comes from are shown on the map below.

In [1]:
import matplotlib.pyplot as plt
import mplleaflet
import pandas as pd

def leaflet_plot_stations(binsize, hashid):

    df = pd.read_csv('data/C2A2_data/BinSize_d{}.csv'.format(binsize))

    station_locations_by_hash = df[df['hash'] == hashid]

    lons = station_locations_by_hash['LONGITUDE'].tolist()
    lats = station_locations_by_hash['LATITUDE'].tolist()

    plt.figure(figsize=(8,8))

    plt.scatter(lons, lats, c='r', alpha=0.7, s=200)

    return mplleaflet.display()

leaflet_plot_stations(400,'fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89')

In [3]:
#load Data
df = pd.read_csv('data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv')
df

Unnamed: 0,ID,Date,Element,Data_Value
0,USW00094889,2014-11-12,TMAX,22
1,USC00208972,2009-04-29,TMIN,56
2,USC00200032,2008-05-26,TMAX,278
3,USC00205563,2005-11-11,TMAX,139
4,USC00200230,2014-02-27,TMAX,-106
5,USW00014833,2010-10-01,TMAX,194
6,USC00207308,2010-06-29,TMIN,144
7,USC00203712,2005-10-04,TMAX,289
8,USW00004848,2007-12-14,TMIN,-16
9,USC00200220,2011-04-21,TMAX,72


In [7]:
df = df.sort(['ID', 'Date'])
df.head()

  if __name__ == '__main__':


Unnamed: 0,ID,Date,Element,Data_Value
55067,USC00200032,2005-01-01,TMIN,-28
55102,USC00200032,2005-01-01,TMAX,67
112671,USC00200032,2005-01-02,TMAX,122
112708,USC00200032,2005-01-02,TMIN,-6
104159,USC00200032,2005-01-03,TMIN,11


In [9]:
df['year']= df.Date.apply(lambda x:x[:4])
df.year

55067     2005
55102     2005
112671    2005
112708    2005
104159    2005
104196    2005
3408      2005
3447      2005
8161      2005
16564     2005
107265    2005
107270    2005
81708     2005
81728     2005
116059    2005
116063    2005
123466    2005
123469    2005
28270     2005
28271     2005
17878     2005
17879     2005
31112     2005
31140     2005
139342    2005
139345    2005
68151     2005
68163     2005
66068     2005
66092     2005
          ... 
40780     2015
40805     2015
48648     2015
48652     2015
126707    2015
126708    2015
81792     2015
81800     2015
110738    2015
110745    2015
116256    2015
116257    2015
135236    2015
135238    2015
77239     2015
77245     2015
6489      2015
6707      2015
128642    2015
128643    2015
94085     2015
94106     2015
95483     2015
95509     2015
83961     2015
84041     2015
50750     2015
50751     2015
61120     2015
61135     2015
Name: year, dtype: object

In [10]:
df['Month-Day'] = df['Date'].apply(lambda x: x[5:])
df['Month-Day'] 

55067     01-01
55102     01-01
112671    01-02
112708    01-02
104159    01-03
104196    01-03
3408      01-04
3447      01-04
8161      01-05
16564     01-05
107265    01-06
107270    01-06
81708     01-07
81728     01-07
116059    01-08
116063    01-08
123466    01-09
123469    01-09
28270     01-10
28271     01-10
17878     01-11
17879     01-11
31112     01-12
31140     01-12
139342    01-13
139345    01-13
68151     01-14
68163     01-14
66068     01-15
66092     01-15
          ...  
40780     12-14
40805     12-14
48648     12-15
48652     12-15
126707    12-16
126708    12-16
81792     12-17
81800     12-17
110738    12-18
110745    12-18
116256    12-19
116257    12-19
135236    12-20
135238    12-20
77239     12-21
77245     12-21
6489      12-25
6707      12-25
128642    12-26
128643    12-26
94085     12-27
94106     12-27
95483     12-28
95509     12-28
83961     12-29
84041     12-29
50750     12-30
50751     12-30
61120     12-31
61135     12-31
Name: Month-Day, dtype: 

In [14]:
df_min = df[(df['Element'] == 'TMIN')]
df_max = df[(df['Element'] == 'TMAX')]


In [17]:
df_temp_min = df[(df['Element'] == 'TMIN') & (df['year'] != '2015')]
df_temp_max = df[(df['Element'] == 'TMAX') & (df['year'] != '2015')]

In [18]:
df_temp_min

Unnamed: 0,ID,Date,Element,Data_Value,year,Month-Day
55067,USC00200032,2005-01-01,TMIN,-28,2005,01-01
112708,USC00200032,2005-01-02,TMIN,-6,2005,01-02
104159,USC00200032,2005-01-03,TMIN,11,2005,01-03
3408,USC00200032,2005-01-04,TMIN,6,2005,01-04
8161,USC00200032,2005-01-05,TMIN,-44,2005,01-05
107265,USC00200032,2005-01-06,TMIN,-56,2005,01-06
81728,USC00200032,2005-01-07,TMIN,-72,2005,01-07
116063,USC00200032,2005-01-08,TMIN,-33,2005,01-08
123469,USC00200032,2005-01-09,TMIN,-22,2005,01-09
28271,USC00200032,2005-01-10,TMIN,-6,2005,01-10


In [20]:
import numpy as np
temp_min = df_temp_min.groupby('Month-Day')['Data_Value'].agg({'temp_min_mean': np.mean})
temp_max = df_temp_max.groupby('Month-Day')['Data_Value'].agg({'temp_max_mean': np.mean})


In [21]:
temp_min.head()

Unnamed: 0_level_0,temp_min_mean
Month-Day,Unnamed: 1_level_1
01-01,-47.623116
01-02,-77.870813
01-03,-95.569378
01-04,-72.925743
01-05,-56.687805


In [23]:
temp_min_15_tmp = df_min[df_min['year'] == '2015']
temp_max_15_tmp = df_max[df_max['year'] == '2015']

temp_min_15 = temp_min_15_tmp.groupby('Month-Day')['Data_Value'].agg({'temp_min_15_mean': np.mean})
temp_max_15 = temp_max_15_tmp.groupby('Month-Day')['Data_Value'].agg({'temp_max_15_mean': np.mean})



In [24]:
# Reset Index
temp_min = temp_min.reset_index()
temp_max = temp_max.reset_index()

temp_min_15 = temp_min_15.reset_index()
temp_max_15 = temp_max_15.reset_index()


In [26]:
temp_min.head()

Unnamed: 0,Month-Day,temp_min_mean
0,01-01,-47.623116
1,01-02,-77.870813
2,01-03,-95.569378
3,01-04,-72.925743
4,01-05,-56.687805


In [33]:

%matplotlib notebook
import matplotlib.pyplot as plt


In [34]:
plt.figure()

plt.plot(temp_min['temp_min_mean'], 'y', alpha = 0.75, label = 'Record Low')
plt.plot(temp_max['temp_max_mean'], 'r', alpha = 0.5, label = 'Record High')

plt.xlabel('Month')
plt.ylabel('Temperature (Tenths of Degrees C)')
plt.title('Extreme Temperatures of 2015 against 2005-2014\n Beijing, China')

plt.gca().fill_between(range(len(temp_min)), 
                       temp_min['temp_min_mean'], temp_max['temp_max_mean'], 
                       facecolor='grey', 
                       alpha=0.2)

plt.gca().axis([-5, 370, -400, 400])
plt.legend(frameon = False)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

a = [0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330]
b = [i+15 for i in a]

Month_name = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
plt.xticks(b, Month_name)

<IPython.core.display.Javascript object>

([<matplotlib.axis.XTick at 0x7f41a5f25ef0>,
  <matplotlib.axis.XTick at 0x7f41a5f1be48>,
  <matplotlib.axis.XTick at 0x7f41f0e45c50>,
  <matplotlib.axis.XTick at 0x7f41f0e23550>,
  <matplotlib.axis.XTick at 0x7f41f0e23f28>,
  <matplotlib.axis.XTick at 0x7f41f0e35ef0>,
  <matplotlib.axis.XTick at 0x7f41f0e35898>,
  <matplotlib.axis.XTick at 0x7f41f0e379e8>,
  <matplotlib.axis.XTick at 0x7f41f0e33400>,
  <matplotlib.axis.XTick at 0x7f41f0e33dd8>,
  <matplotlib.axis.XTick at 0x7f41a5d867f0>,
  <matplotlib.axis.XTick at 0x7f41a5d85208>],
 <a list of 12 Text xticklabel objects>)