
An NOAA dataset has been stored in the file `data/C2A2_data/BinnedCsvs_d400/6c8d642f28d9321421519c91b4ae6955a5796edb65a8b14e2257a994.csv`. The data for this assignment comes from a subset of The National Centers for Environmental Information (NCEI) [Daily Global Historical Climatology Network](https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt) (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe.

Each row in the assignment datafile corresponds to a single observation.

The following variables are provided to you:

* **id** : station identification code
* **date** : date in YYYY-MM-DD format (e.g. 2012-01-24 = January 24, 2012)
* **element** : indicator of element type
    * TMAX : Maximum temperature (tenths of degrees C)
    * TMIN : Minimum temperature (tenths of degrees C)
* **value** : data value for element (tenths of degrees C)


The data you have been given is near **Vancouver, British Columbia, Canada**, and the stations the data comes from are shown on the map below.

In [2]:
import matplotlib.pyplot as plt
import mplleaflet
import pandas as pd

def leaflet_plot_stations(binsize, hashid):

    df = pd.read_csv('data/C2A2_data/BinSize_d{}.csv'.format(binsize))

    station_locations_by_hash = df[df['hash'] == hashid]

    lons = station_locations_by_hash['LONGITUDE'].tolist()
    lats = station_locations_by_hash['LATITUDE'].tolist()

    plt.figure(figsize=(8,8))

    plt.scatter(lons, lats, c='r', alpha=0.7, s=200)
    
    return mplleaflet.display()

leaflet_plot_stations(400,'6c8d642f28d9321421519c91b4ae6955a5796edb65a8b14e2257a994')

In [3]:
%matplotlib notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def fetchdata():
    df=pd.read_csv('data/C2A2_data/BinnedCsvs_d400/6c8d642f28d9321421519c91b4ae6955a5796edb65a8b14e2257a994.csv')
    df=df.sort('Date')
    return df
fetchdata()



Unnamed: 0,ID,Date,Element,Data_Value
114358,CA001107H79,2005-01-01,TMIN,-10
147008,USC00450566,2005-01-01,TMIN,-17
162284,USC00458715,2005-01-01,TMAX,44
163093,CA001106CL2,2005-01-01,TMIN,0
190158,CA001108824,2005-01-01,TMAX,49
162069,CA00102BFHH,2005-01-01,TMAX,40
152543,CA001108395,2005-01-01,TMIN,-5
169067,CA001101158,2005-01-01,TMIN,-20
146872,CA001105658,2005-01-01,TMIN,-40
172013,USC00451484,2005-01-01,TMAX,17


In [5]:
def clean():
    df=fetchdata()
    
    
    df=df.groupby('Date')['Data_Value'].agg({'max': np.max, 'min': np.min})
    return df
clean()



Unnamed: 0_level_0,max,min
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2005-01-01,74,-40
2005-01-02,72,-53
2005-01-03,57,-70
2005-01-04,63,-81
2005-01-05,60,-80
2005-01-06,43,-70
2005-01-07,42,-70
2005-01-08,32,-71
2005-01-09,40,-95
2005-01-10,43,-92


Separate 2005-2014 data and 2015 data

In [9]:
df=clean()
split=df.index.get_loc('2015-01-01')
big=df.ix[:split]
small=df.ix[split:]


big['Date']=big.index
small['Date']=small.index


big=big[['Date','min','max']]

big['Date']=big['Date'].apply(lambda x : x[5:])
big = big.where(big['Date'] != '02-29')


small=small[['Date','min','max']]

small.columns=(['Date','2015min','2015max'])

small['Date']=small['Date'].apply(lambda x : x[5:])

big=big.groupby('Date')['min','max'].agg({'max': np.max, 'min': np.min})
small=small.groupby('Date')['2015min','2015max'].agg({'2015max': np.max, '2015min': np.min})

big=big.set_index(np.linspace(1,365,365))
small=small.set_index(np.linspace(1,365,365))



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [10]:
plt.figure()

plt.plot(big.index,big['max'],color='b',label='record high',alpha=0.25)
plt.plot(big.index,big['min'],color='orange',label='record low',alpha=0.25)

m=pd.merge(big,small,how='inner',left_index=True, right_index=True)

highdays=[]
highrecord=[]

lowdays=[]
lowrecord=[]
for i in range(len(m)):
    if m.iloc[i]['2015max'] > m.iloc[i]['max']:
        highdays.append(i)
        highrecord.append(m.iloc[i]['2015max'])
    if m.iloc[i]['2015min'] < m.iloc[i]['min']:
        lowdays.append(i)
        lowrecord.append(m.iloc[i]['2015min'])



plt.scatter(highdays,highrecord,color='green',label='2015 new high',s=10,alpha=0.75)
plt.scatter(lowdays,lowrecord,color='red',label='2015 new low',s=10,alpha=0.75)

plt.title('Record temperature in Vancouver, British Columbia, Canada\n What is 2015 like', alpha=0.8)
plt.ylabel('Tenths of degrees C')
plt.xlabel('Time of the year')
plt.xticks(range(0,365,31),['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],rotation=45)

plt.gca().fill_between(range(365),big['min'],big['max'],facecolor='gray',alpha=0.25)
plt.legend(['Record high 05-14','Record low 05-14','2015 new high','2015 new low'],loc=8, frameon=False)
plt.subplots_adjust(bottom=0.25)
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.show()

<IPython.core.display.Javascript object>