## Cleaning the daily temperature data for the Botanical Gardens for the years 2000-2019.

### Data source

https://www.ncdc.noaa.gov/cdo-web/search

### Data in the final dataset

- TOBS - Temperature at time of observation
- TMAX - Maximum Temperature
- TMIN - Minimum Temperature
- AVG - average between the maximum and the minimum daily values
- LOC - Location

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../data/temp_data/Botanical_Garden_FULL_2000_2019_temps.csv')

In [3]:
df.shape

(7046, 13)

In [4]:
df.isnull().sum()

STATION         0
NAME            0
LATITUDE        0
LONGITUDE       0
ELEVATION       0
DATE            0
MDPR         7045
PRCP           25
SNOW          312
SNWD          322
TMAX            9
TMIN           10
TOBS           14
dtype: int64

In [5]:
df.head()

Unnamed: 0,STATION,NAME,LATITUDE,LONGITUDE,ELEVATION,DATE,MDPR,PRCP,SNOW,SNWD,TMAX,TMIN,TOBS
0,USC00111497,"CHICAGO BOTANIC GARDEN, IL US",42.13987,-87.78537,192.0,2000-01-01,,0.0,0.0,0.0,44.0,28.0,37.0
1,USC00111497,"CHICAGO BOTANIC GARDEN, IL US",42.13987,-87.78537,192.0,2000-01-02,,0.02,0.0,0.0,49.0,32.0,48.0
2,USC00111497,"CHICAGO BOTANIC GARDEN, IL US",42.13987,-87.78537,192.0,2000-01-03,,0.0,0.0,0.0,62.0,35.0,37.0
3,USC00111497,"CHICAGO BOTANIC GARDEN, IL US",42.13987,-87.78537,192.0,2000-01-04,,0.26,0.4,0.0,39.0,25.0,29.0
4,USC00111497,"CHICAGO BOTANIC GARDEN, IL US",42.13987,-87.78537,192.0,2000-01-05,,0.0,0.0,0.0,30.0,14.0,19.0


In [6]:
df_gar = df[['DATE','TOBS', 'TMAX', 'TMIN']]
df_gar.head()

Unnamed: 0,DATE,TOBS,TMAX,TMIN
0,2000-01-01,37.0,44.0,28.0
1,2000-01-02,48.0,49.0,32.0
2,2000-01-03,37.0,62.0,35.0
3,2000-01-04,29.0,39.0,25.0
4,2000-01-05,19.0,30.0,14.0


In [7]:
df_gar['AVG'] = (df_gar['TMAX'] + df_gar['TMIN']) * 0.5

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [8]:
df_gar.head()

Unnamed: 0,DATE,TOBS,TMAX,TMIN,AVG
0,2000-01-01,37.0,44.0,28.0,36.0
1,2000-01-02,48.0,49.0,32.0,40.5
2,2000-01-03,37.0,62.0,35.0,48.5
3,2000-01-04,29.0,39.0,25.0,32.0
4,2000-01-05,19.0,30.0,14.0,22.0


In [9]:
df_gar['LOC'] = 'garden'
df_gar.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,DATE,TOBS,TMAX,TMIN,AVG,LOC
0,2000-01-01,37.0,44.0,28.0,36.0,garden
1,2000-01-02,48.0,49.0,32.0,40.5,garden
2,2000-01-03,37.0,62.0,35.0,48.5,garden
3,2000-01-04,29.0,39.0,25.0,32.0,garden
4,2000-01-05,19.0,30.0,14.0,22.0,garden


In [10]:
# saving the dataframe
df_gar.to_csv('../data/temp_data/garden_temp_2000_2019.csv', index=False)