# GHCND Inventory Data Prep
#### Description:
This notebook is used to prep the data contained in ghcnd-inventory.txt. Use noaa-daily-retrieve-files-ftp.ipynb to download this file among other metadta files used in the NOAA Daily Weather Project. 

System of Origin Data Source: https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-inventory.txt

#### Data Prep Operations:
- define english column names
- save to .csv file (tab separated)

#### Created by:
Nate Muth <br>
nmuth87@gmail.com

#### Created on:
8/5/2018

#### Changelog:
8/5/2018 - Initial Create Date<br>

In [1]:
import pandas as pd

In [2]:
# read inventory.txt into DataFrame
inventoryDF = pd.read_fwf('ghcnd-inventory.txt', header=None, delimiter=' '
                         , widths=[12,9,10,5,5,5]
                         , names=['StationID', 'Latitude', 'Longitude', 'Element',
                                 'FirstYear', 'LastYear']
                         #, dtypes={'WMO_ID':object}
                         )

In [3]:
inventoryDF[inventoryDF['StationID']=='USW00014942'].head()

Unnamed: 0,StationID,Latitude,Longitude,Element,FirstYear,LastYear
583588,USW00014942,41.3103,-95.8992,TMAX,1948,2018
583589,USW00014942,41.3103,-95.8992,TMIN,1948,2018
583590,USW00014942,41.3103,-95.8992,PRCP,1948,2018
583591,USW00014942,41.3103,-95.8992,SNOW,1948,2018
583592,USW00014942,41.3103,-95.8992,SNWD,1948,2018


In [14]:
# Write file to local directory
inventoryDF.to_csv('ghcnd-inventory-cleansed.csv',sep='\t')

if we're pulling in this data, we'll certainly join it to ghcnd-stations-cleansed.csv 
so that we can see stations and inventory data together

In [8]:
# read ghcnd-stations-cleansed.csv into DataFrame
stationsDF = pd.read_csv('ghcnd-stations-cleansed.csv',sep='\t')

stationsDF.head()

Unnamed: 0.1,Unnamed: 0,StationID,Latitude,Longitude,Elevation,State,StationName,GSN_Flag,HCN_CRN_Flag,WMO_ID,CountryCode,CountryName,StateName
0,0,ACW00011604,17.1167,-61.7833,10.1,,ST JOHNS COOLIDGE FLD,,,,AC,Antigua and Barbuda,
1,1,ACW00011647,17.1333,-61.7833,19.2,,ST JOHNS,,,,AC,Antigua and Barbuda,
2,2,AE000041196,25.333,55.517,34.0,,SHARJAH INTER. AIRP,GSN,,41196.0,AE,United Arab Emirates,
3,3,AEM00041194,25.255,55.364,10.4,,DUBAI INTL,,,41194.0,AE,United Arab Emirates,
4,4,AEM00041217,24.433,54.651,26.8,,ABU DHABI INTL,,,41217.0,AE,United Arab Emirates,


In [10]:
# remove duplicate columns from inventoryDF
inventoryDF = inventoryDF.filter(['StationID','Element','FirstYear','LastYear'])

In [18]:
# merge these two datasets
stations_inventory_DF = pd.merge(stationsDF,inventoryDF,on='StationID',how='left')

stations_inventory_DF.head()

Unnamed: 0.1,Unnamed: 0,StationID,Latitude,Longitude,Elevation,State,StationName,GSN_Flag,HCN_CRN_Flag,WMO_ID,CountryCode,CountryName,StateName,Element,FirstYear,LastYear
0,0,ACW00011604,17.1167,-61.7833,10.1,,ST JOHNS COOLIDGE FLD,,,,AC,Antigua and Barbuda,,TMAX,1949.0,1949.0
1,0,ACW00011604,17.1167,-61.7833,10.1,,ST JOHNS COOLIDGE FLD,,,,AC,Antigua and Barbuda,,TMIN,1949.0,1949.0
2,0,ACW00011604,17.1167,-61.7833,10.1,,ST JOHNS COOLIDGE FLD,,,,AC,Antigua and Barbuda,,PRCP,1949.0,1949.0
3,0,ACW00011604,17.1167,-61.7833,10.1,,ST JOHNS COOLIDGE FLD,,,,AC,Antigua and Barbuda,,SNOW,1949.0,1949.0
4,0,ACW00011604,17.1167,-61.7833,10.1,,ST JOHNS COOLIDGE FLD,,,,AC,Antigua and Barbuda,,SNWD,1949.0,1949.0


In [12]:
# write stations_inventory_DF to a .csv file in the local directory
stations_inventory_DF.to_csv('stations_inventory_DF.csv',sep='\t')