# Enhasing the Data with Census FIPS and Ecosystem Data
### Purpose
In this notebook I will add in columns to the working data set that will contain 1) The blocks FIPS and County FIPS and 2) data about what USGS claims is the ecosystem at each CBC location. 



### Author: 
Ren C'deBaca
### Date: 
2020-04-21
### Update Date: 
2020-04-21

### Inputs 
1.0-rec-initial-data-cleaning.txt - Tab seperated file of cleaned Christmas Bird Count events  Each row represents a single count in a given year. Data Dictonary can be found here: http://www.audubon.org/sites/default/files/documents/cbc_report_field_definitions_2013.pdf

np-circles-to-ecosys_data.csv - comma seperated file from Nathan Pavlovic(nathan.pavlovic@gmail.com). This file was produced by first passing Nathan a file of approximatly 4000 unique lat lon present in the clean data file. 

Nathan then used the 2008 USGS raster ecosystem dataset. Info here https://rmgsc.cr.usgs.gov/outgoing/ecosystems/USdata/  

He used the Extract Values to Points tool in ArcGIS to find the raster value at each point. 



### Output Files
1.3-rec-connecting-fips-data.csv -- CSV file of the unique lat lons present in cbc data. Each Lat lon is matched to a Block FIPS and County FIPS. (This is the file that was shared with Nathan) 

1.3-rec-connecting-fips-ecosystem-data -- CSV file of the cleaned cbc data with added columns for Ecosystem data 


## Steps or Proceedures in the notebook 
1. Load in the cleaned data 
2. Identify the unique Lat Lons present in the cbc circle locations 
3. 
    OPTION 1: Send the unique lat lons though a census API to find the block and county fips 
    OPTION 2: Load in the saved census FIPS data
4. Create a key to merge the lat long with the cleaned data then merge the census data with the cleaned data 
5. Load in Ecological Data from Nathan
6. Create a key to merge the lat long with the cleaned data then merge the files 


## Where the Data will Be Saved 
The raw ecosystem data and the output data will be saved in the Google Drive Folder
https://drive.google.com/drive/folders/1Nlj9Nq-_dPFTDbrSDf94XMritWYG6E2I

The path should look like this: 
audubon-cbc/data/Cloud_Data/<DATA FILE>

## Reference
    https://geo.fcc.gov/api/census/#!/block/get_block_find


In [144]:
# Imports
import os
from datetime import datetime
# Version .24.0
from google.cloud import bigquery
import pandas as pd
import pandas
import requests
import time
import numpy as np

In [145]:
# ALL File Paths should be declared at the TOP of the notebook
PATH_TO_CLEAN_CBC_DATA = "../data/Cloud_Data/1.0-rec-initial-data-cleaning.txt"
PATH_TO_ECO_DATA = "../data/np-circles-to-ecosys_data.csv"

In [146]:
clean_data = pd.read_csv(PATH_TO_CLEAN_CBC_DATA, encoding = "ISO-8859-1", sep="\t")

  interactivity=interactivity, compiler=compiler, result=result)


In [147]:
clean_data.shape

(89568, 48)

In [148]:
clean_data.head()

Unnamed: 0.1,Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,...,max_snow_metric,max_snow_imperial,min_temp_imperial,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial
0,2,Pacific Grove,US-CA,36.6167,-121.9167,1901,12/25/00,1.0,,,...,,,,,,,,,,
1,3,Pueblo,US-CO,38.175251,-104.519575,1901,12/25/00,1.0,,,...,,,,,,,,,,
2,4,Bristol,US-CT,41.6718,-72.9495,1901,12/25/00,2.0,,,...,,,,,,,,,,
3,5,Norwalk,US-CT,41.1167,-73.4,1901,12/25/00,1.0,,,...,,,,,,,,,,
4,6,Glen Ellyn,US-IL,41.8833,-88.0667,1901,12/25/00,1.0,,,...,,,,,,,,,,


In [149]:
clean_data['temp_key_str'] = clean_data['lat'].astype(str) + clean_data['lon'].astype(str)

In [150]:
clean_data['temp_key_str'].nunique()

4531

## Census Data 
There are two options here
OPTION 1: Send the unique lat lons though a census API to find the block and county fips 
OPTION 2: Load in the saved census FIPS data

## Option 1: Load in the saved census FIPS data 

In [None]:
## Option: Uncommonet the next section to load data from file
# smol = pd.read_csv("1.3-rec-connecting-fips-data.csv")
# smol = smol[["lat", "lon", "block_fips", "county_fips"]]
# smol['temp_key_str'] = smol['lat'].astype(str) + smol['lon'].astype(str)
# print(clean_data.shape)
# smol.head()

## Option 2: Run the data through the census API (Note: Takes a few hours) 

In [151]:
# Create a small dataframe of unique lat lon location to use with cencus data 
smol = clean_data[['temp_key_str', 'lat', 'lon']]

In [152]:
smol.shape

(89568, 3)

In [57]:
# Drop duplicate rows 
smol = smol.drop_duplicates(subset=['lat', 'lon'], keep= 'first') 

In [58]:
smol.shape

(4531, 3)

### Create a test call to the API to see how the data comes back 

In [59]:
# Test Lat and Lon
lat = 51.409713
lon = 179.284881

BASE_URL = "https://geo.fcc.gov/api/census/block/find?format=json&latitude=%s&longitude=%s"
url = BASE_URL % (lat, lon)

payload = {}
headers= {}

response = requests.request("GET", url, headers=headers, data = payload)

print(response.text.encode('utf8'))

b'{"Block":{"FIPS":"020160001001519","bbox":[178.616868,51.348718,179.467581,51.661935]},"County":{"FIPS":"02016","name":"Aleutians West"},"State":{"FIPS":"02","code":"AK","name":"Alaska"},"status":"OK","executionTime":"0"}'


### Build a loop to build of list of results from the census API to get the block FIPS code and county FIPS code

In [None]:
result_list = []
county_result_list = []

BASE_URL = "https://geo.fcc.gov/api/census/block/find?format=json&latitude=%s&longitude=%s"

TIME_DELAY = 2

for index, row in smol.iterrows():
    block_fips = ''
    county_fips = ''
    
    lat = row['lat']
    lon = row['lon']
    
    url = BASE_URL % (lat, lon)
    payload = {}
    headers= {}
    response = requests.request("GET", url, headers=headers, data = payload)

    try:
        block_fips = response.json()['Block']['FIPS']
        county_fips = response.json()['County']['FIPS']
    except:
        "Could not get FIPS "
        
    result_list.append(block_fips)
    county_result_list.append(county_fips)
    
    time.sleep(TIME_DELAY)


In [None]:
print(len(result_list))
print(len(county_result_list))


In [None]:
# Turn the result list into arrays 
result_arry = pd.Series(result_list)
county_array = pd.Series(county_result_list)

In [None]:
# Add the series into the data frame 
smol['block_fips'] = result_arry.values
smol['county_fips'] = county_array.values

In [None]:
smol.head

### Choose to save the file, or load the data from file

In [None]:
## Save the data to a file 
#smol.to_csv('1.3-rec-connecting-fips-data.csv')

### Merge in the FIPS data with the clean data

In [124]:
# Merge in the FIPS data with the clean data
clean_data = pd.merge(clean_data, smol[["temp_key_str", "block_fips", "county_fips"]], how="left", left_on="temp_key_str", right_on="temp_key_str")



In [125]:
print(clean_data.shape)
clean_data.head()

(89568, 51)


Unnamed: 0.1,Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,...,max_temp_imperial,min_temp_metric,max_temp_metric,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,temp_key_str,block_fips,county_fips
0,2,Pacific Grove,US-CA,36.6167,-121.9167,1901,12/25/00,1.0,,,...,,,,,,,,36.6167-121.9167,60530120000000.0,6053.0
1,3,Pueblo,US-CO,38.175251,-104.519575,1901,12/25/00,1.0,,,...,,,,,,,,38.175251-104.519575,81010030000000.0,8101.0
2,4,Bristol,US-CT,41.6718,-72.9495,1901,12/25/00,2.0,,,...,,,,,,,,41.6718-72.9495,90034060000000.0,9003.0
3,5,Norwalk,US-CT,41.1167,-73.4,1901,12/25/00,1.0,,,...,,,,,,,,41.1167-73.4,90010440000000.0,9001.0
4,6,Glen Ellyn,US-IL,41.8833,-88.0667,1901,12/25/00,1.0,,,...,,,,,,,,41.8833-88.0667,170438400000000.0,17043.0


# Add Ecosystem Data to the Working Dataset

### Notes: The file 1.3-rec-connecting-fips-data.csv is the file I passed to Nathan for Ecosystem Processing.  He then returned to me a dataset with the ecosystem data added as columns. The next section will proceed to to add in the ecosystme data  

## Load in Ecosystem data 

In [126]:
eco_data = pd.read_csv(PATH_TO_ECO_DATA)

In [127]:
eco_data.shape

(4531, 15)

In [128]:
# Take the Columns we Need
eco_data = eco_data[["lat","lon","Ecosys", "Usgsid_sys", "Nlcd_code", "Nlcd"]]

In [129]:
eco_data.head()

Unnamed: 0,lat,lon,Ecosys,Usgsid_sys,Nlcd_code,Nlcd
0,36.6167,-121.9167,66.0,66_California Coastal Live Oak Woodland and Sa...,3.0,Steppe/Savanna
1,38.175251,-104.519575,274.0,274_Western Great Plains Shortgrass Prairie,4.0,Herbaceous
2,41.6718,-72.9495,300.0,300_Appalachian (Hemlock)-Northern Hardwood Fo...,1.0,Forest and Woodland
3,41.1167,-73.4,487.0,487_Northern Atlantic Coastal Plain Pitch Pine...,1.0,Forest and Woodland
4,41.8833,-88.0667,254.0,254_North-Central Interior Beech-Maple Forest,1.0,Forest and Woodland


In [130]:
# Create a tempor
eco_data['temp_key_str'] = eco_data['lat'].astype(str) + eco_data['lon'].astype(str)


In [131]:
eco_data.head()

Unnamed: 0,lat,lon,Ecosys,Usgsid_sys,Nlcd_code,Nlcd,temp_key_str
0,36.6167,-121.9167,66.0,66_California Coastal Live Oak Woodland and Sa...,3.0,Steppe/Savanna,36.6167-121.9167
1,38.175251,-104.519575,274.0,274_Western Great Plains Shortgrass Prairie,4.0,Herbaceous,38.175251-104.519575
2,41.6718,-72.9495,300.0,300_Appalachian (Hemlock)-Northern Hardwood Fo...,1.0,Forest and Woodland,41.6718-72.9495
3,41.1167,-73.4,487.0,487_Northern Atlantic Coastal Plain Pitch Pine...,1.0,Forest and Woodland,41.1167-73.4
4,41.8833,-88.0667,254.0,254_North-Central Interior Beech-Maple Forest,1.0,Forest and Woodland,41.8833-88.0667


In [139]:
clean_data = pd.merge(clean_data, eco_data[["temp_key_str","Ecosys", "Usgsid_sys", "Nlcd_code", "Nlcd"]], how="left", left_on= "temp_key_str", right_on = "temp_key_str")


In [140]:
clean_data.shape

(89568, 55)

In [141]:
# Drop the key 
clean_data = clean_data.drop("temp_key_str",axis=1)

In [142]:
clean_data

Unnamed: 0.1,Unnamed: 0,circle_name,country_state,lat,lon,count_year,count_date,n_field_counters,n_feeder_counters,min_field_parties,...,min_wind_metric,max_wind_metric,min_wind_imperial,max_wind_imperial,block_fips,county_fips,Ecosys,Usgsid_sys,Nlcd_code,Nlcd
0,2,Pacific Grove,US-CA,36.616700,-121.916700,1901,12/25/00,1.0,,,...,,,,,6.053012e+13,6053.0,66.0,66_California Coastal Live Oak Woodland and Sa...,3.0,Steppe/Savanna
1,3,Pueblo,US-CO,38.175251,-104.519575,1901,12/25/00,1.0,,,...,,,,,8.101003e+13,8101.0,274.0,274_Western Great Plains Shortgrass Prairie,4.0,Herbaceous
2,4,Bristol,US-CT,41.671800,-72.949500,1901,12/25/00,2.0,,,...,,,,,9.003406e+13,9003.0,300.0,300_Appalachian (Hemlock)-Northern Hardwood Fo...,1.0,Forest and Woodland
3,5,Norwalk,US-CT,41.116700,-73.400000,1901,12/25/00,1.0,,,...,,,,,9.001044e+13,9001.0,487.0,487_Northern Atlantic Coastal Plain Pitch Pine...,1.0,Forest and Woodland
4,6,Glen Ellyn,US-IL,41.883300,-88.066700,1901,12/25/00,1.0,,,...,,,,,1.704384e+14,17043.0,254.0,254_North-Central Interior Beech-Maple Forest,1.0,Forest and Woodland
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89563,106920,Pinedale,US-WY,42.866698,-109.860301,2018,12/31/17,24.0,5.0,2.0,...,0.000000,16.092694,0.0,10.0,5.603500e+14,56035.0,191.0,191_Inter-Mountain Basins Big Sagebrush Shrubland,2.0,Shrubland
89564,106921,Riverton,US-WY,43.024622,-108.380601,2018,12/28/17,14.0,1.0,4.0,...,0.000000,0.000000,0.0,0.0,,,,,,
89565,106922,Sheridan,US-WY,44.808634,-106.975791,2018,12/17/17,32.0,5.0,9.0,...,0.000000,8.046347,0.0,5.0,,,,,,
89566,106923,Story-Big Horn,US-WY,44.588955,-106.941551,2018,12/30/17,26.0,11.0,23.0,...,0.000000,24.139041,0.0,15.0,5.603300e+14,56033.0,168.0,168_Northern Rocky Mountain Mesic Montane Mixe...,1.0,Forest and Woodland


In [153]:
# Save the data
clean_data.to_csv('1.3-rec-connecting-fips-ecosystem-data.csv')