# Get Census Tract Information
This Jupyter Notebook explores how to extract Census information for latitude-longitude and state pairings from the Red Cross Disaster Cases file. This notebook translates the method used in get_geocodes.R to Python (3.6).

This script reverse geocodes lat-long coordinates to find census tracts.
The overall approach finds the tract polygons for each lat-long coordinate and extracts census tract and block information from state tract shapefiles. Coordinates that were not localized to a tract in a state shapefile were reverse geocoded using the Phase 1 API technique.

### Procedure:
1. Download shapefiles from Census.gov, store in a folder
2. Intersect lat-long coordinate with appropriate state shapefile to find Census polygon
3. Extract Census information from Census tract

### Inputs:
- FTP TIGER/LINE shapefile, state census tract2010: ftp://ftp2.census.gov/geo/tiger/TIGER2010/TRACT/2010/ (Downloaded by the code below)
- State FIPs code, state_FIPs_codes.txt: https://www.census.gov/geo/reference/ansi_statetables.html (under "National FIPS and GNIS Codes File" tab, Manually downloaded and saved as a TXT file)
- 2009-2014_RedCross_DisasterCases.csv: Downloaded from Phase 1 data folder in the DKDC RC Google Drive

### Outputs:
- 2009_2014_RedCross_DisasterCases_with_census_data.csv: The original case data (2009-2014_RedCross_DisasterCases.csv) with census data as additional columns.

In [73]:
# Import modules
import os
import pandas as pd
#import urllib
#import urllib2 # download files

# Set directories
root_folder = os.path.abspath("../")
data_folder = os.path.abspath("../../") + '/data'
output_folder = root_folder + '/phase2/output'

In [26]:
# Load Red Cross data into dataframe
redcross_disaster_cases = pd.read_csv(data_folder + '/2009-2014_RedCross_DisasterCases.csv',
                                      encoding = "ISO-8859-1",
                                      error_bad_lines = False)

# Load State FIPs codes
state_fips = pd.read_csv(data_folder + '/state_FIPs_codes.txt', sep="|")

# Make list of unique elements in 'esri_state' column
dataset_state_list = list(redcross_disaster_cases['esri_state'].unique())

  interactivity=interactivity, compiler=compiler, result=result)


# Download Shapefiles
Download state shapefiles from Census.gov . First, URLs for each state is built using the state's FIPs number. Next, shapefiles from Census.gov are downloaded and stored in the folder 'shapefiles_tract'.

In [65]:
# Build URLs to FTP shapefiles from Census.gov
base_url = "ftp://ftp2.census.gov/geo/tiger/TIGER2010/TRACT/2010/"
sf_names = []
url = []
counter = 1
for abbrev in dataset_state_list:
  if abbrev in state_fips['STUSAB'].tolist():
    fips_temp = state_fips.loc[state_fips['STUSAB'] == abbrev,'STATE']
    sf_names.append("tl_2010_" + "%02d"%fips_temp + "_tract10.zip")
    url.append(base_url + sf_names[-1])

In [70]:
# Check to see if shapefiles folder exists in the data folder, create if missing
if not os.path.exists(data_folder + '/shapefiles_tract'):
    os.makedirs(data_folder + '/shapefiles_tract')

In [None]:
"""
Shapefiles are stored in the RCP2 Google Drive folder. To download
again from the original source see below:

Download shapefiles for each state. There is some hang time/ timeouts
occassionally. This occurs when loading in the browser also timeouts,
so this is a FTP/ web hosting problem and not a code problem. Just
restart loop every time it crashes...which happens often

TO DO: find more stable links on Census.gov- these often time out
"""

# R code
#for (i in 1:length(url)){
#    if (sf_names[i] %!in% dir(paste(data_folder,'/shapefiles_tract',sep=''))){
#      download.file(url[i],paste(data_folder,'/shapefiles_tract/',sf_names[i],sep = ''),mode = "wb")
#      unzip(paste(data_folder,'/shapefiles_tract/',sf_names[i],sep=''),
#            exdir = paste(data_folder,'/shapefiles_tract',sep=''))
#    }
#}