# Census geocoder

In one of our proposed use cases, we want to find data on census tracts that a list of point data is drawn from.  Commonly this point data is represented as an address.  Obviously to get demographic information from the census tracts containing these points, we need to know how to identify those tracts.  To do this, we will need to use geocoding to convert addresses into tract identities.

Here we will try out the [python-omgeo](https://github.com/azavea/python-omgeo/) module.

In [6]:
from omgeo import Geocoder

g = Geocoder([['omgeo.services.USCensus',{}]])
result = g.geocode('1600 Pennsylvania Ave NW, Washington DC')

In [7]:
print result

{'upstream_response_info': [<USCensus 559ms>], 'candidates': [<1600 PENNSYLVANIA AVE NW, WASHINGTON, DC, 20502 (-77.03535, 38.898754) USCensus>]}


Looks like the omgeo module gives coordinates, but not tract number like we need.

Another possible source is the [Census Geocoding Services](http://www.census.gov/data/developers/data-sets/Geocoding-services.html).  The API documentation is [here](http://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.pdf)

In [48]:
import requests, json

street = "1600 NW Pennsylvania Ave"
city = "Washington DC"
state = "DC"

payload = {'street': street, 'city': city, 'state': state, 'benchmark': 'Public_AR_Current', 'vintage':'ACS2013_Current', 'format':'json'}

base_url = "http://geocoding.geo.census.gov/geocoder/geographies/address"

r = requests.get(base_url, params=payload)
json_data = json.loads(r.text)

print(r.url)
print(r.text)

http://geocoding.geo.census.gov/geocoder/geographies/address?city=Washington+DC&format=json&vintage=ACS2013_Current&benchmark=Public_AR_Current&state=DC&street=1600+NW+Pennsylvania+Ave
{"result":{"input":{"address":{"street":"1600 NW Pennsylvania Ave","city":"Washington DC","state":"DC"},"benchmark":{"id":"4","benchmarkName":"Public_AR_Current","benchmarkDescription":"Public Address Ranges - Current Benchmark","isDefault":false},"vintage":{"id":"413","vintageName":"ACS2013_Current","vintageDescription":"ACS2013 Vintage - Current Benchmark","isDefault":false}},"addressMatches":[{"matchedAddress":"1600 PENNSYLVANIA AVE NW, WASHINGTON, DC, 20502","coordinates":{"x":-77.03535,"y":38.898754},"tigerLine":{"tigerLineId":"76225813","side":"L"},"addressComponents":{"fromAddress":"1600","toAddress":"1698","preQualifier":"","preDirection":"","preType":"","streetName":"PENNSYLVANIA","suffixType":"AVE","suffixDirection":"NW","suffixQualifier":"","city":"WASHINGTON","state":"DC","zip":"20502"},"geog

In [49]:
json_data['result']

{u'addressMatches': [{u'addressComponents': {u'city': u'WASHINGTON',
    u'fromAddress': u'1600',
    u'preDirection': u'',
    u'preQualifier': u'',
    u'preType': u'',
    u'state': u'DC',
    u'streetName': u'PENNSYLVANIA',
    u'suffixDirection': u'NW',
    u'suffixQualifier': u'',
    u'suffixType': u'AVE',
    u'toAddress': u'1698',
    u'zip': u'20502'},
   u'coordinates': {u'x': -77.03535, u'y': 38.898754},
   u'geographies': {u'Census Tracts': [{u'AREALAND': 6586010,
      u'AREAWATER': 4982522,
      u'BASENAME': u'62.02',
      u'CENTLAT': u'+38.8802246',
      u'CENTLON': u'-077.0353387',
      u'COUNTY': u'001',
      u'FUNCSTAT': u'S',
      u'GEOID': u'11001006202',
      u'INTPTLAT': u'+38.8809933',
      u'INTPTLON': u'-077.0363219',
      u'LSADC': u'CT',
      u'MTFCC': u'G5020',
      u'NAME': u'Census Tract 62.02',
      u'OBJECTID': 30454,
      u'OID': 20753331304119L,
      u'STATE': u'11',
      u'TRACT': u'006202'}],
    u'Counties': [{u'AREALAND': 158364924,

In [50]:
json_data['result']['addressMatches'][0]['geographies']['Census Tracts'][0]['GEOID']

u'11001006202'

## Getting single-address FIPS with a function

Now that we have roughly figured out how to make a request for a single address, we will define it as a function.

In [89]:
def get_fips_from_address(street, city, state):
    payload = {'street': street, 'city': city, 'state': state, 'benchmark': 'Public_AR_Current', 'vintage':'ACS2013_Current', 'format':'json'}

    base_url = "http://geocoding.geo.census.gov/geocoder/geographies/address"

    r = requests.get(base_url, params=payload)
    json_data = json.loads(r.text)
    
    fips = json_data['result']['addressMatches'][0]['geographies']['Census Tracts'][0]['GEOID']
    
    return str(fips)

In [90]:
print get_fips_from_address('1600 Pennsylvania Ave NW', 'Washington DC', 'DC')

11001006202


## Getting FIPS by batch processing

Performing geocoding one address at a time is very slow.  Ideally we would pass a batch of addresses all at once.

According to the documentation, batch processing requires submitting a .CSV file with the following components:

`Unique ID, Street address, City, State, ZIP`

Our existing whole foods addresses aren't quite like that, so we will need to import them, create a new CSV with the proper ordering and then submit the batch process request.

# Converting all Whole Foods addresses

In [62]:
import pandas as pd
input_file = './data/wholefoods_addresses.csv'

address_df = pd.read_csv(input_file)


'Monterey'

In [None]:
fips_codes = []

for row in range(0, len(address_df)):
    street = address_df.iloc[row]["street"]
    city = address_df.iloc[row]["city"]
    state = address_df.iloc[row]["state"]
    
    try:
        fips = get_fips_from_address(street, city, state)    
    except:
        continue
    
    fips_codes.append(fips)



In [88]:
import csv

outfile = "./data/wholefoods_tracts.csv"
with open(outfile, 'w') as f:
    writer = csv.writer(f, lineterminator='\n')
    for val in fips_codes:
        writer.writerow([val])