# Application - Geocoding

In this application lesson, we are going to go over how to quickly and easily <b>geocode</b> addresses using Python/Pandas. 

The Department of City Planning maintains a geocoding application called Geosupport Desktop Edition that you can download and use to query and geocode addresses, BBLs, or BINs. It can be found at the following link:
http://www1.nyc.gov/site/planning/data-maps/open-data/dwn-gde-home.page. They also have an online interface version that lets you query addresses one-by-one: 
http://a030-goat.nyc.gov/goat/Default.aspx.

In addition, the Department of Information Technology and Telecommunications has built a Geoclient API that essentially allows you to query the geocoding service manually. In order to use this API, you must first register for an account and request an API key. This can be done at the following link:
https://developer.cityofnewyork.us/api/geoclient-api.

To make things easier, John Krauss over at CARTO wrote Python bindings for DoITT’s Geoclient API that allows for very easy querying using Python. You still need to sign up for an API key using the link above. The GitHub repository for John Krauss’ Geoclient Python bindings can be found here:
https://github.com/talos/nyc-geoclient.

Finally, Deena Patel at MODA has written a few functions for both the DCP Geosupport Desktop Edition and John Krauss’ Geoclient bindings that allow for easy usage and querying with a Pandas DataFrame. Her GitHub repository can be found here:
https://github.com/deenapatel/geocode.

For today's lesson we are going to use the two GitHub repositories listed above. We will first look at John Krauss' Python bindings, and then see how Deena's functions can be used to make the geocoding process even easier.

First, let's import Pandas

In [1]:
import pandas as pd

Next, we'll read in some test data

In [2]:
data = pd.read_excel('test_geo.xlsx')

Let's look at our new data.

In [3]:
data

Unnamed: 0,Boro ID,Boro Full Name,House Number,Street Name
0,1,Manhattan,175,Eldridge Street
1,1,Manhattan,241,West 26th Street
2,4,Queens,39-39,Crescent Street
3,3,Brooklyn,333,Willoughby Avenue
4,2,Bronx,1214,Sheridan Avenue


Note that our data contains the fields Boro ID, Boro Full Name, House Number, and Street Name, but no BBL or BIN. Let's change that!

First, let's import the nyc_geoclient package that we downloaded from John Krauss' GitHub repo.

In [4]:
from nyc_geoclient import Geoclient

Next, make sure to input your geoclient API name and key. To make things easy, let's just use mine.

In [5]:
g = Geoclient('a812bc2d', 'geoclient')

The nyc_geoclient package has now stored our information and can use it to query the online API.

To use this package, all we have to do is input the address and borough of a particular residence. Let's try this on the first entry of our DataFrame.

In [6]:
g.address('333', 'Willoughby Avenue', 'Brooklyn')

{u'assemblyDistrict': u'57',
 u'bbl': u'3019130101',
 u'bblBoroughCode': u'3',
 u'bblTaxBlock': u'01913',
 u'bblTaxLot': u'0101',
 u'bikeLane': u'2',
 u'boardOfElectionsPreferredLgc': u'1',
 u'boePreferredStreetName': u'WILLOUGHBY AVENUE',
 u'boePreferredstreetCode': u'39323001',
 u'boroughCode1In': u'3',
 u'buildingIdentificationNumber': u'3394434',
 u'censusBlock2000': u'1001',
 u'censusBlock2010': u'2001',
 u'censusTract1990': u' 235  ',
 u'censusTract2000': u' 235  ',
 u'censusTract2010': u' 235  ',
 u'cityCouncilDistrict': u'33',
 u'civilCourtDistrict': u'03',
 u'coincidentSegmentCount': u'1',
 u'communityDistrict': u'303',
 u'communityDistrictBoroughCode': u'3',
 u'communityDistrictNumber': u'03',
 u'communitySchoolDistrict': u'14',
 u'condominiumBillingBbl': u'0000000000',
 u'congressionalDistrict': u'08',
 u'cooperativeIdNumber': u'0000',
 u'cornerCode': u'NE',
 u'crossStreetNamesFlagIn': u'E',
 u'dcpPreferredLgc': u'01',
 u'dotStreetLightContractorArea': u'3',
 u'dynamicBlock'

As you can see, the function returns a LOT of information. The information is returned in the form of a <b>dictionary</b>.

<b>Dictionaries</b> are sort of similar to DataFrames in that they store information that you can access via <b>keys</b> (similar to DataFrame columns).

For example, the first <b>key</b> of the dictionary above is 'assemblyDistrict', and the associated value is '57'.

If I were to store the function call above in a variable:

In [7]:
geocode_variable = g.address('333', 'Willoughby Avenue', 'Brooklyn')

I could then search for the value of a particular key:

In [8]:
geocode_variable['assemblyDistrict']

u'57'

We can already see that this function is quite good at geocoding individual addresses, but what about an entire DataFrame of addresses? We could try throwing everything into a loop and geocoding things one by one, but then reformatting the output might be a hassle to deal with. Fortunately for us, Deena has already written a very efficient function to do all of this work for us.

Let's import her downloaded function.

In [9]:
from geoclient import geoclientBatch

To use it, all we have to do is provide the name of our DataFrame, and then the names of our columns that correspond to "House Number", "Street Name", and "Boro ID".

In [10]:
result = geoclientBatch(data, houseNo='House Number', street='Street Name', boro='Boro ID')

In [11]:
result

Unnamed: 0,Boro ID,Boro Full Name,House Number,Street Name,geocodedBBL,geocodedBIN
0,1,Manhattan,175,Eldridge Street,1004200062,1005602
1,1,Manhattan,241,West 26th Street,1007760012,1014230
2,4,Queens,39-39,Crescent Street,4003960005,4004786
3,3,Brooklyn,333,Willoughby Avenue,3019130101,3394434
4,2,Bronx,1214,Sheridan Avenue,2024530003,2002684


There you have it! A fully geocoded DataFrame in only a couple of seconds. 

These functions are actually much more flexible and powerful than we've seen here. Deena's function is only currently written to accept street address and return BBL and BIN, but in reality it can be modified to return any geographic information based on any set of inputtedd address, BBL, or BIN. 