# Using NYC Geocoder in Python

The Department of City Planning (DCP) maintains **GeoSupport** which is geocoding package for processing NYC specific geographic information. The Desktop version is updated a few times a year and is available on their website: http://www1.nyc.gov/site/planning/data-maps/open-data/dwn-gde-home.page

The Department of Information Technology and Telecommunication's GIS team (DoITT GIS) maintains **GeoClient**. The Geoclient API is a RESTful web service interface to DCP's Geosupport. https://developer.cityofnewyork.us/api/geoclient-api


## Using GeoClient

In order to get at GeoClient, I'm using a python wrapper https://github.com/talos/nyc-geoclient
Also you have to register at DoITT's website to get an App ID and key in order to use the GeoClient API.

In [1]:
from nyc_geoclient import Geoclient

myAppID = 'fb9ad04a'
myKey = '051f93e4125df4bae4f7c57517e62344'

g = Geoclient(myAppID,myKey)

help(g.address)

Help on method address in module nyc_geoclient.api:

address(self, houseNumber, street, borough) method of nyc_geoclient.api.Geoclient instance
    Given a valid address, provides blockface-level, property-level, and
    political information.
    
    :param houseNumber:
        The house number to look up.
    :param street:
        The name of the street to look up.
    :param borough:
        The borough to look within.  Must be 'Bronx', 'Brooklyn',
        'Manhattan', 'Queens', or 'Staten Island' (case-insensitive).
    
    :returns: A dict with blockface-level, property-level, and political
        information.



In [2]:
# Address and boro as input:
g.address(253,'Broadway','manhattan')

{u'alleyCrossStreetsFlag': u'X',
 u'assemblyDistrict': u'66',
 u'bbl': u'1001347501',
 u'bblBoroughCode': u'1',
 u'bblTaxBlock': u'00134',
 u'bblTaxLot': u'7501',
 u'blockfaceId': u'0212261726',
 u'boardOfElectionsPreferredLgc': u'1',
 u'boePreferredStreetName': u'BROADWAY',
 u'boePreferredstreetCode': u'11361001',
 u'boroughCode1In': u'1',
 u'buildingIdentificationNumber': u'1082757',
 u'censusBlock2000': u'1010',
 u'censusBlock2010': u'1004',
 u'censusTract1990': u'  21  ',
 u'censusTract2000': u'  21  ',
 u'censusTract2010': u'  21  ',
 u'cityCouncilDistrict': u'01',
 u'civilCourtDistrict': u'01',
 u'coincidentSegmentCount': u'1',
 u'communityDistrict': u'101',
 u'communityDistrictBoroughCode': u'1',
 u'communityDistrictNumber': u'01',
 u'communitySchoolDistrict': u'02',
 u'condominiumBillingBbl': u'1001347501',
 u'condominiumFlag': u'C',
 u'congressionalDistrict': u'10',
 u'continuousParityIndicator1e': u'L',
 u'cooperativeIdNumber': u'0000',
 u'crossStreetNamesFlagIn': u'E',
 u'dc

## Batch GeoClient
Using geoclient to geocode multiple addresses. In this case we are returning BIN,BBL from the geoclient output.

In [3]:
# let's download PLUTO http://www1.nyc.gov/site/planning/data-maps/open-data.page
# use as an example of how to geocode multiple addresses
# obviously you could use any dataframe with addresses, but I work with PLUTO often.

import pandas as pd
pd.set_option('max_columns',100)

# download PLUTO to use as an example list of addresses
# read in PLUTO from open data, only including a few columns and rows
url = 'https://data.cityofnewyork.us/resource/64uk-42ks.json'
filters = "?$select=bbl,borocode,address,bldgclass,ownername,numbldgs,lotarea\
&$limit=1000"
pluto = pd.read_json(url+filters)

# since pluto doesn't display the address number and street name separately, let's break them out:
pluto['houseNo'] = pluto['address'].str.extract('(^[0-9|-]*)',expand=False)
pluto['street'] = pluto['address'].str.extract('(\s.+$)',expand=False)

pluto.head()


Unnamed: 0,address,bbl,bldgclass,borocode,lotarea,numbldgs,ownername,houseNo,street
0,2222 HOLLERS AVENUE,2052700001,Z9,2,5000.0,1.0,GRILO ENTERPRISES INC,2222,HOLLERS AVENUE
1,3512 HEATHCOTE AVENUE,2052860040,Z9,2,15000.0,1.0,PEARTREE AUTO WRECKERS INC,3512,HEATHCOTE AVENUE
2,161 STREET,4097640087,G7,4,8763.0,0.0,"HILLSIDE AVENUE DEVELOPMENT, LLC",161,STREET
3,117-37 LINCOLN STREET,4116950026,G0,4,1855.0,0.0,ABRAHAM TONY,117-37,LINCOLN STREET
4,THOMAS STREET,3028010032,G7,3,5525.0,0.0,PEERLESS EQUITIES LLC,,STREET


In [4]:
#geoclientBatch() returns the BIN, BBL fields from the geoclient output, from a dataframe of addresses

from geoclient import geoclientBatch

df = geoclientBatch(pluto[:20], houseNo='houseNo',street='street',boro='BoroCode')

# BBL field is in the original PLUTO file, geocodedBBL and geocodedBIN are the  new fields
df[['houseNo','street','borocode','bbl','geocodedBBL','geocodedBIN']]

Unnamed: 0,houseNo,street,borocode,bbl,geocodedBBL,geocodedBIN
0,2222,HOLLERS AVENUE,2,2052700001,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
1,3512,HEATHCOTE AVENUE,2,2052860040,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
2,161,STREET,4,4097640087,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
3,117-37,LINCOLN STREET,4,4116950026,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
4,,STREET,3,3028010032,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
5,615,2 AVENUE,1,1009140032,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
6,510,5 AVENUE,1,1012589040,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
7,160,STREET,4,4045790022,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
8,338,EAST 150 STREET,2,2023310036,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
9,2451,EAST 26 STREET,3,3074220754,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>


In [5]:
# Because this works by sending each address individually to the geoclient over the network,
# if you are trying to do more than a few thousand addresses, it can take a long time.
# Let's see how long to do 100 records:
%timeit df = geoclientBatch(pluto[:100], houseNo='houseNo',street='street',boro='BoroCode')

10 loops, best of 3: 149 ms per loop


## GeoSupport
For geocoding more than a few thousand addresses, it is better to download and install DCP's desktop GeoSupport. Luckily, they released a linux version. Which is great (Thank you DCP!!!), but it comes with an interactive interface, making it a little tricky to get at via pythyon. So I created a python interface where you can input a single address and it returns the GeoSupport output.

In [6]:
from geosupport import geosupport, geosupportBatch

geosupport(1,100,'gold street')

["*****  Enter 'X' for Extended Work Area:  \n",
 '\n',
 'Function 1A GRC = 00\n',
 'Error Message =                                                                                 \n',
 '\n',
 '\n',
 '[  0]: ---------0---------1---------2---------3---------4---------5---------6---------7\n',
 '[  1]: 12345678901234567890123456789012345678901234567890123456789012345678901234567890\n',
 '[  2]:                                                                                 \n',
 '[  3]: Access Key                       \t 011213502800102000AA \n',
 '[  4]: Continuous Parity Indicator      \t  \n',
 '[  5]: Low Housenum Of Key              \t 000096000AA\n',
 '[  6]: Bbl                              \t 1000940025\n',
 '[  7]:    Borough Code                  \t 1\n',
 '[  8]:    Tax Block                     \t 00094\n',
 '[  9]:    Tax Lot                       \t 0025\n',
 '[ 10]: Tax Lot Version Number (Nyi)     \t  \n',
 '[ 11]: Rpad Self-Check Code (Scc) - Bbl \t 0\n',
 '[ 12]: Fill

You can see it returns a ton of information.

Mainly I'm interested in BIN,BBL and usually I want to geocode multiple addresses. So for that purpose I created geosupportBatch().

In [7]:
df = geosupportBatch(pluto, houseNo='houseNo',street='street',boro='borocode')
df[['houseNo','street','borocode','bbl','geocodedBBL','geocodedBIN']].head()

Unnamed: 0,houseNo,street,borocode,bbl,geocodedBBL,geocodedBIN
0,2222,HOLLERS AVENUE,2,2052700001,2052700001,2093880
1,3512,HEATHCOTE AVENUE,2,2052860040,2052860040,2129539
2,161,STREET,4,4097640087,Error Message = ' STREET' NOT RECOGNIZED. THER...,Error Message = ' STREET' NOT RECOGNIZED. THER...
3,117-37,LINCOLN STREET,4,4116950026,4116950026,4000000
4,,STREET,3,3028010032,Error Message = ' STREET' NOT RECOGNIZED. THER...,Error Message = ' STREET' NOT RECOGNIZED. THER...


In [9]:
# it works similar to geoclientBatch, but so much faster!
%timeit df = geosupportBatch(pluto[:100], houseNo='houseNo',street='street',boro='borocode')

1 loop, best of 3: 1.37 s per loop
