# Using NYC Geocoder in Python

The Department of City Planning (DCP) maintains **GeoSupport** which is geocoding package for processing NYC specific geographic information. The Desktop version is updated a few times a year and is available on their website: http://www1.nyc.gov/site/planning/data-maps/open-data/dwn-gde-home.page

The Department of Information Technology and Telecommunication's GIS team (DoITT GIS) maintains **GeoClient**. The Geoclient API is a RESTful web service interface to DCP's Geosupport. https://developer.cityofnewyork.us/api/geoclient-api


## Using GeoClient

In order to get at GeoClient, I'm using a python wrapper https://github.com/talos/nyc-geoclient
Also you have to register at DoITT's website to get an App ID and key in order to use the GeoClient API.

In [1]:
from nyc_geoclient import Geoclient

myAppID = 'fb9ad04a'
myKey = '051f93e4125df4bae4f7c57517e62344'

g = Geoclient(myAppID,myKey)

help(g.address)

Help on method address in module nyc_geoclient.api:

address(self, houseNumber, street, borough) method of nyc_geoclient.api.Geoclient instance
    Given a valid address, provides blockface-level, property-level, and
    political information.
    
    :param houseNumber:
        The house number to look up.
    :param street:
        The name of the street to look up.
    :param borough:
        The borough to look within.  Must be 'Bronx', 'Brooklyn',
        'Manhattan', 'Queens', or 'Staten Island' (case-insensitive).
    
    :returns: A dict with blockface-level, property-level, and political
        information.



In [2]:
# Address and boro as input:
g.address(253,'Broadway','manhattan')



{u'alleyCrossStreetsFlag': u'X',
 u'assemblyDistrict': u'66',
 u'bbl': u'1001347501',
 u'bblBoroughCode': u'1',
 u'bblTaxBlock': u'00134',
 u'bblTaxLot': u'7501',
 u'boardOfElectionsPreferredLgc': u'1',
 u'boePreferredStreetName': u'BROADWAY',
 u'boePreferredstreetCode': u'11361001',
 u'boroughCode1In': u'1',
 u'buildingIdentificationNumber': u'1082757',
 u'censusBlock2000': u'1010',
 u'censusBlock2010': u'1004',
 u'censusTract1990': u'  21  ',
 u'censusTract2000': u'  21  ',
 u'censusTract2010': u'  21  ',
 u'cityCouncilDistrict': u'01',
 u'civilCourtDistrict': u'01',
 u'coincidentSegmentCount': u'1',
 u'communityDistrict': u'101',
 u'communityDistrictBoroughCode': u'1',
 u'communityDistrictNumber': u'01',
 u'communitySchoolDistrict': u'02',
 u'condominiumBillingBbl': u'1001347501',
 u'condominiumFlag': u'C',
 u'congressionalDistrict': u'10',
 u'continuousParityIndicator1a': u'L',
 u'continuousParityIndicator1e': u'L',
 u'cooperativeIdNumber': u'0000',
 u'crossStreetNamesFlagIn': u'E'

## Batch GeoClient
Using geoclient to geocode multiple addresses. In this case we are returning BIN,BBL from the geoclient output.

In [3]:
# let's download PLUTO http://www1.nyc.gov/site/planning/data-maps/open-data.page
# use as an example of how to geocode multiple addresses
# obviously you could use any dataframe with addresses, but I work with PLUTO often.

import pandas as pd
pd.set_option('max_columns',100)

pluto = pd.read_csv('/home/deena/Documents/data_munge/pluto/nyc_pluto_16v1/MN.csv')
# since pluto doesn't display the address number and street name separately, let's break them out:
pluto['houseNo'] = pluto['Address'].str.extract('(^[0-9|-]*)',expand=False)
pluto['street'] = pluto['Address'].str.extract('(\s.+$)',expand=False)

pluto.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,Borough,Block,Lot,CD,CT2010,CB2010,SchoolDist,Council,ZipCode,FireComp,PolicePrct,HealthArea,SanitBoro,SanitDistrict,SanitSub,Address,ZoneDist1,ZoneDist2,ZoneDist3,ZoneDist4,Overlay1,Overlay2,SPDist1,SPDist2,LtdHeight,AllZoning1,AllZoning2,SplitZone,BldgClass,LandUse,Easements,OwnerType,OwnerName,LotArea,BldgArea,ComArea,ResArea,OfficeArea,RetailArea,GarageArea,StrgeArea,FactryArea,OtherArea,AreaSource,NumBldgs,NumFloors,UnitsRes,UnitsTotal,LotFront,LotDepth,BldgFront,BldgDepth,Ext,ProxCode,IrrLotCode,LotType,BsmtCode,AssessLand,AssessTot,ExemptLand,ExemptTot,YearBuilt,BuiltCode,YearAlter1,YearAlter2,HistDist,Landmark,BuiltFAR,ResidFAR,CommFAR,FacilFAR,BoroCode,BBL,CondoNo,Tract2010,XCoord,YCoord,ZoneMap,ZMCode,Sanborn,TaxMap,EDesigNum,APPBBL,APPDate,PLUTOMapID,Version,houseNo,street
0,MN,1,10,101,5.0,1018.0,2.0,1.0,10004.0,E007,1.0,8100.0,1.0,1.0,,1 GOVERNORS ISLAND,R3-2,,,,,,GI,,,R3-2/GI,,N,Y4,8.0,0,P,GOVERNORS ISLAND CORP,7736692,2725731,2725731,0,0,0,0,0,0,2725731,2,158,0.0,0,0,0.0,0.0,0.0,0.0,,0.0,N,5.0,5.0,104445450,156510900,104445450,156510900,1900,E,0,0,Governors Island Historic District,THE GOVERNOR'S ISLAND,0.35,0.6,0.0,1.0,1,1000010010,0,5,979071.0,190225.0,16a,Y,199 999,10101.0,,0.0,,1,16v1,1.0,GOVERNORS ISLAND
1,MN,1,101,101,1.0,1001.0,2.0,1.0,10004.0,E007,1.0,8200.0,1.0,1.0,,1 LIBERTY ISLAND,R3-2,,,,,,,,,R3-2,,N,P7,8.0,0,X,U S GOVT LAND & BLDGS,541886,541886,541886,0,0,0,0,0,0,541886,2,10,0.0,0,0,500.0,1046.0,0.0,0.0,,0.0,Y,5.0,5.0,4225950,12197250,4225950,12197250,1900,E,0,0,,STATUE OF LIBERTY NATIONAL MONUMENT,1.0,0.6,0.0,1.0,1,1000010101,0,1,971677.0,190636.0,16a,Y,199 999,10101.0,,0.0,,1,16v1,1.0,LIBERTY ISLAND
2,MN,1,201,101,1.0,1000.0,2.0,1.0,10004.0,E007,1.0,8200.0,1.0,1.0,,1 ELLIS ISLAND,R3-2,,,,,,,,,R3-2,,N,Z9,,0,X,U S GOVT LAND & BLDGS,2764190,603130,603130,0,0,0,0,0,0,603130,2,7,0.0,0,0,0.0,0.0,0.0,0.0,,0.0,N,5.0,5.0,14972400,108450450,14972400,108450450,1900,E,0,0,Ellis Island Historic District,,0.22,0.6,0.0,1.0,1,1000010201,0,1,972790.0,193648.0,12b,,199 999,10101.0,,0.0,,1,16v1,1.0,ELLIS ISLAND
3,MN,1,301,101,,,2.0,1.0,10004.0,E007,1.0,,1.0,1.0,,JOE DIMAGGIO HIGHWAY,ZNA,,,,,,,,,ZNA,,N,U0,7.0,0,,,0,0,0,0,0,0,0,0,0,0,7,0,0.0,0,0,0.0,0.0,0.0,0.0,,0.0,N,0.0,5.0,0,0,0,0,0,,0,0,,,0.0,0.0,0.0,0.0,1,1000010301,0,0,,,12b,,199 999,10101.0,,0.0,,4,16v1,,JOE DIMAGGIO HIGHWAY
4,MN,1,401,101,,,2.0,1.0,10004.0,E007,1.0,,1.0,1.0,,JOE DIMAGGIO HIGHWAY,ZNA,,,,,,,,,ZNA,,N,U0,7.0,0,,,0,0,0,0,0,0,0,0,0,0,7,0,0.0,0,0,0.0,0.0,0.0,0.0,,0.0,N,0.0,5.0,0,0,0,0,0,,0,0,,,0.0,0.0,0.0,0.0,1,1000010401,0,0,,,12b,,1 99 999,10101.0,,0.0,,4,16v1,,JOE DIMAGGIO HIGHWAY


In [4]:
#geoclientBatch() returns the BIN, BBL fields from the geoclient output, from a dataframe of addresses

from geoclient import geoclientBatch

df = geoclientBatch(pluto[1000:1020], houseNo='houseNo',street='street',boro='BoroCode')

# BBL field is in the original PLUTO file, geocodedBBL and geocodedBIN are the  new fields
df[['houseNo','street','BoroCode','BBL','geocodedBBL','geocodedBIN']]

Unnamed: 0,houseNo,street,BoroCode,BBL,geocodedBBL,geocodedBIN
1000,7,CHATHAM SQUARE,1,1001627501,1001627501,1001726
1001,10,PELL STREET,1,1001630001,1001630001,1001776
1002,12,PELL STREET,1,1001630002,1001630002,1001777
1003,16,PELL STREET,1,1001630004,1001630004,1001778
1004,18,PELL STREET,1,1001630005,1001630005,1001779
1005,20,PELL STREET,1,1001630006,1001630006,1001780
1006,24,PELL STREET,1,1001630008,1001630008,1001781
1007,26,PELL STREET,1,1001630009,1001630009,1001782
1008,30,PELL STREET,1,1001630011,1001630011,1001783
1009,34,PELL STREET,1,1001630013,1001630013,1001784


In [5]:
# Because this works by sending each address individually to the geoclient over the network,
# if you are trying to do more than a few thousand addresses, it can take a long time.
# Let's see how long to do 100 records:
%timeit df = geoclientBatch(pluto[1000:1100], houseNo='houseNo',street='street',boro='BoroCode')

1 loops, best of 3: 16.4 s per loop


## GeoSupport
For geocoding more than a few thousand addresses, it is better to download and install DCP's desktop GeoSupport. Luckily, they released a linux version. Which is great (Thank you DCP!!!), but it comes with an interactive interface, making it a little tricky to get at via pythyon. So I created a python interface where you can input a single address and it returns the GeoSupport output.

In [6]:
from geosupport import geosupport, geosupportBatch

geosupport(1,100,'gold street')

["*****  Enter 'X' for Extended Work Area:  \n",
 '\n',
 'Function 1A GRC = 00\n',
 'Error Message =                                                                                 \n',
 '\n',
 '\n',
 '[  0]: ---------0---------1---------2---------3---------4---------5---------6---------7\n',
 '[  1]: 12345678901234567890123456789012345678901234567890123456789012345678901234567890\n',
 '[  2]:                                                                                 \n',
 '[  3]: ACCESS KEY                       \t 011213502800102000AA \n',
 '[  4]: CONTINUOUS PARITY INDICATOR      \t  \n',
 '[  5]: LOW HOUSENUM OF KEY              \t 000096000AA\n',
 '[  6]: BBL                              \t 1000940025\n',
 '[  7]:    BOROUGH CODE                  \t 1\n',
 '[  8]:    TAX BLOCK                     \t 00094\n',
 '[  9]:    TAX LOT                       \t 0025\n',
 '[ 10]: TAX LOT VERSION NUMBER (NYI)     \t  \n',
 '[ 11]: RPAD SELF-CHECK CODE (SCC) - BBL \t 0\n',
 '[ 12]: FILL

You can see it returns a ton of information.

Mainly I'm interested in BIN,BBL and usually I want to geocode multiple addresses. So for that purpose I created geosupportBatch().

In [7]:
df = geosupportBatch(pluto[1000:1020], houseNo='houseNo',street='street',boro='BoroCode')
df[['houseNo','street','BoroCode','BBL','geocodedBBL','geocodedBIN']]

Unnamed: 0,houseNo,street,BoroCode,BBL,geocodedBBL,geocodedBIN
1000,7,CHATHAM SQUARE,1,1001627501,1001627501,1001726
1001,10,PELL STREET,1,1001630001,1001630001,1001776
1002,12,PELL STREET,1,1001630002,1001630002,1001777
1003,16,PELL STREET,1,1001630004,1001630004,1001778
1004,18,PELL STREET,1,1001630005,1001630005,1001779
1005,20,PELL STREET,1,1001630006,1001630006,1001780
1006,24,PELL STREET,1,1001630008,1001630008,1001781
1007,26,PELL STREET,1,1001630009,1001630009,1001782
1008,30,PELL STREET,1,1001630011,1001630011,1001783
1009,34,PELL STREET,1,1001630013,1001630013,1001784


In [8]:
# it works similar to geoclientBatch, but so much faster!
%timeit df = geosupportBatch(pluto[1000:1100], houseNo='houseNo',street='street',boro='BoroCode')

The slowest run took 4.25 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 464 ms per loop
