# NYC GeoClient tutorial

In this lesson, we are going to go over how to geocode addresses using Python and NYC's GeoClient.

[Presentation on geocoding](https://docs.google.com/presentation/d/1LyM9f6icWiee1HE5ai_H73_IZ4C65YAzX52cR-flifo/edit?usp=sharing)

The Department of City Planning (DCP) maintains the official NYC geocoding application called GeoSupport. There are multiple ways of accessing this application. A web interface ([GOAT](http://a030-goat.nyc.gov/goat/Default.aspx.)) lets you query addresses one-by-one.

Another way of accessing the online version of GeoSupport is through and API maintained by the Department of Information Technology and Telecommunications (DoITT). In order to use this API, you need to [register for an account and request an API key.](https://developer.cityofnewyork.us/api/geoclient-api) For now you can use the keys provided here.

To make things easier, [John Krauss](https://github.com/talos/nyc-geoclient) wrote Python bindings for DoITT’s Geoclient API that allows for querying using Python. Documentation here: [nyc_geoclient](https://nyc-geoclient.readthedocs.io/en/latest/geoclient.html). 

Install this package, from the command line 

> pip install nyc_geoclient.

For this tutorial we will be using pandas, so if you haven't already, install that as well. 

> pip install pandas.

## Part 1 - geocoding single addresses

We can querry GeoClient directly from a browser. This would be the querry for 253 Broadway in Manhattan, try it in your browser:

https://api.cityofnewyork.us/geoclient/v1/address.json?houseNumber=253&street=broadway&borough=manhattan&app_id=fb9ad04a&app_key=051f93e4125df4bae4f7c57517e62344

The query is a bit cumbersom because you have to include the app_id and app_key (this is what identifies you once you register on DOITT's website). Also the output is not easy to deal with in a browser. But you get the idea of how it works.

### python bindings
Now let's try it using the python bindings: nyc_geoclient.py

In [1]:
# import the package
from nyc_geoclient import Geoclient

#set up the app key and id (you can get your own from DOITT's website)
myAppID = 'fb9ad04a'
myKey = '051f93e4125df4bae4f7c57517e62344'

g = Geoclient(myAppID,myKey)

The nyc_geoclient package has stored our credentials and can use it to query the online API. We don't need to worry about the credentials after this, it's all stored in the variable g.

The address function needs a house number, street name, and either borough or zipcode. Try it a few times to see what you get back.

In [2]:
g.address(253,'Broadway','manhattan')

{u'alleyCrossStreetsFlag': u'X',
 u'assemblyDistrict': u'66',
 u'bbl': u'1001347501',
 u'bblBoroughCode': u'1',
 u'bblTaxBlock': u'00134',
 u'bblTaxLot': u'7501',
 u'boardOfElectionsPreferredLgc': u'1',
 u'boePreferredStreetName': u'BROADWAY',
 u'boePreferredstreetCode': u'11361001',
 u'boroughCode1In': u'1',
 u'buildingIdentificationNumber': u'1082757',
 u'censusBlock2000': u'1010',
 u'censusBlock2010': u'1004',
 u'censusTract1990': u'  21  ',
 u'censusTract2000': u'  21  ',
 u'censusTract2010': u'  21  ',
 u'cityCouncilDistrict': u'01',
 u'civilCourtDistrict': u'01',
 u'coincidentSegmentCount': u'1',
 u'communityDistrict': u'101',
 u'communityDistrictBoroughCode': u'1',
 u'communityDistrictNumber': u'01',
 u'communitySchoolDistrict': u'02',
 u'condominiumBillingBbl': u'1001347501',
 u'condominiumFlag': u'C',
 u'congressionalDistrict': u'10',
 u'continuousParityIndicator1a': u'L',
 u'continuousParityIndicator1e': u'L',
 u'cooperativeIdNumber': u'0000',
 u'crossStreetNamesFlagIn': u'E'

In [3]:
g.address(253,'Broadway','10007')

{u'alleyCrossStreetsFlag': u'X',
 u'assemblyDistrict': u'66',
 u'bbl': u'1001347501',
 u'bblBoroughCode': u'1',
 u'bblTaxBlock': u'00134',
 u'bblTaxLot': u'7501',
 u'boardOfElectionsPreferredLgc': u'1',
 u'boePreferredStreetName': u'BROADWAY',
 u'boePreferredstreetCode': u'11361001',
 u'boroughCode1In': u'1',
 u'buildingIdentificationNumber': u'1082757',
 u'censusBlock2000': u'1010',
 u'censusBlock2010': u'1004',
 u'censusTract1990': u'  21  ',
 u'censusTract2000': u'  21  ',
 u'censusTract2010': u'  21  ',
 u'cityCouncilDistrict': u'01',
 u'civilCourtDistrict': u'01',
 u'coincidentSegmentCount': u'1',
 u'communityDistrict': u'101',
 u'communityDistrictBoroughCode': u'1',
 u'communityDistrictNumber': u'01',
 u'communitySchoolDistrict': u'02',
 u'condominiumBillingBbl': u'1001347501',
 u'condominiumFlag': u'C',
 u'congressionalDistrict': u'10',
 u'continuousParityIndicator1a': u'L',
 u'continuousParityIndicator1e': u'L',
 u'cooperativeIdNumber': u'0000',
 u'crossStreetNamesFlagIn': u'E'

As you can see, the function returns a LOT of information. The information is returned in the form of a **dictionary**.

In this example, the first **key** of the dictionary above is 'assemblyDistrict', and the associated value is '66'.

Questions: 

How would you return *only* 'assemblyDistrict' or 'BBL' for instance?

What is the BIN and BBL for 100 Gold Street?

## Part 2 - geocoding a dataframe
This is great, but it only allows us to do one address at a time. What if we had a dataframe of addresses to geocode?

For this I have written a [geoclientBatch](https://github.com/deenapatel/geocode/blob/master/geoclient.py) function that loops through a dataframe, geocoding each row using Geoclient.

### setting up the data
First let's get some data to work with. Let's say we are interested in all the micro breweries in NYC.

The NY State Open Data portal has a listing of active liquor licenses https://data.ny.gov/Economic-Development/Liquor-Authority-Quarterly-List-of-Active-Licenses/hrvs-fxs2/data.

I downloaded all of the 'Micro Brewer' license types in NYC (filtering on County Names= NEW YORK, BRONX, BROOKLYN, QUEENS, RICHMOND) and saved it in the [data folder](https://github.com/deenapatel/geocode/tree/master/data).

Let's read this into a dataframe

In [4]:
# import pandas and set the options to diplay more rows and columns than the default
import pandas as pd
pd.options.display.max_rows = 100
pd.options.display.max_columns = 100

mblic = pd.read_csv('data/Liquor_Authority_Quarterly_List_of_Active_Licenses2018-07-30.csv')
print mblic.shape
mblic.head()

(28, 21)


Unnamed: 0,License Serial Number,License Type Name,License Class Code,License Type Code,Agency Zone Office Name,Agency Zone Office Number,County Name (Licensee),Premises Name,Doing Business As (DBA),Actual Address of Premises (Address1),Additional Address Information (Address2),City,State,Zip,License Certificate Number,License Original Issue Date,License Effective Date,License Expiration Date,Latitude,Longitude,Location
0,1188563,MICRO BREWER,101,MI,NewYork City,1,KINGS,"MAD SCIENTISTS BREWING PARTNERS,LLC","SIXPOINT CRAFT ALES,SIXPOINT BREWERY",40 VAN DYKE ST,DWIGHT ST & RICHARDS ST,BROOKLYN,NY,11231,895158,11/6/2007,11/1/2017,10/31/2018,40.673854,-74.012045,"(40.6738540679, -74.0120446585)"
1,1263405,MICRO BREWER,101,MI,NewYork City,1,QUEENS,RICHARD J CASTAGNA,BRIDGE AND TUNNEL BREWERY,61 02 60TH AVE,,MASPETH,NY,11378,902893,9/12/2012,9/1/2018,8/31/2019,40.716857,-73.902744,"(40.7168571254, -73.9027436481)"
2,1264215,MICRO BREWER,101,MI,NewYork City,1,QUEENS,SINGLECUT BEERSMITHS LLC,,1933 37TH ST,,ASTORIA,NY,11105,894086,12/7/2012,12/1/2017,11/30/2018,40.778086,-73.902321,"(40.7780857434, -73.9023209259)"
3,1266867,MICRO BREWER,101,MI,NewYork City,1,KINGS,OTHER HALF BREWING COMPANY INC,,195 CENTRE ST,AKA 191-197 CENTRE ST ETAL,BROOKLYN,NY,11231,999999,5/2/2013,5/1/2018,4/30/2021,40.673665,-73.999091,"(40.6736651375, -73.9990913469)"
4,1268251,MICRO BREWER,101,MI,NewYork City,1,KINGS,7 N 15TH ST CORP,GREENPOINT BEER & ALE CO,7 N 15TH ST,,BROOKLYN,NY,11222,901201,8/7/2013,8/1/2018,7/31/2019,40.724951,-73.95731,"(40.7249512177, -73.9573100115)"


Notice this dataframe as Address as a single column. We'll need to separate this into a house number column and a street column before using geoclient.

Pandas lets you use [regular expressions](https://en.wikipedia.org/wiki/Regular_expression) via the [pandas.Series.str.extract](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.extract.html) function.

This won't work for every address, but it's pretty close.


In [5]:
# the address column name is a bit cumbersom, so let's store it as a variable
addressCol = 'Actual Address of Premises (Address1)'

# extracting out the house number, this is any number or - \
#(Queens has -'s in their numbers)
mblic['houseNo'] = mblic[addressCol].str.extract('(^[0-9|-]*)',expand=False)
# extract everything after the space as being the street
mblic['street'] = mblic[addressCol].str.extract('(\s.+$)',expand=False)

# rename the borough column
mblic['borough'] = mblic['County Name (Licensee)']
# let's see how it looks
mblic[[addressCol,'houseNo','street','borough']]

Unnamed: 0,Actual Address of Premises (Address1),houseNo,street,borough
0,40 VAN DYKE ST,40,VAN DYKE ST,KINGS
1,61 02 60TH AVE,61,02 60TH AVE,QUEENS
2,1933 37TH ST,1933,37TH ST,QUEENS
3,195 CENTRE ST,195,CENTRE ST,KINGS
4,7 N 15TH ST,7,N 15TH ST,KINGS
5,76-01 77TH AVE,76-01,77TH AVE,QUEENS
6,333 339 DOUGLAS ST,333,339 DOUGLAS ST,KINGS
7,856 E 136TH ST,856,E 136TH ST,BRONX
8,53-02 11TH ST,53-02,11TH ST,QUEENS
9,38-40 MINTHORNE ST,38-40,MINTHORNE ST,RICHMOND


### running geoclient batch
Now we are ready to start to geocode it.

Make sure geoclient.py is in the current folder

In [6]:
from geoclient import geoclientBatch

In [7]:
mblic = geoclientBatch(mblic, houseNo='houseNo', street='street', boro='borough')
mblic

Unnamed: 0,License Serial Number,License Type Name,License Class Code,License Type Code,Agency Zone Office Name,Agency Zone Office Number,County Name (Licensee),Premises Name,Doing Business As (DBA),Actual Address of Premises (Address1),Additional Address Information (Address2),City,State,Zip,License Certificate Number,License Original Issue Date,License Effective Date,License Expiration Date,Latitude,Longitude,Location,houseNo,street,borough,geocodedBBL,geocodedBIN
0,1188563,MICRO BREWER,101,MI,NewYork City,1,KINGS,"MAD SCIENTISTS BREWING PARTNERS,LLC","SIXPOINT CRAFT ALES,SIXPOINT BREWERY",40 VAN DYKE ST,DWIGHT ST & RICHARDS ST,BROOKLYN,NY,11231,895158,11/6/2007,11/1/2017,10/31/2018,40.673854,-74.012045,"(40.6738540679, -74.0120446585)",40,VAN DYKE ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
1,1263405,MICRO BREWER,101,MI,NewYork City,1,QUEENS,RICHARD J CASTAGNA,BRIDGE AND TUNNEL BREWERY,61 02 60TH AVE,,MASPETH,NY,11378,902893,9/12/2012,9/1/2018,8/31/2019,40.716857,-73.902744,"(40.7168571254, -73.9027436481)",61,02 60TH AVE,QUEENS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
2,1264215,MICRO BREWER,101,MI,NewYork City,1,QUEENS,SINGLECUT BEERSMITHS LLC,,1933 37TH ST,,ASTORIA,NY,11105,894086,12/7/2012,12/1/2017,11/30/2018,40.778086,-73.902321,"(40.7780857434, -73.9023209259)",1933,37TH ST,QUEENS,4008120031,4016130
3,1266867,MICRO BREWER,101,MI,NewYork City,1,KINGS,OTHER HALF BREWING COMPANY INC,,195 CENTRE ST,AKA 191-197 CENTRE ST ETAL,BROOKLYN,NY,11231,999999,5/2/2013,5/1/2018,4/30/2021,40.673665,-73.999091,"(40.6736651375, -73.9990913469)",195,CENTRE ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
4,1268251,MICRO BREWER,101,MI,NewYork City,1,KINGS,7 N 15TH ST CORP,GREENPOINT BEER & ALE CO,7 N 15TH ST,,BROOKLYN,NY,11222,901201,8/7/2013,8/1/2018,7/31/2019,40.724951,-73.95731,"(40.7249512177, -73.9573100115)",7,N 15TH ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
5,1272032,MICRO BREWER,101,MI,NewYork City,1,QUEENS,NARWHAL LLC,FINBACK,76-01 77TH AVE,,GLENDALE,NY,11385,893276,11/21/2013,11/1/2017,10/31/2018,40.706542,-73.873405,"(40.7065415837, -73.8734050233)",76-01,77TH AVE,QUEENS,4038030092,4092400
6,1272700,MICRO BREWER,101,MI,NewYork City,1,KINGS,THREEFOLD HOLDINGS LLC,THREES BREWING,333 339 DOUGLAS ST,,BROOKLYN,NY,11217,894071,11/6/2014,11/1/2017,10/31/2018,40.679592,-73.982195,"(40.6795920957, -73.9821945527)",333,339 DOUGLAS ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
7,1273115,MICRO BREWER,101,MI,NewYork City,1,BRONX,"BRONX BREWERY LLC, THE",,856 E 136TH ST,,BRONX,NY,10454,894176,1/30/2014,1/1/2018,12/31/2018,40.802246,-73.910719,"(40.8022455516, -73.9107190432)",856,E 136TH ST,BRONX,2025870030,2003992
8,1275294,MICRO BREWER,101,MI,NewYork City,1,QUEENS,TRANSMITTER BREWING LLC,,53-02 11TH ST,SUITE A,LONG ISLAND CITY,NY,11101,896800,3/21/2014,3/1/2018,2/28/2019,40.74003,-73.952538,"(40.7400302908, -73.9525379009)",53-02,11TH ST,QUEENS,4000380016,4436599
9,1275343,MICRO BREWER,101,MI,NewYork City,1,RICHMOND,GORDON JAMES LLC,THE FLAGSHIP BREWING COMPANY,38-40 MINTHORNE ST,,STATEN ISLAND,NY,10301,896254,3/14/2014,3/1/2018,2/28/2019,40.636751,-74.075579,"(40.6367511506, -74.0755786862)",38-40,MINTHORNE ST,RICHMOND,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>


Did they all geocode? If not why?

What do you need to do to get most of them to geocode?

In [10]:
mblic[mblic.geocodedBBL.str.contains('Error')]

Unnamed: 0,License Serial Number,License Type Name,License Class Code,License Type Code,Agency Zone Office Name,Agency Zone Office Number,County Name (Licensee),Premises Name,Doing Business As (DBA),Actual Address of Premises (Address1),Additional Address Information (Address2),City,State,Zip,License Certificate Number,License Original Issue Date,License Effective Date,License Expiration Date,Latitude,Longitude,Location,houseNo,street,borough,geocodedBBL,geocodedBIN
0,1188563,MICRO BREWER,101,MI,NewYork City,1,KINGS,"MAD SCIENTISTS BREWING PARTNERS,LLC","SIXPOINT CRAFT ALES,SIXPOINT BREWERY",40 VAN DYKE ST,DWIGHT ST & RICHARDS ST,BROOKLYN,NY,11231,895158,11/6/2007,11/1/2017,10/31/2018,40.673854,-74.012045,"(40.6738540679, -74.0120446585)",40,VAN DYKE ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
1,1263405,MICRO BREWER,101,MI,NewYork City,1,QUEENS,RICHARD J CASTAGNA,BRIDGE AND TUNNEL BREWERY,61 02 60TH AVE,,MASPETH,NY,11378,902893,9/12/2012,9/1/2018,8/31/2019,40.716857,-73.902744,"(40.7168571254, -73.9027436481)",61,02 60TH AVE,QUEENS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
3,1266867,MICRO BREWER,101,MI,NewYork City,1,KINGS,OTHER HALF BREWING COMPANY INC,,195 CENTRE ST,AKA 191-197 CENTRE ST ETAL,BROOKLYN,NY,11231,999999,5/2/2013,5/1/2018,4/30/2021,40.673665,-73.999091,"(40.6736651375, -73.9990913469)",195,CENTRE ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
4,1268251,MICRO BREWER,101,MI,NewYork City,1,KINGS,7 N 15TH ST CORP,GREENPOINT BEER & ALE CO,7 N 15TH ST,,BROOKLYN,NY,11222,901201,8/7/2013,8/1/2018,7/31/2019,40.724951,-73.95731,"(40.7249512177, -73.9573100115)",7,N 15TH ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
6,1272700,MICRO BREWER,101,MI,NewYork City,1,KINGS,THREEFOLD HOLDINGS LLC,THREES BREWING,333 339 DOUGLAS ST,,BROOKLYN,NY,11217,894071,11/6/2014,11/1/2017,10/31/2018,40.679592,-73.982195,"(40.6795920957, -73.9821945527)",333,339 DOUGLAS ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
9,1275343,MICRO BREWER,101,MI,NewYork City,1,RICHMOND,GORDON JAMES LLC,THE FLAGSHIP BREWING COMPANY,38-40 MINTHORNE ST,,STATEN ISLAND,NY,10301,896254,3/14/2014,3/1/2018,2/28/2019,40.636751,-74.075579,"(40.6367511506, -74.0755786862)",38-40,MINTHORNE ST,RICHMOND,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
13,1276638,MICRO BREWER,101,MI,NewYork City,1,KINGS,FOLKSBIER BRAUEREI LLC,FOLKSBIER,103 LUQUER ST,,BROOKLYN,NY,11231,899013,5/5/2014,5/1/2018,4/30/2019,40.6779,-74.00099,"(40.6779, -74.00099)",103,LUQUER ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
16,1290462,MICRO BREWER,101,MI,NewYork City,1,KINGS,KINGS COUNTY BREWERS COLLECTIVE LLC,,381 TROUTMAN ST,,BROOKLYN,NY,11237,863910,6/9/2016,8/1/2017,7/31/2018,40.70601,-73.92358,"(40.70601, -73.92358)",381,TROUTMAN ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
17,1290993,MICRO BREWER,101,MI,NewYork City,1,KINGS,DISRUPTION GRAIN LLC,INTERBORO SPIRITS & ALES,942 GRAND ST,,BROOKLYN,NY,11211,901639,7/28/2016,7/1/2018,6/30/2019,40.71272,-73.93697,"(40.71272, -73.93697)",942,GRAND ST,KINGS,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>
18,1292743,MICRO BREWER,101,MI,NewYork City,1,RICHMOND,STATEN ISLAND BEER COMPANY INC,,20 KINSEY PL,WAREHOUSE C,STATEN ISLAND,NY,10303,897399,4/6/2016,4/1/2018,3/31/2019,40.62851,-74.16954,"(40.62851, -74.16954)",20,KINSEY PL,RICHMOND,Error: <type 'exceptions.KeyError'>,Error: <type 'exceptions.KeyError'>


#### followup exercises
1. What type of buildings are these microbreweries located in? 
DCP's PLUTO (Primary Land Use Tax Lot Output) has BBL as well as info on building class (and much more). Download this dataset and match it using BBL.

2. What neighborhoods are these breweries located in?
How would you modify geoclientBatch to include 'nta' or 'ntaName'? NTA stands for Neighborhood Tabulation Area.