# GeoClient tutorial

In this lesson, we are going to go over how to geocode addresses using Python and NYC's GeoClient.

The Department of City Planning (DCP) maintains a geocoding application called Geosupport Desktop Edition that you can download and use to query and geocode addresses, BBLs, or BINs. It can be found at the following link: http://www1.nyc.gov/site/planning/data-maps/open-data/dwn-gde-home.page. They also have an online interface version that lets you query addresses one-by-one: http://a030-goat.nyc.gov/goat/Default.aspx.

In addition, the Department of Information Technology and Telecommunications (DoITT) has built a Geoclient API that allows you to query the geocoding service without having to download and install the desktop edition. In order to use this API, you must first register for an account and request an API key. This can be done at the following link: https://developer.cityofnewyork.us/api/geoclient-api. 

To make things easier, John Krauss wrote Python bindings for DoITT’s Geoclient API that allows for easy querying using Python. To install this package, from the command line >>> pip install nyc_geoclient.

In addition for this tutorial we will be using pandas, so if you haven't already, install that as well. >>> pip install pandas.

## geocoding single addresses


In [None]:
# read in the package
from nyc_geoclient import Geoclient

#set up the app key and id (you can get your own from DOITT's website)

myAppID = 'fb9ad04a'
myKey = '051f93e4125df4bae4f7c57517e62344'

g = Geoclient(myAppID,myKey)

The nyc_geoclient package has now stored our credentials and can use it to query the online API.

The address function needs a house number, street name, and either borough or zipcode. Try it a few times to see what you get back.

In [None]:
g.address(253,'Broadway','manhattan')

In [None]:
g.address(253,'Broadway','10007')

As you can see, the function returns a LOT of information. The information is returned in the form of a **dictionary**.

Dictionaries are sort of similar to DataFrames in that they store information that you can access via **keys** (similar to DataFrame columns).

For example, the first **key** of the dictionary above is 'assemblyDistrict', and the associated value is '66'.

Questions: 

How would you return *only* 'assemblyDistrict' or 'BBL' for instance?

What is the BIN and BBL for 100 Gold Street?

## geocoding a dataframe
This is great, but it only allows us to do one address at a time. What if we had a dataframe of addresses to geocode?

For this I have written a geoclientBatch function that loops through a dataframe, geocoding each row using Geoclient.

First let's get some data to work with. The NY State Open Data portal has a listing of active liquor licenses https://data.ny.gov/Economic-Development/Liquor-Authority-Quarterly-List-of-Active-Licenses/hrvs-fxs2/data.

I downloaded all of the 'Micro Brewer' license types in NYC (filtering on County Names= NEW YORK, BRONX, BROOKLYN, QUEENS, RICHMOND).

Let's read this into a dataframe

In [None]:
# import pandas and set the options to diplay more rows and columns than the default
import pandas as pd
pd.options.display.max_rows = 100
pd.options.display.max_columns = 100

mblic = pd.read_csv('data/Liquor_Authority_Quarterly_List_of_Active_Licenses2018-07-30.csv')
print mblic.shape
mblic.head()

Notice this dataframe as Address as a single column. We'll need to separate this into a house number column and a street column before using geoclient.

In [None]:
addressCol = 'Actual Address of Premises (Address1)'
mblic['houseNo'] = mblic[addressCol].str.extract('(^[0-9|-]*)',expand=False)
mblic['street'] = mblic[addressCol].str.extract('(\s.+$)',expand=False)
mblic['borough'] = mblic['County Name (Licensee)']

mblic[[addressCol,'houseNo','street','borough']]

In [None]:
# make sure geoclient.py is in the current folder and then import to use it here
from geoclient import geoclientBatch

In [None]:
mblic = geoclientBatch(mblic, houseNo='houseNo', street='street', boro='borough')
mblic

Did they all geocode? If not why?

In [None]:
mblic[mblic.geoBBL=='']