# Geocoding Bond Issuing Entities

Our bond issue data has quite a few variables, but location is not one of them (not even states).  To associate these bond issues with the right counties, we need to grab the coordinates of each issuer.  In this Notebook, we will leverage the [Google Maps API for Python](https://developers.google.com/api-client-library/python/apis/mapsengine/v1?hl=en) to capture lat-long info.  To do so, we need a new project in the [Google Developers Console](https://console.developers.google.com/project), and once this is created, we can enable any Google API we need.  In this case, we need the *Google Maps Geocoding API*.

To get set up, enable the geocoding API, and then acquire an API key (under credentials).  There are four types of API keys available, and this particular API requires a *server key*.

In [1]:
import time
import sys
import requests
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

The first thing we are going to do is build a function that can leverage the API call and return the relevant latitude and longitude information.

In [2]:
def latlong(point_of_interest):
    '''Function returns that latitude and longitude of a given point of interest.'''
    #Set base URL
    url = 'https://maps.googleapis.com/maps/api/geocode/json'
    #Set parameters for call to API (which are appended to the base)
    params = {'sensor': 'false',
              'address': point_of_interest,
              'key':'AIzaSyDNBURbQuMN82m3Hq0KTMXPyOmsdQP5mSA'}
    #Make call to API
    r = requests.get(url, params=params)
    #Capture results
    results = r.json()['results']
    print results
    #Capture lat-long from results
    location = results[0]['geometry']['location']
    return location['lat'], location['lng']

Let's run a test case, using the first issuer in our data set, *'Corrections Corp of America'*.

In [3]:
latlong('Chicago')

[{u'geometry': {u'location_type': u'APPROXIMATE', u'bounds': {u'northeast': {u'lat': 42.023131, u'lng': -87.52404399999999}, u'southwest': {u'lat': 41.6443349, u'lng': -87.9402669}}, u'viewport': {u'northeast': {u'lat': 42.023131, u'lng': -87.52404399999999}, u'southwest': {u'lat': 41.6443349, u'lng': -87.9402669}}, u'location': {u'lat': 41.8781136, u'lng': -87.6297982}}, u'address_components': [{u'long_name': u'Chicago', u'types': [u'locality', u'political'], u'short_name': u'Chgo'}, {u'long_name': u'Cook County', u'types': [u'administrative_area_level_2', u'political'], u'short_name': u'Cook County'}, {u'long_name': u'Illinois', u'types': [u'administrative_area_level_1', u'political'], u'short_name': u'IL'}, {u'long_name': u'United States', u'types': [u'country', u'political'], u'short_name': u'US'}], u'place_id': u'ChIJ7cv00DwsDogRAMDACa2m4K8', u'formatted_address': u'Chicago, IL, USA', u'types': [u'locality', u'political']}]


(41.8781136, -87.6297982)

A similar function that extracts the county of the search term would also be useful.

In [73]:
def county_id(point_of_interest,state=None):
    '''Function returns that latitude and longitude of a given point of interest.'''
    #Set base URL
    url = 'https://maps.googleapis.com/maps/api/geocode/json'
    #Set parameters for call to API (which are appended to the base)
    params = {'sensor': 'false',
              'address': point_of_interest,
              'key':'AIzaSyDNBURbQuMN82m3Hq0KTMXPyOmsdQP5mSA'}
    #Make call to API
    r = requests.get(url, params=params)
    #Capture results
    results = r.json()['results']
#     print results
#     print len(results)
    #If a state is provided...
    if state != None:
        #...for each hit...
        for r in results:
            #...capture the state and county...
            res_st=[comp['short_name'] for comp in r['address_components'] \
                    if comp['types'][0]=='administrative_area_level_1']
            res_co=[comp['short_name'] for comp in r['address_components'] \
                    if comp['types'][0]=='administrative_area_level_2']
            #...if the state matches...
            if res_st==state:
                #...return the county...
                return res_co[0]
    else:
        #Capture county from the first hit
        res_co=[comp['short_name'] for comp in results[0]['address_components'] \
                    if comp['types'][0]=='administrative_area_level_2']
        return res_co[0]

In [75]:
county_id('Gennessee')#['geometry']

[{u'geometry': {u'location_type': u'APPROXIMATE', u'bounds': {u'northeast': {u'lat': 43.136531, u'lng': -83.5765409}, u'southwest': {u'lat': 43.047334, u'lng': -83.69458100000001}}, u'viewport': {u'northeast': {u'lat': 43.136531, u'lng': -83.5765409}, u'southwest': {u'lat': 43.047334, u'lng': -83.69458100000001}}, u'location': {u'lat': 43.0935147, u'lng': -83.6135572}}, u'formatted_address': u'Genesee Charter Township, MI, USA', u'place_id': u'ChIJ8egdcd-GI4gRWafOJaS_6K0', u'address_components': [{u'long_name': u'Genesee charter Township', u'types': [u'locality', u'political'], u'short_name': u'Genesee Charter Township'}, {u'long_name': u'Genesee County', u'types': [u'administrative_area_level_2', u'political'], u'short_name': u'Genesee County'}, {u'long_name': u'Michigan', u'types': [u'administrative_area_level_1', u'political'], u'short_name': u'MI'}, {u'long_name': u'United States', u'types': [u'country', u'political'], u'short_name': u'US'}], u'partial_match': True, u'types': [u'lo

u'Genesee County'

In [65]:
params = {'sensor': 'false',
          'address': 'Central Michigan University',
          'key':'AIzaSyDNBURbQuMN82m3Hq0KTMXPyOmsdQP5mSA'}
#Make call to API
r = requests.get(url, params=params)
#Capture results
results = r.json()['results']
for comp in results[0]['address_components']:
    if comp['types'][0]=='administrative_area_level_2':
        print comp['long_name']
    elif comp['types'][0]=='administrative_area_level_1':
        print comp['long_name']
        
[comp['long_name'] for comp in results[0]['address_components'] if comp['types'][0]=='administrative_area_level_2']

Isabella County
Michigan


[u'Isabella County']

In [66]:
results

[{u'address_components': [{u'long_name': u'Central Michigan University',
    u'short_name': u'Central Michigan University',
    u'types': [u'point_of_interest', u'establishment']},
   {u'long_name': u'1200',
    u'short_name': u'1200',
    u'types': [u'street_number']},
   {u'long_name': u'South Franklin Street',
    u'short_name': u'S Franklin St',
    u'types': [u'route']},
   {u'long_name': u'Mount Pleasant',
    u'short_name': u'Mt Pleasant',
    u'types': [u'locality', u'political']},
   {u'long_name': u'Isabella County',
    u'short_name': u'Isabella County',
    u'types': [u'administrative_area_level_2', u'political']},
   {u'long_name': u'Michigan',
    u'short_name': u'MI',
    u'types': [u'administrative_area_level_1', u'political']},
   {u'long_name': u'United States',
    u'short_name': u'US',
    u'types': [u'country', u'political']},
   {u'long_name': u'48859',
    u'short_name': u'48859',
    u'types': [u'postal_code']}],
  u'formatted_address': u'Central Michigan Univer

Now let's read in our data and capture a list of locations.

**UPDATE:** We need to first read in the file with the locations we already have.  We will use this processed list to restrict the issuer list so that we do not waste our API calls on locations we already have.  Once we get our new coordinates, we will append them to the end of the file and write it back to disk.

In [None]:
#Read in data
bonds=pd.read_csv('bonds.csv',index_col=['DomicileNationName'])

#Subset to US
bonds_us=bonds.ix[['-','United States']]

#Read in processed list
exist_coords=pd.read('current_issue_geocode_list.csv')
print exist_coords.head()

#Capture list of issuers
issuers=list(bonds_us['Issuer'].values)

len(issuers),issuers

Now, let's roll through the issuers and capture lat-long info.  Let's see how many we can get.

In [6]:
#Generate dict to pair issuer and lat-long
issue_ll={}

#Generate dict to capture bad matches
bad_matches={}

#For each issuer...
for i,iss in enumerate(issuers):
    try:
        #...capture latlong...
        issue_ll.update({iss:latlong(iss)})
        print i,'|',iss, 'location captured'
        time.sleep(.1)
        bad_batch_cnt=0
    except:
        print i,'| ***',iss, 'proved problematic***'
        #...log the reason it didn't work
        bad_matches.update({iss:sys.exc_info()[1]})
        bad_batch_cnt+=1
        if bad_batch_cnt>100:
            break

0 | Province of Quebec location captured
1 | Corrections Corp of America location captured
2 | Miami-Dade Co Indus Dev Auth location captured
3 | *** Redford Brownfield Redv Auth proved problematic***
4 | New York City-New York location captured
5 | Erie Sewer Authority location captured
6 | Dallas Texas location captured
7 | Portsmouth City-Virginia location captured
8 | Iowa State Board of Regents location captured
9 | *** Northwest Alabama Gas Dt proved problematic***
10 | Macomb Co-Michigan location captured
11 | RiversideCoPublicFinAuthority location captured
12 | Edinburg Loc Gov Fin Corp location captured
13 | Edinburg Econ Dev Corp location captured
14 | San Bernardino Co (Rialto) USD location captured
15 | Avon Lake City-Ohio location captured
16 | New York location captured
17 | Lancaster Redevelopment Agency location captured
18 | New Jersey Economic Dev Auth location captured
19 | Hugoton City-Kansas location captured
20 | West Des Moines City-Iowa location captured
21 | Il

How many bad matches do we have?  (Note that we stopped trying to find long-lat once we hit 100 bad requests in a row.  It means we hit our request limit.)

In [7]:
len(set(bad_matches.keys()))

436

And how many did we actually capture?

In [8]:
len(set(issue_ll.keys()))

1413

How many issuers are there total?

In [9]:
len(issuers)

20318

## Saving the Captured Locations

As could be seen above, we have a limited number of requests.  To make this happen for all applicable records, we need to do this over multiple days.  Since we don't want to keep getting locations we already have, with each new request, we need to remove those locations from the request list.  Consequently, we need to store the location/lat-long info pairs.  We can do this by wrapping the dictionary in a series, and writing it to disk.  The next time, we can just read in the most current list again, and remove the places we already have.

In [12]:
Series(issue_ll).to_csv('current_issue_geocode_list.csv')