# Geocoding with APIs

Excellent data analysis is only as good as the quality of data you use. As a Data Scientist, I find myself spend most of my time finding quality data sources and filling gaps in my datasets. This is especially true with Geospatial analysis. Often times you'll find yourself left with a dataset that only has an address column, missing key information necessary for spatial analysis – latitude and longitude coordinates. In this case you’ll need to do some __Geocoding__.

__Geocoding is the computational process of transforming a physical address description to a location on the Earth’s surface (spatial representation in numerical coordinates)__.

This sounds like a daunting process, but it can easily be done with the help of Python and an **API**. API stands for *"Application Programming Interface*." It is a set of rules and protocols that allows different software applications to communicate and interact with each other. APIs enable different systems to exchange data and can be extremely useful for filling gaps in datasets. We'll be using the **Bing Maps API** to geocode our addresses. 

Before diving in you need to get a Bing Maps API Key, which you can do by following these [simple steps](https://www.microsoft.com/en-us/maps/create-a-bing-maps-key "click this to get your API Key"). 

Next, import the libraries and packages needed which are the geocoder library and the pandas and re packages. 

In [1]:
!pip install geocoder

In [3]:
import geocoder
import pandas as pd
import re
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## Testing the Bing Maps API

Now that we have all our tools, test out the Geocoder package and the API Key to make sure they work.

We’ll define our geocoder as **g** and make a request to Bing to geocode a specific address. The Bing Maps API needs the **address line**(Street Address), **locality**(City/Town), and **adminDistrict**(State) to geocode the addresses. You also need to pass the developer key you retrieved from Bing through the geocoder.

After running this, your output should be the latitudinal and longitudinal coordinates of the address we used for testing.

In [5]:
APIKey= 'AhrrKF6BHmGIS3d71YtAb1AbGhXx5dBGhNgcWoS4vdd1apFoj1Spapvdq6r34UhC' 

In [6]:
g = geocoder.bing(None, addressline = '2831 Henderson Rd, Redding, CA 96002', locality='Redding', adminDistrict='California', method='details', key=APIKey)
g.json
for result in  g:
    print(result.latlng)

[40.57463837, -122.38108826]


Pretty cool right? We can also reverse geocode coordinates to retrieve specific address locations for any place in the world. Lets test this out below.

In [5]:
g = geocoder.bing([[26.351670, 127.769400], [48.845580, 2.321807]], method='batch_reverse', key=APIKey)

for result in g:
    print(result.address, result.city, result.postal, result.state, result.country)

Kadena Town, Nakagami County, Okinawa, Japan Kadena Town  Okinawa Japan
114 Rue de Vaugirard, 75006 Paris, France Paris 75006 Île-de-France France


## Data Preprocessing

Now that its been confirmed that the geocoder tool is working, lets read in the excel file of the addresseses we'd like to geocode. Head over to my [github page](https://github.com/AnnetteTamakloe/Geocoding/tree/master) for to download a copy of the file. Make sure to save it in the same folder as the current notebook you are using. 

In [7]:
#Read in your file 
df = pd.read_csv('addresses.csv')

In [12]:
#Take a look at the file
df.head(5)

Unnamed: 0.1,Unnamed: 0,address
0,0,"Whitlow Plan, Brookland Grove, Washington, DC ..."
1,1,"Unit 02 Plan, Aria Reserve Miami, Miami, FL 33137"
2,2,"Unit 01 A Plan, Aria Reserve Miami, Miami, FL ..."
3,3,"The Buchanan Plan, The Townhomes At Michigan P..."
4,4,"Plan 3 Plan, Hudson At Belterra, Austin, TX 78737"


In [11]:
#Lets check the number of addresses in the file
print(len(df))

196


Now lets transform our data to make it easy for our Bing API to do its job.  Our data frame must have columns for the __address line__(Street Address), __locality__(City/Town), and  __adminDistrict__(State). 
We'll use __Pandas__ and the __re__ packages to clean the data. 

In [14]:
#Create an empty list to store our data
geo_data = [] 

#Create a forloop that goes through each address and splices it by street name, city, country, name, zip
for i in df.address:
    s = i.split(',', 2)
    l = i.split()
    address = i
    street = s[0].strip()
    city = s[1].strip()
    country = 'U.S.'
    state = l[-2].strip()
    zipcode = l[-1].strip()
    
#Define our data by passing it through a dictionary to define our columns and put it into a list
    geo_data.append({'address': address, 'street': street, 'city': city, 'state': state,
                     'zip': zipcode, 'country': country})

#Create a DataFrame from the list of dictionaries
geo = pd.DataFrame(geo_data)

We also need to create columns for the latitude and longitude coordinates the geocoder will retrieve from the the Bing API.

In [18]:
geo['lat'] = ' '
geo['long'] = ' '

This is how the our dataset should look like after our data preprocessing session

In [17]:
geo.head()

Unnamed: 0,address,street,city,state,zip,country
0,"Whitlow Plan, Brookland Grove, Washington, DC ...",Whitlow Plan,Brookland Grove,DC,20017,U.S.
1,"Unit 02 Plan, Aria Reserve Miami, Miami, FL 33137",Unit 02 Plan,Aria Reserve Miami,FL,33137,U.S.
2,"Unit 01 A Plan, Aria Reserve Miami, Miami, FL ...",Unit 01 A Plan,Aria Reserve Miami,FL,33137,U.S.
3,"The Buchanan Plan, The Townhomes At Michigan P...",The Buchanan Plan,The Townhomes At Michigan Park,DC,20017,U.S.
4,"Plan 3 Plan, Hudson At Belterra, Austin, TX 78737",Plan 3 Plan,Hudson At Belterra,TX,78737,U.S.


## Geocoding The Addresses

Now its time to geocode. We will create a for loop to pass each address through the geocoder we created a couple steps before, except this time, we’ll be passing the our columns that contain our data through it. Just as before, we'll have to use the API Developer key to make the request to Bing.

In [18]:
for i in geo.index:
    try:
        g = geocoder.bing(None, addressline = geo['street'][i], locality= geo['city'][i], adminDistrict= geo['state'][i], PostalCode= geo['zip'][i], Countryregion= geo['country'][i], method='details', key='AqkhJ4a6kTQ6c4rm5vicHt7BsTav-KsWdaaKCnP794yFpzMDWPa1XCp14WUliC20')
        g.json
        geo.loc[i,'lat']=g.lat
        geo.loc[i, 'long']=g.lng
    except:
        geo.loc[i,'lat']=' '
        geo.loc[i,'long']=' ' 

Now lets take a look out our dataframe geo. The lat and long columns should now be populated with the latitudinal and longitudinal coordinates for each of our addresses.

In [15]:
geo

Unnamed: 0,address,street,city,state,zip,country,lat,long
0,"Whitlow Plan, Brookland Grove, Washington, DC ...",Whitlow Plan,Brookland Grove,DC,20017,U.S.,38.904778,-77.016289
1,"Unit 02 Plan, Aria Reserve Miami, Miami, FL 33137",Unit 02 Plan,Aria Reserve Miami,FL,33137,U.S.,28.595512,-82.487343
2,"Unit 01 A Plan, Aria Reserve Miami, Miami, FL ...",Unit 01 A Plan,Aria Reserve Miami,FL,33137,U.S.,28.595512,-82.487343
3,"The Buchanan Plan, The Townhomes At Michigan P...",The Buchanan Plan,The Townhomes At Michigan Park,DC,20017,U.S.,38.904778,-77.016289
4,"Plan 3 Plan, Hudson At Belterra, Austin, TX 78737",Plan 3 Plan,Hudson At Belterra,TX,78737,U.S.,31.463848,-99.333298
...,...,...,...,...,...,...,...,...
191,"065-620-017, Redding, CA 96001",065-620-017,Redding,CA,96001,U.S.,40.574638,-122.381088
192,"(Undisclosed Address), Washington, DC 20020",(Undisclosed Address),Washington,DC,20020,U.S.,38.892063,-77.019913
193,"(Undisclosed Address), Miami, FL 33177",(Undisclosed Address),Miami,FL,33177,U.S.,25.775084,-80.194702
194,"(Undisclosed Address), Miami, FL 33125",(Undisclosed Address),Miami,FL,33125,U.S.,25.775084,-80.194702


And with that, we have completed the geocoding process. With this dataset, you should be able to make rich visualizations filled with detail and meaningful data analysis in tools like Tableau, Python or R. 