# Geocoding in Python

Maps can be a powerful way to tell stories about your data, yet great datasets for geospatial analysis are rarely complete. At times you may have a dataset that only has an address column, missing key information you’ll need for spatial analysis – Latitude and Longitude coordinates. In this case you’ll need to do some __Geocoding__.

__Geocoding is the computational process of transforming a physical address description to a location on the Earth’s surface (spatial representation in numerical coordinates)__.

This sounds like a daunting process, but it can easily be done with the help of Python and the Bing Maps API. Before diving in, for any of this to work you will first have to acquire a Bing Maps API Key, which you can do by following these [simple steps](https://www.microsoft.com/en-us/maps/create-a-bing-maps-key "click this to get your API Key"). 

## Installing Necessary Libraries and Tools

Next, import the libraries and packages needed which are the geocoder library and the pandas and re packages. The Geocoder package is the perfect candidate to solve this problem since it already deals with multiple geocoding services such as Google, Bing, OSM. Using this package and the Bing API Key, we’ll be able to use Bings data for our geocoding needs.

### Install Libraries

In [1]:
!pip install geocoder





### Install Packages

In [2]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [3]:
import geocoder
import pandas as pd
import re

## Data Preprocessing

### Testing The Geocoder

Now test out the Geocoder package and our API Key to make sure they work. 

We’ll define our geocoder as g and make a request to Bing to geocode a specific address. The Bing API usually needs the address line(Street Address), locality(City/Town), and adminDistrict(State) to geocode the addresses. You also need to pass the developer key you retrieved from Bing through the geocoder.

After running this, your output should be the latitudinal and longitudinal coordinates of the address we used for testing.

In [4]:
## This piece of code is hidden in our HTML Script using tags. 
## This is done by going to 'view' then selecting tags and entering remove-input
## This can be done for outputs by clicking "remove-output", you can hide it as well instead of just removing
APIKey= 'AiBnj44fEKq_ivctmWJQMIngyMz7QZFIXAXtSuWQ6VKz2AMSiPmqHjQf7yEIzvZP'

In [5]:
g = geocoder.bing(None, addressline = '2831 Henderson Rd, Redding, CA 96002', 
                  locality='Redding', adminDistrict='California', 
                  method='details', key= APIKey)
g.json
for result in  g:
    print(result.latlng)

Status code 403 from http://dev.virtualearth.net/REST/v1/Locations: ERROR - 403 Client Error: Forbidden for url: http://dev.virtualearth.net/REST/v1/Locations?adminDistrict=California&locality=Redding&o=json&inclnb=1&key=AiBnj44fEKq_ivctmWJQMIngyMz7QZFIXAXtSuWQ6VKz2AMSiPmqHjQf7yEIzvZP&maxResults=1


As can be seen below, the same can be done to reverse geocode coordinates to retrieve specific address locations for any place in the world. 

In [6]:
g = geocoder.bing([[26.351670, 127.769400], [48.845580, 2.321807]], method='batch_reverse', key= APIKey)

for result in g:
    print(result.address, result.city, result.postal, result.state, result.country)

Status code 403 from http://spatial.virtualearth.net/REST/v1/Dataflows/Geocode: ERROR - 403 Client Error: Forbidden for url: http://spatial.virtualearth.net/REST/v1/Dataflows/Geocode?input=csv&key=AiBnj44fEKq_ivctmWJQMIngyMz7QZFIXAXtSuWQ6VKz2AMSiPmqHjQf7yEIzvZP


### Reading In and Transforming The Data

Now that its been confirmed that the geocoder tool is working, lets read in the excel file of the addresseses we'd like to geocode. 

In [7]:
df = pd.read_excel('addresses.xlsx')

FileNotFoundError: [Errno 2] No such file or directory: 'addresses.xlsx'

In [39]:
df

Unnamed: 0,address
0,"Whitlow Plan, Brookland Grove, Washington, DC ..."
1,"Unit 02 Plan, Aria Reserve Miami, Miami, FL 33137"
2,"Unit 01 A Plan, Aria Reserve Miami, Miami, FL ..."
3,"The Buchanan Plan, The Townhomes At Michigan P..."
4,"Plan 3 Plan, Hudson At Belterra, Austin, TX 78737"
...,...
191,"065-620-017, Redding, CA 96001"
192,"(Undisclosed Address), Washington, DC 20020"
193,"(Undisclosed Address), Miami, FL 33177"
194,"(Undisclosed Address), Miami, FL 33125"


Since the addresses are not formatted in the way necessary for the Bing API to do its work, some data transformation techniques will have to be employed.  Our data frame must have columns for the __address line__(Street Address), __locality__(City/Town), and  __adminDistrict__(State). 
We'll follow the steps below and use __Pandas__ as well as the __re__ packages to clean the data. 
1. Create and define a new Data frame outside of the forloop 
2. Use the re function to split our addresses by commas, which allow us to define the address, street, and city using their location each address string. 
3. Use the re function to split the addresses by spaces, which will allow us to extract the state and zip code using their location in the string
4. After defining all the and cleaning the key data needed, append the newly cleaned observations into the geo Data frame.

In [28]:
geo = pd.DataFrame(columns = ['address','street','city','state','zip','country'])

for i in df.address:
    s= i.split(',',2)
    l = i.split()
    address = i
    street = s[0]
    #print(i)
    city = s[1]
    country = 'U.S.'
    state = l[-2]
    zipcode = l[-1]    
    
    geo = geo.append({'address':i,'street':street,'city':city,'state':state,
                      'zip':zipcode,'country':country}, ignore_index=True)

Columns for the latitude and longitude coordinates the geocoder will retrieve from the the Bing API also needed to be created and added to the dataframe

In [29]:
geo['lat'] = ' '
geo['long'] = ' '

This is how the geo data frame should look like after our mini preprocessing session:

In [14]:
geo

Unnamed: 0,address,street,city,state,zip,country,lat,long
0,"Whitlow Plan, Brookland Grove, Washington, DC ...",Whitlow Plan,Brookland Grove,DC,20017,U.S.,,
1,"Unit 02 Plan, Aria Reserve Miami, Miami, FL 33137",Unit 02 Plan,Aria Reserve Miami,FL,33137,U.S.,,
2,"Unit 01 A Plan, Aria Reserve Miami, Miami, FL ...",Unit 01 A Plan,Aria Reserve Miami,FL,33137,U.S.,,
3,"The Buchanan Plan, The Townhomes At Michigan P...",The Buchanan Plan,The Townhomes At Michigan Park,DC,20017,U.S.,,
4,"Plan 3 Plan, Hudson At Belterra, Austin, TX 78737",Plan 3 Plan,Hudson At Belterra,TX,78737,U.S.,,
...,...,...,...,...,...,...,...,...
191,"065-620-017, Redding, CA 96001",065-620-017,Redding,CA,96001,U.S.,,
192,"(Undisclosed Address), Washington, DC 20020",(Undisclosed Address),Washington,DC,20020,U.S.,,
193,"(Undisclosed Address), Miami, FL 33177",(Undisclosed Address),Miami,FL,33177,U.S.,,
194,"(Undisclosed Address), Miami, FL 33125",(Undisclosed Address),Miami,FL,33125,U.S.,,


## Geocoding The Addresses

Now its time to geocode. We will create a for loop to pass each address through the geocoder object created a couple steps above, except this time, we’ll be passing the column names for each iteration through it. Same as before, the Bing API Developer key will also have to be passed through the geocoder. We will then us the g.lat and g.long functions to request the latitudinal and longitudinal coordinates for all of our addresses.

In [15]:
for i in geo.index:
    try:
        g = geocoder.bing(None, addressline = geo['street'][i], locality= geo['city'][i], adminDistrict= geo['state'][i], PostalCode= geo['zip'][i], Countryregion= geo['country'][i], method='details', key='AqkhJ4a6kTQ6c4rm5vicHt7BsTav-KsWdaaKCnP794yFpzMDWPa1XCp14WUliC20')
        g.json
        geo.loc[i,'lat']=g.lat
        geo.loc[i, 'long']=g.lng
    except:
        geo.loc[i,'lat']=''
        geo.loc[i,'long']=''
    

Now lets take a look at the geo data frame

In [16]:
geo

Unnamed: 0,address,street,city,state,zip,country,lat,long
0,"Whitlow Plan, Brookland Grove, Washington, DC ...",Whitlow Plan,Brookland Grove,DC,20017,U.S.,38.904778,-77.016289
1,"Unit 02 Plan, Aria Reserve Miami, Miami, FL 33137",Unit 02 Plan,Aria Reserve Miami,FL,33137,U.S.,28.595512,-82.487343
2,"Unit 01 A Plan, Aria Reserve Miami, Miami, FL ...",Unit 01 A Plan,Aria Reserve Miami,FL,33137,U.S.,28.595512,-82.487343
3,"The Buchanan Plan, The Townhomes At Michigan P...",The Buchanan Plan,The Townhomes At Michigan Park,DC,20017,U.S.,38.904778,-77.016289
4,"Plan 3 Plan, Hudson At Belterra, Austin, TX 78737",Plan 3 Plan,Hudson At Belterra,TX,78737,U.S.,31.463848,-99.333298
...,...,...,...,...,...,...,...,...
191,"065-620-017, Redding, CA 96001",065-620-017,Redding,CA,96001,U.S.,40.574638,-122.381088
192,"(Undisclosed Address), Washington, DC 20020",(Undisclosed Address),Washington,DC,20020,U.S.,38.892063,-77.019913
193,"(Undisclosed Address), Miami, FL 33177",(Undisclosed Address),Miami,FL,33177,U.S.,25.775084,-80.194702
194,"(Undisclosed Address), Miami, FL 33125",(Undisclosed Address),Miami,FL,33125,U.S.,25.775084,-80.194702


We now have the coordinates for every single one of our addresses and have completed the geocoding process