## Geocoding addresses in python with Geopy
This notebook demonstrates a simple python geocoding and mapping workflow.

###  Install geocoding and mapping libraries that we will use.

- [geopy](https://geopy.readthedocs.io/), for consistant interface to different geocoding APIs
- [folium](https://github.com/python-visualization/folium), a python library for making interactive maps

In [1]:
# Geocoding library


In [2]:
!pip install geopy
!pip install folium

Collecting geopy
  Downloading geopy-2.1.0-py3-none-any.whl (112 kB)
Collecting geographiclib<2,>=1.49
  Downloading geographiclib-1.50-py3-none-any.whl (38 kB)
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0


## Import the libraries

In [11]:
import pandas as pd
from geopy.geocoders import Nominatim
import folium
import requests

## Create a geocoder that uses the OpenStreetMap Nominatim API

In [9]:
geolocator = Nominatim(timeout=10, user_agent = "denironyx@gmail.com")

## Test the geocoder with one address
location = geolocator.geocode('Barrows Hall, Berkeley CA')
location

Location(Social Sciences Building, Eshleman Road, Southside, Berkeley, Alameda County, California, 94720-1076, United States, (37.87010975, -122.25792906636786, 0.0))

## Upload a CSV file of addresses to geocode
Fetch the data to geocode with wget

In [6]:
url = "https://raw.githubusercontent.com/dlab-geo/geocoding/master/address_data/SFLandmarks.csv"
r = requests.get(url, allow_redirects=True)
open('SFLandmarks.csv', 'wb').write(r.content)

156

In [7]:
ls

 Volume in drive C is Windows
 Volume Serial Number is 5CCB-DD86

 Directory of c:\Users\Dee\root\Projects\personal_real_projects\geocoding\geocoderpy

06/11/2021  01:43 PM    <DIR>          .
06/11/2021  01:43 PM    <DIR>          ..
06/11/2021  01:08 PM    <DIR>          Include
06/11/2021  01:08 PM    <DIR>          Lib
06/11/2021  01:45 PM             4,978 nominatim_geocode.ipynb
06/11/2021  01:08 PM                75 pyvenv.cfg
06/11/2021  01:09 PM    <DIR>          Scripts
06/11/2021  01:43 PM               156 SFLandmarks.csv
               3 File(s)          5,209 bytes
               5 Dir(s)  33,762,004,992 bytes free


### Read in the file with `pandas`

In [8]:
df = pd.read_csv('SFLandmarks.csv')
print(df)

   ID          Landmark           City State
0   1      Union Square  San Francisco    CA
1   2        Coit Tower  San Francisco    CA
2   3  Golden Gate Park  San Francisco    CA
3   4        Twin Peaks  San Francisco    CA


### Geocode the addresses in the Pandas Dataframe

The next cell does the following

- Iterates over all rows in the dataframe `df`.
- Joins the values in the columns `Landmark`, `City`, and `State` into one string (the full address)
- Submits the string we just created as the address to be geocoded
- Saves results to the `geocodes` list object.

In [11]:
geocodes = [geolocator.geocode(', '.join([df['Landmark'][i], df['City'][i], df['State'][i]])) for i in range(len(df))]

Goecoding output

In [12]:
geocodes

[Location(Union Square, San Francisco, San Francisco City and County, San Francisco, California, United States, (37.7879363, -122.40751740318035, 0.0)),
 Location(Coit Tower, Telegraph Hill Boulevard, Telegraph Hill, San Francisco, San Francisco City and County, San Francisco, California, 94113, United States, (37.80237905, -122.40583435461313, 0.0)),
 Location(Golden Gate Park, San Francisco City and County, San Francisco, California, 94118-4504, United States, (37.769368099999994, -122.48218371117709, 0.0)),
 Location(Twin Peaks, c, Christmas Tree Point Road, San Francisco, San Francisco City and County, San Francisco, California, 94114-1818, United States, (37.75464, -122.44648, 0.0))]

Add the output `latitude` and `longitude` values in the geocodes list to the df dataframe

In [13]:
df['lat'] = [g.latitude for g in geocodes]
df['lon'] = [g.longitude for g in geocodes]
df

Unnamed: 0,ID,Landmark,City,State,lat,lon
0,1,Union Square,San Francisco,CA,37.787936,-122.407517
1,2,Coit Tower,San Francisco,CA,37.802379,-122.405834
2,3,Golden Gate Park,San Francisco,CA,37.769368,-122.482184
3,4,Twin Peaks,San Francisco,CA,37.75464,-122.44648


## Map the output
First, make an empty map centered on San Francisco.

In [18]:
map1 = folium.Map(location=(37.754640,	-122.446480), zoom_start=12)
for index,row in df.iterrows():
    # Add the geocoded locations to the map
    folium.Marker(location=(row['lat'], row['lon']), popup=row['Landmark']).add_to(map1)

display(map1)