<a href="https://colab.research.google.com/github/bakhtiargithub/data/blob/main/Nominatim_geocode_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Geocoding addresses in Python with Geopy

This notebook demonstrates a simple python geocoding and mapping workflow.

###  Install geocoding and mapping libraries that we will use.

- [geopy](https://geopy.readthedocs.io/), for consistant interface to different geocoding APIs
- [folium](https://github.com/python-visualization/folium), a python library for making interactive maps

In [None]:
# Geocoding library 
!pip install geopy



In [None]:
# Install folium
!pip install folium




### Import the libraries we just installed.

In [None]:
import pandas as pd
from geopy.geocoders import Nominatim
import folium

### Create a geocoder that uses the [OpenStreetMap Nominatim API](https://https://wiki.openstreetmap.org/wiki/Nominatim/FAQ)

In [None]:
geolocator = Nominatim(timeout=10, user_agent = "dlab.berkeley.edu-workshop")

Test the geocoder with one address

In [None]:
location = geolocator.geocode('Barrows Hall, Berkeley CA')
location

Location(Barrows Hall, Eshleman Road, Southside, Berkeley, Alameda County, California, 94720, United States of America, (37.87010715, -122.25792696947403, 0.0))

### Upload a CSV file of addresses to geocode

Fetch the data to geocode with `wget`
- Alternatively, you can download this file [here](https://raw.githubusercontent.com/dlab-geo/geocoding/master/address_data/SFLandmarks.csv)

In [None]:
!wget https://raw.githubusercontent.com/dlab-geo/geocoding/master/address_data/SFLandmarks.csv

--2020-07-08 07:08:58--  https://raw.githubusercontent.com/dlab-geo/geocoding/master/address_data/SFLandmarks.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 156 [text/plain]
Saving to: ‘SFLandmarks.csv.1’


2020-07-08 07:08:59 (8.75 MB/s) - ‘SFLandmarks.csv.1’ saved [156/156]



In [None]:
!ls

oak_liquor_stores.csv  sample_data  SFLandmarks.csv  SFLandmarks.csv.1


### Read in the file with `pandas`


In [None]:
df = pd.read_csv('SFLandmarks.csv')
print(df)

   ID          Landmark           City State
0   1      Union Square  San Francisco    CA
1   2        Coit Tower  San Francisco    CA
2   3  Golden Gate Park  San Francisco    CA
3   4        Twin Peaks  San Francisco    CA


### Geocode the addresses in the Pandas Dataframe

The next cell does the following

- Iterates over all rows in the dataframe `df`.
- Joins the values in the columns `Landmark`, `City`, and `State` into one string (the full address)
- Submits the string we just created as the address to be geocoded
- Saves results to the `geocodes` list object.

In [None]:
geocodes = [geolocator.geocode(', '.join([df['Landmark'][i], df['City'][i], df['State'][i]]))for i in range(len(df))]


Take a look at the output.

In [None]:
geocodes

[Location(Union Square, San Francisco, San Francisco City and County, California, United States of America, (37.7879363, -122.40751740318035, 0.0)),
 Location(Coit Tower, Telegraph Hill Boulevard, Telegraph Hill, San Francisco, San Francisco City and County, California, 94113, United States of America, (37.80237905, -122.40583435461313, 0.0)),
 Location(Golden Gate Park, Richmond District, San Francisco, San Francisco City and County, California, 94118-4504, United States of America, (37.769368099999994, -122.48218371117709, 0.0)),
 Location(Twin Peaks, Christmas Tree Point Road, Cole Valley, San Francisco, San Francisco City and County, California, 94114-1818, United States of America, (37.75464, -122.44648, 0.0))]

Add the output `latitude` and `longitude` values in the `geocodes` list to the `df` dataframe

In [None]:
df['lat'] = [g.latitude for g in geocodes]
df['lon'] = [g.longitude for g in geocodes]
df

Unnamed: 0,ID,Landmark,City,State,lat,lon
0,1,Union Square,San Francisco,CA,37.787936,-122.407517
1,2,Coit Tower,San Francisco,CA,37.802379,-122.405834
2,3,Golden Gate Park,San Francisco,CA,37.769368,-122.482184
3,4,Twin Peaks,San Francisco,CA,37.75464,-122.44648


### Map the output

First, make an empty map centered on San Francisco. Tip, don't name your map **map** as that is a python function.

In [None]:
map1 = folium.Map(location=(37.754640,	-122.446480), zoom_start=12)
for index,row in df.iterrows(): 
  # Add the geocoded locations to the map
  folium.Marker(location=(row['lat'],row['lon']), popup=row['Landmark']).add_to(map1)

display(map1)

Click on the markers above to view the contents of each popup.

## Next steps - Geocode street addresses

You can use this basic workflow to geocode named places, zip codes, or addresses, depending on what is supported by the API you use. See the documentation for `geopy` about available geocoding APIs. Then carefully read the documentation for the geocoding API that you decide to use as they typically require `API keys` and have usage limits.

So, let's fetch some street address data.

In [None]:
!wget https://raw.githubusercontent.com/dlab-geo/geocoding/master/address_data/oak_liquor_stores.csv

--2020-07-08 07:09:02--  https://raw.githubusercontent.com/dlab-geo/geocoding/master/address_data/oak_liquor_stores.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1163 (1.1K) [text/plain]
Saving to: ‘oak_liquor_stores.csv.1’


2020-07-08 07:09:03 (65.2 MB/s) - ‘oak_liquor_stores.csv.1’ saved [1163/1163]



In [None]:
df = pd.read_csv('oak_liquor_stores.csv')

In [None]:
df

Unnamed: 0,id,name,street,city,state,zip,type
0,1,Wah Fay Liquors,2101 8th Ave,Oakland,CA,94606,p
1,2,Vision Liquor,1615 Macarthur Blvd,Oakland,CA,94602,p
2,3,Souza's Liquors,394 12th St,Oakland,CA,94607,p
3,4,Tk Liquors,1500 23th Ave,Oakland,CA,94606,p
4,5,Quadriga Wines Inc,6193 Ridgemont Dr,Oakland,CA,94619,p
5,6,Bev Mo,525 Embarcadero W,Oakland,CA,94607,c
6,7,Fairfax Liquor,5403 Foothill Blvd,Oakland,CA,94601,p
7,8,Saleen Market,1200 78th Ave,Oakland,CA,94621,m
8,9,Park Liquors,828 Franklin St,Oakland,CA,94607,p
9,10,Los Camellos,5913 International Blvd,Oakland,CA,94621,p


In [None]:
df['addr'] = (df['street'] + ' ' + df['city'] + ', ' + df['state'] ).str.strip()

In [None]:
df

Unnamed: 0,id,name,street,city,state,zip,type,addr
0,1,Wah Fay Liquors,2101 8th Ave,Oakland,CA,94606,p,"2101 8th Ave Oakland, CA"
1,2,Vision Liquor,1615 Macarthur Blvd,Oakland,CA,94602,p,"1615 Macarthur Blvd Oakland, CA"
2,3,Souza's Liquors,394 12th St,Oakland,CA,94607,p,"394 12th St Oakland, CA"
3,4,Tk Liquors,1500 23th Ave,Oakland,CA,94606,p,"1500 23th Ave Oakland, CA"
4,5,Quadriga Wines Inc,6193 Ridgemont Dr,Oakland,CA,94619,p,"6193 Ridgemont Dr Oakland, CA"
5,6,Bev Mo,525 Embarcadero W,Oakland,CA,94607,c,"525 Embarcadero W Oakland, CA"
6,7,Fairfax Liquor,5403 Foothill Blvd,Oakland,CA,94601,p,"5403 Foothill Blvd Oakland, CA"
7,8,Saleen Market,1200 78th Ave,Oakland,CA,94621,m,"1200 78th Ave Oakland, CA"
8,9,Park Liquors,828 Franklin St,Oakland,CA,94607,p,"828 Franklin St Oakland, CA"
9,10,Los Camellos,5913 International Blvd,Oakland,CA,94621,p,"5913 International Blvd Oakland, CA"


In [None]:
def geocode_my_address(addr):
  # function that won't crash if it runs into a bad address
  print('geocoding:', addr)
  try:
    x = geolocator.geocode(addr) #['geometry'].squeeze()
    lon_lat = x.longitude, x.latitude
    return lon_lat
  except:
    print("problem with address:", addr)
    lon_lat = None, None
    return lon_lat



In [None]:

df['lon'], df['lat'] = zip(*df['addr'].apply(lambda x: geocode_my_address(x)))


geocoding: 2101 8th Ave Oakland, CA
geocoding: 1615 Macarthur Blvd Oakland, CA
geocoding: 394 12th St Oakland, CA
geocoding: 1500 23th Ave Oakland, CA
problem with address: 1500 23th Ave Oakland, CA
geocoding: 6193 Ridgemont Dr Oakland, CA
geocoding: 525 Embarcadero W  Oakland, CA
geocoding: 5403 Foothill Blvd Oakland, CA
geocoding: 1200 78th Ave Oakland, CA
geocoding: 828 Franklin St Oakland, CA
geocoding: 5913 International Blvd Oakland, CA
geocoding: 3210 Harrison St Oakland, CA
geocoding: 1460 7th St Oakland, CA
geocoding: 1333 Peralta St Oakland, CA
geocoding: 3710 Telegraph Ave Oakland, CA
geocoding: 3293 Lakeshore Ave Oakland, CA
geocoding: 1647 8th St Oakland, CA
geocoding: 3849 Martin Luther King Jr Way Oakland, CA
problem with address: 3849 Martin Luther King Jr Way Oakland, CA
geocoding: 3900 Grand Ave Oakland, CA
geocoding: 7305 Edgewater Dr #D Oakland, CA
problem with address: 7305 Edgewater Dr #D Oakland, CA
geocoding: 350 E 18th St Oakland, CA


In [None]:
df

Unnamed: 0,id,name,street,city,state,zip,type,addr,lon,lat
0,1,Wah Fay Liquors,2101 8th Ave,Oakland,CA,94606,p,"2101 8th Ave Oakland, CA",-122.245,37.7983
1,2,Vision Liquor,1615 Macarthur Blvd,Oakland,CA,94602,p,"1615 Macarthur Blvd Oakland, CA",-122.224,37.8003
2,3,Souza's Liquors,394 12th St,Oakland,CA,94607,p,"394 12th St Oakland, CA",-122.271,37.8025
3,4,Tk Liquors,1500 23th Ave,Oakland,CA,94606,p,"1500 23th Ave Oakland, CA",,
4,5,Quadriga Wines Inc,6193 Ridgemont Dr,Oakland,CA,94619,p,"6193 Ridgemont Dr Oakland, CA",-122.168,37.7846
5,6,Bev Mo,525 Embarcadero W,Oakland,CA,94607,c,"525 Embarcadero W Oakland, CA",-122.279,37.7959
6,7,Fairfax Liquor,5403 Foothill Blvd,Oakland,CA,94601,p,"5403 Foothill Blvd Oakland, CA",-122.172,37.7693
7,8,Saleen Market,1200 78th Ave,Oakland,CA,94621,m,"1200 78th Ave Oakland, CA",-122.186,37.7556
8,9,Park Liquors,828 Franklin St,Oakland,CA,94607,p,"828 Franklin St Oakland, CA",-122.272,37.7999
9,10,Los Camellos,5913 International Blvd,Oakland,CA,94621,p,"5913 International Blvd Oakland, CA",-122.218,37.7738


Last Updated on July 8, 2020 by Patty Frontiera, pfrontiera@berkeley.edu 