# Practice Geocoding

## 1. Installs and Imports

In [3]:
# Start writing code here...\
!pip install geopy orca

You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m


In [21]:
# non-standard lib imports
from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import ArcGIS
from geopy import distance
import geopy
import pandas as pd
import plotly.express as px

# standard lib
import math

## 2. Start Geocoding

In [8]:
# initialize geocoder using ArcGIS mapping systems
geocoder = ArcGIS()

Geocoded addresses return the geocoded address, and a tuple of lat, long, alt.

Sample geocoding solution:

In [11]:
str_loc1: str = '2419 Ashbury Circle Cape Coral, FL 33991'
coded1: geopy.location.Location = geocoder.geocode(str_loc1)
coded1

Location(2419 Ashbury Cir, Cape Coral, Florida, 33991, (26.62169000442219, -82.02573496942675, 0.0))

In [12]:
str_loc2: str = '2855 Gulf to Bay Blvd Clearwater FL 33759'
coded2: geopy.location.Location = geocoder.geocode(str_loc2)
coded2

Location(2855 Gulf To Bay Blvd, Clearwater, Florida, 33759, (27.95835600139496, -82.720330964178, 0.0))

Calculate distance between two addresses.

**\*Important**: geopy.distance.lonlat takes longitude first, *then* latitude.

Steps:
- Convert a geopy.location.Location into a geopy.point.Point using geopy.distance.lonlat
- Use geopy.distance.distance to calculate distance between Points

In [15]:
p1 = distance.lonlat(coded1.longitude, coded1.latitude)
p2 = distance.lonlat(coded2.longitude, coded1.latitude)
d = distance.distance(p1, p2)
d

Distance(69.17116695821906)

Returned Distance is a **straight line** and defaults to kilometers (km) but can be converted to miles.

In [16]:
print(f"Kilometers: {d.km}")
print(f"Miles: {d.miles}")

Kilometers: 69.17116695821906
Miles: 42.98097048127626


## 3. Geocode Addresses from Data File

In [18]:
target_cols = ('Location Address', 'City', 'State Code', 'Zip')
df = pd.read_csv('../data/wastewater_plants.csv', usecols=target_cols)
df.dropna(inplace=True, how='all') # remove null rows
df.sample(3)

Unnamed: 0,NPDES ID,Permit Name,Permit Status Desc,Primary Permit SIC Code,Primary Permit SIC Desc,DMR Cognizant Official,DMR Cognizant Offcl Telephone,Location Address,City,State Code,Zip,County Name,Cont. Email Address,Total Actual Average Flow (MGD),Total App. Design Flow (MGD)
44,KY0021296,Providence WWTP,Effective,4952.0,Sewerage Systems,Terry Rice,270-836-6162,625 CEDAR ST,PROVIDENCE,KY,42450.0,Webster,terrylrice3827@icloud.com,0.724,0.629
114,KY0027227,Lake City STP,Terminated,4952.0,Sewerage Systems,Marion Deweese,270-362-8272,713 MARSHALL RD,GRAND RIVERS,KY,42045.0,Livingston,jdws@grandriverswireless.com,,0.3
204,KY0066583,South Hopkins Regional WWTP,Effective,4952.0,Sewerage Systems,Alan Todd,270-824-2171,410 OLD WHITE PLAINS RD,NORTONVILLE,KY,42442.0,Hopkins,atodd@madisonvillegov.com,0.306,0.75


Combine address fields into one column.

In [19]:
def combine_address_fields(row) -> str:
    """Concatenates address fields into one column/string value.
    
    Args:
        row: pandas dataframe row with 'Location Address', 'City', 'State Code', and 'Zip' columns

    Returns:
        str: full address

    """
    return f"{row['Location Address']} {row['City']} {row['State Code']} {row['Zip']}"

In [20]:
# apply function
df['full_address'] = df.apply(lambda row: combine_address_fields(row), axis=1)
df.full_address.sample(3)

333    JCT OF KY 80 & KY 7 NICHOLASVILLE KY 41630.0
28                      HARLIN LN HARDIN KY 42048.0
58               5512 HITT LN LOUISVILLE KY 40285.0
Name: full_address, dtype: object

Geocode file using new column and geopy's RateLimiter, takes about 10 minutes.

In [43]:
# initialize new geocoder
geolocator = ArcGIS()
# pass locator to RateLimiter with delay
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
# apply rate-limited geocoder to the full_address column, save result in location column
df['location'] = df['full_address'].apply(geocode)

Parse out location column into latitude and longitude columns.

In [44]:
df['latitude'] = df.location.apply(lambda x: x.latitude)
df['longitude'] = df.location.apply(lambda x: x.longitude)
df.drop(['location'], axis=1, inplace=True) # drop location column

## 4. Map Geocoded Addresses (Interactive)

In [51]:
# utilize plotly_express scatter_mapbox and dataframe columns
fig = px.scatter_mapbox(df, lat="latitude", lon="longitude", hover_name="City" ,
                        color_discrete_sequence=["fuchsia"], zoom=5, height=800)
fig.update_layout(mapbox_style="open-street-map") # include streets
fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0}) # remove margins to make figure larger
fig.show()

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=41871a82-6b9f-4a18-a04e-ef9382306e39' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>