# Geocoding Intersections

As an intermediate step in processing social media text, I will use geopy and gmaps to convert strings of road intersections into latitude and longitude coordinates for use in our render_closures function.

In [1]:
import pandas as pd
import geopy
import gmaps as gmaps
from python_scripts.render_closures import render_closures
import python_scripts.config as config
gmaps.configure(api_key= config.gmaps_key)

## Fake Data

To test populating our map.

In [2]:
fake_closures = {
    'start':[
        'rainier ave s & s edmunds st',
        's oregon st & 38th ave s',
        's genesee st & 42nd ave s'
    ],
    'end':[
        '35th ave s & s edmunds st',
        's oregon st & rainier ave s',
        's alaska st & 42nd ave s'
    ],
    'city':[
        'seattle',
        'seattle',
        'seattle'
    ],
    'text':['Fallen Power Line','Construction','Collision']
}
fake = pd.DataFrame(fake_closures)
fake.head()

Unnamed: 0,start,end,city,text
0,rainier ave s & s edmunds st,35th ave s & s edmunds st,seattle,Fallen Power Line
1,s oregon st & 38th ave s,s oregon st & rainier ave s,seattle,Construction
2,s genesee st & 42nd ave s,s alaska st & 42nd ave s,seattle,Collision


## Functionality

Exploring and establishing the code for our function: take the dataframe in and return latitude and longitude start and end points, for use in the render_closures function

In [3]:
# instantiate a google maps geocoder
geolocator = geopy.geocoders.GoogleV3(api_key=config.gmaps_key)
# extract the latitude and longitude
location = geolocator.geocode(fake.start[0])
# print it to check
print((location.latitude, location.longitude))


(47.55866109999999, -122.2854374)


That works so far, but that may only be because my IP address is *within* the city in question. Let's try adding in the city, just to be sure.

In [4]:
# instantiate a google maps geocoder
geolocator = geopy.geocoders.GoogleV3(api_key=config.gmaps_key)
# extract the latitude and longitude
location = geolocator.geocode(fake.start[0] +',' + fake.city[0])
# print it to check
print(location)

Rainier Ave S & S Edmunds St, Seattle, WA 98118, USA


## Building a DataFrame

That seems to work, so now we can iterate through our intersection-based DataFrame and output a shiny new lat/long-based DataFrame and write a function to do just that.

In [5]:
# instantiate a google maps geocoder
geolocator = geopy.geocoders.GoogleV3(api_key=config.gmaps_key)
# build lists to put into our new DataFrame
starts = []
ends = []
texts = []

# iterate over rows
for index,point in fake.iterrows():
    # geocode the start and end points
    new_start = geolocator.geocode(fake.start[index] +',' + fake.city[index])
    new_end = geolocator.geocode(fake.end[index] +',' + fake.city[index])
    # put their latitudes and longitudes into our lists
    starts.append([new_start.latitude,new_start.longitude])
    ends.append([new_end.latitude,new_end.longitude])
    texts.append(fake.text[index])

# put all that together!
new_fake = {'start':starts,'end':ends,'text':texts}
new_df = pd.DataFrame(new_fake)
new_df

Unnamed: 0,start,end,text
0,"[47.55891039999999, -122.2854841]","[47.5586649, -122.2888154]",Fallen Power Line
1,"[47.5628326, -122.2849706]","[47.5628256, -122.2878209]",Construction
2,"[47.564117, -122.2807269]","[47.5605458, -122.280724]",Collision


In [6]:
# for reference:
fake

Unnamed: 0,start,end,city,text
0,rainier ave s & s edmunds st,35th ave s & s edmunds st,seattle,Fallen Power Line
1,s oregon st & 38th ave s,s oregon st & rainier ave s,seattle,Construction
2,s genesee st & 42nd ave s,s alaska st & 42nd ave s,seattle,Collision


In [7]:
# check that it renders:
render_closures(new_df)

Figure(layout=FigureLayout(height='400px', margin='0 auto 0 auto', padding='1px', width='700px'))

## Defining a function

Putting our previous work into a function in order to save that function to a callable Python script.

In [8]:
def intersect_to_coords(data,exdata=None):
    import pandas as pd
    import geopy
    # instantiate a google maps geocoder
    geolocator = geopy.geocoders.GoogleV3(api_key=config.gmaps_key)
    # build lists to put into our new DataFrame
    starts = []
    ends = []
    texts = []

    # iterate over rows
    for index,point in data.iterrows():
        # geocode the start and end points
        new_start = geolocator.geocode(data.start[index] +',' + data.city[index])
        new_end = geolocator.geocode(data.end[index] +',' + data.city[index])
        # put their latitudes and longitudes into our lists
        starts.append([new_start.latitude,new_start.longitude])
        ends.append([new_end.latitude,new_end.longitude])
        texts.append(data.text[index])

    # put all that together!
    new_dict = {'start':starts,'end':ends,'text':texts}
    new_df = pd.DataFrame(new_fake)
    
    if exdata is not None:
        new_df = pd.concat([new_df,exdata],ignore_index=True)
        tuple(new_df['start'])
        tuple(new_df['end'])
    return new_df

### Testing the function with our fabricated data frame

In [9]:
df3 = intersect_to_coords(fake)
render_closures(df3)

Figure(layout=FigureLayout(height='400px', margin='0 auto 0 auto', padding='1px', width='700px'))

### Testing the function with a fabricated data frame built in another notebook

In [10]:
from ast import literal_eval

coords = pd.read_csv('./coords-df.csv',
                     converters={"start": literal_eval,'end':literal_eval})
coords

Unnamed: 0,start,end,text
0,"[47.608222, -122.334897]","[47.610146, -122.336646]",Fallen Power Line
1,"[47.607477, -122.33421]","[47.606116, -122.337564]",Construction
2,"[47.60973, -122.33782]","[47.610663, -122.335567]",Collision


In [11]:
df = intersect_to_coords(fake,coords)
df

Unnamed: 0,start,end,text
0,"[47.55891039999999, -122.2854841]","[47.5586649, -122.2888154]",Fallen Power Line
1,"[47.5628326, -122.2849706]","[47.5628256, -122.2878209]",Construction
2,"[47.564117, -122.2807269]","[47.5605458, -122.280724]",Collision
3,"[47.608222, -122.334897]","[47.610146, -122.336646]",Fallen Power Line
4,"[47.607477, -122.33421]","[47.606116, -122.337564]",Construction
5,"[47.60973, -122.33782]","[47.610663, -122.335567]",Collision


In [12]:
# Misc. cell for checking formatting and datatypes of the new dataframe 
df.start[3]
# for index,_ in df.iterrows():
#     print(df.start[index])
#     print(type(df.start[index]))
#     print('---')
#     print(df.end[index])
#     print(type(df.end[index]))
#     print('---')
#     print(df.text[index])
#     print(type(df.text[index]))
#     print('*****')

[47.608222, -122.334897]

In [13]:
# and the final test:
render_closures(df)

Figure(layout=FigureLayout(height='400px', margin='0 auto 0 auto', padding='1px', width='700px'))

## Mission accomplished!

I manually saved the above function into another file in python_scripts for future use