DSI BOS 11 (May 2020) Project 5

Alex Golden, Jungmoon Ham, Luke Podsiadlo, Zach Tretter

Workbook 6 - Mapping

----------

## Geocode and Map Addresses from Police Radio Speech-to-Text

Relevance to Problem Statement : _"The tool will flag neighborhoods or specific streets where the police and first-respondents were called to provide assistance related to the event."_

#### Workflow Steps

1. Import dataframe.  Addresses at minimum data, additional context as avaiable

2. Find Latitude/Longitude from address ("geocode") via google to enable plotting on map

3. Visualize on map using [Folium](https://python-visualization.github.io/folium/)

#### API Key (Google Geocoding API)

In [1]:
# Text String from Google
my_key = 

_____________

## Imports

In [2]:
import pandas as pd
import numpy as np
import requests

# !pip install folium
import folium

### Import Dataframe

In [7]:
## Import
df = pd.read_csv("./Datasets/transcript_with_addresses_v2.csv")
# df_e = pd.read_csv("./Datasets/transcribed_audio/enhanced-test-df.csv")

In [8]:
# Drop columns as desired
df = df.drop(columns = ["Unnamed: 0",
                        'tokens',
                        'streets',
                        'numbers'])

In [9]:
def str_to_list(x):
    x = x.replace("[","")
    x = x.replace("]","")
    x = x.replace("'","")
    x = x.strip()
    x = x.split(",")
    return x

In [10]:
df['full_streets']=df['full_streets'].map(str_to_list)

In [11]:
# Rearrange columns as needed
df = df[['confidence',
         'file_name',
         'full_streets',
         'transcript']]

In [12]:
df_boston = df[df['file_name']!='watertown_manhunt'].copy()

df_watertown = df[df['file_name']=='watertown_manhunt'].copy()

### Function that Finds Latitude and Longitude from Address

Function code adapted from [article on geocoding written by Shane Lynn](https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/)

Given an address, this code will create a URL to query google.

It will return this dictionary

| Key                 | Value             | Information                                              |
|---------------------|-------------------|----------------------------------------------------------|
| lat_long            | tuple (lat, long) | Geocode result from google                               |
| full_output_address | Text String       | Full address corresponding to geocode returned by google |
| input_string        | Text String       | String that was searched on (from dataframe)             |
| number_of_results   | Integer           | Number of results                                        |
| status              | Text String       | 'Ok' if request successful<br>None otherwise             |

In [13]:
def get_google_latlong(address,
                       api_key=my_key,
                       return_full_response=False):
    
    # Create the URL for geocoding
    geocode_url = "https://maps.googleapis.com/maps/api/geocode/json?address={}".format(address)
    
    # Add the API key
    if api_key is not None:
        geocode_url = geocode_url + "&key={}".format(api_key)
        
    # Ping google for the results:
    results = requests.get(geocode_url)
    
    # Display results as JSON dictionary
    results = results.json()

    # If no result, return None
    if len(results['results']) == 0:
        output = {
            "lat_long":(False)
        }
        
    else:    
        answer = results['results'][0]
        output = { "lat_long" : (
            answer.get('geometry').get('location').get('lat'),
            answer.get('geometry').get('location').get('lng')
        )
                 }
        
        output['full_output_address'] = answer.get('formatted_address')
        
    # Append some other details:    
    output['input_string'] = address
    output['number_of_results'] = len(results['results'])
    output['status'] = results.get('status')

    if return_full_response is True:
        output['response'] = results
    
    return output

## Folium Map - Watertown Manhunt Transcript

In [14]:
df_watertown.head(3)

Unnamed: 0,confidence,file_name,full_streets,transcript
418,manual,watertown_manhunt,"[816 Memorial, 816 Black, 816 Drive, 816 Ea...","One 5'7"", the second with darker skin, both su..."
419,manual,watertown_manhunt,[609 The],Is there an officer driving the 609 right now?
420,manual,watertown_manhunt,"[94 Long, 94 Spruce]",Shots Fired! Shots Fired! Officers pinned down...


In [15]:
# Build a list of addresses
watertown_address_list = []
for row in df_watertown['full_streets']:
    for i in row:
        watertown_address_list += [[i, 'Watertown MA']]

# Display first 5 elements
watertown_address_list[:5]

[['816 Memorial', 'Watertown MA'],
 [' 816 Black', 'Watertown MA'],
 [' 816 Drive', 'Watertown MA'],
 [' 816 Eastern', 'Watertown MA'],
 [' 816 The', 'Watertown MA']]

--------

### Note - The below cell costs money to run!

In [16]:
# Build a list of geocoded addresses
watertown_lat_long_list = []
for i in watertown_address_list:
    j = get_google_latlong(i)['lat_long']
    print(i,j)
    if j:
        watertown_lat_long_list += [j]

['816 Memorial', 'Watertown MA'] (42.3613221, -71.11567699999999)
[' 816 Black', 'Watertown MA'] (42.364476, -71.181772)
[' 816 Drive', 'Watertown MA'] (42.3709299, -71.1828321)
[' 816 Eastern', 'Watertown MA'] (42.3709299, -71.1828321)
[' 816 The', 'Watertown MA'] (42.3709299, -71.1828321)
[' 816 Cambridge', 'Watertown MA'] (42.3709299, -71.1828321)
[' 816 Middle', 'Watertown MA'] (42.3693591, -71.1908705)
[' 816 Station', 'Watertown MA'] (42.3709299, -71.1828321)
['609 The', 'Watertown MA'] (42.3709299, -71.1828321)
['94 Long', 'Watertown MA'] (42.3709299, -71.1828321)
[' 94 Spruce', 'Watertown MA'] (42.3668022, -71.1701717)
['111 Dexter', 'Watertown MA'] (42.36681919999999, -71.16280909999999)
[' 111 Hazel', 'Watertown MA'] (42.3675637, -71.1649596)
['94 Long', 'Watertown MA'] (42.3709299, -71.1828321)
[' 94 Spruce', 'Watertown MA'] (42.3668022, -71.1701717)
['982', 'Watertown MA'] (42.3709299, -71.1828321)
[' Watertown', 'Watertown MA'] (42.3709299, -71.1828321)
['1181 Lincoln', 'W

### Note - The above cell costs money to run!

-------------

In [42]:
pd.DataFrame(watertown_lat_long_list).to_csv("watertown_geocodes.csv")

In [17]:
# Create a Folium Object (Centered at Harvard Square)
m = folium.Map(
    location = [42.3736, -71.1190],
    tiles = 'Stamen Terrain',
    zoom_start = 12,
    control_scale=False
)

# Iterate though our list of latitude and longitudes
for i in watertown_lat_long_list:
    folium.CircleMarker(
    radius = 10,
    location = i,
    color ='red'
    ).add_to(m)

# Display the map
m

-----------


## Folium Map - Boston Metro Police 01 May 2020 to 02 May 2020

In [18]:
df_boston.head(3)

Unnamed: 0,confidence,file_name,full_streets,transcript
0,0.692421913,sample92-25818-20200501-0941.wav,[19219 South],having a t show jokes to 19219 South Street
1,0.659736931,sample52-25818-20200501-1240.wav,"[495 1st, 495 North, 495 The, 495 2nd]",495 North despite a 13 sat on the 1st and 2nd ...
2,0.726749361,sample1232-25818-20200501-1310.wav,"[373 Avenue, 373 Highland]","Island, 47th Street. Our address is 7173 Highl..."


In [19]:
# Build a list of addresses
boston_address_list = []
for row in df_boston['full_streets']:
    for i in row:
        boston_address_list += [[i, 'Massachusetts']]

len(boston_address_list)

1481

-----------

### Note - The below cell costs money to run!

In [20]:
# Build a list of geocoded addresses
boston_lat_long_list = []
for index,i in enumerate(boston_address_list):
    j = get_google_latlong(i)['lat_long']
    if j:
        boston_lat_long_list += [j]
    if index % 10 == 0 :
        print(i,j)

['19219 South', 'Massachusetts'] (42.4072107, -71.3824374)
[' 3831 Ellis', 'Massachusetts'] (42.4211198, -71.0676361)
[' 342 Hanover', 'Massachusetts'] (42.3649691, -71.0535143)
[' 1 Lyman', 'Massachusetts'] (42.3198318, -71.61289169999999)
['81 Essex', 'Massachusetts'] (42.3524171, -71.060316)
[' 120 Newton', 'Massachusetts'] (42.3313461, -71.2463017)
[' 1929 Avenue', 'Massachusetts'] (42.4072107, -71.3824374)
['on Hatch', 'Massachusetts'] (42.1276832, -71.5201393)
[' 269 Cross', 'Massachusetts'] (42.4664118, -71.1360065)
[' 87 East', 'Massachusetts'] (42.1132377, -71.6547507)
[' 1190 The', 'Massachusetts'] (42.3240969, -71.0641998)
[' 1792 Warren', 'Massachusetts'] (42.2125423, -72.19118499999999)
[' 780 Albany', 'Massachusetts'] (42.3337186, -71.0725726)
[' 11279 Father', 'Massachusetts'] (42.4072107, -71.3824374)
[' 1311 Boston', 'Massachusetts'] (42.359595, -71.062193)
['2126 Albany', 'Massachusetts'] (42.6525793, -73.7562317)
['9 Still', 'Massachusetts'] (41.9046305, -70.2408325)

[' 21', 'Massachusetts'] (42.4072107, -71.3824374)
[' 3150 The', 'Massachusetts'] (42.275308, -71.748187)
['5350 Off', 'Massachusetts'] (42.3055488, -71.3609194)
[' 1537 Red', 'Massachusetts'] (42.4072107, -71.3824374)
['on Dale', 'Massachusetts'] (42.3351874, -73.2463113)
['1179 Fulton', 'Massachusetts'] (42.3623975, -71.0531638)
[' 59 Audubon', 'Massachusetts'] (42.4093993, -71.3317636)
[' 51 Red', 'Massachusetts'] (42.4072107, -71.3824374)
[' 3467 Off', 'Massachusetts'] (42.3311111, -71.08)
[' 255 Northampton', 'Massachusetts'] (42.3356669, -72.6739083)
['32032 Hyde', 'Massachusetts'] (42.2565289, -71.1240559)
[' 104 1st', 'Massachusetts'] (42.4072107, -71.3824374)


### Note - The above cell costs money to run!

--------

In [41]:
pd.DataFrame(boston_lat_long_list).to_csv("boston_geocodes.csv")

In [40]:
# Create a Folium Object (Centered at Harvard Square)
m = folium.Map(
    location = [42.3736, -71.1190],
    tiles = 'Stamen Terrain',
    zoom_start = 12,
    control_scale=False
)

# Iterate though our list of latitude and longitudes
for i in boston_lat_long_list:
    folium.CircleMarker(
    radius = 2,
    location = i,
    color ='red'
    ).add_to(m)

# Display the map
m