# 06. Coordinate Acquisition
---

The purpose of this notebook is to leverage the `geocoder` mapping library in order to acquire the latitude and longitude coordinates for the locations referenced in the collected and cleaned tweets.  This process is executed after we have extracted locations identified in each tweet posted by one of the NJ 511 twitter accounts in the previous notebook.

---
## Table of Contents
---

- [Import Resources](#Import-Resources)
- [DataFrame Series Setup](#DataFrame-Series-Setup)
- [Organize and Save](#Organize-and-Save)

---
### Import Resources
---

Here we import the necessary libraries to enable geocode acquisition.  Furthermore, we read-in the cleaned `closure_location` data.

In [1]:
import pandas as pd
import geocoder

Read-in and light cleaning of csv

In [2]:
locs_df = pd.read_csv("../datasets/closure_locations.csv")

# Drop unncessary column
locs_df.drop(columns = "Unnamed: 0", inplace = True)

# Fill blank cells with ''
locs_df.fillna('', inplace = True)

locs_df.head()

Unnamed: 0,username,tweet,date_posted,expanded_tweet,highway,streets,exits,directions,locs,lanes,relocs,town
0,511nji295,Crash on I-295 southbound South of Exit 29 - U...,2019-11-06 23:56:56+00:00,Crash on Interstate 295 southbound South of Ex...,Interstate 295,,Exit 29 - US-30,southbound,,right lane,South of,Haddon Heights
1,511njace,"Construction, bridge painting on Atlantic City...",2019-11-06 23:52:57+00:00,"Construction, bridge painting on Atlantic City...",Atlantic City Expressway,,"Exit 12 - US-40 - Wrangleboro Road, Exit 12 - ...",westbound,,right lane,"between East of, West of","Hamilton Township, Hamilton Township"
2,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:41:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 6 - Pennsylva...",,,northbound,,,from,
3,511nji76,Crash on I-76 eastbound at Exit 2 - I-676 (Cam...,2019-11-06 23:41:56+00:00,Crash on Interstate 76 eastbound at Exit 2 - I...,Interstate 76,,Exit 2 - Interstate 676,eastbound,,right shoulder,,Camden
4,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:27:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 14 - Intersta...",,,Southbound,,all lanes,"between North of, and South of",Newark


---
### DataFrame Series Setup
---

Setting up table for acquiring coordinates

Creating `state` variable to add to location

In [3]:
state = "NJ"

Iterating through each identifiable location and setting up a `location_lookup` column for each non `''` cell.

In [4]:
# creating state variable to add to location_lookup
state = 'NJ'

# creating blank location_lookup column to fill via below for loop
locs_df['location_lookup'] = ''

# loop to populate location_lookup column
for i in list(range(locs_df.shape[0])):
    if locs_df['town'].loc[i] != '':
        locs_df['location_lookup'].loc[i] = locs_df['town'].loc[i] + ', ' + state
        
    elif locs_df['exits'].loc[i] != '':
        locs_df['location_lookup'].loc[i] = locs_df['exits'].loc[i] + ', ' + state
        
    elif locs_df['streets'].loc[i] != '':
        locs_df['location_lookup'].loc[i] = locs_df['streets'].loc[i] + ', ' + state
        
    elif locs_df['highway'].loc[i] != '':
        locs_df['location_lookup'].loc[i] = locs_df['highway'].loc[i] + ', ' + state
        
    else:
        locs_df['location_lookup'].loc[i] = state

In [5]:
locs_df.head()

Unnamed: 0,username,tweet,date_posted,expanded_tweet,highway,streets,exits,directions,locs,lanes,relocs,town,location_lookup
0,511nji295,Crash on I-295 southbound South of Exit 29 - U...,2019-11-06 23:56:56+00:00,Crash on Interstate 295 southbound South of Ex...,Interstate 295,,Exit 29 - US-30,southbound,,right lane,South of,Haddon Heights,"Haddon Heights, NJ"
1,511njace,"Construction, bridge painting on Atlantic City...",2019-11-06 23:52:57+00:00,"Construction, bridge painting on Atlantic City...",Atlantic City Expressway,,"Exit 12 - US-40 - Wrangleboro Road, Exit 12 - ...",westbound,,right lane,"between East of, West of","Hamilton Township, Hamilton Township","Hamilton Township, Hamilton Township, NJ"
2,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:41:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 6 - Pennsylva...",,,northbound,,,from,,"New Jersey Turnpike, Interchange 6 - Pennsylva..."
3,511nji76,Crash on I-76 eastbound at Exit 2 - I-676 (Cam...,2019-11-06 23:41:56+00:00,Crash on Interstate 76 eastbound at Exit 2 - I...,Interstate 76,,Exit 2 - Interstate 676,eastbound,,right shoulder,,Camden,"Camden, NJ"
4,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:27:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 14 - Intersta...",,,Southbound,,all lanes,"between North of, and South of",Newark,"Newark, NJ"


In [6]:
# test
geocoder.arcgis(locs_df['location_lookup'][0]).latlng

[39.88187000000005, -75.05929999999995]

Acquiring coordinates - ***this will take about 30 minutes, given the amount of locations to iterate through***

In [7]:
locs_df['coordinates'] = [geocoder.arcgis(locs_df['location_lookup'][i]).latlng for i in list(range(locs_df.shape[0]))]

In [8]:
locs_df.head()

Unnamed: 0,username,tweet,date_posted,expanded_tweet,highway,streets,exits,directions,locs,lanes,relocs,town,location_lookup,coordinates
0,511nji295,Crash on I-295 southbound South of Exit 29 - U...,2019-11-06 23:56:56+00:00,Crash on Interstate 295 southbound South of Ex...,Interstate 295,,Exit 29 - US-30,southbound,,right lane,South of,Haddon Heights,"Haddon Heights, NJ","[39.88187000000005, -75.05929999999995]"
1,511njace,"Construction, bridge painting on Atlantic City...",2019-11-06 23:52:57+00:00,"Construction, bridge painting on Atlantic City...",Atlantic City Expressway,,"Exit 12 - US-40 - Wrangleboro Road, Exit 12 - ...",westbound,,right lane,"between East of, West of","Hamilton Township, Hamilton Township","Hamilton Township, Hamilton Township, NJ","[40.230360000000076, -74.72385999999995]"
2,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:41:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 6 - Pennsylva...",,,northbound,,,from,,"New Jersey Turnpike, Interchange 6 - Pennsylva...","[41.02283671772758, -78.44427748447676]"
3,511nji76,Crash on I-76 eastbound at Exit 2 - I-676 (Cam...,2019-11-06 23:41:56+00:00,Crash on Interstate 76 eastbound at Exit 2 - I...,Interstate 76,,Exit 2 - Interstate 676,eastbound,,right shoulder,,Camden,"Camden, NJ","[39.945250000000044, -75.11912999999998]"
4,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:27:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 14 - Intersta...",,,Southbound,,all lanes,"between North of, and South of",Newark,"Newark, NJ","[40.73197000000005, -74.17420999999996]"


---
### Organize and Save
---

We will organize the results of the geocoder into a usable dataframe and save it down as a csv for use by other notebooks.

Splitting latitude and longitude into seperate Series in the DataFrame

In [9]:
# Create latitude column
locs_df['lat'] = [coords[0] for coords in locs_df['coordinates']]

# Create longitude column
locs_df['lon'] = [coords[1] for coords in locs_df['coordinates']]

# Drop coordinates column
locs_df.drop(columns = "coordinates", inplace = True)

In [10]:
# Take a look at results from above seperation
locs_df.head()

Unnamed: 0,username,tweet,date_posted,expanded_tweet,highway,streets,exits,directions,locs,lanes,relocs,town,location_lookup,lat,lon
0,511nji295,Crash on I-295 southbound South of Exit 29 - U...,2019-11-06 23:56:56+00:00,Crash on Interstate 295 southbound South of Ex...,Interstate 295,,Exit 29 - US-30,southbound,,right lane,South of,Haddon Heights,"Haddon Heights, NJ",39.88187,-75.0593
1,511njace,"Construction, bridge painting on Atlantic City...",2019-11-06 23:52:57+00:00,"Construction, bridge painting on Atlantic City...",Atlantic City Expressway,,"Exit 12 - US-40 - Wrangleboro Road, Exit 12 - ...",westbound,,right lane,"between East of, West of","Hamilton Township, Hamilton Township","Hamilton Township, Hamilton Township, NJ",40.23036,-74.72386
2,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:41:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 6 - Pennsylva...",,,northbound,,,from,,"New Jersey Turnpike, Interchange 6 - Pennsylva...",41.022837,-78.444277
3,511nji76,Crash on I-76 eastbound at Exit 2 - I-676 (Cam...,2019-11-06 23:41:56+00:00,Crash on Interstate 76 eastbound at Exit 2 - I...,Interstate 76,,Exit 2 - Interstate 676,eastbound,,right shoulder,,Camden,"Camden, NJ",39.94525,-75.11913
4,511njtpk,Roadwork on New Jersey Turnpike inner roadway ...,2019-11-06 23:27:56+00:00,Roadwork on New Jersey Turnpike inner roadway ...,"New Jersey Turnpike, Interchange 14 - Intersta...",,,Southbound,,all lanes,"between North of, and South of",Newark,"Newark, NJ",40.73197,-74.17421


Saving the resulting DataFrame as a csv file for the next notebook to use

In [11]:
locs_df.to_csv('../datasets/geodata.csv')

---