# Assessing Seattle Mobility With Google Maps Distance Matrix API

**Daniel White**

**DATA 512 Final Project**

**December 9, 2018**

## Introduction

Transportation and mobility is a vital issue for all city residents, regardless of socioeconomic status. Transportation planners are faced with the difficult task of designing an fair, equitable system that serves all citizens,m regardless of socioeconomic status. The reality is that not every system is perfect. Gaps in the transit network can develop due to various political and economic factors. These gaps can have a huge impact on property values, crime rate, and economic development in the city. These issues can have far reaching consequences in terms of fostering inequality between neighborhoods.

For my project, I investigated the makeup of the Seattle transportation network using the Google Maps Distance Matrix API. This API works very similarly to the directions feature on the Google Maps platform. Users input an origin and destination and receive a travel time estimate for various modes of transportation. In order to assess mobility, I will compare the travel times via driving and via public transit trip times for trips between origins and destinations around the Seattle metropolitian region. Areas in which travel times via public transit are significantly longer will be deemed less accessible. This is a very human-centered approach to assessing mobility because mode choice often begins with an inquiry to Google Maps. People make their choices based on the travel times and convenience of each mode. Using the Google Maps Distance Matrix API will hopefully be directly reflective this choice that people make when choosing their mode of travel.

## Background / Related Work

The idea of using the Google Maps Distance Matrix API was inspired by a Data Science for Social Good project at the University of Washington in Summer 2018. The project created a mobility index and also generated predictions of mode choice on a Census tract level within neighborhoods. Further detail on this project is available here (Seattle Mobility Index Project): https://escience.washington.edu/2018-data-science-for-social-good-projects/

My project will use a similar approach, but with a reduced scope due to project constraints. With my analysis, I hope to assess effectiveness of this approach and create a high level glimpse into mobility between neighborhoods in Seattle.

### Research Questions

** 1\. Can Google Maps Distance Matrix API be used to effectively assess mobility? **

I intend to assess the advantages and disadvantages to using this approach and potential bias that may exist.

**2\. Which neighborhoods in Seattle are underserved by the public transit network?**

Using the results, I will assess which areas may be underserved by public transit in the Seattle region.

** 3\. Which neighborhoods in Seattle suffer from the worst traffic congestion on the morning commute?**

Using travel times from peak and non-peak hours, I will assess which neighborhoods suffer from the worst traffic congestion on the morning commute.


## Methods

This section details the data sources used, data collection, and data processing to conduct the analysis.

### Data Sources

#### Google Maps API

The Google Maps API provides time and distance data between specified origins and destinations entered by the user. Users can also alter various parameters including mode of travel, time of day, etc. to customize their results. The documentation of the Google Maps Distance Matrix API is available here:

https://developers.google.com/maps/documentation/distance-matrix/start

Users can request an API key in order to make calls and receive responses from the API. Usage of the API is subject to the terms of service specified by Google. With use of the API, users Customer a non-exclusive, non-transferable, non-sublicensable, license to use the Services in Customer Application(s), which may be: (a) fee-based or non-fee-based; (b) public/external or private/internal; (c) business-to-business or business-to-consumer; or (d) asset tracking. The complete terms of service are laid in additional detail here:

https://cloud.google.com/maps-platform/terms/

The key clause outlining restrictions to use of the API is provided below:

*3.2.2 General Restrictions*

*Unless Google specifically agrees in writing, Customer will not: (a) copy, modify, create a derivative work of, reverse engineer, decompile, translate, disassemble, or otherwise attempt to extract any or all of the source code (except to the extent such restriction is expressly prohibited by applicable law); (b) sublicense, transfer, or distribute any of the Services; (c) sell, resell, or otherwise make the Services available as a commercial offering to a third party; or (d) access or use the Services: (i) for High Risk Activities; (ii) in a manner intended to avoid incurring Fees; (iii) for activities that are subject to the International Traffic in Arms Regulations (ITAR) maintained by the United States Department of State; (iv) on behalf of or for the benefit of any entity or person who is legally prohibited from using the Services; or (v) to transmit, store, or process Protected Health Information (as defined in and subject to HIPAA)"*

#### Transit Communities - Seattle Open Data

To avoid any self-inflicted bias in the analysis, I used a pre-determined list of neighborhoods in the Seattle area. Using this list also makes the analysis easier to replicate. The list of neighborhoods comes from the "Transit Communities" dataset on the Seattle Open Data portal. This dataset is considered public domain, without requirements for registration or license. The dataset is available at the link below:

https://data.seattle.gov/Transportation/Transit-communities/ndi9-2pye/data 

### Data Collection
This section details steps to collect the data using the Google Maps API. Note that the API is subject to usage limits. Rerunning this code will only collect a portion of the data until the daily limit of 2,500 is reached. For my analysis, I collected data over the course of multiple days. The JSON files from this analysis are available in the /json folder in the Github repository. 

Alternatively, I recommend that users upload the CSV file with the fully processed dataset to reproduce the findings.

#### Notebook Setup

In [1]:
#Change notebook settings
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Import necessary libaries
import pandas as pd
import numpy as np
import requests
from datetime import datetime, timedelta
import json
import folium

#### Transit Community Data Processing

The code block below imports the Transit communities dataset and performs basic data cleaning operations. A sample of the dataset is provided as an output of the code block. 

The changes to neighborhood names are designed to produce the most accurate response possible from Google Maps API. The string " Seattle, WA" is added to improve location precision. More specific neighborhood name changes were implemented after trial and error with responses from the API.

In [2]:
#Import raw transit community data with neighborhood names
raw_transit_communities = pd.read_csv('data/Transit_communities.csv')
raw_transit_communities.head()

#Clean the neighborhood name data
hood_list = list(raw_transit_communities['Name'])

#Clean neighborhood names to be more clear
idx = hood_list.index('Campus Parkway - combined with University District?')
hood_list[idx] = 'Campus Parkway'
idx = hood_list.index('12th Ave? (First Hill)')
hood_list[idx] = 'First Hill'

#Clean neighborhood for improved accuracy based off API responses
idx = hood_list.index('23rd & Jackson')
hood_list[idx] = '23rd Ave S and S Jackson St'
idx = hood_list.index('Denny')
hood_list[idx] = 'Denny Park'

#Add Seattle, WA for better location accuracy
city_string = " Seattle, WA"
hood_list = [x + city_string for x in hood_list]

Unnamed: 0,Name,Typology,Current Conditions,Future Needs & Priorities,District,Neighborhood Plan,Status Check,City Planning,Major Transit project?
0,University District,Mixed Use Center future,Has a core employment population,,Northeast,http://www.seattle.gov/neighborhoods/npi/plans...,http://www.seattle.gov/planningcommission/docs...,,http://projects.soundtransit.org/Projects-Home...
1,Ballard,Mixed use Center local,"High population, S edge is industrial",Ballard is a “new urban center” Combine distri...,Ballard,http://www.seattle.gov/neighborhoods/npi/plans...,http://www.seattle.gov/planningcommission/docs...,http://www.seattle.gov/dpd/Planning/BallardURV...,http://www.kingcounty.gov/transportation/kcdot...
2,Fremont,Mixed Use Center local,24 hr neighborhood,,Lake Union,http://www.seattle.gov/neighborhoods/npi/plans...,http://www.seattle.gov/planningcommission/docs...,http://www.seattle.gov/dpd/Planning/FremontUrb...,
3,Campus Parkway - combined with University Dist...,Mixed Use Center regional,Major gateway to UW with many admin buildings ...,Focus on this district being a transit node wi...,Northeast,,,,
4,Northgate,Mixed Use Center regional,Regional draw; medical and retail employment c...,,North,http://www.seattle.gov/neighborhoods/npi/plans...,,http://www.seattle.gov/DPD/Planning/Northgate_...,http://projects.soundtransit.org/Projects-Home...


#### Google Maps Geocoding API Data Collection

In this code block, I ran the list of neighborhoods through the Google Maps geocoding API. This is a basic quality check to make sure none of the neighborhood locations are egregiously wrong. Users can request an API and get more detailed information on documentation at the link below:

https://developers.google.com/maps/documentation/geocoding/start

Alternatively, the code block below the API calls allows user to upload the geocoding results from the CSV on Github.

In [21]:
#Define get_geodata as API call function


geo_key = ''
#Users enter your API key above

def get_geodata(locations, key = geo_key):
    url = ("https://maps.googleapis.com/maps/api/geocode/json?address=" + locations      
           + "&key=") + key
    response = requests.get(url).json()
    return response

In [18]:
#Run neighborhood list through the geocoding API, save name input, returned API name, latitude and longitude.
geo_names = []
geo_lat = []
geo_long = []
for hood in hood_list:
    geo_json = get_geodata(hood)
    geo_names.append(geo_json['results'][0]['address_components'][0]['long_name'])
    geo_lat.append(geo_json['results'][0]['geometry']['location']['lat'])
    geo_long.append(geo_json['results'][0]['geometry']['location']['lng'])
geo_df = pd.DataFrame(data = {'name': geo_names,
                              'name_input' : hood_list,
                             'lat' : geo_lat,
                             'long' : geo_long})
geo_df.head()

#Save geolocations as CSV file
geo_df.to_csv('data/hood_geocodes.csv', index = False)

Unnamed: 0,lat,long,name,name_input
0,47.662777,-122.313877,University District,"University District Seattle, WA"
1,47.679217,-122.386031,Ballard,"Ballard Seattle, WA"
2,47.654177,-122.35,Fremont,"Fremont Seattle, WA"
3,47.656011,-122.315588,Northeast Campus Parkway,"Campus Parkway Seattle, WA"
4,47.70859,-122.323235,Northgate,"Northgate Seattle, WA"


In [3]:
#Upload geocoding from CSV file
geo_df = pd.read_csv('data/hood_geocodes.csv')

In the step below, the geocoding results are mapped using the folium package. It appears all the neighborhoods entered were located in the approximately correct region.

In [4]:
locations = geo_df[['lat', 'long']]
locationlist = locations.values.tolist()
f_map = folium.Map(location=[47.6479021,-122.3549198], zoom_start=11)
folium.TileLayer('Mapbox Bright').add_to(f_map)
for point in range(0, len(locationlist)):
    folium.Marker(locationlist[point], popup=geo_df['name_input'][point]).add_to(f_map)
f_map

<folium.map.TileLayer at 0x122139cf8>

<folium.map.Marker at 0x12218d710>

<folium.map.Marker at 0x12218d978>

<folium.map.Marker at 0x122139c18>

<folium.map.Marker at 0x12218d630>

<folium.map.Marker at 0x1221abc88>

<folium.map.Marker at 0x1221a5710>

<folium.map.Marker at 0x12218dc18>

<folium.map.Marker at 0x1221a5cc0>

<folium.map.Marker at 0x122139dd8>

<folium.map.Marker at 0x1221a58d0>

<folium.map.Marker at 0x1221a5d68>

<folium.map.Marker at 0x1221df278>

<folium.map.Marker at 0x1221dfa90>

<folium.map.Marker at 0x1221b4550>

<folium.map.Marker at 0x1221cdb00>

<folium.map.Marker at 0x1221cd1d0>

<folium.map.Marker at 0x1221b49b0>

<folium.map.Marker at 0x1221df5f8>

<folium.map.Marker at 0x1221cd748>

<folium.map.Marker at 0x122205a90>

<folium.map.Marker at 0x1221cdeb8>

<folium.map.Marker at 0x1221b4c18>

<folium.map.Marker at 0x1221f66a0>

<folium.map.Marker at 0x122214160>

<folium.map.Marker at 0x1221f6780>

<folium.map.Marker at 0x122205588>

<folium.map.Marker at 0x12222e160>

<folium.map.Marker at 0x12222ef28>

<folium.map.Marker at 0x122214240>

<folium.map.Marker at 0x1222141d0>

<folium.map.Marker at 0x12222e518>

<folium.map.Marker at 0x122231208>

<folium.map.Marker at 0x12222e390>

<folium.map.Marker at 0x12222ea20>

<folium.map.Marker at 0x1222430f0>

<folium.map.Marker at 0x122231ba8>

<folium.map.Marker at 0x1222569b0>

<folium.map.Marker at 0x122256a90>

<folium.map.Marker at 0x122243710>

<folium.map.Marker at 0x1222316a0>

<folium.map.Marker at 0x122243fd0>

<folium.map.Marker at 0x12226eb70>

#### Google Maps Distance Matrix Data

The code blocks below collect the data from the Google Maps Distance Matrix API. 

The API can only take in 10 origins and destinations per call. Thus, the list of neighborhoods is split into 5 separate parts and saved as a dictionary.

In [67]:
#Split into lists of size 10 for inputting into API
list_size = 10
list_num = len(hood_list)//list_size + 1
hood_dict = {}
for i in range(0,list_num):
    hood_dict['hood{0}'.format(i)] = hood_list[min(list_size*(i),list_size*(i+1)):min(list_size*(i+1),len(hood_list))]

The function get_data calls the Distance Matrix API and returns the json response. The arguments required for the function include the list of origins, destinations, mode, and departure time.

In [24]:
dm_key = 'AIzaSyBxMcOQ1UUaPv-9fDBu7mbFJ_jnsZBuNew'
#Users enter their Distance Matrix API key above

def get_data(origins, destinations, mode, departure_time, key = dm_key ):
    url = ("https://maps.googleapis.com/maps/api/distancematrix/json?origins=" + origins
       + "&destinations=" + destinations
       + "&mode=" + mode
        + "&departure_time=" + departure_time
       + "&language=en-EN" +
           "&key=") + key
    response = requests.get(url).json()
    return response

The code block below calls the Distance Matrix API to collect data with driving set as the mode. The output files are saved as JSON files.

In [33]:
#Set the departure time, enter a date by filling in the variables below with integers.
#Note that results will differ depending on the day of the week
year = 2018
month = 12
day = 10

#Departure time saved here, set for 8:30am
departure_time = str(int(datetime(year, month, day,8,30,0).timestamp()))

json_dict = {}
for i in range(list_num):
    for j in range(list_num):
        list1 = hood_dict['hood{0}'.format(i)]
        list2 = hood_dict['hood{0}'.format(j)]
        
        #Reformat list of neighborhoods into string separated by '|'
        for m in range(0,len(list1)-1):
            list1[m] = list1[m] + '|'
        list1_str = ''.join(list1)
        for n in range(0,len(list2)-1):
            list2[n] = list2[n] + '|'
        list2_str = ''.join(list2)
        
        #Call Distance Matrix API
        response = get_data(list1_str, list2_str, mode = 'driving', departure_time = departure_time)
        json_dict['json_drive{0}{1}'.format(i,j)] = response
        with open('json/json_drive_peak{0}{1}.json'.format(i,j), 'w') as outfile:
            json.dump(response, outfile)

The drive_parser function below parses the API responses from the driving API responses and saves them as a dataframe for analysis.

In [77]:
def drive_parser(json):
    """This function takes a json response file from the Google Maps API and parses
    the data, returning a dataframe of the drive time and drive distances"""
    response_origins = json['origin_addresses']
    response_destinations = json['destination_addresses']
    o_size = len(response_origins)
    d_size = len(response_destinations)
    origins = []
    destinations = []
    #Remove Seattle, WA from location by locating first comma and concatenating string
    for i in response_origins:
        comma = i.find(',')
        new_i = i[:comma]
        origins.append(new_i)
    for i in response_destinations:
        comma = i.find(',')
        new_i = i[:comma]
        destinations.append(new_i)
    #Transform origins into repetitions of 10, repeat sequence 10 times for destinations
    origins_list = list(np.repeat(origins,d_size))
    destinations_list = destinations*o_size
    drive_dur = []
    drive_dist = []
    drive_dur_traf = []
    for i in list(range(o_size)):
        for j in list(range(d_size)):
            drive_dur.append(json['rows'][i]['elements'][j]['duration']['value'])
            drive_dist.append(json['rows'][i]['elements'][j]['distance']['value'])
            drive_dur_traf.append(json['rows'][i]['elements'][j]['duration_in_traffic']['value'])
    distance_matrix = pd.DataFrame(data = {'orig':origins_list, 
                                           'dest': destinations_list, 
                                           'd_time':drive_dur,
                                           'd_dist' : drive_dist,
                                           'd_dur_traf' : drive_dur_traf})
    return distance_matrix

In [None]:
drive_df_all = pd.concat([drive_parser(json_dict[key]) for key in json_dict.keys()],ignore_index=True)

The code block below calls the Distance Matrix API to collect data with transit set as the mode. The output files are saved as JSON files.

In [22]:
#Set the departure time, enter a date by filling in the variables below with integers.
#Note that results will differ depending on the day of the week
year = 2018
month = 12
day = 10

departure_time = str(int(datetime(year, month, day,8,30,0).timestamp()))
#collect transit data, run with the departure time above for 8:30 peak
json_dict = {}
for i in range(list_num):
    for j in range(list_num):
        list1 = hood_dict['hood{0}'.format(i)]
        list2 = hood_dict['hood{0}'.format(j)]
        #format list of origins into string
        for m in range(0,len(list1)-1):
            list1[m] = list1[m] + '|'
        list1_str = ''.join(list1)
        for n in range(0,len(list2)-1):
            list2[n] = list2[n] + '|'
        list2_str = ''.join(list2)
        response = get_data(list1_str, list2_str, mode = 'transit', departure_time = departure_time)
        json_dict['json_transit{0}{1}'.format(i,j)] = response
        with open('json_transit{0}{1}.json'.format(i,j), 'w') as outfile:
            json.dump(response, outfile)

The transit_parser function below parses the API responses from the transit API responses and saves them as a dataframe for analysis.

In [None]:
def transit_parser(json):
    """This function takes a json response file from the Google Maps API and parses
    the data, returning a dataframe of the transit time and transit distances"""
    response_origins = json['origin_addresses']
    response_destinations = json['destination_addresses']
    o_size = len(response_origins)
    d_size = len(response_destinations)
    origins = []
    destinations = []
    #Remove Seattle, WA from location by locating first comma and concatenating string
    for i in response_origins:
        comma = i.find(',')
        new_i = i[:comma]
        origins.append(new_i)
    for i in response_destinations:
        comma = i.find(',')
        new_i = i[:comma]
        destinations.append(new_i)
    #Transform origins into repetitions of 10, repeat sequence 10 times for destinations
    origins_list = list(np.repeat(origins,d_size))
    destinations_list = destinations*o_size
    transit_dur = []
    transit_dist = []
    transit_fare = []
    for i in range(o_size):
        for j in range(d_size):
            if json['rows'][i]['elements'][j]['status'] == 'OK':
                transit_dur.append(json['rows'][i]['elements'][j]['duration']['value'])
                transit_dist.append(json['rows'][i]['elements'][j]['distance']['value'])
                try:
                    transit_fare.append(json['rows'][i]['elements'][j]['fare']['value'])
                except(KeyError):
                    transit_fare.append(0)
            else:
                transit_dur.append(np.nan)
                transit_dist.append(np.nan)
                transit_fare.append(np.nan)
    distance_matrix = pd.DataFrame(data = {'orig':origins_list, 
                                           'dest': destinations_list, 
                                           't_time':transit_dur,
                                           't_dist' : transit_dist,
                                           't_fare': transit_fare})
    return distance_matrix

Next, the JSON files from the transit API calls are parsed and combined into one dataframe.

In [89]:
transit_df_all = pd.concat([transit_parser(transit_dict[key]) for key in transit_dict.keys()],ignore_index=True)

The API returns the names of the Google Maps result. The input names are readded to the dataframe using the for loops below.

In [83]:
#Remake neighborhood list without city strings
hood_list = list(raw_transit_communities['Name'])
idx = hood_list.index('Campus Parkway - combined with University District?')
hood_list[idx] = 'Campus Parkway'
idx = hood_list.index('12th Ave? (First Hill)')
hood_list[idx] = 'First Hill'

list_size = 10
list_num = len(hood_list)//list_size + 1
hood_dict = {}
for i in range(0,list_num):
    hood_dict['hood{0}'.format(i)] = hood_list[min(list_size*(i),list_size*(i+1)):min(list_size*(i+1),len(hood_list))]

#Create list of input names into API to add to dataframe
origins = []
destinations = []
for o_list in hood_dict.keys():
    for d_list in hood_dict.keys():
        for o in hood_dict[o_list]:
            for d in hood_dict[d_list]:
                origins.append(o)
                destinations.append(d)

Finally, the drive and transit dataframes are combined and saved as the Seattle Distance Matrix csv file.

In [89]:
df = pd.merge(drive_df, transit_df, how = 'inner', on = ['orig', 'dest'])
df['orig_name'] = origins
df['dest_name'] = destinations

#Reorder columns in dataset
df = df[['orig_name', 'dest_name', 'd_time', 
         'd_dur_traf', 'd_dist', 't_time', 
         't_dist', 't_fare']]

df = df[df['orig_name'] != '23rd & Jackson']
df = df[df['dest_name'] != '23rd & Jackson']
df = df[df['d_time'] != 0]

df.to_csv('Seattle_Distance_Matrix.csv', index = False)

### Findings

Alternatively, users can upload the Seattle Distance Matrix file for their own analysis. The code block below uploads the CSV file and creates the key metrics used to assess mobility. These metrics include:

* Percent Difference between Transit and Driving Times 

*(Transit Time - Drive Time) / Drive Time*

* Percent Difference between Driving Time in Peak vs Non-Peak 

*(Peak Time - Non-Peak Time) / Non-Peak Time*

Summary statistics for each variable are shown so users can compare their own API calls with the data collected from this analysis. 

In the limited scope of this analysis, I assessed travel times of trips originating in neighborhoods during the morning commute hours. I chose to assess origins to be more indicative of access during the time of day when most are commuting to their places of employment. Neighborhoods with longer travel times via public transit and higher traffic congestion have lower mobility scores and could be identified as higher priority areas in terms of transit expansion.

The findings of the analysis are summarized and discussed below. For each metric, tables of the top 10 and bottom 10 tables are presented. Additional interactive map visualizations are provided in a Tableau Public workbook available here:

https://public.tableau.com/profile/daniel.white8128#!/vizhome/Seattle_Mobility_Map_Visualization_09Dec/RegionMap


In [98]:
#Read in Seattle Distance Matrix
df = pd.read_csv('Seattle_Distance_Matrix.csv')

#Create variables for analysis

#Traffic in peak vs. non-peak
df['traffic_diff'] = df['d_dur_traf'] - df['d_time']
df['traffic_diff_percent'] = df['traffic_diff'] / df['d_time']

#Time difference between transit and driving
df['time_diff'] = (df['t_time'] - df['d_time'])
df['time_diff_percent'] = df['time_diff'] / df['d_time']

df.describe()

Unnamed: 0,d_time,d_dur_traf,d_dist,t_time,t_dist,t_fare,traffic_diff,traffic_diff_percent,time_diff,time_diff_percent
count,1640.0,1640.0,1640.0,1640.0,1640.0,1640.0,1640.0,1640.0,1640.0,1640.0
mean,980.787195,1181.662805,10976.111585,2544.093902,11527.753049,3.355335,200.87561,0.185336,1563.306707,1.591514
std,361.504035,510.311551,6392.536966,1112.171506,6397.724499,1.351872,220.049427,0.190762,832.783376,0.698797
min,58.0,58.0,140.0,113.0,140.0,0.0,-65.0,-0.122024,-164.0,-0.316703
25%,725.0,795.5,6056.75,1708.5,6300.5,2.75,32.75,0.042476,926.5,1.139933
50%,968.0,1124.5,10148.5,2490.0,11044.0,2.75,105.0,0.115562,1505.5,1.536654
75%,1224.0,1522.25,15150.75,3348.0,15771.0,2.75,314.25,0.289416,2114.0,1.968207
max,2049.0,2841.0,32535.0,7102.0,32235.0,8.25,1015.0,1.053801,5336.0,4.248175


#### Transit vs. Driving Time

The code block below shows the top 10 and bottom 10 neighborhoods by difference between transit and driving time. Neighborhoods with a lower value for percent difference are highly accessible, as the difference in travel times between public transit and driving are lower. A value of 1.0 indicates that traveling via public transit takes about twice as long as driving when leaving from that neighborhood. A map visualization of this data is available here:

https://public.tableau.com/profile/daniel.white8128#!/vizhome/Seattle_Mobility_Map_Visualization_09Dec/Drivingvs_Transit

As the tables and map visualizations show, neighborhoods with the lowest mobility are highly concentrated in North Seattle. Public transit options are relatively limited further in North Seattle. However, this area has already has a planned Link expansion opening in 2021 and 2024. This should be vital in better connecting these neighborhoods to the rest of the Seattle metropolitan region.

The impact of Link access is readily apparent in South Seattle. Othello and Columbia City were amongst the top neighborhoods in public transit access. However, this is likely due to some bias in the neighborhood selection. These neighborhoods were located at their respective Link train station, which likely underestimates the true public transit travel time for the whole neighborhood. Leaving directly from the train station will result in faster travel times via public transit.

Despite geographic constraints, neighborhoods in West Seattle were roughly average in terms of their transit access. However, Admiral was one of the worst performing neighborhoods in terms of transit access, which could identify a need for additional bus lines, etc. to better connect this neighborhood.

Most neighborhoods located downtown scored very favorably in terms of transit access, with the exception of South Lake Union. When taking public transit from South Lake Union, travel times took approximately 3 times longer than driving. This is likely due to South Lake Union location between major highway arteries I-99 and I-5, which lowers the travel time via driving to other locales. In addition, most public transit trips from South Lake Union require taking multiple bus trips, often first to downtown, then another bus towards the desired destination. There is also a great deal of construction in this area which could impede public transit access.

In [95]:
orig_time = pd.DataFrame(df.groupby('orig_name', as_index = False)['time_diff_percent'].mean())
orig_time.sort_values(by='time_diff_percent').head(10)
orig_time.sort_values(by='time_diff_percent', ascending=False).head(10)

Unnamed: 0,orig_name,time_diff_percent
3,Belltown,0.918217
9,Columbia City Station,0.990443
8,Colman Dock,1.071515
28,Othello Station,1.112243
22,Mt. Baker Station,1.126777
17,King Street Station,1.148791
23,North Beacon Hill,1.176454
2,Ballard,1.232921
34,Stadium,1.255004
5,Campus Parkway,1.317407


Unnamed: 0,orig_name,time_diff_percent
4,Broadview,2.655346
1,Admiral,2.4116
33,South Lake Union,2.186488
31,Roosevelt,2.100943
30,Rainier Beach,2.005881
14,Greenwood,1.974648
24,North Green Lake,1.948391
26,Northgate,1.927443
25,North Greenwood,1.912899
27,Oaktree,1.850523


#### Driving Time, Peak vs. Non-Peak
The code block below shows the top 10 and bottom 10 neighborhoods by difference between drive time during the peak and non-peak. Neighborhoods with a lower value for percent difference have less problems with traffic congestion during the morning commute. A map visualization of the peak vs. non-peak drive times is available here:

https://public.tableau.com/profile/daniel.white8128#!/vizhome/Seattle_Mobility_Map_Visualization_09Dec/Peakvs_Non-Peak

The neighborhoods with the most traffic congestion delays during the morning commute were located in West Seattle and South Seattle. For neighborhoods in West Seattle, this is likely due to geographic and infrastructure constraints, as there is only one bridge to access central Seattle, where a significant portion of the destination neighborhoods are located. Also, there could be some bias in that the neighborhoods that had the worst traffic congestion also have to travel the most distance to other neighborhoods. Traffic delays can be exponential the longer you are driving, so the percent difference metric may not fully captured this. This could be addressed by controlling for distance in future studies.

However, this issue did not impact neighborhoods located in North Seattle, which had roughly average traffic congestion across all neighborhoods. This is likely because the I-5 express lanes are located in the southbound direction during the morning commute. It was very interesting to see this tradeoff between North and South Seattle. North Seattle has lower transit access, but benefits from express lanes during peak commuting hours. While South Seattle benefits from Link access, but has less accomodation in terms of express lanes. 

Overall, the neighborhoods with the least traffic congestion were located centrally in the downtown area. This makes sense, as there is less traffic when leaving downtown during the morning commute. Therefore, one shouldn't draw any significant conclusions from this finding.

In [97]:
orig_traf_time = pd.DataFrame(df.groupby('orig_name', as_index = False)['traffic_diff_percent'].mean())
orig_traf_time.sort_values(by='traffic_diff_percent').head(10)
orig_traf_time[(orig_traf_time['traffic_diff_percent'] != np.inf)].sort_values(by='traffic_diff_percent', ascending = False).head(10)

Unnamed: 0,orig_name,traffic_diff_percent
29,Pike/Pine,0.017295
8,Colman Dock,0.035994
3,Belltown,0.036758
20,Madison,0.057096
6,Capitol Hill,0.062909
34,Stadium,0.065452
12,First Hill,0.066195
0,15th Ave,0.073265
40,Yesler Terrace,0.082032
33,South Lake Union,0.088574


Unnamed: 0,orig_name,traffic_diff_percent
1,Admiral,0.552148
37,West Seattle Junction/Triangle,0.505638
28,Othello Station,0.446349
30,Rainier Beach,0.424069
21,Morgan Junction,0.411806
32,Roxbury,0.369485
9,Columbia City Station,0.268712
26,Northgate,0.24654
11,Denny,0.237333
31,Roosevelt,0.234394


An additional visualization on how the peak vs. non-peak and transit vs. driving metrics interact is available here:

https://public.tableau.com/profile/daniel.white8128#!/vizhome/Seattle_Mobility_Map_Visualization_09Dec/InteractionBetweenMetrics

A map of how regions were assigned is available here:

https://public.tableau.com/profile/daniel.white8128#!/vizhome/Seattle_Mobility_Map_Visualization_09Dec/RegionMap

Overall, neighborhoods in Seattle performed the worst in terms of these two metrics, highlighting a need for better access, in spite of the geographic constraints.

### Discussion / Implications

Overall, the Distance Matrix API provided some interesting insights into the transit network. The challenges of connecting West Seattle are evident given the geographic constraints. The tradeoff between transit access and express lanes was interesting to see between North and South Seattle. However, there are significant limitations that should be discussed.

**Location Accuracy**

The location accuracy of the Google Maps Distance Matrix API was very inconsistent. Some neighborhoods were excluded from my analysis (23rd and Jackson) because the API located it in Kansas City, despite the 'Seattle, WA' designation. Additionally, the Denny neighborhood was located as an actual 'Denny's' diner in Industrial District instead of the Denny neighborhood near downtown. These are only two of the issues I was able to diagnose, as checking the accuracy of results of the API is very difficult to do without time-consuming trial and error. If conducting this analysis again, I would enter the neighborhood locations as geocodes in order to ensure precision.

Also, the actual location of each neighborhood is somewhat arbitrary. The Distance Matrix calculates travel times from a single location point. It is impossible to determine what the most representative point for a neighborhood as a whole is. For large neighborhoods, a single point may not be an appropriate means of assessing the access of the entire population.

**Neighborhood Selection**

The transit communities dataset was selected in the hopes of avoiding bias. However, bias is still prevalent in the neighborhoods included which greatly impact the results. For instance, many of the neighborhoods in South Seattle are located at Link stations. This likely overstated the access of these areas since public transit trips started from train stations. This was not the case for other neighborhoods included that also have Link stations, but were not located precisely at the train station.

Also, the neighborhood sample may not have even representation from all regions. If more neighborhoods from North Seattle are included, then it greatly impacts the distance that other neighborhoods have to travel by comparison. It is difficult to assess the extent of this bias in this analysis, but I will acknowledge that it may exist.

**Future Research**

Future research could build on this study by correcting some of the location accuracy issues and reducing bias in the neighborhood selection. I would recommend using geocodes to determine exact neighborhood locations and develop a systematic way of identifying activity centers of interest that may be most most representative of the mobility around Seattle. Developing a representative sample of neighborhoods could be a challenge in itself. I was limited by the amount of destinations I could input into the API on the free tier. It would be interesting to see how results changed with a larger dataset. This could also help reduce bias in developing a representative sample.

I only assessed neighborhoods in terms of trip origin. Trip destinations could also be assessed with this dataset. Additionally, users could enter their own API calls varying the time of day to see how that impacts mobility. I would expect to see very different results for traffic congestion during evening hours compared to the morning commute.

It would also be interesting to see how this approach could be extrapolated to different cities. Some of the same issues with bias would arise, but this would be valuable in assessing whether the Google Maps Distance Matrix API can be used to assess mobility.

### Conclusion

The answers to my initial research questions are summarized below.

** 1\. Can Google Maps Distance Matrix API be used to effectively assess mobility? **

Overall, the Google Maps Distance Matrix API provided a decent overall picture of the Seattle transit network. There are issues with location accuracy, but the API provided flexible parameters that allows users to assess drive times, public transit times, and even biking times if they were so inclined. This could be useful tool for researchers in conducting their own analyses. It can give a good high level overview of mobility and is much cheaper and less analytically intensive than traffic demand models generated by transit officials.

**2\. Which neighborhoods in Seattle are underserved by the public transit network?**

According to the results, South Lake Union stood out as a neighborhood that was underserved compared to its neighboring counterparts. Most neighborhoods located in North Seattle also appear to be underserved, but should benefit from Link access as soon as 2021. The neighborhood of Admiral in West Seattle had particularly bad transit access. 

** 3\. Which neighborhoods in Seattle suffer from the worst traffic congestion on the morning commute?**

During the morning commute hours, West Seattle and South Seattle suffered from the worst traffic congestion delays. For West Seattle, this is likely due to geographic constraints and limited access roads to the Central Seattle. Neighborhoods in North Seattle had lower traffic congestion delays than South Seattle, which is likely due to the express lanes on I-5 going southbound during the morning commute.

### References
1. Tableau Public - Seattle Mobility (https://public.tableau.com/profile/daniel.white8128#!/vizhome/Seattle_Mobility_Map_Visualization_09Dec/RegionMap)
1. Seattle Mobility Index Project (https://escience.washington.edu/2018-data-science-for-social-good-projects/)
2. Seattle Mobility Index Project Presentation ((http://escience.washington.edu/wp-content/uploads/2018/09/Seattle-Mobility-Index-Project-final-presentation.pdf)
2. Google Maps Distance Matrix API (https://developers.google.com/maps/documentation/distance-matrix/start)
3. Google Maps Geocoding API (https://developers.google.com/maps/documentation/geocoding/start)
4. Google Maps Term of Service Agreement (https://cloud.google.com/maps-platform/terms/)
4. Transit Communities - Seattle Open Data (https://data.seattle.gov/Transportation/Transit-communities/ndi9-2pye/data)