# CP255: Using Pandana and UrbanAccess to Measure Multi-Modal Accessibility with Pedestrian and Transit Networks

Created by Sam Blanchard (blanchard@urbansim.com) (11/6/18)

This notebook provides a brief overview of the basic functionality of Pandana and UrbanAccess.

Topics that will be covered are the structure of Pandana graph networks, the acquisition of General Transit Feed Specification (GTFS) and OpenStreetMap (OSM) street network data, and network analysis using nearest feature and cumulative accessibility queries on pedestrian and transit Pandana networks.

** Pandana**  

https://github.com/UDST/pandana

Pandana is a Python and C++ network analysis tool that can compute network accessibility using: 1) shortest path queries between ODs for any number of nodes within a search radius; and 2) aggregation queries using a cumulative opportunities accessibility method.  

A variety of statistics and can be used including sum, average, standard deviation, and count along with a number of distance decay functions such as linear and exponential. Pandana requires: 1) a set of OD node coordinates (e.g. based on addresses or Census block centroids) for which accessibility will be computed between and can include variables of interest such as socioeconomic data or business establishments that can be queried or aggregated; and 2) a network of nodes and weighted edges used for network routing. The OD nodes are connected to the nearest node in the graph network.

Pandana calculates the shortest path (e.g. lowest cost path) between ODs over a hierarchical network using the contraction hierarchies algorithm.

`Fletcher Foti and Paul Waddell. 2014. "A Generalized Computational Framework for Accessibility: From the Pedestrian to the Metropolitan Scale`

** UrbanAccess**

https://github.com/UDST/urbanaccess

UrbanAccess is a Python General Transit Feed Specification (GTFS) data acquisition, processing, and Pandana network creation tool designed to be used in tandem with Pandana for accessibility queries. UrbanAccess includes tools: 1) to connect and search GTFS data APIs; 2) validate GTFS data; 3) create individual agency or metropolitan scale transit networks; 4) compute headways; 5) penalize network impedance by transit mode.

`Samuel D. Blanchard and Paul Waddell, 2017, "UrbanAccess: Generalized Methodology for Measuring Regional Accessibility with an Integrated Pedestrian and Transit Network" Transportation Research Record: Journal of the Transportation Research Board, 2653: 35–44.`

** Notes ** 
    
- GTFS feeds and OSM data are constantly updated. The data in this notebook may change over time which may result in slight differences in results.
- This notebook uses Pandana v0.4.1 and UrbanAccess v0.2.0

## Installation:

In [None]:
!pip install pandana==0.4.1
!pip install urbanaccess==0.2.0

In [None]:
import os
import pandas as pd
import numpy as np

import pandana as pdna
from pandana.loaders import osm
from pandana.utils import reindex

import urbanaccess as ua
from urbanaccess.config import settings
from urbanaccess.gtfsfeeds import feeds
from urbanaccess import gtfsfeeds
from urbanaccess.gtfs.gtfsfeeds_dataframe import gtfsfeeds_dfs
from urbanaccess.network import ua_network, load_network

import matplotlib.pyplot as plt
import matplotlib
from mpl_toolkits.basemap import Basemap

# Note: pyproj v1.9.5.1 Windows builds has a issue where the pyproj_datadir cannot be found.
# Suggest downgrading if possible otherwise you must edit the package to give it your env Library\\share path

# Note: tables v3.4.4 for py36 on Windows is missing the conda dependency snappy and will fail if snappy is not installed

%matplotlib inline

In [None]:
# For Pandana <= v0.3.0 set number of unique Pandana networks that will be generated in this session. 
# For versions > v0.4.0 this is not required.
# pdna.network.reserve_num_graphs(5)

In [None]:
# Pandana currently uses depreciated parameters in matplotlib, this hides the warning until its fixed
import warnings
import matplotlib.cbook
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)

In [None]:
data_dir = 'data'

# #1 Pandana network basics

## Nodes

Pandana nodes consist of a unique "id" with spatial coordinates (latitude and longitude). Nodes are the vertices of a graph network representing street intersections.

In [None]:
nodes = pd.DataFrame({'id':[1,2,3,4],
                      'x':[-122.302578,-122.177008,-122.181374,-122.184170],
                      'y':[37.560184,37.481747,37.483689,37.484536]})
nodes

Plot the nodes

In [None]:
nodes.plot(x='x',y='y',kind='scatter')

## Edges

Pandana edges consist of a "from" node id and "to" node id column which is used to denote direction and an impedance column or weight column representing a friction factor for travel between the two nodes, in this case distance in meters, but this can also be travel time or a utility value. Edges are the connections between nodes representing streets and pathways.

In [None]:
edges = pd.DataFrame({'from':[1,2,3],
                      'to':[2,3,4],
                      'distance_m':[3000,6000,8000]})
edges

Pandana edges can be either "one way" or "two way". Lets convert the one way edge table above to a two way edge table

In [None]:
edges = edges.append(edges.rename(inplace=False,columns={'from':'to','to':'from'})).reset_index(drop=True)
edges

# #2  Using UrbanAccess and Pandana to analyze transit+pedestrian networks

# UrbanAccess

## The feeds object

The GTFS `feeds` object is a global `urbanaccess_gtfsfeeds` object that allows you to save and manage information needed to download multiple GTFS feeds. This object is a dictionary of the names of GTFS feeds or agencies and the URLs to use to download the corresponding feeds.

In [None]:
feeds.to_dict()

### Searching for GTFS feeds

You can use the search function to find feeds on the GTFS Data Exchange (Note: the GTFS Data Exchange is no longer being maintained as of Summer 2016 so feeds here may be out of date)

Let's search for feeds for transit agencies in the GTFS Data Exchange that we know serve Oakland, CA: 

1) Bay Area Rapid Transit District (BART) which runs the metro rail service and 

2) AC Transit which runs bus services.

Let's start by finding the feed for the Bay Area Rapid Transit District (BART) by using the search term: `Bay Area Rapid Transit`

In [None]:
gtfsfeeds.search(search_text='Bay Area Rapid Transit',
                 search_field=None,
                 match='contains')

Now that we see what can be found on the GTFS Data Exchange. Let's run this again but this time let's add the feed from your search to the feed download list

In [None]:
gtfsfeeds.search(search_text='Bay Area Rapid Transit',
                 search_field=None,
                 match='contains',
                 add_feed=True)

If you know of a GTFS feed located elsewhere or one that is more up to date, you can add additional feeds located at custom URLs by adding a dictionary with the key as the name of the service/agency and the value as the URL.

Let's do this for AC Transit which also operates in Oakland, CA.

The link to their feed is here: http://www.actransit.org/planning-focus/data-resource-center/ and let's get the latest version as of Fall 2018

In [None]:
feeds.add_feed(add_dict={'ac transit': 'http://www.actransit.org/wp-content/uploads/GTFSFall18.zip'})

Note the two GTFS feeds now in your feeds object ready to download

In [None]:
feeds.to_dict()

# Downloading GTFS data

Use the download function to download all the feeds in your feeds object at once. If no parameters are specified the existing feeds object will be used to acquire the data.

By default, your data will be downloaded into the directory of this notebook in the folder: `data`

In [None]:
gtfsfeeds.download()

For the purposes of this demo, the data had already been downloaded using the command above.

# Load GTFS data into an UrbanAccess transit data object

Now that we have downloaded our data let's load our individual GTFS feeds (currently a series of text files stored on disk) into a combined network of Pandas DataFrames.

- You can specify one feed or multiple feeds that are inside a root folder using the `gtfsfeed_path` parameter. If you want to aggregate multiple transit networks together, all the GTFS feeds you want to aggregate must be inside of a single root folder.
- Turn on `validation` and set a bounding box with the `remove_stops_outsidebbox` parameter turned on to ensure all your GTFS feed data are within a specified area.

Let's specify a bounding box of coordinates for the City of Oakland to subset the GTFS data to. You can generate a bounding box by going to http://boundingbox.klokantech.com/ and selecting the CSV format which should be: `(-122.355881,37.632226,-122.114775,37.884725)`

In [None]:
validation = True
verbose = True
# bbox for City of Oakland
bbox = (-122.355881,37.632226,-122.114775,37.884725)
remove_stops_outsidebbox = True
append_definitions = True

loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path=None,
                                           validation=validation,
                                           verbose=verbose,
                                           bbox=bbox,
                                           remove_stops_outsidebbox=remove_stops_outsidebbox,
                                           append_definitions=append_definitions)

### The transit data object

The output is a global `urbanaccess_gtfs_df` object that can be accessed with the specified variable `loaded_feeds`. This object holds all the individual GTFS feed files aggregated together with each GTFS feed file type in separate Pandas DataFrames to represent all the loaded transit feeds in a metropolitan area. 

In [None]:
loaded_feeds.stops.head()

Note the two transit services we have aggregated into one regional table

In [None]:
loaded_feeds.stops.unique_agency_id.unique()

Quickly view the transit stop locations

In [None]:
loaded_feeds.stops.plot(kind='scatter', x='stop_lon', y='stop_lat', s=0.1)

In [None]:
loaded_feeds.routes.head()

In [None]:
loaded_feeds.stop_times.head()

In [None]:
loaded_feeds.trips.head()

In [None]:
loaded_feeds.calendar.head()

# Create a transit network

Now that we have loaded and standardized our GTFS data, let's create a travel time weighted graph from the GTFS feeds we have loaded.

Create a network for weekday `monday` service between 7 am and 10 am (`['07:00:00', '08:00:00']`) to represent travel times during the AM Peak period.

Assumptions: We are using the service ids in the `calendar` file to subset the day of week, however if your feed uses the `calendar_dates` file and not the `calendar` file then you can use the `calendar_dates_lookup` parameter. This is not required for AC Transit and BART.

In [None]:
ua.gtfs.network.create_transit_net(gtfsfeeds_dfs=loaded_feeds,
                                   day='monday',
                                   timerange=['07:00:00', '08:00:00'],
                                   calendar_dates_lookup=None)

### The UrbanAccess network object

The output is a global `urbanaccess_network` object. This object holds the resulting graph comprised of nodes and edges for the processed GTFS network data for services operating at the day and time you specified inside of `transit_edges` and `transit_nodes`.

Let's set the global network object to a variable called `urbanaccess_net` that we can then inspect:

In [None]:
urbanaccess_net = ua.network.ua_network

In [None]:
urbanaccess_net.transit_edges.head()

In [None]:
urbanaccess_net.transit_nodes.head()

In [None]:
urbanaccess_net.transit_nodes.plot(kind='scatter', x='x', y='y', s=0.1)

# Download OSM data

Now let's download OpenStreetMap (OSM) pedestrian street network data to produce a graph network of nodes and edges for Oakland, CA. We will use the same bounding box as before.

In [None]:
# nodes, edges = ua.osm.load.ua_network_from_bbox(bbox=bbox,
#                                                 remove_lcn=True)

Let's load previously saved data from step above

In [None]:
h5file = os.path.join(data_dir, 'osm_walk_2way_oakland.h5')
nodes = pd.HDFStore(h5file).nodes
edges = pd.HDFStore(h5file).edges

Inspect the nodes and edges, note the initial distance based weight in meters.

In [None]:
nodes.head()

In [None]:
edges.head()

# Create a pedestrian network

Now that we have our pedestrian network data let's create a travel time weighted graph from the pedestrian network we have loaded and add it to our existing UrbanAccess network object. We will assume a pedestrian travels on average at 3 mph.

The resulting weighted network will be added to your UrbanAccess network object inside `osm_nodes` and `osm_edges`

In [None]:
ua.osm.network.create_osm_net(osm_edges=edges,
                              osm_nodes=nodes,
                              travel_speed_mph=3)

Let's inspect the results which we can access inside of the existing `urbanaccess_net` variable:

In [None]:
urbanaccess_net.osm_nodes.head()

In [None]:
urbanaccess_net.osm_edges.head()

In [None]:
urbanaccess_net.osm_nodes.plot(kind='scatter', x='x', y='y', s=0.1)

# Create an integrated transit and pedestrian network

Now let's integrate the two networks together. The resulting graph will be added to your existing UrbanAccess network object. After running this step, your network will be ready to be used with Pandana.

The resulting integrated network will be added to your UrbanAccess network object inside `net_nodes` and `net_edges`

In [None]:
ua.network.integrate_network(urbanaccess_network=urbanaccess_net,
                             headways=False)

Let's inspect the results which we can access inside of the existing `urbanaccess_net` variable:

In [None]:
urbanaccess_net.net_nodes.head()

In [None]:
urbanaccess_net.net_edges.head()

In [None]:
urbanaccess_net.net_edges[urbanaccess_net.net_edges['net_type'] == 'transit'].head()

# Save the network to disk

You can save the final processed integrated network `net_nodes` and `net_edges` to disk inside of a HDF5 file. By default the file will be saved to the directory of this notebook in the folder `data`

In [None]:
ua.network.save_network(urbanaccess_network=urbanaccess_net,
                        filename='final_net.h5',
                        overwrite_key = True)

# Load saved network from disk

You can load an existing processed integrated network HDF5 file from disk into a UrbanAccess network object.

In [None]:
urbanaccess_net = ua.network.load_network(filename='final_net.h5')

# Visualize the network

You can visualize the network you just created using basic UrbanAccess plot functions

### Integrated network

In [None]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=1.1, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

### Integrated network by travel time

Use the `col_colors` function to color edges by travel time. In this case the darker red the higher the travel times.

Note the ability to see AC Transit's major bus arterial routes (in darker red) and transfer locations and BART rail network (rail stations are visible by the multiple bus connections at certain junctions in the network most visible in downtown Oakland at 19th, 12th Street, and Lake Merritt stations and Fruitvale and Coliseum stations) with the underlying pedestrian network. Downtown Oakland is located near the white cutout in the northeast middle section of the network which represents Lake Merritt.

In [None]:
edgecolor = ua.plot.col_colors(df=urbanaccess_net.net_edges, col='weight', cmap='gist_heat_r', num_bins=5)
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color=edgecolor, edge_linewidth=1, edge_alpha=0.7,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Let's zoom in closer to downtown Oakland using a new smaller extent bbox. Note the bus routes on the major arterials and the BART routes from station to station.

In [None]:
edgecolor = ua.plot.col_colors(df=urbanaccess_net.net_edges, col='weight', cmap='gist_heat_r', num_bins=5)
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=(-122.282295, 37.795, -122.258434, 37.816022),
                 fig_height=30, margin=0.02,
                 edge_color=edgecolor, edge_linewidth=1, edge_alpha=0.7,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

### Transit network

You can also slice the network by network type

In [None]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['net_type']=='transit'],
                 bbox=None,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

### Pedestrian network

In [None]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['net_type']=='walk'],
                 bbox=None,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

You can slice the network using any attribute in edges. In this case let's examine bridges in the walk network.

In [None]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[(urbanaccess_net.net_edges['net_type']=='walk') & 
                                                 (urbanaccess_net.net_edges['bridge']=='yes')],
                 bbox=None,
                 fig_height=30, margin=0.02,
                 edge_color='red', edge_linewidth=2, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Search for other attributes of interest:

In [None]:
urbanaccess_net.net_edges.columns

In [None]:
urbanaccess_net.net_edges['highway'].unique()

In this case let's examine all explicitly pedestrian network elements:

In [None]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[(urbanaccess_net.net_edges['net_type']=='walk') & 
                                                 (urbanaccess_net.net_edges['highway'].isin(['cycleway',
                                                                                             'footway',
                                                                                             'steps',
                                                                                             'path', 
                                                                                             'pedestrian']))],
                 bbox=None,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

### Transit network: AC Transit Route 51A

Let's do the same with the transit network. In this case let's examine one route for AC Transit route 51A.

Looking at what routes are in the network for 51A we see route id: `51A_ac_transit`

In [None]:
urbanaccess_net.net_edges['unique_route_id'].unique()

In [None]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['unique_route_id']=='51A_ac_transit'],
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=25, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

### Transit network: BART network

We can also slice the data by agency. In this case let's view all BART routes and make the station nodes larger on our plot.

Looking at what agencies are in the network for BART we see agency id: `bay_area_rapid_transit`

In [None]:
urbanaccess_net.net_edges['unique_agency_id'].unique()

In [None]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['unique_agency_id']=='bay_area_rapid_transit'],
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=100, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

# Add average headways to network travel time

### Calculate route stop level headways

The network we have generated so far only contains pure travel times. UrbanAccess allows for the calculation of and addition of route stop level average headways to the network. This is used as a proxy for passenger wait times at stops and stations. The route stop level average headway are added to the pedestrian to transit connector edges.

Let's calculate headways for the same AM Peak time period. Statistics on route stop level headways will be added to your GTFS transit data object inside of `headways`

In [None]:
ua.gtfs.headways.headways(gtfsfeeds_df=loaded_feeds,
                          headway_timerange=['07:00:00','08:00:00'])

In [None]:
loaded_feeds.headways.head()

### Add the route stop level average headways to your integrated network

Now that headways have been calculated and added to your GTFS transit feed object, you can use them to generate a new integrated network that incorporates the headways within the pedestrian to transit connector edge travel times.

In [None]:
ua.network.integrate_network(urbanaccess_network=urbanaccess_net,
                             headways=True,
                             urbanaccess_gtfsfeeds_df=loaded_feeds,
                             headway_statistic='mean')

### Integrated network by travel time with average headways

In [None]:
edgecolor = ua.plot.col_colors(df=urbanaccess_net.net_edges, col='weight', cmap='gist_heat_r', num_bins=5)
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color=edgecolor, edge_linewidth=1, edge_alpha=0.7,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

# Using an UrbanAccess network with Pandana

Pandana (Pandas Network Analysis) is a tool to compute network accessibility metrics.

Now that we have an integrated transit and pedestrian network that has been formatted for use with Pandana, we can now use Pandana right away to compute accessibility metrics.

There are a couple of things to remember about UrbanAccess and Pandana:
- UrbanAccess generates by default a one way network. One way means there is an explicit edge for each direction in the edge table. Where applicable, it is important to set any Pandana `two_way` parameters to `False` (they are `True` by default) to indicate that the network is a one way network.
- As of Pandana v0.4.1, `node ids` and `from` and `to` columns in your network must be integer type and not string. UrbanAccess automatically generates both string and integer types so use the `from_int` and `to_int` columns in edges and the index in nodes `id_int`.
- UrbanAccess by default will generate edge weights that represent travel time in units of minutes.

For more on Pandana see the:

**Pandana repo:** https://github.com/UDST/pandana 

**Pandana documentation:** http://udst.github.io/pandana/

## Load Census block data

Let's load 2010 Census block data for the 9 county Bay Area. Note: These data have been processed from original Census and LEHD data.

In [None]:
h5file = os.path.join(data_dir, 'bay_area_demo_data.h5')
blocks = pd.read_hdf(h5file,'blocks')
# remove blocks that contain all water
blocks = blocks[blocks['square_meters_land'] != 0]
print ('Total number of blocks: {:,}'.format(len(blocks)))
blocks.head()

Let's subset the Census data to just be the bounding box for Oakland

In [None]:
lng_max, lat_min, lng_min, lat_max = bbox
outside_bbox = blocks.loc[~(((lng_max < blocks["x"]) & (blocks["x"] < lng_min)) & ((lat_min < blocks["y"]) & (blocks["y"] < lat_max)))]
blocks_subset = blocks.drop(outside_bbox.index)
print ('Total number of subset blocks: {:,}'.format(len(blocks_subset)))

In [None]:
blocks_subset.plot(kind='scatter', x='x', y='y', s=0.1)

## Initialize the Transit+Pedestrian Pandana network

Let's initialize our Pandana network object using our transit and pedestrian network we created. Note: the `from_int` and `to_int` as well as the `twoway=False` denoting this is a explicit one way network.

In [None]:
%%time
transit_ped_net = pdna.Network(urbanaccess_net.net_nodes["x"],
                               urbanaccess_net.net_nodes["y"],
                               urbanaccess_net.net_edges["from_int"],
                               urbanaccess_net.net_edges["to_int"],
                               urbanaccess_net.net_edges[["weight"]], 
                               twoway=False)

Now let's attach the node ids of the network to our blocks

In [None]:
blocks_subset['node_id_transit'] = transit_ped_net.get_node_ids(blocks_subset['x'], blocks_subset['y'])

### Calculate cumulative accessibility

Now let's compute an accessibility metric, in this case a cumulative accessibility metric.

Let's set the block variables we want to use as our accessibly metric on the Pandana network. In this case let's use `jobs`

In [None]:
transit_ped_net.set(blocks_subset.node_id_transit, variable = blocks_subset.jobs, name='jobs')

Now let's run an cumulative accessibility query using our network and the jobs variable for three different travel time thresholds: 15, 30, 45 minutes.

Note: Depending on network size, radius threshold, computer processing power, and whether or not you are using multiple cores the compute process may take some time.

In [None]:
%%time
jobs_45 = transit_ped_net.aggregate(45, type='sum', decay='linear', name='jobs')
jobs_30 = transit_ped_net.aggregate(30, type='sum', decay='linear', name='jobs')
jobs_15 = transit_ped_net.aggregate(15, type='sum', decay='linear', name='jobs')

Quickly visualize the accessibility query results. As expected, note that a travel time of 15 minutes results in a lower number of jobs accessible at each network node.

In [None]:
print (jobs_45.head())
print (jobs_30.head())
print (jobs_15.head())

In [None]:
results = pd.DataFrame({'jobs_45':reindex(jobs_45, blocks_subset.node_id_transit),
                        'jobs_30':reindex(jobs_30, blocks_subset.node_id_transit),
                        'jobs_15':reindex(jobs_15, blocks_subset.node_id_transit)})
results.head()

In [None]:
results.describe()

### Jobs accessible within 15 minutes

Note how the radius of the number of jobs accessible expands as the time threshold increases where high accessibility is indicated in dark red. You can easily see downtown Oakland has the highest accessibility due to a convergence of transit routes and because downtown is where the majority of jobs in the area are located. Other high accessibility areas are visible elsewhere directly adjacent to BART metro rail stations of West Oakland, Fruitvale, and Coliseum and AC Transit bus routes on the main arterial road corridors.

In [None]:
%%time
transit_ped_net.plot(jobs_15, 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':4,'edgecolor':'none'})

### Jobs accessible within 30 minutes

In [None]:
%%time
transit_ped_net.plot(jobs_30, 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':4,'edgecolor':'none'})

### Jobs accessible within 45 minutes

In [None]:
%%time
transit_ped_net.plot(jobs_45, 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':4,'edgecolor':'none'})

### Calculate nearest features using POIs

Now let's compute another accessibility metric, in this case a nearest feature query.

Load preprocessed POI data representing hospitals (via CA OSHPD: https://www.oshpd.ca.gov/documents/HWDD/GIS/HealthcareFacilities201210.zip) and protected areas in California (via GreenInfo Network: http://www.atlas.ca.gov/download.html#/casil/planning/Land_Ownership/CPAD/CPAD-2016b-December2016) and subset to our bounding box for Oakland. 

In [None]:
parks = pd.read_csv(os.path.join(data_dir, 'CPAD_2016_Units.csv') ,encoding='utf-8',index_col=0)
hospitals = pd.read_csv(os.path.join(data_dir, 'OSHPD_2012.csv'),encoding='utf-8',index_col=0)
print('Loaded {:,} parks and {:,} hospitals'.format(len(parks),len(hospitals)))

In [None]:
outside_bbox = parks.loc[~(((lng_max < parks["x"]) & (parks["x"] < lng_min)) & ((lat_min < parks["y"]) & (parks["y"] < lat_max)))]
parks_subset = parks.drop(outside_bbox.index)
parks_subset = parks_subset[parks_subset['ACCESS_TYP'] == 'Open Access']
print ('Total number of subset parks: {:,}'.format(len(parks_subset)))

outside_bbox = hospitals.loc[~(((lng_max < hospitals["x"]) & (hospitals["x"] < lng_min)) & ((lat_min < hospitals["y"]) & (hospitals["y"] < lat_max)))]
hospitals_subset = hospitals.drop(outside_bbox.index)
hospitals_subset = hospitals_subset[hospitals_subset['TYPE'] == 'Hospital']
hospitals_subset = hospitals_subset[hospitals_subset['FAC_STATUS'] == 'Open']
print ('Total number of subset hospitals: {:,}'.format(len(hospitals_subset)))

In [None]:
hospitals_subset.head()

In [None]:
hospitals_subset.plot(kind='scatter', x='x', y='y', s=5)

In [None]:
parks_subset.head()

In [None]:
parks_subset.plot(kind='scatter', x='x', y='y', s=2)

In Pandana v0.4.0, we can set the POIs on to the network. 

In [None]:
# If using Pandana v0.3.0, we must first initialize the POIs and parameters we want to use in the analysis 
# and then set the POIs on to the network. 
# transit_ped_net.init_pois(num_categories=2, max_dist=45, max_pois=2)

In [None]:
transit_ped_net.set_pois(category="parks", x_col=parks_subset['x'], y_col=parks_subset['y'], maxdist=45, maxitems=2)
transit_ped_net.set_pois(category="hospitals", x_col=hospitals_subset['x'], y_col=hospitals_subset['y'], maxdist=45, maxitems=2)

Now let's run a nearest feature query using our network and the two POI categories to calculate the distance from each network node to the 2 nearest parks and hospitals up to 30 minutes of travel time. Anything past 30 minutes will be given a value of 0.

Note: Depending on network size, radius, computer processing power, and whether or not you are using multiple cores the compute process may take some time.

In [None]:
%%time
nearest_parks = transit_ped_net.nearest_pois(distance=30, category="parks", 
                                             num_pois=2,max_distance=0, include_poi_ids=True)
nearest_hospitals = transit_ped_net.nearest_pois(distance=30, category="hospitals", 
                                                 num_pois=2,max_distance=0,  include_poi_ids=True)

Quickly view the results

In [None]:
nearest_hospitals[nearest_hospitals[1]>0].head()

In [None]:
hospitals_subset[hospitals_subset.index == 7421]

In [None]:
nearest_hospitals[1].describe()

In [None]:
nearest_parks[nearest_parks[1]>0].head()

In [None]:
parks_subset[parks_subset.index == 5658]

In [None]:
nearest_parks[1].describe()

### Accessibility to hospitals within 30 minutes

Now let's plot the results

You can see nodes that are closer to hospitals are colored bright yellow and ones further away are dark red. Ones that are not within the 30 minute travel time are not displayed and are white.

In [None]:
%%time
transit_ped_net.plot(nearest_hospitals[1], 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':6,'edgecolor':'none'})

### Accessibility to parks within 30 minutes

You can see nodes that are closer to parks are colored bright yellow and ones further away are dark red. Ones that are not within the 30 minute travel time are not displayed and are white.

In [None]:
%%time
transit_ped_net.plot(nearest_parks[1], 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':6,'edgecolor':'none'})

# #3  Examples of a simple analysis

## Difference in access by mode

In [None]:
results_transit = results.copy()

h5file = os.path.join(data_dir, 'osm_walk_2way_oakland.h5')
osm_walk_nodes = pd.HDFStore(h5file).nodes
osm_walk_edges = pd.HDFStore(h5file).edges
print ('Loaded {} nodes {} edges'.format(str(len(osm_walk_nodes)), str(len(osm_walk_edges))))

SPEED_MPH = 3
osm_walk_edges['travel_time_min'] = (osm_walk_edges['distance']/1609.34) / SPEED_MPH * 60
print ('Converted edge weight')

walk_net = pdna.Network(osm_walk_nodes["x"], 
                   osm_walk_nodes["y"], 
                   osm_walk_edges["from"], 
                   osm_walk_edges["to"],
                   osm_walk_edges[["travel_time_min"]],twoway=True)

blocks['node_id'] = walk_net.get_node_ids(blocks['x'], blocks['y'])
walk_net.set(blocks.node_id, variable = blocks.jobs, name='jobs')
walk_jobs_30 = walk_net.aggregate(30, 
                              type='sum', 
                              decay='linear', 
                              name = 'jobs')
print ('Aggregation completed')

results_walk = pd.DataFrame({'walk_jobs_30':reindex(walk_jobs_30, blocks.node_id)})
results_combined = results_transit.join(results_walk, how='left', sort=False)
results_combined = results_combined.join(blocks[['x','y']], how='left', sort=False)
results_combined['access_diff'] = results_combined['walk_jobs_30']-results_combined['jobs_30']
results_combined[['access_diff']].tail()

In [None]:
results_combined[results_combined['access_diff'] < 100000].hist(column='access_diff', bins=30)

## Relationship between access and other variables

You can combine census data with accessibility metrics in order to investigate patterns of low or high access neighborhoods and socioeconomics.

Lets see what the relationship is between average household income and transit accessibility (using a 30 min travel time)

In [None]:
results_combined = results_transit.join(blocks, how='left', sort=False)
results_combined_subset = results_combined[results_combined['income']<200000]
results_combined_subset.plot.scatter(x='jobs_30',y='income',s=2)