# Transit Matrix

In [1]:
import sys, os
os.chdir('scripts')
from p2p import *

In [2]:
%matplotlib inline

--- 
<h1><center>DEMO</center></h1>  


**View structure of data example: Health Facilities in Chicago.**  
Health Facilities Data: http://makosak.github.io/chihealthaccess/index.html

In [2]:
df = pd.read_csv('data/DEST/health_chicago.csv')
df.head()

Unnamed: 0,ID,Facility,lat,lon,Type,target,category,community
0,1,"American Indian Health Service of Chicago, Inc.",41.956676,-87.651879,5,127000,Other Health Providers,3
1,2,Hamdard Center for Health and Human Services,41.997852,-87.669535,5,190000,Other Health Providers,77
2,3,Infant Welfare Society of Chicago,41.924904,-87.71727,5,137000,Other Health Providers,22
3,4,Mercy Family - Henry Booth House Family Health...,41.841694,-87.62479,5,159000,Other Health Providers,35
4,6,Cook County - Dr. Jorge Prieto Health Center,41.847143,-87.724975,5,166000,Other Health Providers,30


### Distance Matrices  

<span style="color:LimeGreen"> **Specifications for the asymmetric and symmetric distance matrices:**  

- network_type (drive or walk)
- epsilon=0.05 (can change default)  
- primary_input  
- secondary_input  
- output_type='csv'  
- n_best_matches=4 (for simulations)
- read_from_file=None  
- write_to_file (set as True if user wants to save results)   
- load_to_mem=True (True is default but can set it to False if the user is running a computational intensive process >>>.)

**Please make sure latitude and longitude are correct if using X and Y.**


## Model 1: Asymmetric Matrix  
---
The first model directly creates an asymmetric matrix from destination points to the centroids of the area of analysis (also takes ~ 20 min). This approach is most effective when you are only calculating the distance matrix or a particular distance score once.

In [4]:
# Calculate asymmetric distance matrix for walking (takes ~3 minutes to run) 

w_asym_mat = TransitMatrix(network_type='walk',
                           primary_input='data/ORIG/tracts2010.csv',
                           secondary_input='data/DEST/health_chicago.csv', 
                           write_to_file=True)

w_asym_mat.process()

#The output is walk_asym_health_tracts.csv (used in the calculation of the metrics)

INFO:p2p:Processing network (walk) in format: csv with epsilon: 0.05


The variables in your data set are:
>  geoid10
>  lon
>  lat
>  Pop2014
>  Pov14
>  community


INFO:p2p:Total number of rows in the dataset: 801
INFO:p2p:Complete number of rows for computing the matrix: 801
INFO:p2p:Total number of rows dropped due to missing latitude or longitude values: 0


The variables in your data set are:
>  agency_id
>  facility
>  lat
>  lon
>  cat_num
>  target
>  category


INFO:p2p:Total number of rows in the dataset: 199
INFO:p2p:Complete number of rows for computing the matrix: 199
INFO:p2p:Total number of rows dropped due to missing latitude or longitude values: 0
INFO:osmnet:Requesting network data within bounding box from Overpass API in 1 request(s)
INFO:osmnet:Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"motor|proposed|construction|abandoned|platform|raceway"]["foot"!~"no"]["pedestrians"!~"no"](41.55758000,-87.85448850,42.11430300,-87.58049640);>;);out;'}"


Requesting network data within bounding box from Overpass API in 1 request(s)
Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"motor|proposed|construction|abandoned|platform|raceway"]["foot"!~"no"]["pedestrians"!~"no"](41.55758000,-87.85448850,42.11430300,-87.58049640);>;);out;'}"


INFO:osmnet:Downloaded 74,131.9KB from www.overpass-api.de in 13.74 seconds
INFO:osmnet:Downloaded OSM network data within bounding box from Overpass API in 1 request(s) and 15.03 seconds


Downloaded 74,131.9KB from www.overpass-api.de in 13.74 seconds
Downloaded OSM network data within bounding box from Overpass API in 1 request(s) and 15.03 seconds


INFO:osmnet:Returning OSM data with 442,330 nodes and 117,327 ways...


Returning OSM data with 442,330 nodes and 117,327 ways...


INFO:osmnet:Edge node pairs completed. Took 144.87 seconds


Edge node pairs completed. Took 144.87 seconds


INFO:osmnet:Returning processed graph with 197,545 nodes and 302,432 edges...
INFO:osmnet:Completed OSM data download and Pandana node and edge table creation in 172.84 seconds


Returning processed graph with 197,545 nodes and 302,432 edges...
Completed OSM data download and Pandana node and edge table creation in 172.84 seconds


INFO:p2p:Prepared raw network in 1.23 seconds and wrote to: data/results/raw_network_0.csv
  node_array = pd.DataFrame.as_matrix(nodes)
INFO:p2p:Nearest Neighbor matching completed in 0.28 seconds
INFO:p2p:Nearest Neighbor matching completed in 0.11 seconds
INFO:p2p:Writing to file: data/results/walk_full_results_0.csv
INFO:p2p:Shortest path matrix computed in 8.42 seconds
INFO:p2p:All operations completed in 182.91 seconds


Cleaned up calculation artifacts


In [2]:
# Calculate asymmetric distance matrix for driving (takes ~1.5 minutes to run) 

d_asym_mat = TransitMatrix(network_type='drive',
                           primary_input='data/ORIG/tracts2010.csv',
                           secondary_input='data/DEST/health_chicago.csv', 
                           write_to_file=True)

d_asym_mat.process(speed_limit_filename='data/speed_limit.csv')

#The output is drive_asym_health_tracts.csv (used in the calculation of the metrics)

INFO:p2p:Processing network (drive) in format: csv with epsilon: 0.05


The variables in your data set are:
>  geoid10
>  lon
>  lat
>  Pop2014
>  Pov14
>  community


INFO:p2p:Total number of rows in the dataset: 801
INFO:p2p:Complete number of rows for computing the matrix: 801
INFO:p2p:Total number of rows dropped due to missing latitude or longitude values: 0


The variables in your data set are:
>  agency_id
>  facility
>  lat
>  lon
>  cat_num
>  target
>  category


INFO:p2p:Total number of rows in the dataset: 199
INFO:p2p:Complete number of rows for computing the matrix: 199
INFO:p2p:Total number of rows dropped due to missing latitude or longitude values: 0


The variable names in your speed limit data set are:
>  Unnamed: 0
>  FULLSTNA
>  SPDLIMIT


INFO:osmnet:Requesting network data within bounding box from Overpass API in 1 request(s)
INFO:osmnet:Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"cycleway|footway|path|pedestrian|steps|track|proposed|construction|bridleway|abandoned|platform|raceway|service"]["motor_vehicle"!~"no"]["motorcar"!~"no"]["service"!~"parking|parking_aisle|driveway|emergency_access"](41.55758000,-87.85448850,42.11430300,-87.58049640);>;);out;'}"


Requesting network data within bounding box from Overpass API in 1 request(s)
Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"cycleway|footway|path|pedestrian|steps|track|proposed|construction|bridleway|abandoned|platform|raceway|service"]["motor_vehicle"!~"no"]["motorcar"!~"no"]["service"!~"parking|parking_aisle|driveway|emergency_access"](41.55758000,-87.85448850,42.11430300,-87.58049640);>;);out;'}"


INFO:osmnet:Downloaded 37,481.3KB from www.overpass-api.de in 7.18 seconds
INFO:osmnet:Downloaded OSM network data within bounding box from Overpass API in 1 request(s) and 7.78 seconds


Downloaded 37,481.3KB from www.overpass-api.de in 7.18 seconds
Downloaded OSM network data within bounding box from Overpass API in 1 request(s) and 7.78 seconds


INFO:osmnet:Returning OSM data with 194,268 nodes and 43,369 ways...


Returning OSM data with 194,268 nodes and 43,369 ways...


INFO:osmnet:Edge node pairs completed. Took 50.80 seconds
INFO:osmnet:Returning processed graph with 63,841 nodes and 97,483 edges...
INFO:osmnet:Completed OSM data download and Pandana node and edge table creation in 64.15 seconds


Edge node pairs completed. Took 50.80 seconds
Returning processed graph with 63,841 nodes and 97,483 edges...
Completed OSM data download and Pandana node and edge table creation in 64.15 seconds


INFO:p2p:Matching street network completed in 
            12.23 seconds: 5150 perfect matches, 136 near perfect matches,
            748 good matches and 113 non matches
INFO:p2p:Prepared raw network in 0.41 seconds and wrote to: data/matrices/raw_network_0.csv
  node_array = pd.DataFrame.as_matrix(nodes)
INFO:p2p:Nearest Neighbor matching completed in 0.24 seconds
INFO:p2p:Nearest Neighbor matching completed in 0.07 seconds
INFO:p2p:Writing to file: data/matrices/drive_full_results_0.csv
INFO:p2p:Shortest path matrix computed in 1.47 seconds
INFO:p2p:All operations completed in 78.58 seconds


Cleaned up calculation artifacts



### Model 2: Symmetric Matrix 
---
The second model creates a symmetric distance travel matrix from block to block (801 x 801 matrix). Then, we snap the destination points to the area of analysis (blocks), getting a matrix that calculates the distance between the destinations and every block in the dataset. 


In [3]:
# Specify walking distance matrix (takes ~3 min to run) 
w_sym_mat = TransitMatrix(network_type='walk',
                          primary_input='data/ORIG/tracts2010.csv',
                          write_to_file=True,
                          load_to_mem=True)

# Run process
w_sym_mat.process()

# Saved as walk_sym_health_tracts.csv

INFO:p2p:Processing network (walk) in format: csv with epsilon: 0.05


The variables in your data set are:
>  geoid10
>  lon
>  lat
>  Pop2014
>  Pov14
>  community


INFO:p2p:Total number of rows in the dataset: 801
INFO:p2p:Complete number of rows for computing the matrix: 801
INFO:p2p:Total number of rows dropped due to missing latitude or longitude values: 0
INFO:osmnet:Requesting network data within bounding box from Overpass API in 1 request(s)
INFO:osmnet:Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"motor|proposed|construction|abandoned|platform|raceway"]["foot"!~"no"]["pedestrians"!~"no"](41.60021990,-87.85448850,42.07126140,-87.58049640);>;);out;'}"


Requesting network data within bounding box from Overpass API in 1 request(s)
Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"motor|proposed|construction|abandoned|platform|raceway"]["foot"!~"no"]["pedestrians"!~"no"](41.60021990,-87.85448850,42.07126140,-87.58049640);>;);out;'}"


ERROR:p2p:Error trying to download OSM network. 
            Did you reverse lat/long? 
            Is your network connection functional?
            


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [5]:
# Specify driving distance matrix (takes ~1.5 minute to run) 
d_sym_mat = TransitMatrix(network_type='drive',
                          primary_input='data/ORIG/tracts2010.csv',
                          write_to_file=True,
                          load_to_mem=True)

# Run process. For driving, p2p queries OSM to fetch the street network and then output the shortest path transit matrix
d_sym_mat.process(speed_limit_filename='data/speed_limit.csv')

# Saved as drive_sym_health_tracts.csv

INFO:p2p:Processing network (drive) in format: csv with epsilon: 0.05


The variables in your data set are:
>  geoid10
>  lon
>  lat
>  Pop2014
>  Pov14
>  community


INFO:p2p:Total number of rows in the dataset: 801
INFO:p2p:Complete number of rows for computing the matrix: 801
INFO:p2p:Total number of rows dropped due to missing latitude or longitude values: 0


The variable names in your speed limit data set are:
>  Unnamed: 0
>  FULLSTNA
>  SPDLIMIT


INFO:osmnet:Requesting network data within bounding box from Overpass API in 1 request(s)
INFO:osmnet:Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"cycleway|footway|path|pedestrian|steps|track|proposed|construction|bridleway|abandoned|platform|raceway|service"]["motor_vehicle"!~"no"]["motorcar"!~"no"]["service"!~"parking|parking_aisle|driveway|emergency_access"](41.60021990,-87.85448850,42.07126140,-87.58049640);>;);out;'}"


Requesting network data within bounding box from Overpass API in 1 request(s)
Posting to http://www.overpass-api.de/api/interpreter with timeout=180, "{'data': '[out:json][timeout:180];(way["highway"]["highway"!~"cycleway|footway|path|pedestrian|steps|track|proposed|construction|bridleway|abandoned|platform|raceway|service"]["motor_vehicle"!~"no"]["motorcar"!~"no"]["service"!~"parking|parking_aisle|driveway|emergency_access"](41.60021990,-87.85448850,42.07126140,-87.58049640);>;);out;'}"


INFO:osmnet:Downloaded 33,501.4KB from www.overpass-api.de in 7.87 seconds
INFO:osmnet:Downloaded OSM network data within bounding box from Overpass API in 1 request(s) and 8.25 seconds


Downloaded 33,501.4KB from www.overpass-api.de in 7.87 seconds
Downloaded OSM network data within bounding box from Overpass API in 1 request(s) and 8.25 seconds


INFO:osmnet:Returning OSM data with 170,563 nodes and 38,972 ways...


Returning OSM data with 170,563 nodes and 38,972 ways...


INFO:osmnet:Edge node pairs completed. Took 46.40 seconds
INFO:osmnet:Returning processed graph with 57,572 nodes and 88,601 edges...
INFO:osmnet:Completed OSM data download and Pandana node and edge table creation in 59.66 seconds


Edge node pairs completed. Took 46.40 seconds
Returning processed graph with 57,572 nodes and 88,601 edges...
Completed OSM data download and Pandana node and edge table creation in 59.66 seconds


INFO:p2p:Matching street network completed in 
            10.65 seconds: 4121 perfect matches, 118 near perfect matches,
            649 good matches and 99 non matches
INFO:p2p:Prepared raw network in 0.37 seconds and wrote to: data/raw_network_0.csv
  node_array = pd.DataFrame.as_matrix(nodes)
INFO:p2p:Nearest Neighbor matching completed in 0.24 seconds
INFO:p2p:Writing to file: data/drive_full_results_0.csv
INFO:p2p:Shortest path matrix computed in 1.27 seconds
INFO:p2p:All operations completed in 72.20 seconds


Cleaned up calculation artifacts


Now, snap the points to the units of analysis. However, snapping the destination points is not always so straightforward. Deciding which points (laying on the network) are assigned to each area of analysis may be arbitrary; therefore, it is important to scrutinize the structure of the data before doing any further processing. If the destinations fall within the unit of analysis, the best option is to run a within function that incorporates the destinations to the unit of analysis and then doing a join with the area IDs.
The following image shows that in this case, we can safely run a function that assigns each point to the area of analysis of interest. 

<img src="scripts/data/figures/snap.png" width="500" title="Optional title">

**Spatial join of health facilities and area of analysis**

Finally, in order to get the matrix of origins to destinations, we need to join the health facilities by block with the distance matrix previously generated. This will generate an asymmetric matrix with all the distances from destinations to all the units of analysis in Chicago.

In [25]:
# Read destination files to join with boundaries 
health_gdf = gpd.read_file('data/DEST/health_chicago.shp')
health_gdf.head()
#Use symmetric matrix calculated above or read your previously saved results:
sym_walk=pd.read_csv('data/matrices/walk_sym_health_tracts.csv')

# Read boundaries files 
boundaries_gdf = gpd.read_file('data/ORIG/tracts2010.shp')

# Rename the ID name in order to match both data frames. 
sym_walk= sym_walk.rename(index=str, columns={"Unnamed: 0": "geoid10"})

# Spatial join of amenities within each area of analysis 
#It drops values outside of the tracts shapefile. From 199 to 182 datapoints.
s_join = gpd.sjoin(health_gdf, boundaries_gdf, how='inner', op='within')

# Convert geopanda dataframe to non-spatial dataframe to join 
jb_df = pd.DataFrame(s_join)

# Make sure the id is of the same data type in both data frames.
# sym_walk.dtypes
# jb_df.dtypes
jb_df.geoid10=jb_df.geoid10.astype(int)
jb_df=pd.DataFrame(jb_df['geoid10'])

# Join the symmetric matrix with the spatially joined data (with geoid10 id)
j_asym=pd.merge(sym_walk, jb_df, left_on='geoid10', right_on='geoid10', how='right')

j_asym.to_csv('data/matrices/walk_asym_health_tracts_join.csv')

In [26]:
#Check the output is correct
j_asym.head()

Unnamed: 0,geoid10,17031842400,17031840300,17031841100,17031841200,17031838200,17031650301,17031530503,17031760803,17031540102,...,17031620100,17031620200,17031070200,17031070400,17031070500,17031071000,17031071200,17031130300,17031292200,17031630900
0,17031031501,18586,12034,9778,10187,8879,17799,24351,11661,25486,...,15606,16228,3343,3331,3596,3934,3862,4856,11392,16023
1,17031031502,18246,11694,9438,9847,8539,17459,24011,11793,25272,...,15266,15888,3003,2996,3261,3599,3522,4988,11052,15683
2,17031031800,18898,11352,10090,9505,8197,17117,24601,10601,25953,...,14924,15546,3655,3415,3331,4052,4174,3796,10710,15341
3,17031063400,15827,9560,7019,7713,6405,15325,21877,13006,22882,...,13132,13754,689,1095,1402,1679,1335,6713,8918,13549
4,17031031300,19002,12564,10194,10717,9409,18329,24881,11930,25861,...,16136,16758,3873,4042,4307,4645,4392,4797,11922,16553


In [31]:
j_asym.shape

(182, 802)

Now that the user has the origin destination matrices, we can proceed to estimate metrics. For this demo's purpose, we will use only drive_asym_health_tracts.csv and walk_asym_health_tracts.csv to run the metrics.