# Simple Demo for Testing

Authors: [Irene Farah](https://www.linkedin.com/in/imfarah/),  [Julia Koschinsky](https://www.linkedin.com/in/julia-koschinsky-657599b1/), [Logan Noel](https://www.linkedin.com/in/lmnoel/).   
Contact: [Julia Koschinsky](mailto:jkoschinsky@uchicago.edu)  

Research assistance of [Shiv Agrawal](http://simonlab.uchicago.edu/people/ShivAgrawal.html), [Caitlyn Tien](https://www.linkedin.com/in/caitlyn-tien-0b784b161/) and [Richard Lu](https://www.linkedin.com/in/richard-lu-576874155/) is gratefully acknowledged.

Center for Spatial Data Science  
University of Chicago  

July 30, 2019

----

This notebook lets you test the core spatial access metrics with a toy dataset before running your own data. You will input two stored csv files:  

1)  **hyde_park_tracts.csv** contains 12 points of origins (tract centroids for Hyde Park, Chicago + population field)  
2)  **hyde_park_dests.csv** contains 7 amenities in three categories: museums, restaurants and supermarkets and one target field (this is an attribute of the amenity like number of employees, size or revenue).  

You will first create a matrix of walking times (in seconds) from these points of origin to the 7 destinations (the matrix will have 12 rows and 7 columns). 

Then, the demo runs through a basic version of each spatial access and coverage metric for illustration purposes. The functionality of each spatial access metric is explained in more detail in the following notebooks.

In [None]:
# Check to see what version of spatial access you are using
! pip3 show spatial-access

In [None]:
cd ../..

## Creating the Travel Time Matrix

This generates a matrix of walking times (in seconds) from the 12 origins to the 7 destinations (12 rows x 7 columns). 

In [None]:
from spatial_access.p2p import *

---

Read in the stored source and destination csv files:

In [4]:
import pandas as pd
sources_df = pd.read_csv('./data/input_data/sources/hyde_park_tracts.csv')
dests_df = pd.read_csv('./data/input_data/destinations/hyde_park_dests.csv')


View the source data (12 tract centroids):

In [5]:
sources_df

Unnamed: 0,geoid10,lon,lat,Pop2014,Pov14,community
0,17031836300,-87.601757,41.801532,6465,234,41
1,17031836200,-87.601284,41.790469,1329,47,41
2,17031410100,-87.579323,41.801497,1956,551,41
3,17031410200,-87.594269,41.801668,1248,362,41
4,17031410500,-87.603745,41.797827,2630,717,41
5,17031410600,-87.598946,41.797971,2365,703,41
6,17031411100,-87.589702,41.790449,2246,154,41
7,17031410700,-87.594198,41.79804,1959,453,41
8,17031410800,-87.589626,41.79796,3201,741,41
9,17031410900,-87.576659,41.797874,2923,607,41


View the destination data (7 amenities):

In [6]:
dests_df

Unnamed: 0,name,lon,lat,category,target
0,Museum of Science and Industry,-87.583131,41.790883,Museum,400
1,Medici,-87.593738,41.791438,Restaurant,50
2,Valois,-87.588328,41.799663,Restaurant,30
3,DuSable Museum,-87.607132,41.791985,Museum,100
4,Whole Foods,-87.587949,41.801978,Supermarket,50
5,Hyde Park Produce,-87.595524,41.799942,Supermarket,35
6,Jewel Osco,-87.607225,41.78458,Supermarket,70


Specify travel mode, variable names, and file locations:

In [None]:
#asymmetric matrix, different source and destination files
matrix = TransitMatrix(network_type='walk',
                       primary_hints={'idx' : 'geoid10', 'population': 'skip', 'lat': 'lat', 'lon': 'lon'},
                       secondary_hints={'idx': 'name', 'capacity': 'skip', 'category': 'category', 'lat': 'lat', 'lon': 'lon'},
                       primary_input='./data/input_data/sources/hyde_park_tracts.csv',
                       secondary_input='./data/input_data/destinations/hyde_park_dests.csv')


Get the travel times by querying OpenStreetMap data for the spatial extent of your source and destination coordinates:

In [None]:
matrix.process()

Save the travel time matrix in csv and/or tmx format (running access metrics with tmx is faster):

In [None]:
matrix.write_csv('./data/output_data/matrices/simple_demo_matrix.csv')

In [None]:
matrix.write_tmx('./data/output_data/matrices/simple_demo_matrix.tmx')

## Access Metrics (Attributes of the Origin File)

Next, the travel time matrix serves as the input for the calculation of several spatial access metrics. We first calculate spatial access measures that are attributes of the point of origin (12 tract centroids). After that, we calculate so-called coverage metrics that are attributes of the destination points (7 amenities).

In [10]:
from spatial_access.Models import *

### Access Model

The first line of code defines the Access Model using the previously generated matrix of travel times from above.   
If you specify **`transit_matrix_filename=None`**, the matrix will be estimated on the fly.  

The Access Model generates an access score to measure how accessible a location is to multiple amenities within a given travel time (e.g. 20 minutes walking). You can specify three types of weights for this score: 

1) **distance decay** where closer amenities have more weight (default = linear)  
2) **relative importance of an amenity type** (e.g. with a greater weight for supermarkets than museums)  
3) **penalty for same types** (where more of the same type of amenity gets less weight). 

You can estimate the score with or without normalization.  
The AccessModel does not require population or target variables.


---

Specify travel mode, file names, variable names and the distance decay function:

In [None]:
access = AccessModel(network_type='walk',
                     transit_matrix_filename='./data/output_data/matrices/simple_demo_matrix.csv',
                     sources_filename='./data/input_data/sources/hyde_park_tracts.csv',
                     destinations_filename='./data/input_data/destinations/hyde_park_dests.csv',
                     source_column_names={'idx' : 'geoid10', 'population': 'skip', 'lat': 'lat', 'lon': 'lon'},
                     dest_column_names={'idx': 'name', 'capacity': 'skip', 'category': 'category', 'lat': 'lat', 'lon': 'lon'},
                     decay_function = 'linear')

Specify the weights for relative importance and same types:

In [12]:
category_dict = {
    "Museum": [5, 5, 3],
    "Restaurant": [10, 10],
    "Supermarket": [10, 7, 5]
}

Specify the travel time threshold in seconds (e.g. 1,800 seconds = 30 minutes), whether or not to normalize the score, and the importance/variety weights:

In [None]:
access.calculate(upper_threshold=1800,
                 normalize=False,
                 category_weight_dict=category_dict)

View the first 5 records of the access score results by category:

In [17]:
access.model_results.head()

Unnamed: 0,all_categories_score,Museum_score,Supermarket_score,Restaurant_score
17031836300,19.318333,2.108333,9.826667,7.383333
17031836200,22.553333,4.983333,8.342222,9.227778
17031410100,18.303889,2.011111,8.398333,7.894444
17031410200,25.826667,1.647222,12.301667,11.877778
17031410500,21.282778,3.519444,9.713333,8.05


In [18]:
access.model_results.to_csv('./data/output_data/models/simple_demo_accessMod.csv')

### Access Time: Time to closest destination

Next, you will calculate the time it takes to reach the closest destination for each point of origin.  
As before, you define the Access Time model using the sources and destinations csv.  
AccessTime does not require population or target variables.

In [None]:
accessT = AccessTime(network_type='walk',
                     transit_matrix_filename='./data/output_data/matrices/simple_demo_matrix.csv',
                     sources_filename='./data/input_data/sources/hyde_park_tracts.csv',
                     destinations_filename='./data/input_data/destinations/hyde_park_dests.csv',
                     source_column_names={'idx' : 'geoid10', 'population': 'skip', 'lat': 'lat', 'lon': 'lon'},
                     dest_column_names={'idx': 'name', 'capacity': 'skip', 'category': 'category', 'lat': 'lat', 'lon': 'lon'}
                    )

In [None]:
accessT.calculate()

In [None]:
accessT.model_results.head()

In [21]:
accessT.model_results.to_csv('data/output_data/models/simple_demo_accessT.csv')

### Access Count: Number of Destinations within a Catchment Area

Access Count measures the number of destinations within a given travel time.  
In this case, the catchment area is 1,800 seconds (30 minutes) of walking from a point of origin.  
It does not require population or target variables.

In [None]:
accessC = AccessCount(network_type='walk',
                     transit_matrix_filename='./data/output_data/matrices/simple_demo_matrix.csv',
                     sources_filename='./data/input_data/sources/hyde_park_tracts.csv',
                     destinations_filename='./data/input_data/destinations/hyde_park_dests.csv',
                     source_column_names={'idx' : 'geoid10', 'population': 'skip', 'lat': 'lat', 'lon': 'lon'},
                     dest_column_names={'idx': 'name', 'capacity': 'skip', 'category': 'category', 'lat': 'lat', 'lon': 'lon'}
                     )

In [None]:
accessC.calculate(upper_threshold=1800)

In [34]:
accessC.model_results.head()

Unnamed: 0,count_in_range_Museum,count_in_range_Supermarket,count_in_range_Restaurant,count_in_range_all_categories
17031836300,1,3,2,6
17031836200,2,3,2,7
17031410100,1,2,2,5
17031410200,2,2,2,6
17031410500,2,3,2,7


In [25]:
accessC.model_results.to_csv('data/output_data/models/simple_demo_accessC.csv')

### Access Sum: The sum of an attribute of a destination within a given travel time

Access Sum sums an attribute of a destination within a catchment area, e.g. the size of supermarkets within 30 minutes walking time from a point of origin.  It requires a target variable.

In [None]:
accessS = AccessSum(network_type='walk',
                     transit_matrix_filename='data/output_data/matrices/simple_demo_matrix.csv',
                     sources_filename='data/input_data/sources/hyde_park_tracts.csv',
                     destinations_filename='data/input_data/destinations/hyde_park_dests.csv',
                     source_column_names={'idx' : 'geoid10', 'population': 'skip', 'lat': 'lat', 'lon': 'lon'},
                     dest_column_names={'idx': 'name', 'capacity': 'target', 'category': 'category', 'lat': 'lat', 'lon': 'lon'}
                   )

In [None]:
accessS.calculate(upper_threshold=1800)

In [42]:
accessS.model_results.head()

Unnamed: 0,sum_in_range_Museum,sum_in_range_Supermarket,sum_in_range_Restaurant,sum_in_range_all_categories
17031836300,100,155,80,335
17031836200,500,155,80,735
17031410100,400,85,80,565
17031410200,500,85,80,665
17031410500,500,155,80,735


In [40]:
accessS.model_results.to_csv('./data/output_data/simple_demo_accessS.csv')

### Destination Sum: Sum of a provider charactistic by area

**Destination Sum** sums an attribute of a destination within a geographic boundary. It also generates this result per capita within these boundaries.  
This so-called container approach differs from Access Sum in that it sums point attributes within areas without relying on travel times. It requires population and target variables.

In [None]:
d_sum = DestSum(network_type='walk',
                sources_filename='d./ata/input_data/sources/hyde_park_tracts.csv',
                destinations_filename='./data/input_data/destinations/hyde_park_dests.csv',
                source_column_names={'idx' : 'geoid10', 'population': 'skip', 'lat': 'lat', 'lon': 'lon'},
                dest_column_names={'idx': 'name', 'capacity': 'target', 'category': 'category', 'lat': 'lat', 'lon': 'lon'}
               )

In [95]:
d_sum.calculate()

Unnamed: 0_level_0,Museum,Supermarket,Restaurant,all_categories,Museum_per_capita,Supermarket_per_capita,Restaurant_per_capita,all_categories_per_capita
spatial_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
HYDE PARK,400.0,85.0,80.0,565.0,44.444444,9.444444,8.888889,62.777778
WASHINGTON PARK,100.0,0.0,0.0,100.0,,,,
WOODLAWN,0.0,70.0,0.0,70.0,,,,


In [45]:
d_sum.aggregated_results.head()

Unnamed: 0_level_0,Museum,Supermarket,Restaurant,all_categories,Museum_per_capita,Supermarket_per_capita,Restaurant_per_capita,all_categories_per_capita
spatial_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
HYDE PARK,400.0,85.0,80.0,565.0,0.017291,0.003674,0.003458,0.024423
WASHINGTON PARK,100.0,0.0,0.0,100.0,,,,
WOODLAWN,0.0,70.0,0.0,70.0,,,,


In [37]:
d_sum.aggregated_results.to_csv('./data/output_data/simple_demo_destsum.csv')

## Coverage Metrics (Attributes of Destinations)

The metrics above were attributes of the origin points, i.e. they considered spatial access from the perspective of someone accessing amenities. In contrast, the following metrics are attributes of the destination, i.e. they consider spatial access from the perspective of the service provider. In addition to a capacity field, these metrics also require a population variable.

### Coverage

Coverage adds two variables to the destination file: The number of people within the catchment area of a provider and a provider attribute divided by this nearby population count. E.g. you can use this to calculate the funding amount a service provider receives per people within the catchment area of the provider (such as 30 minutes walking time to the provider).

In [None]:
cov = Coverage(network_type='walk',
               transit_matrix_filename='./data/output_data/matrices/simple_demo_matrix.csv',
               sources_filename='./data/input_data/sources/hyde_park_tracts.csv',
               destinations_filename='./data/input_data/destinations/hyde_park_dests.csv',
               source_column_names={'idx' : 'geoid10', 'population': 'Pop2014', 'lat': 'lat', 'lon': 'lon'},
               dest_column_names={'idx': 'name', 'capacity': 'target', 'category': 'category', 'lat': 'lat', 'lon': 'lon'}
              )

In [47]:
#Note that the capacity field is not real but only for demo purposes
cov.calculate(upper_threshold=1800)

Unnamed: 0,service_pop,percap_spending,category
Museum of Science and Industry,24861,0.016089,Museum
DuSable Museum,23134,0.004323,Museum
Whole Foods,31326,0.001596,Supermarket
Hyde Park Produce,31326,0.001117,Supermarket
Jewel Osco,16726,0.004185,Supermarket
Medici,31326,0.001596,Restaurant
Valois,31326,0.000958,Restaurant


In [16]:
cov.model_results.to_csv('./data/output_data/models/simple_demo_cov.csv')

### Two-Stage Floating Catchment Area (TSFCA)

TSFCA Models are a type of gravity model popularized by Luo and Wang in 2003 to estimate spatial access gaps to primary care. They are calculated in two stages (using the primary care example): In a first stage, the ratio of doctors to the nearby population is calculated for every provider. In the 2nd stage, these ratios are summed for every point of origin (such as a tract centroid) within a travel threshold. In other words, the ratio of doctors to people is first calculated for the catchment areas of doctors (1st stage) and then summed for the catchment areas around a home or work location (2nd stage). The field names below are for a case that calculates per capita spending.

In [None]:
tsfca = TSFCA(network_type='walk',
              transit_matrix_filename='./data/output_data/matrices/simple_demo_matrix.csv',
              sources_filename='./data/input_data/sources/hyde_park_tracts.csv',
              destinations_filename='./data/input_data/destinations/hyde_park_dests.csv',
              source_column_names={'idx' : 'geoid10', 'population': 'Pop2014', 'lat': 'lat', 'lon': 'lon'},
              dest_column_names={'idx': 'name', 'capacity': 'target', 'category': 'category', 'lat': 'lat', 'lon': 'lon'}
             )

In [49]:
#Note that the capacity field is not real but only for demo purposes
tsfca.calculate(upper_threshold=1800)

Unnamed: 0,percap_spend_Museum,percap_spend_Supermarket,percap_spend_Restaurant,percap_spend_all_categories
17031836300,0.004323,0.006899,0.002554,0.013775
17031836200,0.020412,0.006899,0.002554,0.029864
17031410100,0.016089,0.002713,0.002554,0.021357
17031410200,0.020412,0.002713,0.002554,0.025679
17031410500,0.020412,0.006899,0.002554,0.029864
17031410600,0.020412,0.006899,0.002554,0.029864
17031411100,0.020412,0.006899,0.002554,0.029864
17031410700,0.020412,0.002713,0.002554,0.025679
17031410800,0.020412,0.002713,0.002554,0.025679
17031410900,0.016089,0.002713,0.002554,0.021357


In [11]:
tsfca.model_results.to_csv('./data/output_data/models/simple_demo_tsfca.csv')