# OD Flow Analysis

This notebook will use the [PSRC HTS](https://www.psrc.org/our-work/household-travel-survey-program) to create an OD matrix for the census tract level.

Flows will be visualized.

Trip-level data is used ([download here](https://household-travel-survey-psregcncl.hub.arcgis.com/datasets/22d91ae217be41f58ebac0844ac5d60d_0/explore))

In [43]:
# libraries
import numpy as np
import pandas as pd
import censusdata

In [11]:
# read in data
trips_df = pd.read_csv("../land-use-travel-patterns/data/Household_Travel_Survey_Trips.csv")

  trips_df = pd.read_csv("../land-use-travel-patterns/data/Household_Travel_Survey_Trips.csv")


## Determine census tracts in PSRC extent (King, Kitsap, Pierce, Snohomish)

In [41]:
# borrowing code from labs...thanks eric :D

def get_census_data(tables, state, county, year=2019):

    # Download the data
    data = censusdata.download('acs5', year,  # Use 2019 ACS 5-year estimates
                               censusdata.censusgeo([('state', state), ('county', county), ('tract', '*')]),
                               list(tables.keys()))

    # Rename the column
    data.rename(columns=tables, inplace=True)

    # Extract information from the first column
    data['Name'] = data.index.to_series().apply(lambda x: x.name)
    data['SummaryLevel'] = data.index.to_series().apply(lambda x: x.sumlevel())
    data['State'] = data.index.to_series().apply(lambda x: x.geo[0][1])
    data['County'] = data.index.to_series().apply(lambda x: x.geo[1][1])
    data['Tract'] = data.index.to_series().apply(lambda x: x.geo[2][1])
    data.reset_index(drop=True, inplace=True)
    data = data[['Tract','Name']+list(tables.values())].set_index('Tract')
    return data

def get_census_tract_geom(state_fips, county_fips):

    # find state and county fips here: https://www.census.gov/geographies/reference-files/2017/demo/popest/2017-fips.html
    
    # Download the census tract shapefiles
    tracts = gpd.read_file(f'https://www2.census.gov/geo/tiger/TIGER2019/TRACT/tl_2019_{state_fips}_tract.zip')

    # set index as tract
    tracts = tracts.rename(columns={'TRACTCE':'Tract'}).set_index('Tract')

    # Filter to only King County
    tracts = tracts[tracts['COUNTYFP'] == county_fips]
    tracts = tracts[['geometry']]

    return tracts

## Create OD Matrix

- consider using 1 year of data, then applying weights?

In [16]:
# relevant columns
trips_df_tracts = trips_df[["trip_id", "o_tract10", "d_tract10"]]

In [37]:
# groupby for origin trips, destination trips
origin_counts = trips_df_tracts.groupby("o_tract10").count()["trip_id"].reset_index().rename(columns={'trip_id':'o_count', 'o_tract10':'tractid'})
dest_counts = trips_df_tracts.groupby("d_tract10").count()["trip_id"].reset_index().rename(columns={'trip_id':'d_count', 'd_tract10':'tractid'})

In [39]:
# join on tractid. we want to preserve all tracts possible -- ok if OD matrix has 0 flows --> use outer join
od_counts = origin_counts.merge(dest_counts, how='outer', on='tractid')

In [40]:
od_counts

Unnamed: 0,tractid,o_count,d_count
0,5.303300e+10,359,362
1,5.303300e+10,166,164
2,5.303300e+10,48,48
3,5.303300e+10,708,711
4,5.303300e+10,119,117
...,...,...,...
769,5.306105e+10,10,10
770,5.306105e+10,8,9
771,5.306194e+10,92,91
772,5.306194e+10,1,1
