# Equity Analysis at the TAZ Level
This Jupyter notebook takes AequilibraE runs (with and without resilience investment) and outputs an HTML file that reports changes in metrics by equity category. 

The default assumption is that the user will run the equity overlay analysis (`run_equity_overlay.bat` file in C:\GitHub\RDR\helper_tools\equity_analysis) as a first step, and then use the output from that as an input to this TAZ metrics analysis. However, the user may also directly provide data in a CSV file assigning an equity variable value to each TAZ from another source, rather than running the equity overlay analysis. If providing other data, the equity data must be numeric and either binary or ordinal – data in some other form will produce unexpected results. Future versions of this tool may be made flexible to accept more diverse types of equity data. In order to present a distilled summary of results by equity category, the tool assumes that the data are ordinal and therefore assigns an equity category to each TAZ pair based on the equity category value for either the origin TAZ or the destination TAZ (whichever is higher). It then groups all TAZ pairs by equity category and summarizes respective results for each category. Consult the RDR User Guide in C:\GitHub\RDR\documentation for more information on how to use and understand this tool.

The purpose is to help the user examine and understand differential impacts of (A) a resilience investment intended to mitigate effects of (B) a disruption, comparing different equity categories of interest. The analysis displays variables to help illuminate the following questions from various angles.

Questions driving this analysis include:
- What is the baseline magnitude of trips/hours/miles for each equity category?
- How relevant is the disruption for each equity category?
- What is the projected impact of the resilience investment overall and for each equity category, i.e., are the benefits equitably distributed?

The `equity_metrics.config` configuration file allows the user to specify the following:
- `path_to_RDR_config_file` – This should identify the location of the configuration file pertinent to the existing RDR Metamodel run and corresponding AequilibraE runs that will be used for this equity analysis. The analysis will use this configuration file to identify where to access the OMX files from those runs.
- `resil` - Resilience project.
- `hazard`, `recovery`, `socio`, `proj_group`, `elasticity`, `run_type` - Aequilibrae scenario dimensions.
- *Note*: As described above, the default assumption is that the user will use the equity overlay analysis first, and then use the output from that as an input to this TAZ metrics analysis. If the user will instead directly provide the equity data then the user should update the `equity_metrics.config` file to specify the name of the user-provided file in `output_name` (without the CSV file extension). The equity data must be numeric and either binary or ordinal.

## Questions and corresponding variables
### Question 1: What is the baseline magnitude of trips/hours/miles for each equity category?
- Variables 1T/1H/1M: Overall sum of trips/hours/miles for each category absent disruption
    - `trips_base`
    - `hours_base`
    - `miles_base`
    
### Question 2: How relevant is the disruption for each equity category?
- Variables 2aT/2aH/2aM: Percent change from baseline trips/hours/miles due to disruption (without resilience investment)
    - `trips_percent_change_noresil`
    - `hours_percent_change_noresil`
    - `miles_percent_change_noresil` 
- Variables 2bT/2bH/2bM: Percent of TAZ pairs with a change in trips/hours/miles due to disruption (without resilience investment)
    - `trips_percent_pairs_relevant`
    - `hours_percent_pairs_relevant`
    - `miles_percent_pairs_relevant`
    
### Question 3: What is the projected impact of the resilience investment for each equity category?
- Variables 3aT/3aH/3aM: Overall impact of resilience investment in trips/hours/miles (i.e., trips/hours/miles in the "resilience" case minus trips/hours/miles in the "no resilience" case)
    - `trips_delta_absolute`
    - `hours_delta_absolute`
    - `miles_delta_absolute`
- Variables 3bT/3bH/3bM: Same as the above set, except divided by the trips/hours/miles in the "no resilience" case and multiplied by 100 to show percent change relative to "no resilience" case
    - `trips_delta_relative`
    - `hours_delta_relative`
    - `miles_delta_relative`
- Variables 3cT/3cH/3cM: Average difference in trips/hours/miles due to resilience investment for all __relevant__ TAZ pairs (i.e., among the subset of TAZ pairs where there was a disruption impact in the "no resilience" case)
    - `trips_mean_delta_for_relevant_pairs`
    - `hours_mean_delta_for_relevant_pairs`
    - `miles_mean_delta_for_relevant_pairs`
- Variables 3dT/3dH/3dM: Average difference in trips/hours/miles due to resilience investment for all TAZ pairs __with non-zero delta due to resilience__ (i.e., among the even smaller subset of TAZ pairs where the "resilience" case was different from the "no resilience" case)
    - `trips_mean_delta_for_pairs_with_non-zero_delta`
    - `hours_mean_delta_for_pairs_with_non-zero_delta`
    - `miles_mean_delta_for_pairs_with_non-zero_delta`

In [None]:
# Import statements
import os
import sys
import numpy as np
import pandas as pd
import openmatrix as omx
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import equity_config_reader

sys.path.insert(0, os.path.abspath('../../metamodel_py'))
import rdr_setup

In [None]:
# The equity configuration filepath is now passed into this notebook from its parent, TAZ_metrics.py, via the temporary text file,
# assuming this notebook and TAZ_metrics.py are both in the same folder.
with open('temp.txt', 'r') as f:
    config_filepath = f.read()
equity_cfg = equity_config_reader.read_equity_config_file(config_filepath)

rdr_cfg_path = equity_cfg['path_to_RDR_config_file']

# Directory of equity helper tool files
equity_dir = equity_cfg['equity_analysis_dir']

cfg = rdr_setup.read_config_file(rdr_cfg_path)

RDR_run_id = cfg['run_id']

# Name of equity variable
category_name = equity_cfg['equity_feature']

# Name of CSV file with equity category value for each TAZ (either output from run_equity_overlay.bat OR user-provided)
category_filename = equity_cfg['output_name']

# Look to see if the equity overlay data exists
if not os.path.exists(os.path.join(equity_dir, category_filename + '.csv')):
    print('{}.csv not found in {}. Please run the equity_overlay first or generate your own file and specify the filename for it as output_name in the equity_metrics.config file'.format(category_filename, equity_dir))

In [None]:
# Utility method for reading OMX files
def readOMX(filename, selectedMatrix, debug_mode):
    f = omx.open_file(filename)
    matrix_size = f.shape()
    if debug_mode:
        print('Shape: ', f.shape())
        print('Number of tables: ', len(f))
        print('Table names: ', f.list_matrices())
        print('Attributes: ', f.list_all_attributes())
    omx_df = f[selectedMatrix]
    if debug_mode:
        print('Sum of matrix elements: ', '{:.9}'.format(np.sum(omx_df)))
        print('Percentiles: ', np.percentile(omx_df, (1, 10, 30, 50, 70, 90, 99)))
        print('Maximum: ', np.amax(omx_df))
    return omx_df, matrix_size, f

In [None]:
# Define inputs for comparison - these come from equity_cfg, the equity_metrics.config file
resil = equity_cfg['resil']
baseline = equity_cfg['baseline']
hazard = equity_cfg['hazard']
recovery = equity_cfg['recovery']
socio = equity_cfg['socio']
projgroup = equity_cfg['projgroup']
elasticity = equity_cfg['elasticity']
elasname = str(int(10 * -elasticity))
run_type = equity_cfg['run_type']
largeval = int(equity_cfg['largeval'])

if run_type == 'SP':
    hours_name = 'free_flow_time'
    miles_name = 'distance'
else:
    hours_name = 'time_final'
    miles_name = 'distance_blended'

In [None]:
# Location of the OMX files for "base" and for "disruption with resilience investment"
omx_file_path = os.path.join(equity_dir, "aeq_runs", "disrupt", RDR_run_id,
                             socio + projgroup + '_' + resil + '_' + elasname + '_' + hazard + '_' + recovery,
                             "matrix", "matrices")

# Location of the OMX files for "disruption WITHOUT resilience investment"
omx_file_path_noresil = os.path.join(equity_dir, "aeq_runs", "disrupt", RDR_run_id,
                                     socio + projgroup + '_' + baseline + '_' + elasname + '_' + hazard + '_' + recovery,
                                     "matrix", "matrices")

# Read the base OMX trip table
base_matrix_filename = os.path.join(omx_file_path, 'base_demand_summed.omx')
base_dem, matrix_size, base_trip_omx_file = readOMX(base_matrix_filename, 'matrix', 0)
df_base_trips = pd.DataFrame(data=base_dem)

# Read the new OMX trip table in the disruption with resilience case
newdisruptresil_matrix_filename = os.path.join(omx_file_path, 'new_demand_summed.omx')
newdisruptresil_dem, matrix_size, newdisruptresil_trip_omx_file = readOMX(newdisruptresil_matrix_filename, 'matrix', 0)
df_resil_trips = pd.DataFrame(data=newdisruptresil_dem)

# Read the new OMX trip table in the disruption WITHOUT resilience case
newdisruptNOresil_matrix_filename = os.path.join(omx_file_path_noresil, 'new_demand_summed.omx')
newdisruptNOresil_dem, matrix_size, newdisruptNOresil_trip_omx_file = readOMX(newdisruptNOresil_matrix_filename, 'matrix', 0)
df_NOresil_trips = pd.DataFrame(data=newdisruptNOresil_dem)

In [None]:
# Create filename strings
baseskims_filename = run_type + '_' + socio + projgroup
baseskims_folder = os.path.join(omx_file_path, baseskims_filename + '.omx')
disruptskims_noresil_filename = run_type + '_disrupt_' + socio + projgroup + '_' + baseline + '_' + elasname + '_' + hazard + '_' + recovery
disruptskims_noresil_folder = os.path.join(omx_file_path_noresil, disruptskims_noresil_filename + '.omx')
disruptskims_resil_filename = run_type + '_disrupt_' + socio + projgroup + '_' + resil + '_' + elasname + '_' + hazard + '_' + recovery
disruptskims_resil_folder = os.path.join(omx_file_path, disruptskims_resil_filename + '.omx')

In [None]:
# Read the base skims OMX
base_hours, matrix_size, base_skims_omx_file = readOMX(baseskims_folder, hours_name, 0)
df_base_hours = pd.DataFrame(data=base_hours)
base_miles, matrix_size, base_skims_omx_file = readOMX(baseskims_folder, miles_name, 0)
df_base_miles = pd.DataFrame(data=base_miles)

In [None]:
# Base times and distances by origin TAZ
# Convert O-D matrix to tall table indexed by origin and destination TAZ
bool_base_hours = df_base_hours < largeval
a = np.repeat(bool_base_hours.columns, len(bool_base_hours.index))
b = np.tile(bool_base_hours.index, len(bool_base_hours.columns))

# Sums demand where <largeval
base_cumtripcount = (df_base_trips.where(bool_base_hours, other=0))
base_cumtime = (base_cumtripcount*df_base_hours)/60
base_cumdist = (base_cumtripcount*df_base_miles)
c1 = base_cumtripcount.values.ravel()
c2 = base_cumtime.values.ravel()
c3 = base_cumdist.values.ravel()
base_df = pd.DataFrame({'from':a, 'to':b, 'trips':c1, 'hours':c2, 'miles':c3})

In [None]:
# Read the disrupt skims OMX - no resilience project
disrupt_noresil_hours, matrix_size, disrupt_noresil_skims_omx_file = readOMX(disruptskims_noresil_folder, hours_name, 0)
df_disrupt_noresil_hours = pd.DataFrame(data=disrupt_noresil_hours)
disrupt_noresil_miles, matrix_size, disrupt_noresil_skims_omx_file = readOMX(disruptskims_noresil_folder, miles_name, 0)
df_disrupt_noresil_miles = pd.DataFrame(data=disrupt_noresil_miles)

In [None]:
# Disrupt times and distances for no resilience project by origin TAZ
# Convert O-D matrix to tall table indexed by origin and destination TAZ
bool_disrupt_noresil_hours = df_disrupt_noresil_hours < largeval
a = np.repeat(bool_disrupt_noresil_hours.columns, len(bool_disrupt_noresil_hours.index))
b = np.tile(bool_disrupt_noresil_hours.index, len(bool_disrupt_noresil_hours.columns))

# Sums demand where <largeval
disrupt_noresil_cumtripcount = (df_NOresil_trips.where(bool_disrupt_noresil_hours, other=0))
disrupt_noresil_cumtime = (disrupt_noresil_cumtripcount*df_disrupt_noresil_hours)/60
disrupt_noresil_cumdist = (disrupt_noresil_cumtripcount*df_disrupt_noresil_miles)
c1 = disrupt_noresil_cumtripcount.values.ravel()
c2 = disrupt_noresil_cumtime.values.ravel()
c3 = disrupt_noresil_cumdist.values.ravel()
disrupt_noresil_df = pd.DataFrame({'from':a, 'to':b, 'trips':c1, 'hours':c2, 'miles':c3})

In [None]:
# Read the disrupt skims OMX - with resilience project
disrupt_resil_hours, matrix_size, disrupt_resil_skims_omx_file = readOMX(disruptskims_resil_folder, hours_name, 0)
df_disrupt_resil_hours = pd.DataFrame(data=disrupt_resil_hours)
disrupt_resil_miles, matrix_size, disrupt_resil_skims_omx_file = readOMX(disruptskims_resil_folder, miles_name, 0)
df_disrupt_resil_miles = pd.DataFrame(data=disrupt_resil_miles)

In [None]:
# Disrupt times and distances for resilience project by origin TAZ
# Convert O-D matrix to tall table indexed by origin and destination TAZ
bool_disrupt_resil_hours = df_disrupt_resil_hours < largeval
a = np.repeat(bool_disrupt_resil_hours.columns, len(bool_disrupt_resil_hours.index))
b = np.tile(bool_disrupt_resil_hours.index, len(bool_disrupt_resil_hours.columns))

# Sums demand where <largeval
disrupt_resil_cumtripcount = (df_resil_trips.where(bool_disrupt_resil_hours, other=0))
disrupt_resil_cumtime = (disrupt_resil_cumtripcount*df_disrupt_resil_hours)/60
disrupt_resil_cumdist = (disrupt_resil_cumtripcount*df_disrupt_resil_miles)
c1 = disrupt_resil_cumtripcount.values.ravel()
c2 = disrupt_resil_cumtime.values.ravel()
c3 = disrupt_resil_cumdist.values.ravel()
disrupt_resil_df = pd.DataFrame({'from':a, 'to':b, 'trips':c1, 'hours':c2, 'miles':c3})

In [None]:
# Create data frame of skim results
merged_df = pd.merge(base_df, disrupt_noresil_df, how='inner', on=['from', 'to'], suffixes=("_base", None))
taz_pair_skims = pd.merge(merged_df, disrupt_resil_df, how='inner', on=['from', 'to'], suffixes=("_disrupt_noresil", "_disrupt_resil"))

In [None]:
# Read in equity category label by TAZ
taz_equity = pd.read_csv(os.path.join(equity_dir, category_filename + '.csv'),
                         usecols=['TAZ', category_name],
                         converters={'TAZ': int, category_name: float})

In [None]:
# Join by from TAZ and to TAZ
taz_stats = taz_pair_skims.merge(taz_equity, how='left', left_on='from', right_on='TAZ').merge(taz_equity, how='left', left_on='to', right_on='TAZ', suffixes=('_from', '_to'))

# Add label to show the highest value for equity indicator for each TAZ pair, regardless of whether that value is for the origin or destination.
# Assign the equity indicator value from the origin or destination (whichever is higher).
# If both origin and destination have a value that is not a number (nan), assign 0. For example, 
# this would be true if the origin and destination are both nodes that do not exist in the TAZ file (external).
# This approach relies on the user providing equity data that are numeric, and it assumes that the data are ordinal.
conditiona = (pd.isna(taz_stats[category_name + '_from'])) & (pd.isna(taz_stats[category_name + '_to']))
conditionb = (pd.isna(taz_stats[category_name + '_to']))
conditionc = (pd.isna(taz_stats[category_name + '_from']))
conditiond = (taz_stats[category_name + '_from'] >= taz_stats[category_name + '_to'])
conditione = (taz_stats[category_name + '_from'] < taz_stats[category_name + '_to'])
taz_stats[category_name + '_high_value'] = (
    np.select(
        condlist=[conditiona,conditionb,conditionc,conditiond,conditione],
        choicelist=[float(0),
                    taz_stats[category_name + '_from'].astype(float),
                    taz_stats[category_name + '_to'].astype(float),
                    taz_stats[category_name + '_from'].astype(float),
                    taz_stats[category_name + '_to'].astype(float)],
        default=0))

# Replace NaN values with 'external'. These are for nodes which do not exist in the TAZ file, and therefore do not have any equity attributes. 
# They are nodes which are outside the MPO boundaries and are needed for travel demand modeling purposes, but do not have shapes associated with them. 
# They are not omitted because the totals for hours, miles, and trips should be the same at the MPO level as what is reported to users.
taz_stats[['TAZ_from', category_name + '_from','TAZ_to', category_name + '_to']] = taz_stats[['TAZ_from', category_name + '_from','TAZ_to', category_name + '_to']].fillna('external')

# Add 'O-D_categorylabel' to taz_stats data frame to compile the equity category of the "to" and "from" TAZ in one attribute
taz_stats['O-D_categorylabel'] = taz_stats[category_name + '_from'].astype(str) + " to " + taz_stats[category_name + '_to'].astype(str)

# Calculate relative change in trips/hours/miles for each 
taz_stats['trips_delta'] = (taz_stats['trips_disrupt_resil'] - taz_stats['trips_disrupt_noresil'])
taz_stats['hours_delta'] = (taz_stats['hours_disrupt_resil'] - taz_stats['hours_disrupt_noresil'])
taz_stats['miles_delta'] = (taz_stats['miles_disrupt_resil'] - taz_stats['miles_disrupt_noresil'])

In [None]:
# Create three variables to flag whether the disruption is relevant for the TAZ pair (for trips/miles/hours)
taz_stats['trips_disruption_relevant'] = taz_stats['trips_base'] != taz_stats['trips_disrupt_noresil']
taz_stats['hours_disruption_relevant'] = taz_stats['hours_base'] != taz_stats['hours_disrupt_noresil']
taz_stats['miles_disruption_relevant'] = taz_stats['miles_base'] != taz_stats['miles_disrupt_noresil']

# Create three variables to capture the nature of impact for each TAZ pair (for trips/miles/hours), 
# with and without resilience investment

# for trips
condition1 = (taz_stats['trips_disrupt_noresil'] == taz_stats['trips_base'])
condition2 = (taz_stats['trips_disrupt_resil'] == taz_stats['trips_disrupt_noresil'])
condition3 = (taz_stats['trips_disrupt_noresil'] != None)&(taz_stats['trips_disrupt_resil'] != None)
taz_stats['trips_delta_category'] = (
    np.select(
        condlist=[condition1,condition2,condition3], 
        choicelist=["no_change",
                   "same_change",
                   "different_change"], 
        default="null_value"))

# for miles
condition1 = (taz_stats['miles_disrupt_noresil'] == taz_stats['miles_base'])
condition2 = (taz_stats['miles_disrupt_resil'] == taz_stats['miles_disrupt_noresil'])
condition3 = (taz_stats['miles_disrupt_noresil'] != None)&(taz_stats['miles_disrupt_resil'] != None)
taz_stats['miles_delta_category'] = (
    np.select(
        condlist=[condition1,condition2,condition3], 
        choicelist=["no_change",
                   "same_change",
                   "different_change"], 
        default="null_value"))

# for hours
condition1 = (taz_stats['hours_disrupt_noresil'] == taz_stats['hours_base'])
condition2 = (taz_stats['hours_disrupt_resil'] == taz_stats['hours_disrupt_noresil'])
condition3 = (taz_stats['hours_disrupt_noresil'] != None)&(taz_stats['hours_disrupt_resil'] != None)
taz_stats['hours_delta_category'] = (
    np.select(
        condlist=[condition1,condition2,condition3], 
        choicelist=["no_change",
                   "same_change",
                   "different_change"], 
        default="null_value"))

In [None]:
# Create a function to produce a summary table given a metric of interest (i.e., trips, miles, or hours) 
# and an index of interest (i.e., some grouping of TAZ pairs based on equity category of origin/destination).

# First create helper function for use in the main function
def countnonzeros(x):
    return x.astype(bool).sum(axis=0)

# Now create main function
def createsummary(index,metric):
    # Aggregate base metric for all three categories (Variables 1T/1H/1M)
    category_base = pd.pivot_table(taz_stats, index=index, values=taz_stats.columns.to_list(),
                                    aggfunc={metric+'_base':np.sum}, 
                                    fill_value=0)
    # Create variables 2aT/2aH/2aM: Percent change from baseline trips/hours/miles due to disruption (without resilience investment) and merge with the prior into one dataframe
    Q_TwoA = pd.pivot_table(taz_stats, index=index, values=taz_stats.columns.to_list(),
                                    aggfunc={metric+'_disrupt_noresil':np.sum}, 
                                    fill_value=0)
    category_stats = pd.merge(category_base,Q_TwoA,on=None,left_index=True, right_index=True)
    category_stats[metric+'_percent_change_noresil'] = ((category_stats[metric+'_disrupt_noresil'] - category_stats[metric+'_base'])*100)/category_stats[metric+'_base']

    # Create variables 2bT/2bH/2bM: Percent of TAZ pairs with a change in trips/hours/miles due to disruption (without resilience investment)
    Q_TwoB = pd.pivot_table(taz_stats, index=index, values=taz_stats.columns.to_list(),
                                    aggfunc={metric+'_disruption_relevant':[countnonzeros,len]}, 
                                    fill_value=0)
    # Flatten multi index hierarchy in column headers
    Q_TwoB = pd.DataFrame(Q_TwoB.to_records())
    Q_TwoB[metric+'_percent_pairs_relevant'] = (Q_TwoB["('"+metric+"_disruption_relevant', 'countnonzeros')"]*100)/Q_TwoB["('"+metric+"_disruption_relevant', 'len')"]
    # Merge into category_stats dataframe
    category_stats = pd.merge(category_stats,Q_TwoB,on=category_name + '_high_value')

    # Create variables 3aT/3aH/3aM: Overall impact of resilience investment in trips/hours/miles (i.e., trips/hours/miles in the "resilience" case minus trips/hours/miles in the "no resilience" case)
    Q_ThreeA = pd.pivot_table(taz_stats, index=index, values=taz_stats.columns.to_list(),
                                    aggfunc={metric+'_disrupt_resil':np.sum}, 
                                    fill_value=0)
    category_stats = pd.merge(category_stats,Q_ThreeA,on=category_name + '_high_value')
    category_stats[metric+"_delta_absolute"] = category_stats[metric+'_disrupt_resil'] - category_stats[metric+'_disrupt_noresil']
    # Create variables 3bT/3bH/3bM: Same as the above set, except divided by the trips/hours/miles in the "no resilience" case and multiplied by 100 to show percent change relative to "no resilience" case
    category_stats[metric+'_delta_relative'] = (category_stats[metric+'_delta_absolute']*100)/category_stats[metric+'_disrupt_noresil']

    # Create variables 3cT/3cH/3cM: Average difference in trips/hours/miles due to resilience investment for all relevant TAZ pairs (i.e., among the subset of TAZ pairs where there was a disruption impact in the "no resilience" case)
    # First filter for relevant TAZ pairs
    relevant_filter = taz_stats[metric+'_disrupt_noresil'] != taz_stats[metric+'_base'] 
    relevant_set = taz_stats[relevant_filter]
    Q_ThreeC = pd.pivot_table(relevant_set, index=index, values=relevant_set.columns.to_list(),
                                    aggfunc={metric+'_delta':np.mean}, 
                                    fill_value=0)
    Q_ThreeC.rename(columns = {metric+'_delta':metric+'_mean_delta_for_relevant_pairs'}, inplace = True)
    if metric+'_mean_delta_for_relevant_pairs' in Q_ThreeC.columns.to_list():
        category_stats = pd.merge(category_stats,Q_ThreeC,on=category_name + '_high_value')

    # Create variables 3dT/3dH/3dM: Average difference in trips/hours/miles due to resilience investment for all TAZ pairs with non-zero delta due to resilience (i.e., among the even smaller subset of TAZ pairs where the "resilience" case was different from the "no resilience" case)
    nonzerodelta_filter = taz_stats[metric+'_disrupt_noresil'] != taz_stats[metric+'_disrupt_resil']
    nonzerodelta_set = taz_stats[nonzerodelta_filter]
    Q_ThreeD = pd.pivot_table(nonzerodelta_set, index=index, values=nonzerodelta_set.columns.to_list(),
                                aggfunc={metric+'_delta':np.mean}, 
                                fill_value=0)
    Q_ThreeD.rename(columns = {metric+'_delta':metric+'_mean_delta_for_pairs_with_non-zero_delta'}, inplace = True)
    if metric+'_mean_delta_for_pairs_with_non-zero_delta' in Q_ThreeD.columns.to_list():
        category_stats = pd.merge(category_stats,Q_ThreeD,on=category_name + '_high_value')

    # Subset the columns of interest
    category_stats = category_stats.filter([index,
                                            metric+'_base', 
                                            metric+'_percent_change_noresil', 
                                            metric+'_percent_pairs_relevant',
                                            metric+'_delta_absolute', 
                                            metric+'_delta_relative',
                                            metric+'_mean_delta_for_relevant_pairs', 
                                            metric+'_mean_delta_for_pairs_with_non-zero_delta'],
                                       axis=1)
    # Convert the equity category to string for better rendering in the charts that follow.
    category_stats[index] = category_stats[index].astype(str)
    return category_stats

### Summary of Indicators for Trips by Equity Category

In [None]:
# Produce distilled output table for trips
trips_summary_distilled = createsummary(category_name + '_high_value','trips')
trips_summary_distilled

### Summary of Indicators for Hours by Equity Category

In [None]:
hours_summary_distilled = createsummary(category_name + '_high_value','hours')
hours_summary_distilled

### Summary of Indicators for Miles by Equity Category

In [None]:
miles_summary_distilled = createsummary(category_name + '_high_value','miles')
miles_summary_distilled

### Question 1: What is the baseline magnitude of trips/hours/miles for each category?
- Variables 1T/1H/1M: Overall sum of trips/hours/miles for each category absent disruption
    - `trips_base`
    - `hours_base`
    - `miles_base`

In [None]:
# Create function to generate bar charts for each question of interest
def makebarcharts(variabletype,axislabel,title):
    if "trips_"+variabletype in trips_summary_distilled.columns.to_list():
        if (trips_summary_distilled['trips_'+variabletype] == 0).all() and (hours_summary_distilled['hours_'+variabletype] == 0).all() and (miles_summary_distilled['miles_'+variabletype] == 0).all():
            print("The '___{}' variable was zero for all categories for trips, hours, and miles, so no chart was produced.".format(variabletype))
        else:
            fig = make_subplots(rows=1, cols=3)
            fig.add_trace(go.Bar(
                x= trips_summary_distilled["trips_"+variabletype],
                y= trips_summary_distilled[category_name + '_high_value'],
                orientation='h'),
                row=1, col=1)
            fig.add_trace(go.Bar(
                x= hours_summary_distilled["hours_"+variabletype],
                y= hours_summary_distilled[category_name + '_high_value'],
                orientation='h'),
                row=1, col=2)
            fig.add_trace(go.Bar(
                x= miles_summary_distilled["miles_"+variabletype],
                y= miles_summary_distilled[category_name + '_high_value'],
                orientation='h'),
                row=1, col=3)

            # Update xaxis properties
            fig.update_xaxes(title_text=axislabel+"Trips", row=1, col=1)
            fig.update_xaxes(title_text=axislabel+"Hours", row=1, col=2)
            fig.update_xaxes(title_text=axislabel+"Miles", row=1, col=3)

            # Update yaxis properties
            fig.update_yaxes(title_text="TAZ Pairs Grouped by their Highest Equity Indicator Value (in Origin or Destination)", row=1, col=1)

            # Update title and height
            fig.update_layout(title_text=title, height=700, showlegend=False)

            fig.show()

In [None]:
variabletype = "base"
axislabel = ""
title = "Baseline Magnitude of Trips/Hours/Miles for Each Equity Indicator Category, Absent Disruption"
makebarcharts(variabletype=variabletype,axislabel=axislabel,title=title)

### Question 2: How relevant is the disruption for each category?
- Variables 2aT/2aH/2aM: Percent change from baseline trips/hours/miles due to disruption (without resilience investment)
    - `trips_percent_change_noresil`
    - `hours_percent_change_noresil`
    - `miles_percent_change_noresil` 
- Variables 2bT/2bH/2bM: Percent of TAZ pairs with a change in trips/hours/miles due to disruption (without resilience investment)
    - `trips_percent_pairs_relevant`
    - `hours_percent_pairs_relevant`
    - `miles_percent_pairs_relevant`

In [None]:
variabletype = "percent_change_noresil"
axislabel = "Percent Change in "
title = "Percent Change from Baseline Due to Disruption (without Resilience)"
makebarcharts(variabletype=variabletype,axislabel=axislabel,title=title)

In [None]:
variabletype = "percent_pairs_relevant"
axislabel = "Percent of Pairs: "
title = "Percent of TAZ Pairs with Potential Impacts from Disruption"
makebarcharts(variabletype=variabletype,axislabel=axislabel,title=title)

### Question 3: What is the projected impact of the resilience investment for this category?
- Variables 3aT/3aH/3aM: Overall impact of resilience investment in trips/hours/miles (i.e., trips/hours/miles in the "resilience" case minus trips/hours/miles in the "no resilience" case)
    - `trips_delta_absolute`
    - `hours_delta_absolute`
    - `miles_delta_absolute`
- Variables 3bT/3bH/3bM: Same as the above set, except divided by the trips/hours/miles in the "no resilience" case and multiplied by 100 to show percent change relative to "no resilience" case
    - `trips_delta_relative`
    - `hours_delta_relative`
    - `miles_delta_relative`
- Variables 3cT/3cH/3cM: Average difference in trips/hours/miles due to resilience investment for all __relevant__ TAZ pairs (i.e., among the subset of TAZ pairs where there was a disruption impact in the "no resilience" case)
    - `trips_mean_delta_for_relevant_pairs`
    - `hours_mean_delta_for_relevant_pairs`
    - `miles_mean_delta_for_relevant_pairs`
- Variables 3dT/3dH/3dM: Average difference in trips/hours/miles due to resilience investment for all TAZ pairs __with non-zero delta due to resilience__ (i.e., among the even smaller subset of TAZ pairs where the "resilience" case was different from the "no resilience" case)
    - `trips_mean_delta_for_pairs_with_non-zero_delta`
    - `hours_mean_delta_for_pairs_with_non-zero_delta`
    - `miles_mean_delta_for_pairs_with_non-zero_delta`

In [None]:
variabletype = "delta_absolute"
axislabel = "Change in "
title = 'Overall Impact of Resilience Investment as Compared to the "No Resilience" Case, for All TAZ Pairs'
makebarcharts(variabletype=variabletype,axislabel=axislabel,title=title)

In [None]:
variabletype = "delta_relative"
axislabel = "Percent Change in "
title = 'Relative Impact of Resilience Investment as Compared to the "No Resilience" Case, for All TAZ Pairs'
makebarcharts(variabletype=variabletype,axislabel=axislabel,title=title)

In [None]:
variabletype = "mean_delta_for_relevant_pairs"
axislabel = "Average Change in "
title = 'Average Difference Due to Resilience Investment for Relevant TAZ Pairs (Relevant = Those with Disruption Impact in the "No Resilience" Case)'
makebarcharts(variabletype=variabletype,axislabel=axislabel,title=title)

In [None]:
variabletype = "mean_delta_for_pairs_with_non-zero_delta"
axislabel = "Average Change in "
title = 'Average Difference Due to Resilience Investment Among the Smaller Subset of TAZ Pairs Where the "Resilience" Case Was Different from the "No Resilience" Case)'
makebarcharts(variabletype=variabletype,axislabel=axislabel,title=title)

In [None]:
# Conversion to HTML has moved to TAZ_metrics.py
# !jupyter nbconvert MetricsByTAZ.ipynb --to html --no-input

In [None]:
base_trip_omx_file.close()
newdisruptresil_trip_omx_file.close()
newdisruptNOresil_trip_omx_file.close()

base_skims_omx_file.close()
disrupt_noresil_skims_omx_file.close()
disrupt_resil_skims_omx_file.close()