# Equity Analysis at the TAZ Level
This Jupyter notebook takes AequilibraE runs (with and without resilience investment) and outputs an HTML file that reports changes in metrics by equity category. 

The default assumption is that the user will run the equity overlay analysis (`run_equity_overlay.bat` file in C:\GitHub\RDR\helper_tools\equity_analysis) as a first step, and then use the output from that as an input to this TAZ metrics analysis. However, the user may also directly provide data in a CSV file assigning an equity variable value to each TAZ from another source, rather than running the equity overlay analysis. If providing other data, the equity data must be numeric. Consult the RDR User Guide in C:\GitHub\RDR\documentation for more information on how to use and understand this tool.

The purpose is to help the user examine and understand differential impacts of a resilience investment intended to mitigate effects of a disruption, comparing different equity categories of interest. The analysis displays variables to help illuminate the following questions from various angles.

Questions driving this analysis include:
- What is the baseline magnitude of trips, minutes per trip, and miles per trip for each equity category?
- How relevant is the disruption for each equity category?
- What is the projected impact of the resilience investment overall and for each equity category, i.e., are the benefits equitably distributed?

The `equity_metrics.config` configuration file allows the user to specify the following:
- `path_to_RDR_config_file` – This should identify the location of the configuration file pertinent to the existing RDR Metamodel run and corresponding AequilibraE runs that will be used for this equity analysis. The analysis will use this configuration file to identify where to access the OMX files from those runs.
- `resil` - Resilience project.
- `hazard`, `recovery`, `socio`, `proj_group`, `elasticity`, `run_type` - Aequilibrae scenario dimensions.
- *Note*: As described above, the default assumption is that the user will use the equity overlay analysis first, and then use the output from that as an input to this TAZ metrics analysis. If the user will instead directly provide the equity data then the user should update the `equity_metrics.config` file (or renamed config file referenced in the run_TAZ_metrics.bat file, if applicable) to specify the name of the user-provided file in `output_name` (without the CSV file extension). The equity data must be numeric.
    
## Check output directory for CSV file outputs with underlying data results. 
This is in the same location as this HTML file, and is the directory specified in the `equity_analysis_dir` parameter in the equity metrics config file.

## Scroll down in this HTML file for charts and statistical analysis.

In [None]:
# Import statements
import os
import sys
import numpy as np
import pandas as pd
import openmatrix as omx
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import linregress

import equity_config_reader

sys.path.insert(0, os.path.abspath('../../metamodel_py'))
import rdr_setup

In [None]:
# The equity configuration filepath is now passed into this notebook from its parent, TAZ_metrics.py, via the temporary text file,
# assuming this notebook and TAZ_metrics.py are both in the same folder.
# To run the notebook in isolation, rather than by executing the run_TAZ_metrics.bat file, comment out the below three lines and 
# uncomment the subsequent two lines.
with open('temp.txt', 'r') as f:
    config_filepath = f.read()
equity_cfg = equity_config_reader.read_equity_config_file(config_filepath)

#config_filepath = "C:\GitHub\RDR\helper_tools\equity_analysis\equity_metrics.config"
#equity_cfg = equity_config_reader.read_equity_config_file(config_filepath)

rdr_cfg_path = equity_cfg['path_to_RDR_config_file']

# Directory of equity helper tool files
equity_dir = equity_cfg['equity_analysis_dir']

cfg = rdr_setup.read_config_file(rdr_cfg_path)

RDR_run_id = cfg['run_id']

# Name of equity variable
category_name = equity_cfg['equity_feature']

# Name of CSV file with equity category value for each TAZ (either output from run_equity_overlay.bat OR user-provided)
category_filename = equity_cfg['output_name']

# P-value for use in statistical tests
pval = float(equity_cfg['pval'])

# Look to see if the equity overlay data exists
if not os.path.exists(os.path.join(equity_dir, category_filename + '.csv')):
    print('{}.csv not found in {}. Please run the equity_overlay first or generate your own file and specify the filename for it as output_name in the equity_metrics.config file'.format(category_filename, equity_dir))

In [None]:
# Utility method for reading OMX files
def readOMX(filename, selectedMatrix, debug_mode):
    f = omx.open_file(filename)
    matrix_size = f.shape()
    if debug_mode:
        print('Shape: ', f.shape())
        print('Number of tables: ', len(f))
        print('Table names: ', f.list_matrices())
        print('Attributes: ', f.list_all_attributes())
    omx_df = f[selectedMatrix]
    if debug_mode:
        print('Sum of matrix elements: ', '{:.9}'.format(np.sum(omx_df)))
        print('Percentiles: ', np.percentile(omx_df, (1, 10, 30, 50, 70, 90, 99)))
        print('Maximum: ', np.amax(omx_df))
    return omx_df, matrix_size, f

In [None]:
# Define inputs for comparison - these come from equity_cfg, the equity_metrics.config file
resil = equity_cfg['resil']
baseline = equity_cfg['baseline']
hazard = equity_cfg['hazard']
recovery = equity_cfg['recovery']
socio = equity_cfg['socio']
projgroup = equity_cfg['projgroup']
elasticity = equity_cfg['elasticity']
elasname = str(int(10 * -elasticity))
run_type = equity_cfg['run_type']
largeval = float(equity_cfg['largeval'])

hours_name = 'free_flow_time'
miles_name = 'distance'

In [None]:
# Location of the "matrix" OMX files for "base"
matrix_omx_folder_path_base = os.path.join(equity_dir, "aeq_runs", "base", RDR_run_id,
                                           socio + projgroup, "matrix", "matrices")

# Location of the "nocar" OMX files for "base"
nocar_omx_folder_path_base = os.path.join(equity_dir, "aeq_runs", "base", RDR_run_id,
                                          socio + projgroup, "nocar", "matrices")

# Location of the "matrix" OMX files for "disruption with resilience investment"
matrix_omx_folder_path = os.path.join(equity_dir, "aeq_runs", "disrupt", RDR_run_id,
                             socio + projgroup + '_' + resil + '_' + elasname + '_' + hazard + '_' + recovery,
                             "matrix", "matrices")

# Location of the "matrix" OMX files for "disruption WITHOUT resilience investment"
matrix_omx_folder_path_noresil = os.path.join(equity_dir, "aeq_runs", "disrupt", RDR_run_id,
                                     socio + projgroup + '_' + baseline + '_' + elasname + '_' + hazard + '_' + recovery,
                                     "matrix", "matrices")

# Location of the "nocar" OMX files for "disruption with resilience investment"
nocar_omx_folder_path = os.path.join(equity_dir, "aeq_runs", "disrupt", RDR_run_id,
                             socio + projgroup + '_' + resil + '_' + elasname + '_' + hazard + '_' + recovery,
                             "nocar", "matrices")

# Location of the "nocar" OMX files for "disruption WITHOUT resilience investment"
nocar_omx_folder_path_noresil = os.path.join(equity_dir, "aeq_runs", "disrupt", RDR_run_id,
                                     socio + projgroup + '_' + baseline + '_' + elasname + '_' + hazard + '_' + recovery,
                                     "nocar", "matrices")

# READING THE TABLES FOR "MATRIX"

# Read the base OMX trip table
matrix_base_matrix_filename = os.path.join(matrix_omx_folder_path_base, 'base_demand_summed.omx')
matrix_base_dem, matrix_base_trips_matrix_size, matrix_base_trip_omx_file = readOMX(matrix_base_matrix_filename, 'matrix', 0)
df_matrix_base_trips = pd.DataFrame(data=matrix_base_dem)

# Read the new OMX trip table in the disruption with resilience case
matrix_newdisruptresil_matrix_filename = os.path.join(matrix_omx_folder_path, 'new_demand_summed.omx')
matrix_newdisruptresil_dem, matrix_resil_trips_matrix_size, matrix_newdisruptresil_trip_omx_file = readOMX(matrix_newdisruptresil_matrix_filename, 'matrix', 0)
df_matrix_resil_trips = pd.DataFrame(data=matrix_newdisruptresil_dem)

# Read the new OMX trip table in the disruption WITHOUT resilience case
matrix_newdisruptNOresil_matrix_filename = os.path.join(matrix_omx_folder_path_noresil, 'new_demand_summed.omx')
matrix_newdisruptNOresil_dem, matrix_noresil_trips_matrix_size, matrix_newdisruptNOresil_trip_omx_file = readOMX(matrix_newdisruptNOresil_matrix_filename, 'matrix', 0)
df_matrix_NOresil_trips = pd.DataFrame(data=matrix_newdisruptNOresil_dem)

# READING THE TABLES FOR "NOCAR," if applicable
if os.path.exists(nocar_omx_folder_path):

    # Read the base OMX trip table
    nocar_base_matrix_filename = os.path.join(nocar_omx_folder_path_base, 'base_demand_summed.omx')
    nocar_base_dem, nocar_base_trips_matrix_size, nocar_base_trip_omx_file = readOMX(nocar_base_matrix_filename, 'nocar', 0)
    df_nocar_base_trips = pd.DataFrame(data=nocar_base_dem)

    # Read the new OMX trip table in the disruption with resilience case
    nocar_newdisruptresil_matrix_filename = os.path.join(nocar_omx_folder_path, 'new_demand_summed.omx')
    nocar_newdisruptresil_dem, nocar_resil_trips_matrix_size, nocar_newdisruptresil_trip_omx_file = readOMX(nocar_newdisruptresil_matrix_filename, 'matrix', 0)
    df_nocar_resil_trips = pd.DataFrame(data=nocar_newdisruptresil_dem)

    # Read the new OMX trip table in the disruption WITHOUT resilience case
    nocar_newdisruptNOresil_matrix_filename = os.path.join(nocar_omx_folder_path_noresil, 'new_demand_summed.omx')
    nocar_newdisruptNOresil_dem, nocar_noresil_trips_matrix_size, nocar_newdisruptNOresil_trip_omx_file = readOMX(nocar_newdisruptNOresil_matrix_filename, 'matrix', 0)
    df_nocar_NOresil_trips = pd.DataFrame(data=nocar_newdisruptNOresil_dem)

In [None]:
# Names of file skims
baseskims_filename = run_type + '_' + socio + projgroup
disruptskims_noresil_filename = run_type + '_disrupt_' + socio + projgroup + '_' + baseline + '_' + elasname + '_' + hazard + '_' + recovery
disruptskims_resil_filename = run_type + '_disrupt_' + socio + projgroup + '_' + resil + '_' + elasname + '_' + hazard + '_' + recovery

# Create filepath strings for "matrix" tables
matrix_baseskims_folder = os.path.join(matrix_omx_folder_path_base, baseskims_filename + '.omx')
matrix_disruptskims_noresil_folder = os.path.join(matrix_omx_folder_path_noresil, disruptskims_noresil_filename + '.omx')
matrix_disruptskims_resil_folder = os.path.join(matrix_omx_folder_path, disruptskims_resil_filename + '.omx')

# Create filepath strings for "nocar" tables
nocar_baseskims_folder = os.path.join(nocar_omx_folder_path_base, baseskims_filename + '.omx')
nocar_disruptskims_noresil_folder = os.path.join(nocar_omx_folder_path_noresil, disruptskims_noresil_filename + '.omx')
nocar_disruptskims_resil_folder = os.path.join(nocar_omx_folder_path, disruptskims_resil_filename + '.omx')

In [None]:
# READING THE BASE SKIMS FOR "MATRIX"
# Read the base skims OMX for "matrix"
matrix_base_hours, matrix_base_hours_matrix_size, base_skims_omx_file = readOMX(matrix_baseskims_folder, hours_name, 0)
df_matrix_base_hours = pd.DataFrame(data=matrix_base_hours)
matrix_base_miles, matrix_base_miles_matrix_size, base_skims_omx_file = readOMX(matrix_baseskims_folder, miles_name, 0)
df_matrix_base_miles = pd.DataFrame(data=matrix_base_miles)

# READING THE BASE SKIMS FOR "NOCAR," if applicable
if os.path.exists(nocar_omx_folder_path):
    # Read the base skims OMX for "nocar"
    nocar_base_hours, nocar_base_hours_matrix_size, base_skims_omx_file = readOMX(nocar_baseskims_folder, hours_name, 0)
    df_nocar_base_hours = pd.DataFrame(data=nocar_base_hours)
    nocar_base_miles, nocar_base_miles_matrix_size, base_skims_omx_file = readOMX(nocar_baseskims_folder, miles_name, 0)
    df_nocar_base_miles = pd.DataFrame(data=nocar_base_miles)

In [None]:
# READING THE DISRUPTION WITH NO RESILIENCE SKIMS FOR "MATRIX"
# Read the disrupt skims OMX - no resilience project
matrix_disrupt_noresil_hours, matrix_size, disrupt_noresil_skims_omx_file = readOMX(matrix_disruptskims_noresil_folder, hours_name, 0)
df_matrix_disrupt_noresil_hours = pd.DataFrame(data=matrix_disrupt_noresil_hours)
matrix_disrupt_noresil_miles, matrix_size, disrupt_noresil_skims_omx_file = readOMX(matrix_disruptskims_noresil_folder, miles_name, 0)
df_matrix_disrupt_noresil_miles = pd.DataFrame(data=matrix_disrupt_noresil_miles)

# READING THE DISRUPTION WITH NO RESILIENCE SKIMS FOR "NOCAR," if applicable
if os.path.exists(nocar_omx_folder_path):
    # Read the disrupt skims OMX - no resilience project
    nocar_disrupt_noresil_hours, matrix_size, disrupt_noresil_skims_omx_file = readOMX(nocar_disruptskims_noresil_folder, hours_name, 0)
    df_nocar_disrupt_noresil_hours = pd.DataFrame(data=nocar_disrupt_noresil_hours)
    nocar_disrupt_noresil_miles, matrix_size, disrupt_noresil_skims_omx_file = readOMX(nocar_disruptskims_noresil_folder, miles_name, 0)
    df_nocar_disrupt_noresil_miles = pd.DataFrame(data=nocar_disrupt_noresil_miles)

In [None]:
# READING THE DISRUPTION WITH RESILIENCE SKIMS FOR "MATRIX"
# Read the disrupt skims OMX - with resilience project
matrix_disrupt_resil_hours, matrix_size, disrupt_resil_skims_omx_file = readOMX(matrix_disruptskims_resil_folder, hours_name, 0)
df_matrix_disrupt_resil_hours = pd.DataFrame(data=matrix_disrupt_resil_hours)
matrix_disrupt_resil_miles, matrix_size, disrupt_resil_skims_omx_file = readOMX(matrix_disruptskims_resil_folder, miles_name, 0)
df_matrix_disrupt_resil_miles = pd.DataFrame(data=matrix_disrupt_resil_miles)

# READING THE DISRUPTION WITH RESILIENCE SKIMS FOR "NOCAR," if applicable
if os.path.exists(nocar_omx_folder_path):
    # Read the disrupt skims OMX - with resilience project
    nocar_disrupt_resil_hours, matrix_size, disrupt_resil_skims_omx_file = readOMX(nocar_disruptskims_resil_folder, hours_name, 0)
    df_nocar_disrupt_resil_hours = pd.DataFrame(data=nocar_disrupt_resil_hours)
    nocar_disrupt_resil_miles, matrix_size, disrupt_resil_skims_omx_file = readOMX(nocar_disruptskims_resil_folder, miles_name, 0)
    df_nocar_disrupt_resil_miles = pd.DataFrame(data=nocar_disrupt_resil_miles)

In [None]:
# Function to create dataframe based on skim results
def makeskimresult_df(hours_df,trips_df,miles_df):
    # Base times and distances by origin TAZ
    # Convert O-D matrix to tall table indexed by origin and destination TAZ
    bool_base_hours = hours_df < largeval
    a = np.repeat(bool_base_hours.columns, len(bool_base_hours.index))
    b = np.tile(bool_base_hours.index, len(bool_base_hours.columns))

    # Sums demand where <largeval
    base_cumtripcount = (trips_df.where(bool_base_hours, other=0))
    base_cumtime = (base_cumtripcount*hours_df)/60
    base_cumdist = (base_cumtripcount*miles_df)
    c1 = base_cumtripcount.values.ravel()
    c2 = base_cumtime.values.ravel()
    c3 = base_cumdist.values.ravel()
    df = pd.DataFrame({'from':a, 'to':b, 'trips':c1, 'hours':c2, 'miles':c3})
    return df

In [None]:
# Make dataframes for "matrix"
matrix_base_df = makeskimresult_df(df_matrix_base_hours,df_matrix_base_trips,df_matrix_base_miles)
matrix_disrupt_noresil_df = makeskimresult_df(df_matrix_disrupt_noresil_hours,df_matrix_NOresil_trips,df_matrix_disrupt_noresil_miles)
matrix_disrupt_resil_df = makeskimresult_df(df_matrix_disrupt_resil_hours,df_matrix_resil_trips,df_matrix_disrupt_resil_miles)

if os.path.exists(nocar_omx_folder_path):
    # Make dataframes for "nocar," if applicable
    nocar_base_df = makeskimresult_df(df_nocar_base_hours,df_nocar_base_trips,df_nocar_base_miles)
    nocar_disrupt_noresil_df = makeskimresult_df(df_nocar_disrupt_noresil_hours,df_nocar_NOresil_trips,df_nocar_disrupt_noresil_miles)
    nocar_disrupt_resil_df = makeskimresult_df(df_nocar_disrupt_resil_hours,df_nocar_resil_trips,df_nocar_disrupt_resil_miles)

if os.path.exists(nocar_omx_folder_path):
    # If "nocar" tables exist, combine "matrix" and "nocar" dataframes for overall depiction of results
    base_df = matrix_base_df.add(nocar_base_df)
    disrupt_noresil_df = matrix_disrupt_noresil_df.add(nocar_disrupt_noresil_df)
    disrupt_resil_df = matrix_disrupt_resil_df.add(nocar_disrupt_resil_df)
else:
    # Otherwise if "nocar" tables do not exist, the overall results are just those from the "matrix" folders
    base_df = matrix_base_df
    disrupt_noresil_df = matrix_disrupt_noresil_df
    disrupt_resil_df = matrix_disrupt_resil_df  

In [None]:
matrix_base_trip_omx_file.close()
matrix_newdisruptresil_trip_omx_file.close()
matrix_newdisruptNOresil_trip_omx_file.close()

if os.path.exists(nocar_omx_folder_path):
    nocar_base_trip_omx_file.close()
    nocar_newdisruptresil_trip_omx_file.close()
    nocar_newdisruptNOresil_trip_omx_file.close()

base_skims_omx_file.close()
disrupt_noresil_skims_omx_file.close()
disrupt_resil_skims_omx_file.close()

In [None]:
# Create data frame of skim results
merged_df = pd.merge(base_df, disrupt_noresil_df, how='inner', on=['from', 'to'], suffixes=("_base", None))
taz_pair_skims = pd.merge(merged_df, disrupt_resil_df, how='inner', on=['from', 'to'], suffixes=("_disrupt_noresil", "_disrupt_resil"))

In [None]:
# Replace NaN values with 'external'. These are for nodes which do not exist in the TAZ file, and therefore do not have any equity attributes. 
# They are nodes which are outside the MPO boundaries and are needed for travel demand modeling purposes, but do not have shapes associated with them. 
# They are not omitted because the totals for hours, miles, and trips should be the same at the MPO level as what is reported to users.
taz_pair_skims[['from', 'to']] = taz_pair_skims[['from', 'to']].fillna('external')

# Calculate relative change in trips/hours/miles for each 
taz_pair_skims['trips_delta'] = (taz_pair_skims['trips_disrupt_resil'] - taz_pair_skims['trips_disrupt_noresil'])
taz_pair_skims['hours_delta'] = (taz_pair_skims['hours_disrupt_resil'] - taz_pair_skims['hours_disrupt_noresil'])
taz_pair_skims['miles_delta'] = (taz_pair_skims['miles_disrupt_resil'] - taz_pair_skims['miles_disrupt_noresil'])

In [None]:
# Create three variables to flag whether the disruption is relevant for the TAZ pair (for trips/miles/hours)
taz_pair_skims['trips_disruption_relevant'] = taz_pair_skims['trips_base'] != taz_pair_skims['trips_disrupt_noresil']
taz_pair_skims['hours_disruption_relevant'] = taz_pair_skims['hours_base'] != taz_pair_skims['hours_disrupt_noresil']
taz_pair_skims['miles_disruption_relevant'] = taz_pair_skims['miles_base'] != taz_pair_skims['miles_disrupt_noresil']

In [None]:
# Read in equity category label by TAZ
taz_equity = pd.read_csv(os.path.join(equity_dir, category_filename + '.csv'),
                         usecols=['TAZ', category_name],
                         converters={'TAZ': int, category_name: float})

In [None]:
# Function to aggregate and calculate metrics by TAZ of origin or destination
def aggregate(from_or_to):
    summary = pd.pivot_table(taz_pair_skims, index=from_or_to, values=taz_pair_skims.columns.to_list(),
                                      aggfunc={'trips_base':np.sum,
                                               'trips_disrupt_noresil':np.sum,
                                               'trips_disrupt_resil': np.sum,
                                               'hours_base':np.sum,
                                               'hours_disrupt_noresil':np.sum,
                                               'hours_disrupt_resil': np.sum,
                                               'miles_base':np.sum,
                                               'miles_disrupt_noresil':np.sum,
                                               'miles_disrupt_resil': np.sum,
                                               }, 
                                               fill_value=0)
    summary = pd.DataFrame(summary.to_records())

    # MINUTES PER TRIP calculations
    summary['minutespertrip_base'] = (summary['hours_base']*60)/summary['trips_base']
    summary['minutespertrip_disrupt_noresil'] = (summary['hours_disrupt_noresil']*60)/summary['trips_disrupt_noresil']
    summary['minutespertrip_disrupt_resil'] = (summary['hours_disrupt_resil']*60)/summary['trips_disrupt_resil']
    # MILES PER TRIP calculations
    summary['milespertrip_base'] = (summary['miles_base'])/summary['trips_base']
    summary['milespertrip_disrupt_noresil'] = (summary['miles_disrupt_noresil'])/summary['trips_disrupt_noresil']
    summary['milespertrip_disrupt_resil'] = (summary['miles_disrupt_resil'])/summary['trips_disrupt_resil']    
    # Additional trip calculations
    metric = "trips"
    summary[metric+'_percent_change_noresil'] = ((summary[metric+'_disrupt_noresil'] - summary[metric+'_base'])*100)/summary[metric+'_base']
    summary[metric+"_delta_absolute"] = summary[metric+'_disrupt_resil'] - summary[metric+'_disrupt_noresil']
    summary[metric+'_delta_relative'] = (summary[metric+'_delta_absolute']*100)/summary[metric+'_disrupt_noresil']
    # Additional minutes per trip calculations
    metric = "minutespertrip"
    summary[metric+'_percent_change_noresil'] = ((summary[metric+'_disrupt_noresil'] - summary[metric+'_base'])*100)/summary[metric+'_base']
    summary[metric+"_delta_absolute"] = summary[metric+'_disrupt_resil'] - summary[metric+'_disrupt_noresil']
    summary[metric+'_delta_relative'] = (summary[metric+'_delta_absolute']*100)/summary[metric+'_disrupt_noresil']
    # Additional miles per trip calculations
    metric = "milespertrip"
    summary[metric+'_percent_change_noresil'] = ((summary[metric+'_disrupt_noresil'] - summary[metric+'_base'])*100)/summary[metric+'_base']
    summary[metric+"_delta_absolute"] = summary[metric+'_disrupt_resil'] - summary[metric+'_disrupt_noresil']
    summary[metric+'_delta_relative'] = (summary[metric+'_delta_absolute']*100)/summary[metric+'_disrupt_noresil']    

    # Join by 'from' TAZ or 'to' TAZ (as the case may be)
    summary = summary.merge(taz_equity, how='left', left_on=from_or_to, right_on='TAZ')

    return summary


In [None]:
TAZ_origin_stats = aggregate('from')
TAZ_destination_stats = aggregate('to')

In [None]:
origin_csv_summary_filepath = os.path.join(equity_dir,"MetricsByTAZ_summary_{}_byTAZofOrigin.csv".format(equity_cfg['run_id']))
# Produce a summary CSV file
TAZ_origin_stats.to_csv(origin_csv_summary_filepath)

destination_csv_summary_filepath = os.path.join(equity_dir,"MetricsByTAZ_summary_{}_byTAZofDestination.csv".format(equity_cfg['run_id']))
# Produce a summary CSV file
TAZ_destination_stats.to_csv(destination_csv_summary_filepath)

In [None]:
# Create prerequisite dictionaries and list to use the "make_plots" function and the "runstats" function
ylabel_dict = {"percent_change_noresil":"Percent Change in ",
"delta_absolute":"Change in ",
"delta_relative":"Percent Change in ",
"base":""}

title_dict = {"percent_change_noresil":"Percent Change from Baseline Due to Disruption (without Resilience) (i.e., Computing Difference in Metric Compared to Base Value, Then Dividing by Base Value)",
"delta_absolute":"Overall Impact of Resilience Investment as Compared to 'No Resilience' Case, for All TAZ (i.e., Metric in 'Resilience' Case Minus Metric in 'No Resilience' Case)",
"delta_relative":"Relative Impact of Resilience Investment as Compared to 'No Resilience' Case, for All TAZ (i.e., Overall Impact Divided by Value of Metric in 'No Resilience' Case)",
"base":"Baseline Magnitude of Metrics Absent Disruption"}

color_dict = {"percent_change_noresil":'#024a70',
"delta_absolute":"#990000",
"delta_relative":'#833C0C',
"base":"#548235"}

metrics_list = ["base",
"percent_change_noresil",
"delta_absolute",
"delta_relative"]

metricsubtype = ['trips_','minutespertrip_','milespertrip_']

y_hoverformat_dict = {"percent_change_noresil":"%{y:.3}%",
"delta_absolute":"%{y:,.5}",
"delta_relative":"%{y:.3}%",
"base":"%{y:,.7}"}

In [None]:
def runstats(metricsubtype=metricsubtype,metrics_list=metrics_list):
    dict = {}
    for type in metricsubtype:
        for metric in metrics_list:
            baseregress_set = TAZ_origin_stats.filter([category_name,type+metric]).dropna()
            result = linregress(baseregress_set[category_name],baseregress_set[type+metric])
            dict[type+metric] = [*result]
    df = pd.DataFrame.from_dict(dict, orient='index',columns=['slope', 'intercept', 'r_value','p_value','stderr'])
    # p-value is derived from the configuration file. The default is 0.05
    # Check whether the pvalue that resulted from the chi square test is less than the p-value in the configuration file
    df['stat_significance'] = df['p_value'] < pval
    df['r_squared'] = df['r_value']**2
    df = df.reset_index()
    df = df.rename(columns = {'index':'dependent_variable'})
    roworder = ['trips_base', 'minutespertrip_base','milespertrip_base', 
                'trips_percent_change_noresil', 'minutespertrip_percent_change_noresil','milespertrip_percent_change_noresil',
                'trips_delta_absolute','minutespertrip_delta_absolute','milespertrip_delta_absolute',
                'trips_delta_relative', 'minutespertrip_delta_relative', 'milespertrip_delta_relative'
               ]
    # convert first column to 'Categorical' data type with custom order
    df['dependent_variable'] = pd.Categorical(df['dependent_variable'], categories=roworder, ordered=True)
    # sort the dataframe by 'continent' column
    df = df.sort_values(by='dependent_variable')
    df = df.filter(['dependent_variable','stat_significance','p_value',
                                            'slope', 'intercept', 'r_value','r_squared','stderr'],
                                       axis=1)
    return df

In [None]:
# Function to generate plots
def make_plots(dataframe,variable_name):
    fig = make_subplots(rows=1, cols=3, subplot_titles=('Trips',  'Minutes per Trip', 'Miles per Trip'))
    fig.add_trace(go.Scatter( 
        x=dataframe[category_name], 
        y=dataframe["trips_"+variable_name],
        mode='markers',
        marker=dict(color=color_dict[variable_name],opacity=0.3),
        text=dataframe['TAZ'],
        hovertemplate=
        "<b>TAZ ID: %{text}</b><br>" +
        "Equity Category of Origin TAZ ("+category_name+")"+": %{x:.0f}<br>" +
        ylabel_dict[variable_name]+"Trips: "+y_hoverformat_dict[variable_name]+"<br>" +
        #"TAZ: %{marker.size:,}" +
        "<extra></extra>"),
        row=1,col=1)
    fig.add_trace(go.Scatter(
        x=dataframe[category_name], 
        y=dataframe["minutespertrip_"+variable_name],
        mode='markers',
        marker=dict(color=color_dict[variable_name],opacity=0.3),
        text=dataframe['TAZ'],
        hovertemplate=
        "<b>TAZ ID: %{text}</b><br>" +
        "Equity Category of Origin TAZ ("+category_name+")"+": %{x:.0f}<br>" +
        ylabel_dict[variable_name]+"Minutes per Trip: "+y_hoverformat_dict[variable_name]+"<br>" +
        #"TAZ: %{marker.size:,}" +
        "<extra></extra>"),
        row=1,col=2)
    fig.add_trace(go.Scatter( 
        x=dataframe[category_name], 
        y=dataframe["milespertrip_"+variable_name],
        mode='markers',
        marker=dict(color=color_dict[variable_name],opacity=0.3),
        text=dataframe['TAZ'],
        hovertemplate=
        "<b>TAZ ID: %{text}</b><br>" +
        "Equity Category of Origin TAZ ("+category_name+")"+": %{x:.0f}<br>" +
        ylabel_dict[variable_name]+"Miles per Trip: "+y_hoverformat_dict[variable_name]+"<br>" +
        #"TAZ: %{marker.size:,}" +
        "<extra></extra>"),
        row=1,col=3)
    # edit axis labels
    fig['layout']['xaxis']['title']="Equity Category of Origin TAZ ("+category_name+")"
    fig['layout']['xaxis2']['title']="Equity Category of Origin TAZ ("+category_name+")"
    fig['layout']['xaxis3']['title']="Equity Category of Origin TAZ ("+category_name+")"    
    fig['layout']['yaxis']['title']=ylabel_dict[variable_name]+"Trips"
    fig['layout']['yaxis2']['title']=ylabel_dict[variable_name]+"Minutes per Trip"
    fig['layout']['yaxis3']['title']=ylabel_dict[variable_name]+"Miles per Trip"
    fig.update_layout(showlegend=False, title_text=title_dict[variable_name])
    fig.show()

## Questions and corresponding variables
### Question 1: What is the baseline magnitude of the metric?
- Variables 1T/1H/1M: Overall magnitude of metric absent disruption
    - `trips_base`
    - `minutespertrip_base`
    - `milespertrip_base`

In [None]:
# Generate baseline plots based on TAZ of origin (question 1)
make_plots(TAZ_origin_stats, metrics_list[0])

### Question 2: How relevant is the disruption?
- Variables 2aT/2aH/2aM: Percent change from baseline metric due to disruption (without resilience investment)
    - `trips_percent_change_noresil`
    - `minutespertrip_percent_change_noresil`
    - `milespertrip_percent_change_noresil` 

In [None]:
# Generate relative change plots based on TAZ of origin (question 2)
make_plots(TAZ_origin_stats, metrics_list[1])

### Question 3: What is the projected impact of the resilience investment?
##### Question 3A: What was the absolute impact (change in metric)?
- Variables 3aT/3aH/3aM: Overall impact of resilience investment on metrics (i.e., magnitude in the "resilience" case minus magnitude in the "no resilience" case)
    - `trips_delta_absolute`
    - `minutespertrip_delta_absolute`
    - `milespertrip_delta_absolute`

In [None]:
# Generate absolute resilience impact plots based on TAZ of origin (question 3a)
make_plots(TAZ_origin_stats, metrics_list[2])

##### Question 3B: What was the relative impact (change in metric expressed as a percentage of the "no resilience" magnitude)?
- Variables 3bT/3bH/3bM: Same as the above set, except divided by the metric in the "no resilience" case and multiplied by 100 to show percent change relative to "no resilience" case
    - `trips_delta_relative`
    - `minutespertrip_delta_relative`
    - `milespertrip_delta_relative`

In [None]:
# Generate relative resilience impact plots based on TAZ of origin (question 3b)
make_plots(TAZ_origin_stats, metrics_list[3])

# Statistical Analysis
## Context
The charts above show points for each TAZ, and show the value of the metric in question (for trips originating from that TAZ) in relation to the equity indicator value for that TAZ. Looking at the collection of all points, one can visually start to infer possible relationships between each metric and the equity indicator value. If there is truly a relationship, there may be implications for equity. To quantify the strength of the relationship and the power of the equity indicator value as a predictor of the metric, we can use a linear regression analysis to find the line of best fit, the r-value and r-squared value, and also use p-value to assess the probability that there is a relationship (i.e., probability that the slope of the slope of the line of best fit is non-zero). 

The p-value shows the allowable threshold probability for whether an identified relationship may be due to random chance. For example, the `equity_metric.config` has a p-value of 0.05 by default - a p-value of 0.05 means that there is a 5 percent chance that there actually is no relationship between the variables. Conversely, there is a 95 percent chance that there is truly a relationship and the slope of the line of best fit is non-zero.  The determination of statistical "significance" is based on whether the p-value that resulted from the test is lower than the p-value supplied by the user. If the p-value in the `equity_metric.config` is 0.05 and the test resulted in a p-value of 0.02, then the result was statistically significant.

The r-value (correlation) measures the strength of the association between two variables. It ranges from -1 (strongly negatively correlated) to 1 (strongly positively correlated). A value of 0 means there is no correlation. Squaring the r-value results in r-squared, which is a related but different metric that indicates the extent to which the variation in the metric is explained by the modeled relationship with the equity indicator variable. R-squared ranges from 0 to 1. R-squared of 0 means the points are not explained by the regression. R-squared of 1 means that all the points are explained by the regression line. The threshold range for an r-squared value that would indicate a useful or meaningful predictive power depends on the context, but in general lower values indicate that the model is less meaningful and higher values indicate that it is more meaningful.

## Results Specific to this Analysis
In the table below, a number with "e" represents scientific notation. For example, 2.29e-02 means 2.29x10^2 or 0.0229. 
As a reminder, these are the meanings of the metrics (in the first column of the table):
- Overall magnitude of metric absent disruption
    - `trips_base`
    - `minutespertrip_base`
    - `milespertrip_base`
- Percent change from baseline metric due to disruption (without resilience investment)
    - `trips_percent_change_noresil`
    - `minutespertrip_percent_change_noresil`
    - `milespertrip_percent_change_noresil` 
- Overall impact of resilience investment on metrics (i.e., magnitude in the "resilience" case minus magnitude in the "no resilience" case)
    - `trips_delta_absolute`
    - `minutespertrip_delta_absolute`
    - `milespertrip_delta_absolute`
- Same as the above set, except divided by the metric in the "no resilience" case and multiplied by 100 to show percent change relative to "no resilience" case
    - `trips_delta_relative`
    - `minutespertrip_delta_relative`
    - `milespertrip_delta_relative`

In [None]:
regressiontable = runstats(metricsubtype, metrics_list)
regressiontable

In [None]:
# Generate plots based on TAZ of origin
#for metric in metrics_list:
#    make_plots(TAZ_origin_stats, metric)

In [None]:
# Conversion to HTML has moved to TAZ_metrics.py
# !jupyter nbconvert MetricsByTAZ.ipynb --to html --no-input