## Assessment 1: UrbanVal Parameter

In [1]:
import os
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly as pys
import ipywidgets as widgets
from ipywidgets import Layout

import _global_scripts as gs

## Purpose

In this assessment, we adjust the `UrbanVal` parameter to see its impact on CRT PNR values. We will follow the following subtask to set the stage.
 
 - **2.1.1** Set UrbanVal=0 for all TAZ (analyze with and without recalibration)
 - **2.1.2** Set UrbanVal=LN(UrbanVal) (analyze with and without recalibration)
 - **2.1.3** Set crt constants (acs_CRT, asc_dCRT, asc_wCRT) = 0 (analyze without recalibration)
 - **2.1.4** E2.1.3 and UrbanVal=0 (analyze without recalibration)

We will do these steps to the `BY_2019` model, and then compare results to the previous TDM results and 2019 On-Board travel survey (See 1-StationSummary.ipynb). 


## Inputs

In [2]:
# set whether you want PA or OD analysis
pa_od = 'PA'
pa_od_function = 'pa'

In [3]:
df_tdm_obs = pd.read_csv(f"_data/base_observed_summary_{pa_od_function}.csv").sort_values(by=['Source','station'], ascending=True).reset_index(drop=True)
if(pa_od == 'PA'):
        df_tdm_obs = df_tdm_obs.drop(columns={'Alt_PA','Alt_Direct_PA','Alt_Transfer_PA'})

# read in results for e2.1.1 before recalibration
path_brding_summary_211_b4 = r"_data/E2.1/E2.1.1_node_b4calib.csv"
path_rider_summary_211_b4  = r"_data/E2.1/E2.1.1_link_b4calib.csv"

# read in results for e2.1.1 after recalibration
path_brding_summary_211 = r"_data/E2.1/E2.1.1_node.csv"
path_rider_summary_211  = r"_data/E2.1/E2.1.1_link.csv"

# read in results for e2.1.2 before recalibration
path_brding_summary_212_b4 = r"_data/E2.1/E2.1.2_node_b4calib.csv"
path_rider_summary_212_b4  = r"_data/E2.1/E2.1.2_link_b4calib.csv"

# read in results for e2.1.2 after recalibration
path_brding_summary_212 = r"_data/E2.1/E2.1.2_node.csv"
path_rider_summary_212  = r"_data/E2.1/E2.1.2_link.csv"

# read in results for e2.1.3 before recalibration
path_brding_summary_213 = r"_data/E2.1/E2.1.3_node.csv"
path_rider_summary_213  = r"_data/E2.1/E2.1.3_link.csv"

# read in results for e2.1.4 before recalibration
path_brding_summary_214 = r"_data/E2.1/E2.1.4_node.csv"
path_rider_summary_214  = r"_data/E2.1/E2.1.4_link.csv"

## Compare Results

### Plotly Graph Boarding Comparisons by Station

In [4]:
# read in stations and summarize tdm results
df_stations1 = gs.df_stations[['station','N']]
df_tdm_211_b4 = gs.summarize_tdm_stats(path_brding_summary_211_b4,path_rider_summary_211_b4, df_stations1, 'TDM_2.1.1_b4', pa_od_function)
df_tdm_211 = gs.summarize_tdm_stats(path_brding_summary_211,path_rider_summary_211, df_stations1, 'TDM_2.1.1', pa_od_function)
df_tdm_212_b4 = gs.summarize_tdm_stats(path_brding_summary_212_b4,path_rider_summary_212_b4, df_stations1, 'TDM_2.1.2_b4', pa_od_function)
df_tdm_212 = gs.summarize_tdm_stats(path_brding_summary_212,path_rider_summary_212, df_stations1, 'TDM_2.1.2', pa_od_function)
df_tdm_213 = gs.summarize_tdm_stats(path_brding_summary_213,path_rider_summary_213, df_stations1, 'TDM_2.1.3', pa_od_function)
df_tdm_214 = gs.summarize_tdm_stats(path_brding_summary_214,path_rider_summary_214, df_stations1, 'TDM_2.1.4', pa_od_function)

# concat dataframes
df_tdm_obs_new = pd.concat([
        df_tdm_obs,
        df_tdm_211_b4,
        df_tdm_211,
        df_tdm_212_b4,
        df_tdm_212,
        df_tdm_213,
        df_tdm_214
    ]).reset_index()
df_tdm_obs_new = df_tdm_obs_new.round(3).fillna(0)
df_tdm_obs_new = df_tdm_obs_new.drop(columns='index')
df_tdm_obs_new = df_tdm_obs_new.loc[:, ~df_tdm_obs_new.columns.str.contains('^Unnamed')]

# rename
sumStats = df_tdm_obs_new.copy()
display(df_tdm_obs_new)

  df_rider_summary = pd.read_csv(path_riders)
  df_rider_summary = pd.read_csv(path_riders)
  df_rider_summary = pd.read_csv(path_riders)


Unnamed: 0,Source,station,AccessMode,Brd_PA,Brd_Direct_PA,Brd_Transfer_PA
0,OBS,01-PROVO CENTRAL STATION,drive,2283.379,0.00,0.00
1,OBS,01-PROVO CENTRAL STATION,walk,945.908,0.00,0.00
2,OBS,02-OREM CENTRAL STATION,drive,1475.024,0.00,0.00
3,OBS,02-OREM CENTRAL STATION,walk,371.804,0.00,0.00
4,OBS,03-AMERICAN FORK STATION,drive,1423.768,0.00,0.00
...,...,...,...,...,...,...
235,TDM_2.1.4,11-FARMINGTON STATION,walk,14.420,9.60,4.82
236,TDM_2.1.4,12-LAYTON STATION,walk,43.860,21.67,22.19
237,TDM_2.1.4,13-CLEARFIELD STATION,walk,99.800,91.29,8.51
238,TDM_2.1.4,14-ROY STATION,walk,53.090,49.38,3.71


In [6]:
# add a few more columns regarding percentage of boardings in relation to total boardings
# sum by source and station 
station_sum = sumStats.groupby(["Source", "station"], as_index=False).agg({
    f"Brd_{pa_od}": "sum",
    f"Brd_Direct_{pa_od}": "sum",
    f"Brd_Transfer_{pa_od}": "sum"
})

# add All accessMode
station_sum["AccessMode"] = "All"
sumStats2 = pd.concat([sumStats, station_sum], ignore_index=True) 

accessmode_sum = sumStats2.groupby(["Source", "AccessMode"], as_index=False).agg({
    f"Brd_{pa_od}": "sum",
    f"Brd_Direct_{pa_od}": "sum",
    f"Brd_Transfer_{pa_od}": "sum"
})

accessmode_sum.rename(columns={
    f"Brd_{pa_od}": f"Source_Brd_{pa_od}", 
    f"Brd_Direct_{pa_od}": f"Source_Brd_Direct_{pa_od}", 
    f"Brd_Transfer_{pa_od}": f"Source_Brd_Transfer_{pa_od}"}, inplace=True)
sumStatsP = sumStats2.merge(accessmode_sum, on=["Source", "AccessMode"], how="left")

sumStatsP[f"Brd_{pa_od}_Perc"]          = sumStatsP[f"Brd_{pa_od}"] / sumStatsP[f"Source_Brd_{pa_od}"]
sumStatsP[f"Brd_Direct_{pa_od}_Perc"]   = sumStatsP[f"Brd_Direct_{pa_od}"] / sumStatsP[f"Source_Brd_Direct_{pa_od}"]
sumStatsP[f"Brd_Transfer_{pa_od}_Perc"] = sumStatsP[f"Brd_Transfer_{pa_od}"] / sumStatsP[f"Source_Brd_Transfer_{pa_od}"]
sumStatsP


Unnamed: 0,Source,station,AccessMode,Brd_PA,Brd_Direct_PA,Brd_Transfer_PA,Source_Brd_PA,Source_Brd_Direct_PA,Source_Brd_Transfer_PA,Brd_PA_Perc,Brd_Direct_PA_Perc,Brd_Transfer_PA_Perc
0,OBS,01-PROVO CENTRAL STATION,drive,2283.379,0.00,0.00,14955.118,0.0,0.00,0.152682,,
1,OBS,01-PROVO CENTRAL STATION,walk,945.908,0.00,0.00,5672.892,0.0,0.00,0.166742,,
2,OBS,02-OREM CENTRAL STATION,drive,1475.024,0.00,0.00,14955.118,0.0,0.00,0.098630,,
3,OBS,02-OREM CENTRAL STATION,walk,371.804,0.00,0.00,5672.892,0.0,0.00,0.065540,,
4,OBS,03-AMERICAN FORK STATION,drive,1423.768,0.00,0.00,14955.118,0.0,0.00,0.095203,,
...,...,...,...,...,...,...,...,...,...,...,...,...
355,TDM_2.1.4,11-FARMINGTON STATION,All,91.650,86.82,4.83,2699.530,2201.5,498.03,0.033950,0.039437,0.009698
356,TDM_2.1.4,12-LAYTON STATION,All,174.630,152.38,22.25,2699.530,2201.5,498.03,0.064689,0.069216,0.044676
357,TDM_2.1.4,13-CLEARFIELD STATION,All,232.480,223.98,8.50,2699.530,2201.5,498.03,0.086119,0.101740,0.017067
358,TDM_2.1.4,14-ROY STATION,All,229.980,226.12,3.86,2699.530,2201.5,498.03,0.085193,0.102712,0.007751


In [7]:
def plotit(variable, access_mode):
    output.clear_output()  # Clear previous output before displaying new content
    global firstTime
    if firstTime:
    
        filtered_data = sumStatsP[sumStatsP['AccessMode'] == access_mode]
            
        # Create histogram
        fig = px.histogram(
            filtered_data, 
            x="station", 
            y=variable, 
            text_auto='.2s',
            color='Source', 
            barmode='group',
            height=400
        )
        fig.update_layout(
            xaxis_title="Station Name",
            yaxis_title=str(variable),
            legend_title="Model Version"
        )
        
        # Display the plot
        fig.show()
    
    else:
        firstTime = True

In [8]:
lstValues = list([
    f'Brd_{pa_od}',
    f'Brd_Direct_{pa_od}',
    f'Brd_Transfer_{pa_od}',
    f'Brd_{pa_od}_Perc',
    f'Brd_Direct_{pa_od}_Perc',
    f'Brd_Transfer{pa_od}_Perc'
])
accessModeOptions = ['drive', 'walk', 'All']

selectValues = widgets.Select(options=lstValues, value=(f'Brd_{pa_od}' ), description = 'Select Variable')
selectAccessMode = widgets.Dropdown(options=accessModeOptions, value='All', description='Access Mode')

# Set up a global variable to track whether the widgets have been changed
firstTime = False

# create output widget to display filtered DataFrame
output = widgets.Output()
hbox = widgets.HBox([selectValues, selectAccessMode])

# create interactive widget
interactive_output = widgets.interactive_output(plotit, {'variable':selectValues, 'access_mode': selectAccessMode})

display(hbox)
display(interactive_output)
display(output)

HBox(children=(Select(description='Select Variable', options=('Brd_PA', 'Brd_Direct_PA', 'Brd_Transfer_PA', 'B…

Output()

Output()

### RMSE and Absolute Difference Table by Station

In [9]:
def calculate_metric(df, access_mode, calculation_type):
    # Filter based on access_mode
    if access_mode in ["walk", "drive"]:
        df = df[df["AccessMode"] == access_mode]
    elif access_mode == "all":
        # Group by station and source, summing Brd_OD and OBS for walk + drive
        df = (
            df.groupby(["station", "Source"], as_index=False)
            .agg({f"Brd_{pa_od}": "sum"})  # AccessMode is irrelevant in 'all'
        )

    # Separate observed values (OBS) and other sources
    obs_df = df[df['Source'] == 'OBS'][['station', f'Brd_{pa_od}']].rename(columns={f'Brd_{pa_od}': 'OBS'})
    sources_df = df[df['Source'] != 'OBS']

    # Merge observed values with other sources
    merged = sources_df.merge(obs_df, on=['station'])
    
    if calculation_type=='rmse':
        # Calculate RMSE for each source and station
        merged['squared_error'] = (merged[f'Brd_{pa_od}'] - merged['OBS'])**2
        final_df = merged.groupby(['station', 'Source'])['squared_error'].sum().reset_index()
        
        # Add RMSE values per source and station
        final_df['RMSE'] = np.sqrt(final_df['squared_error'])
        final_df = final_df.pivot(index='station', columns='Source', values='RMSE')
        
        # Calculate total RMSE for all stations
        total_df = (
            merged.groupby('Source')['squared_error'].sum()
            .apply(np.sqrt)
            .rename('Total')
        )
    
    if calculation_type == 'abs':
        # Calculate absolute difference for each source and station
        merged['abs_diff'] = abs(merged[f'Brd_{pa_od}'] - merged['OBS'])
        final_df = merged.groupby(['station', 'Source'])['abs_diff'].mean().reset_index()
    
        # Add absolute difference values per source and station
        final_df = final_df.pivot(index='station', columns='Source', values='abs_diff')
    
        # Calculate total absolute difference for all stations
        total_df = (
            merged.groupby('Source')['abs_diff'].sum()
            .rename('Total')
        )
    
    # Append Total row with source values as headers
    final_df.loc['Total'] = total_df
    
    # Reset column names for clarity
    final_df.columns.name = None
    final_df.reset_index(inplace=True)
    
    return final_df

In [10]:
# Separate observed values (OBS) and other sources
obs_df = sumStats[sumStats['Source'] == 'OBS'][['station', f'Brd_{pa_od}']].rename(columns={f'Brd_{pa_od}': 'OBS'})
sources_df = sumStats[sumStats['Source'] != 'OBS']

# Merge observed values with other sources
merged = sources_df.merge(obs_df, on='station')

# Calculate RMSE for each source and station
merged['squared_error'] = (merged[f'Brd_{pa_od}'] - merged['OBS'])**2
rmse_table = merged.groupby(['station', 'Source'])['squared_error'].sum().reset_index()

# Add RMSE values per source and station
rmse_table['RMSE'] = np.sqrt(rmse_table['squared_error'])
rmse_table = rmse_table.pivot(index='station', columns='Source', values='RMSE')

# Calculate total RMSE for all stations
total_rmse = (
    merged.groupby('Source')['squared_error'].sum()
    .apply(np.sqrt)
    .rename('Total')
)

# Append Total row with source values as headers
rmse_table.loc['Total'] = total_rmse

# Reset column names for clarity
rmse_table.columns.name = None
rmse_table.reset_index(inplace=True)

display(rmse_table)

Unnamed: 0,station,TDM,TDM_2.1.1,TDM_2.1.1_b4,TDM_2.1.2,TDM_2.1.2_b4,TDM_2.1.3,TDM_2.1.4
0,01-PROVO CENTRAL STATION,2458.348307,2461.093137,2582.523168,2461.812677,2451.733123,3097.144079,3164.566565
1,02-OREM CENTRAL STATION,1500.113989,1504.246556,1538.452794,1504.344453,1505.163149,1856.512481,1938.180524
2,03-AMERICAN FORK STATION,1930.638146,1971.986615,1714.830466,1970.84689,1996.600694,1785.66923,1829.952686
3,04-LEHI STATION,1292.228929,1182.071548,1060.974839,1182.825115,1201.502742,1418.051893,1501.583466
4,05-DRAPER STATION,1239.191364,1298.879939,1011.686147,1299.886917,1324.921567,1188.578972,1213.318835
5,06-SOUTH JORDAN STATION,1381.086008,1360.941152,970.347594,1363.728305,1400.080085,1289.389379,1320.45978
6,07-MURRAY CENTRAL STATION,428.07656,540.691767,894.399608,538.367341,532.725419,1673.796194,1696.55124
7,08-SALT LAKE CENTRAL STATION,499.604912,466.699178,503.391297,466.480755,470.950622,809.085617,817.568768
8,09-NORTH TEMPLE STATION,780.783166,708.817075,805.975547,709.9553,703.9818,992.481237,996.344882
9,10-WOODS CROSS STATION,902.39497,1019.078948,476.400321,1023.189296,1063.861812,695.742496,766.641601


In [11]:
# Example usage
result_df = calculate_metric(
    sumStats, 
    access_mode="all",      # all, walk, drive
    calculation_type="abs"  # rmse, abs
)
display(result_df)

#FIX TOTAL ROW -- do we want the sum of the absolute value?

Unnamed: 0,station,TDM,TDM_2.1.1,TDM_2.1.1_b4,TDM_2.1.2,TDM_2.1.2_b4,TDM_2.1.3,TDM_2.1.4
0,01-PROVO CENTRAL STATION,1802.437,1860.437,2040.717,1861.437,1843.207,2782.757,2860.127
1,02-OREM CENTRAL STATION,492.948,564.628,893.898,564.768,539.718,1466.158,1582.408
2,03-AMERICAN FORK STATION,39.295,83.815,360.595,82.425,116.915,1190.165,1274.135
3,04-LEHI STATION,229.769,8.019,456.401,9.529,45.869,1129.071,1242.581
4,05-DRAPER STATION,208.494,254.294,310.226,255.704,287.604,872.656,909.236
5,06-SOUTH JORDAN STATION,479.173,415.163,307.947,418.773,460.773,1023.197,1063.307
6,07-MURRAY CENTRAL STATION,353.307,295.807,864.287,296.147,260.087,1657.217,1680.747
7,08-SALT LAKE CENTRAL STATION,150.422,2.822,235.922,4.792,12.208,791.602,800.432
8,09-NORTH TEMPLE STATION,692.45,611.8,723.29,613.06,605.92,937.16,941.28
9,10-WOODS CROSS STATION,565.46,655.97,140.6,660.18,698.69,545.72,638.56


## Conclusions

- UrbanVal takes up a pretty big part of the mode choice 
    - when you take out UrbanVal, it does account for large ridership, although after recalibration a constant will fix the missing ridership 
    - the UrbanVal constant is dominant in the utility equation (while it doesn't have a storng contribution to the utility equation, relative to other constants it is pretty big)
- UrbanVal doesn't affect the boarding pattern
    - although cutting out UrbanVal and not recalibrating shows significant ridership loss, the pattern of low boardings at Provo remains
- UrbanVal doesn't solve any of the major problems occurring, especially low boardings at Provo Station
- By setting the crt constants to 0, although crt ridership overall decreases significantly, the proportion at provo station increases. This suggests adjusting the utility equation could prove effective to fixing the Provo station error
    - We can conclude that the crt constants are VERY dominate in the utility equation. Will any other investigations play a role if the constants have so much sway? 
    - Why is the model not seeing trips on commuter rail without needing a huge constant to balance things out?

