## Research Task - Incorporate feedback to Transit Performance Metrics Portfolio #1514

via Juan Matute
>If you're taking requests, I'd like to see the Table 8.1 performance metrics on a statewide basis, along with a 
>- list for each performance metric of which individual transit agency-mode of service combinations are 
>- in the bottom 5% (approximately two standard deviations from the mean) for each. 
>
>This would be illustrative for discussion purposes.


## Table 8.1
![image.png](attachment:e9b88e50-8bf8-4285-a1d7-08b2cbf4bd3b.png)

In [1]:
import pandas as pd
import numpy as np
from new_transit_metrics_utils import GCS_FILE_PATH, sum_by_group, make_long

In [2]:
df = pd.read_parquet(f"{GCS_FILE_PATH}raw_transit_performance_metrics_data.parquet")

In [20]:
df_agg = df.groupby(["agency_name","mode"]).agg(
    {"upt":"sum",
    "vrh":"sum",
    "vrm":"sum",
    "opexp_total":"sum"}
).reset_index()

In [21]:
calc_dict = {
    "opex_per_vrh": ("opexp_total", "vrh"),
    "opex_per_vrm": ("opexp_total", "vrm"),
    "upt_per_vrh": ("upt", "vrh"),
    "upt_per_vrm": ("upt", "vrm"),
    "opex_per_upt": ("opexp_total", "upt"),
    }

for new_col, (num, dem) in calc_dict.items():
    df_agg[new_col] = (
        df_agg[num] / df_agg[dem]
    ).round(2)

In [22]:
df_agg.head()

Unnamed: 0,agency_name,mode,upt,vrh,vrm,opexp_total,opex_per_vrh,opex_per_vrm,upt_per_vrh,upt_per_vrm,opex_per_upt
0,Access Services (AS),Demand Response,16695038,9426628,159008226,833736052,88.44,5.24,1.77,0.1,49.94
1,Access Services (AS),Demand Response Taxi,4405674,1647003,36337091,158530393,96.25,4.36,2.67,0.12,35.98
2,Alameda-Contra Costa Transit District,Bus,216639730,10439319,106099533,2457288244,235.39,23.16,20.75,2.04,11.34
3,Alameda-Contra Costa Transit District,Bus Rapid Transit,10012230,229273,1981875,57224517,249.59,28.87,43.67,5.05,5.72
4,Alameda-Contra Costa Transit District,Commuter Bus,8398451,363100,5777682,116802450,321.68,20.22,23.13,1.45,13.91


# Dealing with NaN and inf values
Some of the metric calculation results in either inf or NaN values due to divide-by-zero scenarios. These values break the standard devatition calculation.

We will analyze the differences in standard deviation by filtering the data by these scenarios
1. remove rows with zeros
2. 
    1. upt_per_vrh/vrm NaN (0/0) values replaced with zero. Zero riders per revenue mile/hours still make sense. buses can still run and pick up zero passengers. highlights service ineffencies 
    2. opex_per_upt/vrm/vrh inf (#/0) values replaced with its opexp value. operating cost still exist even if nobody rides the bus. highlights cost inefficienies 
    3. opex_per_upt/vrm/vrh NaN (0/0) values replaced with zeros. not running the bus results in not picking up passengers and  zero operating cost.

In [28]:
# any rows with zero upt/vrh/vrm/opex?
no_zero_rows = df_agg[
    (df_agg["upt"] != 0) 
    & (df_agg["vrh"] != 0)
    & (df_agg["vrm"] != 0)
    & (df_agg["opexp_total"] != 0)
]


In [25]:
replaced_values = df_agg.fillna(0)

# applying logic to alternate dataframe
col_list =[
    "opex_per_vrh",
    "opex_per_vrm",
    "opex_per_upt"
]

for i in col_list:
    replaced_values[i] = replaced_values.apply(
        lambda row: row["opexp_total"] if row[i] == np.inf 
        else row[i], axis=1
    )

In [30]:
display(
    no_zero_rows.describe(),
    replaced_values.describe()
)

Unnamed: 0,upt,vrh,vrm,opexp_total,opex_per_vrh,opex_per_vrm,upt_per_vrh,upt_per_vrm,opex_per_upt
count,352.0,352.0,352.0,352.0,352.0,352.0,352.0,352.0,352.0
mean,15805240.0,690077.3,10423940.0,132068500.0,152.105455,12.022585,10.053494,0.817472,28.689886
std,83973620.0,2525418.0,36336960.0,530703100.0,211.948732,21.986489,14.329247,1.609922,23.689859
min,3646.0,1125.0,13752.0,58055.0,21.93,0.52,0.62,0.05,1.86
25%,119541.0,33492.75,373068.5,3169014.0,83.9775,6.0375,2.59,0.19,10.1575
50%,443313.0,93763.5,1420240.0,10139650.0,114.285,8.675,5.355,0.375,18.705
75%,3039104.0,363465.2,5522744.0,49878800.0,149.925,11.9625,11.2125,0.7825,43.8475
max,1311211000.0,37618060.0,435132900.0,7272747000.0,2740.98,327.64,122.01,18.51,119.07


Unnamed: 0,upt,vrh,vrm,opexp_total,opex_per_vrh,opex_per_vrm,upt_per_vrh,upt_per_vrm,opex_per_upt
count,361.0,361.0,361.0,361.0,361.0,361.0,361.0,361.0,361.0
mean,15411200.0,672873.1,10164060.0,128784400.0,8633.119,8496.529,9.802853,0.797091,8512.781
std,82954020.0,2495977.0,35916770.0,524430800.0,129140.7,129149.6,14.235798,1.594786,129148.5
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,110958.0,30176.0,330090.0,2926879.0,83.18,5.97,2.49,0.18,10.1
50%,420382.0,88710.0,1357521.0,9127496.0,114.5,8.69,5.24,0.37,18.8
75%,2999062.0,353327.0,5283896.0,44245040.0,151.41,12.12,11.06,0.77,45.02
max,1311211000.0,37618060.0,435132900.0,7272747000.0,2396004.0,2396004.0,122.01,18.51,2396004.0


In [31]:
# who are are bottom 5% in each metrics
bottom_metrics = {
    "bottom_opex_vrh":"opex_per_vrh",
    "bottom_opex_vrm":"opex_per_vrm",
    "bottom_opex_upt":"opex_per_upt",
    "bottom_upt_vrh":"upt_per_vrh",
    "bottom_upt_vrm":"upt_per_vrm"
}
bottom_5 = {}
for k,v in bottom_metrics.items():
    bottom_5[k] = no_zero_rows[no_zero_rows[v] <= no_zero_rows[v].quantile(0.05)]

In [63]:
for k in bottom_5:
    print(f"\n Datset: {k}"),
    display(v[["agency_name","mode",bottom_metrics[k]]].sort_values(by=bottom_metrics[k]))


 Datset: bottom_opex_vrh


Unnamed: 0,agency_name,mode,opex_per_vrh
289,San Diego Association of Governments (SANDAG) ...,Vanpool,21.93
243,Los Angeles County Metropolitan Transportation...,Vanpool,23.2
301,San Joaquin Council (SJCOG),Vanpool,25.23
248,Metropolitan Transportation Commission (MTC) -...,Vanpool,25.88
328,Stanislaus Council of Governments (StanCOG) - ...,Vanpool,26.15
288,San Bernardino County Transportation Authority...,Vanpool,30.96
352,Victor Valley Transit Authority (VVTA),Vanpool,33.79
279,Riverside County Transportation Commission (RCTC),Vanpool,34.09
115,City of Malibu - Community Services Department,Demand Response Taxi,42.6
194,County of Placer (PCT/TART) - Department of Pu...,Vanpool,56.91



 Datset: bottom_opex_vrm


Unnamed: 0,agency_name,mode,opex_per_vrm
289,San Diego Association of Governments (SANDAG) ...,Vanpool,0.52
328,Stanislaus Council of Governments (StanCOG) - ...,Vanpool,0.54
248,Metropolitan Transportation Commission (MTC) -...,Vanpool,0.56
243,Los Angeles County Metropolitan Transportation...,Vanpool,0.58
301,San Joaquin Council (SJCOG),Vanpool,0.59
352,Victor Valley Transit Authority (VVTA),Vanpool,0.69
288,San Bernardino County Transportation Authority...,Vanpool,0.77
279,Riverside County Transportation Commission (RCTC),Vanpool,0.86
194,County of Placer (PCT/TART) - Department of Pu...,Vanpool,1.29
337,SunLine Transit Agency,Vanpool,1.33



 Datset: bottom_opex_upt


Unnamed: 0,agency_name,mode,opex_per_upt
289,San Diego Association of Governments (SANDAG) ...,Vanpool,5.21
243,Los Angeles County Metropolitan Transportation...,Vanpool,5.75
248,Metropolitan Transportation Commission (MTC) -...,Vanpool,6.12
301,San Joaquin Council (SJCOG),Vanpool,6.14
352,Victor Valley Transit Authority (VVTA),Vanpool,6.9
328,Stanislaus Council of Governments (StanCOG) - ...,Vanpool,7.4
288,San Bernardino County Transportation Authority...,Vanpool,8.07
279,Riverside County Transportation Commission (RCTC),Vanpool,8.18
194,County of Placer (PCT/TART) - Department of Pu...,Vanpool,13.13
337,SunLine Transit Agency,Vanpool,16.24



 Datset: bottom_upt_vrh


Unnamed: 0,agency_name,mode,upt_per_vrh
76,City of Escalon - Transit Services,Bus,1.11
357,Yolo County Transportation District (YCTD),Demand Response,1.38
195,County of Sacramento Municipal Services Agency...,Bus,1.42
315,Santa Clara Valley Transportation Authority (VTA),Demand Response,1.44
213,"Golden Gate Bridge, Highway and Transportation...",Demand Response,1.62
258,North County Transit District (NCTD),Demand Response,1.73
0,Access Services (AS),Demand Response,1.77
216,Imperial County Transportation Commission (ICTC),Demand Response,1.8
283,Riverside Transit Agency (RTA),Demand Response Taxi,1.81
115,City of Malibu - Community Services Department,Demand Response Taxi,1.84



 Datset: bottom_upt_vrm


Unnamed: 0,agency_name,mode,upt_per_vrm
76,City of Escalon - Transit Services,Bus,0.05
195,County of Sacramento Municipal Services Agency...,Bus,0.05
328,Stanislaus Council of Governments (StanCOG) - ...,Vanpool,0.07
283,Riverside Transit Agency (RTA),Demand Response Taxi,0.07
357,Yolo County Transportation District (YCTD),Demand Response,0.08
337,SunLine Transit Agency,Vanpool,0.08
184,City of Visalia (VT) - Transportation,Commuter Bus,0.08
216,Imperial County Transportation Commission (ICTC),Demand Response,0.08
213,"Golden Gate Bridge, Highway and Transportation...",Demand Response,0.09
244,Madera County - Public Works Department,Commuter Bus,0.09
