# [Research Task - Create visuals for PUC 99314.11 leg report](https://github.com/cal-itp/data-analyses/issues/1656)
1. line graph of each metric (UPT, VRM, PMT) by agency
- x-axis is year
- y-axis is metric
- each line is an agency
- dotted line is average metric for all agencies in the year

2. line graph of each metric, by district
- similar to above
- each line is a district
- dotted line is average metrics for all districts the year

3. line graph of each metric, by mode
- similar to above
- each line is a mode
- dotter line is average metric for all modes in the year

Maybe try a box plot to show min/max/average for each metric?

## NTD Policy Manual for collecting UPT and PMT

### NTD Full Reporting Policy Manual 
However, FTA recognizes that certain statistics are challenging to collect and can drastically increase the reporting burden for transit agencies. To assist reporters who would find conducting 100 percent count burdensome, `transit agencies may estimate Unlinked Passenger Trips (UPT) and PMT through sampling`. The NTD provides a sampling method and sampling guidance on the NTD website.

### NTD Full Reporting Policy Manual & NTD Reduced Reporting Polict Manual
Collecting Service Consumed Data Transit agencies must report actual data on the Annual Report for all service data except UPT and PMT. `Only Full Reporters report PMT data to the NTD.` For these two data points, agencies may provide an estimate but only if the actual 100 percent data are not reliably collected and routinely processed.



In [1]:
import altair as alt
import pandas as pd
from calitp_data_analysis.sql import get_engine, to_snakecase, query_sql
from functools import cache
from calitp_data_analysis.gcs_pandas import GCSPandas
@cache
def gcs_pandas():
    return GCSPandas()

pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)
pd.options.display.float_format = '{:,.2f}'.format

## Data querying, comparing, cleaning

### warehouse query

In [None]:
# metric_list = [
#     "pmt",
#     "upt",
#     "vrh",
#     # "opexp_total" # not needed for this project
# ]

# # empty list for appending DFs
# df_list = []

# for metric in metric_list:
#         query = f"""
#         SELECT
#           ntd_id,
#           source_agency,
#           agency_status,
#           primary_uza_name,
#           uza_population,
#           uza_area_sq_miles,
#           year,
#           mode,
#           type_of_service,
#           reporter_type,
#           SUM({metric}) AS total_{metric},
#         FROM
#           `cal-itp-data-infra.mart_ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_{metric}`
#         WHERE
#           source_state = "CA"
#           AND year BETWEEN 2018 AND 2023
#         GROUP BY
#           ntd_id,
#           source_agency,
#           agency_status,
#           primary_uza_name,
#           uza_population,
#           uza_area_sq_miles,
#           year,
#           mode,
#           type_of_service,
#           reporter_type
#         """
#         # create df
#         metric = query_sql(query, as_df=True)

#         # append df to list
#         df_list.append(metric)

# # unpack list into separate DFs
# ntd_pmt, ntd_upt, ntd_vrh = df_list

In [None]:
# get districts for ntd ID


# for metric in metric_list:
#         query = f"""
#         SELECT
#           `mart_transit_database.dim_organizations`.`key` AS `key`,
#           `mart_transit_database.dim_organizations`.`source_record_id` AS `source_record_id`,
#           `mart_transit_database.dim_organizations`.`name` AS `name`,
#           `mart_transit_database.dim_organizations`.`ntd_id_2022` AS `ntd_id_2022`,
#           `Bridge_Organizations_X_Headquarters_County_Geography___Key`.`county_geography_name` AS `county`,
#           `Dim_County_Geography___County_Geography_Key`.`caltrans_district` AS `caltrans_district`
#         FROM
#           `mart_transit_database.dim_organizations`

#         LEFT JOIN `mart_transit_database.bridge_organizations_x_headquarters_county_geography` AS `Bridge_Organizations_X_Headquarters_County_Geography___Key` ON `mart_transit_database.dim_organizations`.`key` = `Bridge_Organizations_X_Headquarters_County_Geography___Key`.`organization_key`
#           LEFT JOIN `mart_transit_database.dim_county_geography` AS `Dim_County_Geography___County_Geography_Key` ON `Bridge_Organizations_X_Headquarters_County_Geography___Key`.`county_geography_key` = `Dim_County_Geography___County_Geography_Key`.`key`
#         WHERE
#           (
#             `mart_transit_database.dim_organizations`.`_is_current` = TRUE
#           )

#            AND (
#             `mart_transit_database.dim_organizations`.`ntd_id_2022` IS NOT NULL
#           )
#           AND (
#             (
#               `mart_transit_database.dim_organizations`.`ntd_id_2022` <> ''
#             )

#             OR (
#               `mart_transit_database.dim_organizations`.`ntd_id_2022` IS NULL
#             )
#           )
#           AND (
#             `Bridge_Organizations_X_Headquarters_County_Geography___Key`.`_is_current` = TRUE
#           )
#           AND (
#             `Dim_County_Geography___County_Geography_Key`.`_is_current` = TRUE
#           )
#         """
#         # create df
#         ntd_id_x_district = query_sql(query, as_df=True)
        
# ntd_id_x_district["caltrans_district"] = ntd_id_x_district["caltrans_district"].astype("str")

In [None]:
# merge_on_col = [
#     "ntd_id",
#     "year",
#     "source_agency",
#     "agency_status",
#     "primary_uza_name",
#     "uza_population",
#     "uza_area_sq_miles",
#     "mode",
#     "type_of_service",
#     "reporter_type",
# ]

# merge_1 = ntd_vrh.merge(ntd_upt, on=merge_on_col, how="inner")
# # merge_2 = merge_1.merge(ntd_vrh, on=merge_on_col, how = "inner")

# ntd_metrics_merge = merge_1.merge(ntd_pmt, on=merge_on_col, how="inner")

### data from other report

In [None]:
# gcs_path = "gs://calitp-analytics-data/data-analyses/ntd/"
# ntd_name = "ntd_operator_data_18_23.parquet"

# ntd_all_metrics = pd.read_parquet(f"{gcs_path}{ntd_name}")

### compare datasets

In [None]:
# display(
#     ntd_all_metrics.info(), ntd_metrics_merge.info()  # mode/service is aggregated up
# )

In [None]:
# display(
#     ntd_all_metrics["ntd_id"].nunique()
#     == ntd_metrics_merge["ntd_id"].nunique(),  # TRUE, same count of unique values
#     set(ntd_all_metrics["ntd_id"].unique())
#     == set(ntd_metrics_merge["ntd_id"].unique()),  # TRUE, same unique NTD_IDs
# )

In [None]:
# display(
#     ntd_all_metrics["ntd_id"].nunique(),
#     ntd_metrics_merge["ntd_id"].nunique()
# )

In [None]:
# metric_cols = ["total_upt", "total_vrh", "total_upt"]

# for metric in metric_cols:
#     print(
#         ntd_all_metrics[metric].sum() == ntd_metrics_merge[metric].sum()
#     )  # TRUE sum of each metrics are equal

### merge in the district numbers to ntd_metric_merge

In [None]:
# ntd_metrics_merge = ntd_metrics_merge.merge(
#     ntd_id_x_district[["ntd_id_2022","county","caltrans_district"]],
#     left_on = "ntd_id",
#     right_on = "ntd_id_2022",
#     how="inner",
#     indicator=True
# )

In [None]:
# ntd_metrics_merge[ntd_metrics_merge["caltrans_district"].isna()].head()

## save out data

In [2]:
gcs_path = "gs://calitp-analytics-data/data-analyses/ntd/"
# ntd_metrics_merge.to_parquet(f"{gcs_path}puc_analysis_data_2025_12_9.parquet")

### read in cleaned ata

In [None]:
# ntd_metrics_merge = gcs_pandas().read_parquet(f"{gcs_path}puc_analysis_data.parquet") #puc_analysis_data.parquet is initial analysis data
# ntd_metrics_merge.info()

**everything matches, moving with `ntd_metrics_merge` since its has mode/service**

In [None]:
# cort_merge_filname = "ntd_cohort_data_2026-01-26.parquet"
# ntd_cohort_merge = gcs_pandas().read_parquet(f"{gcs_path}{cort_merge_filname}")

In [3]:
yes_no_data = gcs_pandas().read_parquet(f"{gcs_path}ntd_yes_no_data_2026-01-29.parquet")

yes_no_data.columns = yes_no_data.columns.str.lower()
# yes_no_data[["year","ntd_id"]] = yes_no_data[["year","ntd_id"]].astype("str")
# yes_no_data = yes_no_data.rename(columns={"requirement_flag":"requirement_met_flag"})
display(
    yes_no_data.info(),
    yes_no_data.head(3)
)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4965 entries, 0 to 4964
Data columns (total 25 columns):
 #   Column                Non-Null Count  Dtype   
---  ------                --------------  -----   
 0   ntd_id                4965 non-null   object  
 1   source_agency         4965 non-null   object  
 2   agency_status         4965 non-null   object  
 3   primary_uza_name      3704 non-null   object  
 4   uza_population        4965 non-null   int64   
 5   uza_area_sq_miles     4965 non-null   float64 
 6   year                  4965 non-null   object  
 7   mode                  4965 non-null   object  
 8   type_of_service       4965 non-null   object  
 9   reporter_type         4965 non-null   object  
 10  total_vrh             3964 non-null   float64 
 11  total_upt             3964 non-null   float64 
 12  total_pmt             2432 non-null   float64 
 13  ntd_id_2022           4965 non-null   object  
 14  county                4965 non-null   object  
 15  calt

None

Unnamed: 0,ntd_id,source_agency,agency_status,primary_uza_name,uza_population,uza_area_sq_miles,year,mode,type_of_service,reporter_type,total_vrh,total_upt,total_pmt,ntd_id_2022,county,caltrans_district,ntd_entity_name,area_type,metric,quartile,metric_short,metric_value,requirement,requirement_met_flag,_merge
0,90003,San Francisco Bay Area Rapid Transit District (BART),Active,"San Francisco--Oakland, CA",3515933,513.8,2019,HR,DO,Full Reporter,2225056.0,125105460.0,1756364558.0,90003,San Francisco,4,San Francisco Bay Area Rapid Transit District,Urban,Farebox Recovery Ratio,Top 25%,FBR,63.14,Met FBR Min,True,both
1,90003,San Francisco Bay Area Rapid Transit District (BART),Active,"San Francisco--Oakland, CA",3515933,513.8,2019,HR,DO,Full Reporter,2225056.0,125105460.0,1756364558.0,90003,San Francisco,4,San Francisco Bay Area Rapid Transit District,Urban,Local Funding % Change vs 2019,Middle 50%,Pct_Change_vs_2019,0.01,Maintained_or_Increased_vs_2019,True,both
2,90003,San Francisco Bay Area Rapid Transit District (BART),Active,"San Francisco--Oakland, CA",3515933,513.8,2020,MB,PT,Full Reporter,,,,90003,San Francisco,4,San Francisco Bay Area Rapid Transit District,Urban,Farebox Recovery Ratio,Top 25%,FBR,42.2,Met FBR Min,True,both


In [None]:
### save cleaned data to csv
# ntd_metrics_merge.to_csv(f"{gcs_path}puc_analysis_data_2025_12_9.csv")

## Group aggregations

In [None]:
# melt big DF so all columns are under 1 column.
group_list_melt = [
    "source_agency",
    "year",
    "ntd_id",
    "reporter_type",
    "caltrans_district",
    "mode",
    "service"
]

value_cols = ["total_upt", "total_vrh", "total_pmt"]

melt = pd.melt(
    ntd_metrics_merge,
    id_vars=group_list_melt,
    value_vars=value_cols,
    var_name="metric",
    value_name="metric_value",
    ignore_index=True,
)

In [None]:
### save melted data to csv
# melt.to_csv(f"{gcs_path}puc_analysis_data_melt_2025_12_9.csv")

In [None]:
# What does group/agg the melted DF look like?
group_list_agg = [
    "source_agency",
    "year",
    "ntd_id",
    "reporter_type",
    "caltrans_district",
]
vrh_total = (
    melt[melt["metric"] == "total_vrh"]
    .groupby(group_list_agg)["metric_value"]
    .sum()
    .reset_index()
).rename(columns={"metric_value": "total_vrh"})

upt_total = (
    melt[melt["metric"] == "total_upt"]
    .groupby(group_list_agg)["metric_value"]
    .sum()
    .reset_index()
).rename(columns={"metric_value": "total_upt"})

passenger_total = (
    melt[melt["metric"] == "total_pmt"]
    .groupby(group_list_agg)["metric_value"]
    .sum()
    .reset_index()
).rename(columns={"metric_value": "total_pmt"})

yearly_totals = (
    ntd_metrics_merge.groupby(["year"])
    .agg({"total_upt": "sum", "total_vrh": "sum", "total_pmt": "sum"})
    .reset_index()
) 

agency_totals = (
    ntd_metrics_merge.groupby(["year","source_agency"])
    .agg({"total_upt": "sum", "total_vrh": "sum", "total_pmt": "sum"})
    .reset_index()
)

district_totals = (
    ntd_metrics_merge.groupby(["caltrans_district","year"])
    .agg({"total_upt": "sum", "total_vrh": "sum", "total_pmt": "sum"})
    .reset_index()
)

mode_totals = (
    ntd_metrics_merge.groupby(["mode","year"])
    .agg({"total_upt": "sum", "total_vrh": "sum", "total_pmt": "sum"})
    .reset_index()
)

In [None]:
# how many rows have zero PMT?
len(passenger_total[passenger_total["total_pmt"] == 0])

### chart functtion with mean line

In [None]:
y_col = "1"
x_col = "2"
color_col= "3red"
tooltip= [y_col,x_col]
if color_col:
    tooltip.append(color_col)
tooltip

In [None]:
type(tooltip)

In [13]:
def make_chart(data, x_col, y_col, title, color_col = False):
    tooltip_list=[y_col, x_col]
    if color_col:
        tooltip_list.append(color_col)
    
    chart = (alt.Chart(data)
        .mark_line(point=True)
        .encode(
            x=alt.X(x_col, axis = alt.Axis(labelFontSize=15, titleFontSize=15)),
            y=alt.Y(f"{y_col}:Q", title=f"{y_col}", axis=alt.Axis(labelFontSize=15, titleFontSize=15)),
            tooltip= tooltip_list,
            # legend = alt.Legend(labelFontSize=15, titleFontSize=15),
            # color = color_col if color_col else alt.Undefined,
            color=alt.Color(color_col, legend = alt.Legend(labelFontSize=15, titleFontSize=15)) if color_col else alt.Undefined
        )
        .properties(
            title= title,
            height=500,
            width=700,
        )
        .interactive()
    )

    # line for average
    # baseline= pd.DataFrame({
    # "baseline":[data[y_col].mean()]
    # })
    
    # line = (
    #     alt.Chart(baseline)
    #     .mark_rule(color = "red", strokeWidth=5, strokeDash=[10, 5], point=True,)
    #     .encode(
    #         y=alt.Y(f"baseline:Q",axis=alt.Axis(format=",.0f", orient="left")),
    #         tooltip=[alt.Tooltip(f"baseline")],
    #          color=alt.Color("baseline", 
    #                          legend = alt.Legend(title="baseline", labelFontSize=15, titleFontSize=15)
    #                         )
    #         )
        
    # )


    combo = alt.layer(
        chart, 
        # line
    ).resolve_scale(y="shared")

    return display(combo)

## Overall Totals

### Metric grand total per year

In [None]:
for col in yearly_totals.columns[1:]:
    yearly_avg = format(yearly_totals[col].mean(),",.2f")
    
    print(f"\nAverage {col} per  by year: {yearly_avg}"),
    make_chart(
        data = yearly_totals, 
        y_col = col,
        x_col = "year:N",
        title = f"Grand Total {col} per year",
    )


#### Boxplot of each metric grand total per year

In [None]:
all_totals_dict = {
    "total_vrh": vrh_total,
    "total_upt": upt_total,
    "total_pmt": passenger_total,
}

# Boxplot
# removing zero-values to see what happens
for col, df in all_totals_dict.items():
    box_plot = (
        alt.Chart(df[df[col] != 0])
        .mark_boxplot(extent="min-max")
        .encode(
            x="year:N",
            y=alt.Y(col, axis=alt.Axis(format=",.0f", labelFontSize=15, titleFontSize=15)),
            # row = "reporter_type",
            tooltip=["source_agency", alt.Tooltip(col, format=",.f"), "year"],
        )
        .interactive()
        .properties(title=col, height=800, width="container")
    )

    display(
        f"Number of Agencies that reported zero {col}: {df[df[col]==0].ntd_id.nunique()}",
        box_plot.resolve_scale(y="independent"),
    )

### Metrics grand total by district, per year

In [None]:
for col in district_totals.columns[2:]:
    district_avg = format(district_totals[col].mean(),",.2f")
    
    print(f"\nAverage {col} per  by year: {district_avg}"),
    make_chart(
        data = district_totals.astype({"caltrans_district":"int"}),
        y_col = col,
        x_col = "year:N",
        color_col = "caltrans_district:N",
        title = f"{col} by district per year"
    )

#### Box Plot of metric per district

In [None]:
# Boxplot
# removing zero-values to see what happens
for col in district_totals.columns[2:]:
    box_plot = (
        alt.Chart(district_totals[district_totals[col] != 0])
        .mark_boxplot(extent="min-max")
        .encode(
            x="caltrans_district:N",
            y=col,
            # row = "reporter_type",
            tooltip=[col, "year"],
        )
        .interactive()
        .properties(title=f"Box Plot of {col} per district", height=200, width=1000)
    )

    display(
        f"\nNumber of rows that reported zero {col}: {district_totals[district_totals[col]==0][col].count()}",
        box_plot.resolve_scale(y="independent"),
    )

### Metrics grand total by agency, per year

In [None]:
agency_avg = format(agency_totals[col].mean(),",.2f")

for col in agency_totals.columns[2:]:
    agency_avg = format(agency_totals[col].mean(),",.2f")
    
    print(f"\nAverage {col} per agency by year: {agency_avg}"),
    make_chart(
        data = agency_totals,
        y_col = col,
        x_col = "year:N",
        color_col = "source_agency:N",
        title = f"{col} per agency by year"
    )

### Metrics grand total by mode, per year

In [None]:
for col in mode_totals.columns[2:]:
    mode_avg = format(mode_totals[col].mean(),",.2f")
    
    print(f"\nAverage {col} per mode by year: {mode_avg}"),
    make_chart(
        data = mode_totals,
        y_col = col,
        x_col = 'year:N',
        color_col = "mode:N",
        title = f"{col} per Mode by year",
    )
    

## Additional Comments

> “The pandemic had the most severe effects in more `urbanized Caltrans districts` (e.g., District 4: Bay Area and District 7: Los Angeles and Ventura Counties), where `unlinked passenger trips` and `passenger miles traveled` fell dramatically due to reduced commuting and widespread office closures. In `smaller districts`, ridership remained steadier, reflecting a customer base more reliant on transit for essential travel rather than commuting. 
>
>Recovery since 2021 has been uneven across the state. Although all districts have seen ridership and passenger miles rise from their pandemic lows, none have returned to `FY 2018–2019` highs. Caltrans District 7 (Los Angeles) and District 4 (Bay Area) have experienced the steepest declines and slowest recovery. Overall, `urbanized districts` drive the statewide totals, with their ridership swings dominating the overall trend. `Rural and small-agency districts`, however, exhibit much less volatility, underscoring the role of transit in those regions as an essential service rather than one tied primarily to commuting downtown cores.”


### District 4 and District 7 UPT / PMT 

In [None]:
for col in ["total_upt","total_pmt"]:
    make_chart(
        data = district_totals[
            district_totals["caltrans_district"].isin(["4","7"])
            ].astype({"caltrans_district":"int"}),
        y_col = col,
        x_col = "year:N",
        color_col = "caltrans_district:N",
        title = f"{col} for 'Urbanized District' (4 and 7) per year"
    )

### Remaining District UPT / PMT

In [None]:
for col in ["total_upt","total_pmt"]:
    make_chart(
        data = district_totals[~district_totals["caltrans_district"].isin(["4","7"])].astype({"caltrans_district":"int"}),
        y_col = col,
        x_col = "year:N",
        color_col = "caltrans_district:N",
        title = f"{col} for 'Rural/Smaller' Districts (1 to 12, exluding 4 and 7) per year"
    )

# Farebox / Local Funding Requirements Analysis
- Note; individual operators can move between cohorts from year to year.

In [4]:
merge_farebox = yes_no_data[yes_no_data["metric"]=="Farebox Recovery Ratio"]
merge_funding = yes_no_data[yes_no_data["metric"]=="Local Funding % Change vs 2019"]

group_list_melt = [
    "source_agency",
    "year",
    "ntd_id",
    "caltrans_district",
    "mode",
    "type_of_service",
    "area_type",
    "reporter_type",
    "quartile",
    "metric",
    "metric_value",
    "requirement",
    "requirement_met_flag"
]

value_cols = ["total_upt", "total_vrh", "total_pmt"]

melt_farebox = pd.melt(
    merge_farebox,
    id_vars=group_list_melt,
    value_vars=value_cols,
    var_name="ntd_metric",
    value_name="ntd_metric_value",
    ignore_index=True,
)

melt_funding = pd.melt(
    merge_funding,
    id_vars=group_list_melt,
    value_vars=value_cols,
    var_name="ntd_metric",
    value_name="ntd_metric_value",
    ignore_index=True,
)



In [5]:
years = [
    "2020",
    "2021",
    "2022",
    "2023",
    "2024"
]

modes = [
    "MB",
    "CB",
    "RB",
    "TB"
]

tos = [
    "PT",
    "DO"
]

## Farebox

### Urban 

In [6]:
import numpy as np

In [18]:
# do pivot table
urban_farebox_median_pivot_true = pd.pivot_table(
    melt_farebox[
    # (melt_farebox["requirement_met_flag"]==True)
     (melt_farebox["area_type"]=="Urban")
    & (melt_farebox["mode"].isin(modes))
    & (melt_farebox["year"].isin(years))
    & (melt_farebox["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["metric","area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = "median"
)

# urban_farebox_median_pivot_false = pd.pivot_table(
#     melt_farebox[
#     (melt_farebox["requirement_met_flag"]==False)
#     & (melt_farebox["area_type"]=="Urban")
#     & (melt_farebox["mode"].isin(modes))
#     & (melt_farebox["year"].isin(years))
#     & (melt_farebox["type_of_service"].isin(tos))
# ],
#     values = "ntd_metric_value",
#     index = ["area_type","requirement_met_flag","ntd_metric"],
#     columns =  "year",
#     aggfunc = ["median"]
# )

urban_farebox_mean_pivot_true = pd.pivot_table(
    melt_farebox[
    # (melt_farebox["requirement_met_flag"]==True)
     (melt_farebox["area_type"]=="Urban")
    & (melt_farebox["mode"].isin(modes))
    & (melt_farebox["year"].isin(years))
    & (melt_farebox["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["metric","area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = ["mean"]
)

# urban_farebox_mean_pivot_false = pd.pivot_table(
#     melt_farebox[
#     (melt_farebox["requirement_met_flag"]==False)
#     & (melt_farebox["area_type"]=="Urban")
#     & (melt_farebox["mode"].isin(modes))
#     & (melt_farebox["year"].isin(years))
#     & (melt_farebox["type_of_service"].isin(tos))
# ],
#     values = "ntd_metric_value",
#     index = ["area_type","requirement_met_flag","ntd_metric"],
#     columns =  "year",
#     aggfunc = ["mean"]
# )

display(
    """Yearly Median values for Urban Operators that met/no met Farebox Recovery Ratio Requirements""",
    urban_farebox_median_pivot_true,
    """
    """,
    # """Yearly Totals for Urban Operators that DID NOT MEET Farebox Recovery Ratio Requirements""",
    # urban_farebox_median_pivot_false,

     """Yearly Mean values for Urban Operators that met/no met Farebox Recovery Ratio Requirements""",
    urban_farebox_mean_pivot_true,
    """
    """,
    # """Yearly Averages for Urban Operators that DID NOT MEET Farebox Recovery Ratio Requirements""",
    # urban_farebox_mean_pivot_false,
)

'Yearly Median values for Urban Operators for Farebox Recovery Ratio Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,year,2020,2021,2022,2023,2024
metric,area_type,requirement_met_flag,ntd_metric,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Farebox Recovery Ratio,Urban,False,total_pmt,6124791.0,2718913.5,3584092.0,5319410.5,6631894.5
Farebox Recovery Ratio,Urban,False,total_upt,537873.5,222843.0,310051.0,459614.0,519608.5
Farebox Recovery Ratio,Urban,False,total_vrh,42076.0,38479.0,41326.0,45071.0,54635.0
Farebox Recovery Ratio,Urban,True,total_pmt,10515437.0,7723232.0,11735541.0,10184477.5,11133340.0
Farebox Recovery Ratio,Urban,True,total_upt,977044.0,1119309.0,2180106.0,1705719.0,1449251.5
Farebox Recovery Ratio,Urban,True,total_vrh,71837.0,192663.0,183510.0,134656.0,115623.5


'\n    '

'Yearly Mean values for Urban Operators for Farebox Recovery Ratio Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,mean,mean,mean,mean,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,year,2020,2021,2022,2023,2024
metric,area_type,requirement_met_flag,ntd_metric,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Farebox Recovery Ratio,Urban,False,total_pmt,29597783.07,13994405.09,18956832.96,23349608.11,27239778.86
Farebox Recovery Ratio,Urban,False,total_upt,6167727.71,3044784.94,4124936.36,4784655.02,5638209.88
Farebox Recovery Ratio,Urban,False,total_vrh,232214.93,205616.43,210117.96,218608.7,246396.08
Farebox Recovery Ratio,Urban,True,total_pmt,21650859.15,13934291.85,20829574.59,23993587.3,24077104.22
Farebox Recovery Ratio,Urban,True,total_upt,4345957.69,5690707.85,6524942.04,6599855.32,7133312.23
Farebox Recovery Ratio,Urban,True,total_vrh,235200.61,328835.85,332464.22,288938.74,273955.73


'\n    '

In [27]:
for metric in value_cols:
    melt_farebox_urban_agg = melt_farebox[
        (melt_farebox["area_type"]=="Urban")
        & (melt_farebox["mode"].isin(modes))
        & (melt_farebox["year"].isin(years))
        & (melt_farebox["type_of_service"].isin(tos))
        & (melt_farebox["ntd_metric"]== metric)
                ].groupby(["area_type","year","requirement_met_flag","ntd_metric"]).agg({"ntd_metric_value":"mean"}).reset_index()
    make_chart(
        data = melt_farebox_urban_agg,
        y_col = "ntd_metric_value",
        x_col = "year:N",
        title = f"""What was the average {metric} for Bus modes for Urban operators that meet, or did not meet, Farebox Recovery ratio between 2019-2024""",
        color_col = "requirement_met_flag"
        
    )

### Rural 

In [17]:
# pivot table
rural_farebox_pivot_true = pd.pivot_table(
    melt_farebox[
    # (melt_farebox["requirement_met_flag"]==True)
     (melt_farebox["area_type"]=="Rural")
    & (melt_farebox["mode"].isin(modes))
    & (melt_farebox["year"].isin(years))
    & (melt_farebox["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = ["median"]
)

rural_farebox_pivot_false = pd.pivot_table(
    melt_farebox[
    # (melt_farebox["requirement_met_flag"]==False)
     (melt_farebox["area_type"]=="Rural")
    & (melt_farebox["mode"].isin(modes))
    & (melt_farebox["year"].isin(years))
    & (melt_farebox["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = ["mean"]
)

display(
    """Yearly Median values for Rual Operators for Farebox Recovery Ratio Requirements""",
    rural_farebox_pivot_true,
    """
    """,
    """Yearly Mean values  for Rual Operators for Farebox Recovery Ratio Requirements""",
    rural_farebox_pivot_false,
)

'Yearly Median values for Rual Operators for Farebox Recovery Ratio Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,median,median,median,median,median
Unnamed: 0_level_1,Unnamed: 1_level_1,year,2020,2021,2022,2023,2024
area_type,requirement_met_flag,ntd_metric,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Rural,False,total_upt,30454.0,15030.5,16288.0,22763.0,26394.0
Rural,False,total_vrh,5791.0,6291.5,5329.0,5499.0,5672.5
Rural,True,total_pmt,,,,,0.0
Rural,True,total_upt,54925.0,18596.0,33307.0,41611.5,48329.0
Rural,True,total_vrh,10039.0,4392.0,6211.0,8357.0,9619.0


'\n    '

'Yearly Mean values  for Rual Operators for Farebox Recovery Ratio Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,mean,mean,mean,mean,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,year,2020,2021,2022,2023,2024
area_type,requirement_met_flag,ntd_metric,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Rural,False,total_upt,57616.06,32342.7,35268.96,35731.07,39628.08
Rural,False,total_vrh,10108.55,9381.4,9214.08,8222.15,7679.85
Rural,True,total_pmt,,,,,0.0
Rural,True,total_upt,146522.53,56331.47,111890.79,160709.56,168387.06
Rural,True,total_vrh,14035.53,10131.2,12848.32,14680.44,16179.53


In [19]:
for metric in value_cols:
    melt_farebox_rural_agg = melt_farebox[
        (melt_farebox["area_type"]=="Rural")
        & (melt_farebox["mode"].isin(modes))
        & (melt_farebox["year"].isin(years))
        & (melt_farebox["type_of_service"].isin(tos))
        & (melt_farebox["ntd_metric"]== metric)
                ].groupby(["area_type","year","requirement_met_flag","ntd_metric"]).agg({"ntd_metric_value":"mean"}).reset_index()
    make_chart(
        data = melt_farebox_rural_agg,
        y_col = "ntd_metric_value",
        x_col = "year:N",
        title = f"Average {metric} for Bus modes for Rual operators that meet, or did not meet, Farebox Recovery ratio between 2019-2024",
        color_col = "requirement_met_flag"
        
    )

## Funding Expended

### Urban 

In [31]:
# do pivot table
urban_funding_pivot_true = pd.pivot_table(
    melt_funding[
    # (melt_funding["requirement_met_flag"]==True)
     (melt_funding["area_type"]=="Urban")
    & (melt_funding["mode"].isin(modes))
    & (melt_funding["year"].isin(years))
    & (melt_funding["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = ["median"]
)

urban_funding_pivot_false = pd.pivot_table(
    melt_funding[
    # (melt_funding["requirement_met_flag"]==False)
     (melt_funding["area_type"]=="Urban")
    & (melt_funding["mode"].isin(modes))
    & (melt_funding["year"].isin(years))
    & (melt_funding["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = ["mean"]
)

display(
    """Yearly Median values for Urban Operators that met/no met Local Funding Requirements""",
    urban_funding_pivot_true,
    """
    """,
    """Yearly Mean values for Urban Operators that met/no met Local Funding Requirements""",
    urban_funding_pivot_false,
)

'Yearly Median values for Urban Operators that met/no met Local Funding Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,median,median,median,median,median
Unnamed: 0_level_1,Unnamed: 1_level_1,year,2020,2021,2022,2023,2024
area_type,requirement_met_flag,ntd_metric,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Urban,False,total_pmt,6676822.0,2901557.5,4085449.5,5969271.5,7614651.0
Urban,False,total_upt,561752.0,219254.0,311609.5,488451.0,529731.0
Urban,False,total_vrh,47554.0,39525.0,40241.0,46360.0,53970.0
Urban,True,total_pmt,18212236.5,3748501.0,10770579.0,8617980.0,7407682.0
Urban,True,total_upt,683873.0,343617.0,1519157.5,1102685.0,956966.0
Urban,True,total_vrh,46945.5,54540.0,105138.0,66932.0,70270.0


'\n    '

'Yearly Mean values for Urban Operators that met/no met Local Funding Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,mean,mean,mean,mean,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,year,2020,2021,2022,2023,2024
area_type,requirement_met_flag,ntd_metric,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Urban,False,total_pmt,26292926.43,13195788.14,19229722.86,23657209.94,27477909.84
Urban,False,total_upt,5510182.82,3072288.73,4364879.83,5160701.13,6328922.78
Urban,False,total_vrh,219514.99,190262.16,209428.92,218553.1,245506.09
Urban,True,total_pmt,15497900.17,5706249.2,11612005.33,16581196.82,15725019.8
Urban,True,total_upt,2617419.7,1689412.43,3007311.9,3578997.85,2723324.67
Urban,True,total_vrh,139173.1,146141.57,186601.2,213527.23,169323.81


In [32]:
for metric in value_cols:
    melt_funding_urban_agg = melt_funding[
        (melt_funding["area_type"]=="Urban")
        & (melt_funding["mode"].isin(modes))
        & (melt_funding["year"].isin(years))
        & (melt_funding["type_of_service"].isin(tos))
        & (melt_funding["ntd_metric"]== metric)
                ].groupby(["area_type","year","requirement_met_flag","ntd_metric"]).agg({"ntd_metric_value":"mean"}).reset_index()
    make_chart(
        data = melt_funding_urban_agg,
        y_col = "ntd_metric_value",
        x_col = "year:N",
        title = f"Average {metric} for Bus modes for Urban operators that meet, or did not meet, Farebox Recovery ratio between 2019-2024",
        color_col = "requirement_met_flag"
        
    )

### Rural 

In [33]:
# pivot table
rural_funding_pivot_true = pd.pivot_table(
    melt_funding[
    # (melt_funding["requirement_met_flag"]==True)
     (melt_funding["area_type"]=="Rural")
    & (melt_funding["mode"].isin(modes))
    & (melt_funding["year"].isin(years))
    & (melt_funding["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = ["sum"]
)

rural_funding_pivot_false = pd.pivot_table(
    melt_funding[
    # (melt_funding["requirement_met_flag"]==False)
     (melt_funding["area_type"]=="Rural")
    & (melt_funding["mode"].isin(modes))
    & (melt_funding["year"].isin(years))
    & (melt_funding["type_of_service"].isin(tos))
],
    values = "ntd_metric_value",
    index = ["area_type","requirement_met_flag","ntd_metric"],
    columns =  "year",
    aggfunc = ["sum"]
)

display(
    """Yearly Totals for Rural Operators that met/no met Local Fudning Requirements""",
    rural_funding_pivot_true,
    """
    """,
    """Yearly Totals for Rural Operators that met/no met Local funding requirements Requirements""",
    rural_funding_pivot_false,
)

'Yearly Totals for Rural Operators that met/no met Local Fudning Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sum,sum,sum,sum,sum
Unnamed: 0_level_1,Unnamed: 1_level_1,year,2020,2021,2022,2023,2024
area_type,requirement_met_flag,ntd_metric,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Rural,False,total_pmt,0.0,0.0,0.0,0.0,0.0
Rural,False,total_upt,3645779.0,1728150.0,1695000.0,1699526.0,1256221.0
Rural,False,total_vrh,461144.0,421425.0,395934.0,346111.0,240488.0
Rural,True,total_pmt,0.0,0.0,0.0,0.0,0.0
Rural,True,total_upt,339785.0,103296.0,1394273.0,1966780.0,2810687.0
Rural,True,total_vrh,64392.0,16040.0,94867.0,137541.0,261329.0


'\n    '

'Yearly Totals for Rural Operators that met/no met Local funding requirements Requirements'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sum,sum,sum,sum,sum
Unnamed: 0_level_1,Unnamed: 1_level_1,year,2020,2021,2022,2023,2024
area_type,requirement_met_flag,ntd_metric,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Rural,False,total_pmt,0.0,0.0,0.0,0.0,0.0
Rural,False,total_upt,3645779.0,1728150.0,1695000.0,1699526.0,1256221.0
Rural,False,total_vrh,461144.0,421425.0,395934.0,346111.0,240488.0
Rural,True,total_pmt,0.0,0.0,0.0,0.0,0.0
Rural,True,total_upt,339785.0,103296.0,1394273.0,1966780.0,2810687.0
Rural,True,total_vrh,64392.0,16040.0,94867.0,137541.0,261329.0


In [34]:
for metric in value_cols:
    melt_funding_rural_agg = melt_funding[
        (melt_funding["area_type"]=="Rural")
        & (melt_funding["mode"].isin(modes))
        & (melt_funding["year"].isin(years))
        & (melt_funding["type_of_service"].isin(tos))
        & (melt_funding["ntd_metric"]== metric)
                ].groupby(["area_type","year","requirement_met_flag","ntd_metric"]).agg({"ntd_metric_value":"mean"}).reset_index()
    make_chart(
        data = melt_funding_rural_agg,
        y_col = "ntd_metric_value",
        x_col = "year:N",
        title = f"Average {metric} for Bus modes for Rual operators that meet, or did not meet, Local funding requirements between 2019-2024",
        color_col = "requirement_met_flag"
        
    )