In [1]:
# Parameters
state = "CA"


# Choose cut-offs for competitive or viable competitive

* What threshold to use for `pct_trips_competitive` to count a route as competitive (based on `bus_multiplier` only)?
* What threshold to use for `pct_below_cutoff` to count a competitive route as viable competitive (also factor in `bus_difference`)?

In [2]:
import geopandas as gpd
import intake
import pandas as pd

catalog = intake.open_catalog("./*.yml")



In [3]:
# parameters cell
state = "CA"

# Maps for CA

In [4]:
df = catalog.competitive_route_variability.read()

In [5]:
route_cols = ["calitp_itp_id", "route_id", "route_group"]
route_groups = df.route_group.unique().tolist()

In [6]:
print("Overall")
df2 = df[route_cols + ["pct_trips_competitive", "pct_below_cutoff"]].drop_duplicates()
print(df2.pct_trips_competitive.describe())

for i in route_groups:
    subset = df2[df2.route_group==i]
    print(f"Route Group: {i}")
    print(subset.pct_trips_competitive.describe())

Overall
count    3314.000000
mean        0.346213
std         0.451978
min         0.000000
25%         0.000000
50%         0.000000
75%         1.000000
max         1.000000
Name: pct_trips_competitive, dtype: float64
Route Group: short
count    2319.000000
mean        0.349827
std         0.456715
min         0.000000
25%         0.000000
50%         0.000000
75%         1.000000
max         1.000000
Name: pct_trips_competitive, dtype: float64
Route Group: medium
count    594.000000
mean       0.376576
std        0.457146
min        0.000000
25%        0.000000
50%        0.020000
75%        1.000000
max        1.000000
Name: pct_trips_competitive, dtype: float64
Route Group: long
count    401.000000
mean       0.280334
std        0.409386
min        0.000000
25%        0.000000
50%        0.000000
75%        0.622000
max        1.000000
Name: pct_trips_competitive, dtype: float64


For small/medium route types, seems like 75th percentile is at 100% of trips being within the 2x `bus_multiplier`.

For large route types, 75th percentile is around 63% of trips within the 2x `bus_multiplier`.

Overall, can use 75% as a threshold, since it may differ for large operators like LA Metro and small operators, and we need to provide recommendations to all kinds of operators. Regardless, only the top 15 routes will be shown, so probably it'll only recommend routes where 100% of trips are within the 2x threshold anyway.

In [7]:
PCT_COMPETITIVE_THRESHOLD = 0.75
df3 = df2[df2.pct_trips_competitive > PCT_COMPETITIVE_THRESHOLD]

In [8]:
print("Overall")
print(df3.pct_below_cutoff.describe())
for i in route_groups:
    subset = df3[df3.route_group==i]
    print(f"Route Group: {i}")
    print(subset.pct_below_cutoff.describe())

Overall
count    1037.000000
mean        0.886619
std         0.262732
min         0.000000
25%         1.000000
50%         1.000000
75%         1.000000
max         1.000000
Name: pct_below_cutoff, dtype: float64
Route Group: short
count    739.000000
mean       0.914522
std        0.247713
min        0.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        1.000000
Name: pct_below_cutoff, dtype: float64
Route Group: medium
count    204.000000
mean       0.869574
std        0.242886
min        0.000000
25%        0.845770
50%        1.000000
75%        1.000000
max        1.000000
Name: pct_below_cutoff, dtype: float64
Route Group: long
count    94.000000
mean      0.704246
std       0.335870
min       0.000000
25%       0.500000
50%       0.850000
75%       1.000000
max       1.000000
Name: pct_below_cutoff, dtype: float64


In [9]:
print("Overall")
print(df3[df3.calitp_itp_id==182].pct_below_cutoff.describe())
for i in route_groups:
    subset = df3[(df3.route_group==i) & (df3.calitp_itp_id==182)]
    print(f"Route Group: {i}")
    print(subset.pct_below_cutoff.describe())

Overall
count    50.000000
mean      0.856563
std       0.243884
min       0.166667
25%       0.819930
50%       1.000000
75%       1.000000
max       1.000000
Name: pct_below_cutoff, dtype: float64
Route Group: short
count    13.000000
mean      0.940966
std       0.212850
min       0.232558
25%       1.000000
50%       1.000000
75%       1.000000
max       1.000000
Name: pct_below_cutoff, dtype: float64
Route Group: medium
count    17.000000
mean      0.882275
std       0.185248
min       0.292683
25%       0.847826
50%       0.993902
75%       1.000000
max       1.000000
Name: pct_below_cutoff, dtype: float64
Route Group: long
count    20.000000
mean      0.779845
std       0.290207
min       0.166667
25%       0.623296
50%       0.934343
75%       1.000000
max       1.000000
Name: pct_below_cutoff, dtype: float64


For all route types, seems like 75th percentile is at 100% of trips being within the thresholds for `bus_difference`.

Let's use a more generous threshold, like 80% of trips are within the `bus_difference` cut-off.