# Are route categories stable quarter to quarter?

If a `route_id` is `parallel` in one quarter, would it change to `on_shn` in another? It should be pretty stable, since how often would a bus route drastically deviate from its original route? 

Freeways don't change quarter to quarter.

Why are there large shifts in what's considered parallel vs not from Q1 2022 to Q2 2022?

In [1]:
import pandas as pd

import pmac_utils
from shared_utils import rt_dates



In [2]:
dfs = {}
for key, date in rt_dates.PMAC.items():
    df = pmac_utils.flag_parallel_intersecting_routes(date)
    dfs[key] = df

In [3]:
keep_cols = [
    "itp_id", "category", "route_id", 
    "District", "_merge"
]

df1 = dfs["Q1_2022"][keep_cols]
df2 = dfs["Q2_2022"][keep_cols]

In [4]:
m1 = pd.merge(
    df1, 
    df2,
    on = ["itp_id", "route_id", "District"],
    how = "outer",
    validate = "1:1",
    indicator="compare_categories"
)

In [5]:
len(m1)

4431

In [6]:
m1.compare_categories.value_counts()

right_only    1639
left_only     1450
both          1342
Name: compare_categories, dtype: int64

In [7]:
m2 = pd.merge(
    df1, 
    df2,
    on = ["itp_id", "route_id"],
    how = "outer",
    validate = "1:1",
    indicator="compare_categories"
)

In [8]:
m2.compare_categories.value_counts()

both          2421
right_only     560
left_only      371
Name: compare_categories, dtype: int64

Appears that adding `District` as a merge variable is what's throwing a lot of stuff off. It depends on what is kept for the `route_id-hwy` overlap, since `District` comes from the hwy segment. Let's just find the `District` as the last step in `pmac_utils`.

In [13]:
from D1_pmac_routes import TRAFFIC_OPS_GCS
from utils import GCS_FILE_PATH
import geopandas as gpd

date_str = "2022-02-08"

In [15]:
trips_with_hrs = pd.read_parquet(
    f"{GCS_FILE_PATH}trips_with_hrs_{date_str}.parquet")

routelines = gpd.read_parquet(
        f"{TRAFFIC_OPS_GCS}routelines_{date_str}.parquet")

In [19]:
routes = pd.merge(
    routelines.drop_duplicates(subset=["calitp_itp_id", "shape_id"])[["calitp_itp_id", "shape_id", "geometry"]],
    trips_with_hrs[["calitp_itp_id", "shape_id", "route_id"]],
    on = ["calitp_itp_id", "shape_id"],
    how = "outer",
    validate = "1:m",
    indicator=True
)

In [20]:
routes._merge.value_counts()

both          8070
right_only     354
left_only       16
Name: _merge, dtype: int64

In [24]:
routes[(routes._merge=='right_only') & (routes.calitp_itp_id==323)]

Unnamed: 0,calitp_itp_id,shape_id,geometry,route_id,_merge
8095,323,,,Inland Emp.-Orange Co. Line,right_only
8096,323,,,91 Line,right_only
8097,323,,,San Bernardino Line,right_only
8098,323,,,Orange County Line,right_only
8099,323,,,Antelope Valley Line,right_only
8100,323,,,Riverside Line,right_only
8101,323,,,LAX FlyAway Bus,right_only
8102,323,,,Ventura County Line,right_only


In [25]:
routes[routes._merge=='left_only']

Unnamed: 0,calitp_itp_id,shape_id,geometry,route_id,_merge
7132,323,OCout,"LINESTRING (162954.612 -438637.892, 163173.290...",,left_only
7133,323,91in,"LINESTRING (258892.478 -468536.137, 258638.176...",,left_only
7134,323,AVout,"LINESTRING (162954.612 -438637.892, 163173.290...",,left_only
7135,323,IEOCout,"LINESTRING (248145.884 -431048.582, 247615.383...",,left_only
7136,323,SBin,"LINESTRING (249601.968 -431680.896, 248960.682...",,left_only
7137,323,SBout,"LINESTRING (162954.612 -438637.892, 163151.478...",,left_only
7138,323,AVin,"LINESTRING (170662.592 -367194.989, 172889.895...",,left_only
7139,323,91out,"LINESTRING (163047.441 -438436.345, 163073.740...",,left_only
7140,323,OCin,"LINESTRING (244904.298 -532848.371, 243730.522...",,left_only
7141,323,VTout,"LINESTRING (73572.297 -417982.936, 73546.502 -...",,left_only
