# Are route categories stable quarter to quarter?

If a `route_id` is `parallel` in one quarter, would it change to `on_shn` in another? It should be pretty stable, since how often would a bus route drastically deviate from its original route? 

Freeways don't change quarter to quarter.

Why are there large shifts in what's considered parallel vs not from Q1 2022 to Q2 2022?

In [1]:
import pandas as pd

import pmac_utils
from shared_utils import rt_dates



In [2]:
dfs = {}
for key, date in rt_dates.PMAC.items():
    df = pmac_utils.flag_parallel_intersecting_routes(date)
    dfs[key] = df

In [3]:
keep_cols = [
    "itp_id", "category", "route_id", 
    "_merge", "District"
]

df1 = dfs["Q1_2022"][keep_cols]
df2 = dfs["Q2_2022"][keep_cols]

In [4]:
def compare_col(df1, df2, col):
    print(df1[col].value_counts())
    print(df2[col].value_counts())
    print(df1[col].value_counts(normalize=True))
    print(df2[col].value_counts(normalize=True))

In [5]:
compare_col(df1, df2, "category")

parallel    1977
other        738
on_shn        59
Name: category, dtype: int64
parallel    1997
other        918
on_shn        66
Name: category, dtype: int64
parallel    0.712689
other       0.266042
on_shn      0.021269
Name: category, dtype: float64
parallel    0.669909
other       0.307950
on_shn      0.022140
Name: category, dtype: float64


In [6]:
compare_col(df1, df2, "District")

4.0     842
7.0     661
3.0     200
8.0     151
11.0    145
5.0     128
12.0    115
6.0     114
10.0     87
1.0      72
2.0      52
9.0      25
Name: District, dtype: int64
4.0     767
7.0     700
3.0     204
8.0     161
11.0    146
10.0    129
5.0     123
6.0     118
12.0    118
1.0      75
2.0      61
9.0      22
Name: District, dtype: int64
4.0     0.324846
7.0     0.255015
3.0     0.077160
8.0     0.058256
11.0    0.055941
5.0     0.049383
12.0    0.044367
6.0     0.043981
10.0    0.033565
1.0     0.027778
2.0     0.020062
9.0     0.009645
Name: District, dtype: float64
4.0     0.292302
7.0     0.266768
3.0     0.077744
8.0     0.061357
11.0    0.055640
10.0    0.049162
5.0     0.046875
6.0     0.044970
12.0    0.044970
1.0     0.028582
2.0     0.023247
9.0     0.008384
Name: District, dtype: float64


In [7]:
m1 = pd.merge(
    df1, 
    df2,
    on = ["itp_id", "route_id"],
    how = "outer",
    validate = "1:1",
    indicator="compare_categories"
)

In [8]:
m1.compare_categories.value_counts()

both          2404
right_only     577
left_only      370
Name: compare_categories, dtype: int64

In [9]:
in_both = m1[(m1.compare_categories=="both")]

In [10]:
in_both.shape

(2404, 9)

In [11]:
in_both[(in_both.category_x != in_both.category_y)].shape

(194, 9)