## vp_condenser...no direction

Let's see if we can get vp_condensed version working with nearest neighbor.

We want to look for only the valid directions and do nearest snap, and correctly index back into the whole linestring.

If done correctly, can get an entire function removed in `gtfs_funnel`
and have different starting point in `rt_segment_speeds` for `nearest_vp_to_stop`.


Things to update:
1. remove vp_nn from `gtfs_funnel`
2. In `vp_transform`, use vp_condensed_line, remove merging on vp_primary_direction
3. Re-jig the function to subset for valid indices first. But we need to add back all the columns we need at the end of nearest_vp_to_stop.
3a. maybe if the function for nearest_snap only takes shapely, we can coerce any arrays into that 
4. nearest_vp_to_stop has very sparse columns

In [10]:
import geopandas as gpd
import pandas as pd

from update_vars import SEGMENT_GCS, GTFS_DATA_DICT
from shared_utils import rt_dates

In [11]:
dict_inputs = GTFS_DATA_DICT["stop_segments"]
analysis_date = rt_dates.DATES["oct2024"]

In [12]:
file = dict_inputs["stage2"]
df1 = pd.read_parquet(
    f"{SEGMENT_GCS}{file}_{analysis_date}.parquet")

df2 = pd.read_parquet(
    f"{SEGMENT_GCS}{file}_{analysis_date}_test.parquet")

In [15]:
df = pd.merge(
    df1,
    df2,
    on = ["trip_instance_key", "stop_sequence", "shape_array_key", "stop_geometry"],
    how = "inner"
)

In [22]:
df = df.assign(
    different = df.apply(
        lambda x: True if set(x.nearest_vp_arr_x) != set(x.nearest_vp_arr_y) 
        else False, axis=1
    )
)

In [23]:
df.different.value_counts()

False    2873531
True          11
Name: different, dtype: int64

In [25]:
2873531/(2873531+11)

0.9999961719717338

In [26]:
11/(2873531+11)

3.828028266160717e-06

In [24]:
df[df.different==True]

Unnamed: 0,trip_instance_key,stop_sequence,shape_array_key,stop_geometry,nearest_vp_arr_x,nearest_vp_arr_y,different
2647784,446add580d803889d500434f9ece4e76,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5638440, 5638441, 5638439, 5638442, 5638435, ...","[5638440, 5638441, 5638439, 5638442, 5638435, ...",True
2647792,53daf28e5f0e5af189abbc99b3fe8e53,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5640985, 5640986, 5640984, 5640975, 5640983, ...","[5640985, 5640986, 5640984, 5640975, 5640983, ...",True
2647800,553ec9026070f40c8487751635a8ccfe,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5640664, 5640665, 5640666, 5640663, 5640662, ...","[5640664, 5640665, 5640666, 5640663, 5640662, ...",True
2647826,71a4480b5784014caaf9b21e0328e94e,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5640483, 5640482, 5640484, 5640481, 5640477, ...","[5640483, 5640482, 5640484, 5640501, 5640481, ...",True
2647842,7bba7220b0cd77563f2a6c9c82ad9769,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5639651, 5639652, 5639650, 5639649, 5639653, ...","[5639651, 5639652, 5639650, 5639649, 5639653, ...",True
2647850,7cfa9724d2b8274d633ab3dfb21a7f8d,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5639970, 5639971, 5639972, 5639969, 5639966, ...","[5639970, 5639971, 5639972, 5639969, 5639966, ...",True
2647858,9410c17c4f154ae3aa4c25bd98096de6,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5640815, 5640816, 5640814, 5640817, 5640813, ...","[5640815, 5640816, 5640814, 5640817, 5640813, ...",True
2647874,a1cba44baf1f12ca2c06464374ff5272,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5640140, 5640141, 5640139, 5640142, 5640136, ...","[5640140, 5640141, 5640139, 5640142, 5640136, ...",True
2647904,d68e3b7a03a0c31ca8efe8941d74888c,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5639810, 5639809, 5639811, 5639808, 5639804, ...","[5639810, 5639809, 5639811, 5639808, 5639804, ...",True
2648084,d94d1eb5a31337b8f938aeaf50b967e0,32,ad7711dbb909b690ee6c2a00fd96219e,b'\x01\x01\x00\x00\x00\xe0Jvl\x04*^\xc0d\x92\x...,"[5638755, 5638754, 5638756, 5638753, 5638750, ...","[5638755, 5638754, 5638756, 5638753, 5638750, ...",True


In [None]:
def check_value(gdf: gpd.GeoDataFrame, x):
    one_direction_arr = gdf.vp_primary_direction.iloc[x]
    one_stop_direction = gdf.stop_primary_direction.iloc[x]
    one_near_vp_arr = gdf.nearest_vp_arr.iloc[x]
    one_orig_vp_arr = gdf.vp_idx.iloc[x]

    for i in one_near_vp_arr:
        this_index = np.where(one_orig_vp_arr == i)[0]
        this_direction = one_direction_arr[this_index]
        print(one_stop_direction, this_index, this_direction)

In [None]:
check_value(gdf2, 3)

In [None]:
check_value(gdf2, 10)

In [None]:
check_value(gdf2, 64)

In [None]:
check_value(gdf2, 1_000)