# Research Request - GTFS Digest: Add Rail and Ferry Operators. #1386

Tiffany's comment:
If it's just a couple of rail, (Amtrak, Metrolink) and a handful of ferry operators, it's worth digging into the why they dropped off, and start by looking for their rows in the 4 schedule tables: trips, shapes, stops, stop_times, and then look for it in a vp table.

* I think the ferry operators and Metrolink are already associated to a district. Even Amtrak might be? But if Amtrak isn't, you can create a separate "district = Amtrak" the merged df so it always has a tab for itself. Amtrak plots for the entire country!
* District 4: San Francisco Bay Area Rapid Transit (BART), City and County of San Francisco (Muni)
* District 7: Los Angeles County Metropolitan Transportation Authority (LA Metro)
* District 11: San Diego Metropolitan Transit System

Amanda
* All the ferry operators are gone. 
* Amtrak is in District 3 but it has schedule_only data, which isn't true? 

Other operators (thanks Meta.AI)
* Strikethroughs = these operators are already in our `schd_vp_df2`
Rail Services

    <s>Amtrak California: Offers intercity rail services throughout the state</s>
    
    <s>BART (Bay Area Rapid Transit): Provides rail services in the San Francisco Bay Area</s>
    
    Caltrain: Offers commuter rail services in the San Francisco Bay Area ¹
    
    LA Metro Rail: Provides rail services in Los Angeles County ¹ **There's Los Angeles County Metropolitan Transportation Authority** 
    
    Metrolink: Offers commuter rail services in Southern California ¹ **Would this be Southern California Regional Rail Authority?**
    
    San Diego Trolley: Provides light rail services in San Diego ¹ **Is this part of San Diego Metropolitan Transit System?**
    
    San Joaquin Regional Rail Commission (ACE): Offers commuter rail services in the San Joaquin Valley ¹ 
    
    <s>SMART (Sonoma-Marin Area Rail Transit): Provides commuter rail services in Sonoma and Marin counties ¹</s>
    
    <s>VTA (Santa Clara Valley Transportation Authority): Offers light rail services in Santa Clara County ¹</s>
    
Here's a list of ferry operators in California:

    San Francisco Bay Ferry: operates 10 ferry routes in the San Francisco Bay Area, with two seasonal routes ¹
    Golden Gate Ferry: operates ferry services between Larkspur, Sausalito, Tiburon, and San Francisco ¹
    Blue and Gold Fleet: connects San Francisco with Sausalito, Tiburon, Angel Island, Oakland, Alameda, and Vallejo ²
    Balboa Island Ferry: provides daily ferry service between the Balboa Peninsula in Newport Beach and Balboa Island ¹
    Tideline Marine Group: operates commuter ferry service between Berkeley and San Francisco ¹
    Caltrans: operates the J-Mack Ferry, a cable ferry service between Ryde and Ryer Island near Sacramento ¹
    California Department of Transportation: operates the Howard Landing Ferry on the California Delta ²

In [1]:
import _section1_utils as section1
import _section2_utils as section2
import geopandas as gpd
import merge_data
import merge_operator_data
import numpy as np
import pandas as pd
from segment_speed_utils import gtfs_schedule_wrangling, helpers
from segment_speed_utils.project_vars import COMPILED_CACHED_VIEWS, PROJECT_CRS
from shared_utils import catalog_utils, portfolio_utils, rt_dates
from update_vars import GTFS_DATA_DICT, RT_SCHED_GCS, SCHED_GCS, SEGMENT_GCS

In [2]:
pd.options.display.max_columns = 100
pd.options.display.float_format = "{:.2f}".format
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

In [3]:
analysis_date_list = [rt_dates.DATES["feb2025"]]

In [4]:
type(analysis_date_list)

list

In [5]:
analysis_date = rt_dates.DATES["feb2025"]

## Look at `operators_prep`
* Ferry operators aren't here.
* 

In [6]:
schd_vp_url = f"{GTFS_DATA_DICT.digest_tables.dir}{GTFS_DATA_DICT.digest_tables.route_schedule_vp}.parquet"

In [7]:
schd_vp_df = pd.read_parquet(
    schd_vp_url,
    columns=[
        "schedule_gtfs_dataset_key",
        "caltrans_district",
        "organization_name",
        "name",
        "sched_rt_category",
        "service_date",
    ],
)

In [8]:
schd_vp_df2 = schd_vp_df.loc[
    (schd_vp_df.service_date == "2025-01-15")
    | (schd_vp_df.service_date == "2024-12-11")
]

In [9]:
schd_vp_df3 = (
    schd_vp_df2[
        ["organization_name", "service_date", "sched_rt_category", "caltrans_district"]
    ]
    .drop_duplicates(subset=["organization_name"])
    .sort_values(by=["organization_name"])
)

In [10]:
schd_vp_df3.sched_rt_category.value_counts()

schedule_and_vp    102
schedule_only       89
vp_only              6
Name: sched_rt_category, dtype: int64

In [11]:
schd_vp_df3.loc[schd_vp_df3.organization_name.str.contains("Rail")]

Unnamed: 0,organization_name,service_date,sched_rt_category,caltrans_district
259484,San Joaquin Regional Rail Commission,2024-12-11,schedule_and_vp,10 - Stockton
11331,Sonoma-Marin Area Rail Transit District,2024-12-11,schedule_and_vp,04 - Oakland
338245,Southern California Regional Rail Authority,2024-12-11,vp_only,07 - Los Angeles


In [12]:
schd_vp_df3.loc[schd_vp_df3.organization_name.str.contains("Water")]

Unnamed: 0,organization_name,service_date,sched_rt_category,caltrans_district
334697,San Francisco Bay Area Water Emergency Transit Authority,2024-12-11,vp_only,04 - Oakland


In [13]:
schd_vp_df3.loc[schd_vp_df3.organization_name.str.contains("Metropolitan")]

Unnamed: 0,organization_name,service_date,sched_rt_category,caltrans_district
4127,Los Angeles County Metropolitan Transportation Authority,2024-12-11,schedule_and_vp,07 - Los Angeles
202479,San Diego Metropolitan Transit System,2024-12-11,schedule_and_vp,11 - San Diego
69932,Santa Barbara Metropolitan Transit District,2024-12-11,schedule_and_vp,05 - San Luis Obispo
127086,Santa Cruz Metropolitan Transit District,2024-12-11,schedule_and_vp,05 - San Luis Obispo


In [14]:
schd_vp_df3.loc[schd_vp_df3.organization_name.str.contains("Fleet")]

Unnamed: 0,organization_name,service_date,sched_rt_category,caltrans_district


In [15]:
schd_vp_df3.loc[schd_vp_df3.organization_name.str.contains("Ferry")]

Unnamed: 0,organization_name,service_date,sched_rt_category,caltrans_district


In [16]:
schd_vp_df3.loc[schd_vp_df3.organization_name.str.contains("Fleet")]

Unnamed: 0,organization_name,service_date,sched_rt_category,caltrans_district


### Look at where rail routes are 

In [17]:
CLEAN_ROUTES = GTFS_DATA_DICT.schedule_tables.route_identification

In [18]:
route_names_df = pd.read_parquet(f"{SCHED_GCS}{CLEAN_ROUTES}.parquet")

In [19]:
route_names_df.columns

Index(['schedule_gtfs_dataset_key', 'name', 'route_id', 'route_long_name',
       'route_short_name', 'route_desc', 'service_date', 'combined_name',
       'route_id2', 'recent_combined_name', 'recent_route_id2'],
      dtype='object')

In [20]:
operators_to_keep = [
    "Amtrak",
    "Los Angeles County Metropolitan Transportation Authority",
    "San Diego Metropolitan Transit System",
    "Capitol Corridor Joint Powers Authority",
    "Southern California Regional Rail Authority",
    "San Joaquin Regional Rail Commission",
    "City and County of San Francisco",
]

### Look for rail operators only 

In [21]:
rail_ops_only = pd.read_parquet(schd_vp_url)

In [22]:
rail_ops_only2 = rail_ops_only.loc[
    rail_ops_only.organization_name.isin(operators_to_keep)
]

In [23]:
sched_keys_to_keep = list(
    rail_ops_only2.loc[
        rail_ops_only2.organization_name.isin(operators_to_keep)
    ].schedule_gtfs_dataset_key.unique()
)

In [24]:
route_names_df2 = route_names_df.loc[
    route_names_df.schedule_gtfs_dataset_key.isin(sched_keys_to_keep)
]

In [25]:
route_names_df2.columns

Index(['schedule_gtfs_dataset_key', 'name', 'route_id', 'route_long_name',
       'route_short_name', 'route_desc', 'service_date', 'combined_name',
       'route_id2', 'recent_combined_name', 'recent_route_id2'],
      dtype='object')

In [26]:
unique_routes_df = route_names_df2[
    [
        "schedule_gtfs_dataset_key",
        "name",
        "route_id",
        "route_long_name",
        "route_short_name",
        "route_desc",
    ]
].drop_duplicates()

In [27]:
len(unique_routes_df)

1430

In [28]:
# unique_routes_df

### Bring in `route_typologies`

In [29]:
EXPORT = GTFS_DATA_DICT.schedule_tables.route_typologies

In [30]:
route_typologies = pd.read_parquet(f"{SCHED_GCS}{EXPORT}_{analysis_date}.parquet")

In [31]:
route_typologies.columns

Index(['schedule_gtfs_dataset_key', 'name', 'route_type', 'route_id',
       'route_long_name', 'route_short_name', 'combined_name', 'is_express',
       'is_rapid', 'is_rail', 'is_local', 'direction_id', 'common_shape_id',
       'route_name', 'route_meters', 'is_coverage', 'is_downtown_local'],
      dtype='object')

In [32]:
route_typologies2 = route_typologies[
    [
        "route_type",
        "route_id",
        "combined_name",
        "is_express",
        "is_rapid",
        "is_rail",
        "is_local",
    ]
].drop_duplicates()

In [33]:
unique_routes_df.head(1)

Unnamed: 0,schedule_gtfs_dataset_key,name,route_id,route_long_name,route_short_name,route_desc
1046,f5a749dd65924e025b1293c58f95f8d6,Bay Area 511 Capitol Corridor Schedule,Shuttle,Shuttle_Auburn,Shuttle,Shuttle to Auburn


### Merge everything together 

In [34]:
rail_ops_only2.head(1)

Unnamed: 0,schedule_gtfs_dataset_key,direction_id,time_period,avg_scheduled_service_minutes,avg_stop_miles,n_scheduled_trips,frequency,is_express,is_rapid,is_rail,is_coverage,is_downtown_local,is_local,service_date,typology,minutes_atleast1_vp,minutes_atleast2_vp,total_rt_service_minutes,total_scheduled_service_minutes,total_vp,vp_in_shape,is_early,is_ontime,is_late,n_vp_trips,vp_per_minute,pct_in_shape,pct_rt_journey_atleast1_vp,pct_rt_journey_atleast2_vp,pct_sched_journey_atleast1_vp,pct_sched_journey_atleast2_vp,rt_sched_journey_ratio,avg_rt_service_minutes,sched_rt_category,speed_mph,route_long_name,route_short_name,route_combined_name,route_id,base64_url,organization_source_record_id,organization_name,caltrans_district,route_primary_direction,name,schedule_source_record_id
4110,0666caf3ec1ecc96b74f4477ee4bc939,0.0,all_day,81.85,0.18,65,2.71,0.0,0.0,0.0,0.0,1.0,0.0,2024-05-22,downtown_local,6721,6615,9461.83,5320.0,19827,18501,1,3,61,65,2.1,0.93,0.71,0.7,1.0,1.0,1.78,145.57,schedule_and_vp,9.75,Metro Local Line,10/48,10/48 Metro Local Line,10,aHR0cHM6Ly9naXRsYWIuY29tL0xBQ01UQS9ndGZzX2J1cy9yYXcvbWFzdGVyL2d0ZnNfYnVzLnppcA==,recPnGkwdpnr8jmHB,Los Angeles County Metropolitan Transportation Authority,07 - Los Angeles,Southbound,LA Metro Bus Schedule,recX8JOPmBQM9aWLC


In [73]:
crosswalk = rail_ops_only2[
    ["schedule_gtfs_dataset_key", "organization_name", "caltrans_district"]
].drop_duplicates()

In [36]:
m1 = pd.merge(unique_routes_df, route_typologies2, on="route_id", how="left")

In [38]:
m2 = pd.merge(m1, crosswalk, on=["schedule_gtfs_dataset_key"])

In [76]:
m2.route_type.unique()

array(['3', '2', nan, '5', '0', '1'], dtype=object)

In [77]:
rail_only = m2.loc[(m2.is_rail == 1)|m2.route_type.isin(['1','2','3'])]

In [42]:
m2.columns

Index(['schedule_gtfs_dataset_key', 'name', 'route_id', 'route_long_name',
       'route_short_name', 'route_desc', 'route_type', 'combined_name',
       'is_express', 'is_rapid', 'is_rail', 'is_local', 'organization_name',
       'caltrans_district'],
      dtype='object')

In [78]:
rail_only.organization_name.value_counts()

City and County of San Francisco                            1728
San Diego Metropolitan Transit System                       1689
Los Angeles County Metropolitan Transportation Authority     320
Amtrak                                                       126
Capitol Corridor Joint Powers Authority                        4
San Joaquin Regional Rail Commission                           1
Name: organization_name, dtype: int64

### Check if you can distinguish if a route is a rail one

In [43]:
schd_vp_url = f"{GTFS_DATA_DICT.digest_tables.dir}{GTFS_DATA_DICT.digest_tables.route_schedule_vp}.parquet"

In [44]:
df = pd.read_parquet(schd_vp_url)

In [62]:
df.caltrans_district.unique()

array(['06 - Fresno', '05 - San Luis Obispo', '04 - Oakland',
       '07 - Los Angeles', '03 - Marysville', '10 - Stockton',
       '07 - Los Angeles / Ventura', '01 - Eureka', None,
       '08 - San Bernardino', '02 - Redding', '11 - San Diego',
       '12 - Irvine', '09 - Bishop'], dtype=object)

In [49]:
df.service_date.unique()

array(['2024-02-14T00:00:00.000000000', '2024-03-13T00:00:00.000000000',
       '2024-04-17T00:00:00.000000000', '2024-05-22T00:00:00.000000000',
       '2024-06-12T00:00:00.000000000', '2024-07-17T00:00:00.000000000',
       '2024-08-14T00:00:00.000000000', '2024-09-18T00:00:00.000000000',
       '2024-10-16T00:00:00.000000000', '2024-11-13T00:00:00.000000000',
       '2024-12-11T00:00:00.000000000', '2025-01-15T00:00:00.000000000',
       '2025-02-12T00:00:00.000000000', '2023-05-17T00:00:00.000000000',
       '2023-06-14T00:00:00.000000000', '2023-07-12T00:00:00.000000000',
       '2023-08-15T00:00:00.000000000', '2023-09-13T00:00:00.000000000',
       '2023-10-11T00:00:00.000000000', '2023-11-15T00:00:00.000000000',
       '2023-12-13T00:00:00.000000000', '2024-01-17T00:00:00.000000000',
       '2023-04-12T00:00:00.000000000', '2023-03-15T00:00:00.000000000'],
      dtype='datetime64[ns]')

In [50]:
df2 = df.loc[
    (df.schedule_gtfs_dataset_key.isin(sched_keys_to_keep))
    & (df.service_date == "2025-02-12T00:00:00.000000000")
]

In [51]:
df2.head(1)

Unnamed: 0,schedule_gtfs_dataset_key,direction_id,time_period,avg_scheduled_service_minutes,avg_stop_miles,n_scheduled_trips,frequency,is_express,is_rapid,is_rail,is_coverage,is_downtown_local,is_local,service_date,typology,minutes_atleast1_vp,minutes_atleast2_vp,total_rt_service_minutes,total_scheduled_service_minutes,total_vp,vp_in_shape,is_early,is_ontime,is_late,n_vp_trips,vp_per_minute,pct_in_shape,pct_rt_journey_atleast1_vp,pct_rt_journey_atleast2_vp,pct_sched_journey_atleast1_vp,pct_sched_journey_atleast2_vp,rt_sched_journey_ratio,avg_rt_service_minutes,sched_rt_category,speed_mph,route_long_name,route_short_name,route_combined_name,route_id,base64_url,organization_source_record_id,organization_name,caltrans_district,route_primary_direction,name,schedule_source_record_id
4164,0666caf3ec1ecc96b74f4477ee4bc939,0.0,all_day,81.22,5.1,65,2.71,0.0,0.0,0.0,0.0,1.0,0.0,2025-02-12,downtown_local,6460,6226,9289.97,5156.0,18648,18032,1,5,57,63,2.01,0.97,0.69,0.67,1.0,1.0,1.8,147.46,schedule_and_vp,9.18,Metro Local Line,10/48,10/48 Metro Local Line,10,aHR0cHM6Ly9naXRsYWIuY29tL0xBQ01UQS9ndGZzX2J1cy9yYXcvbWFzdGVyL2d0ZnNfYnVzLnppcA==,recPnGkwdpnr8jmHB,Los Angeles County Metropolitan Transportation Authority,07 - Los Angeles / Ventura,Southbound,LA Metro Bus Schedule,recX8JOPmBQM9aWLC


In [75]:
df2.columns

Index(['schedule_gtfs_dataset_key', 'direction_id', 'time_period',
       'avg_scheduled_service_minutes', 'avg_stop_miles', 'n_scheduled_trips',
       'frequency', 'is_express', 'is_rapid', 'is_rail', 'is_coverage',
       'is_downtown_local', 'is_local', 'service_date', 'typology',
       'minutes_atleast1_vp', 'minutes_atleast2_vp',
       'total_rt_service_minutes', 'total_scheduled_service_minutes',
       'total_vp', 'vp_in_shape', 'is_early', 'is_ontime', 'is_late',
       'n_vp_trips', 'vp_per_minute', 'pct_in_shape',
       'pct_rt_journey_atleast1_vp', 'pct_rt_journey_atleast2_vp',
       'pct_sched_journey_atleast1_vp', 'pct_sched_journey_atleast2_vp',
       'rt_sched_journey_ratio', 'avg_rt_service_minutes', 'sched_rt_category',
       'speed_mph', 'route_long_name', 'route_short_name',
       'route_combined_name', 'route_id', 'base64_url',
       'organization_source_record_id', 'organization_name',
       'caltrans_district', 'route_primary_direction', 'name',
       '

### Why aren't rail routes showing for operators that certifably do have rail such as SF Muni and Amtrak when you do `is_rail == 0`?

In [55]:
df2.sched_rt_category.value_counts()

schedule_and_vp    2591
vp_only             186
schedule_only        33
Name: sched_rt_category, dtype: int64

In [52]:
df2.is_rail.value_counts()

0.00    2371
1.00      94
Name: is_rail, dtype: int64

In [74]:
df2.groupby(
    [
        "organization_name",
        "sched_rt_category",
    ]
).agg(
    {
        "is_rail": "sum",
    }
)

Unnamed: 0_level_0,Unnamed: 1_level_0,is_rail
organization_name,sched_rt_category,Unnamed: 2_level_1
Amtrak,schedule_only,0.0
Amtrak,vp_only,0.0
Amtrak,schedule_and_vp,0.0
Capitol Corridor Joint Powers Authority,schedule_only,0.0
Capitol Corridor Joint Powers Authority,vp_only,0.0
Capitol Corridor Joint Powers Authority,schedule_and_vp,0.0
City and County of San Francisco,schedule_only,0.0
City and County of San Francisco,vp_only,0.0
City and County of San Francisco,schedule_and_vp,58.0
Flagship Cruises and Events Inc.,schedule_only,0.0


In [57]:
rail_only.organization_name.value_counts()

Los Angeles County Metropolitan Transportation Authority    36
City and County of San Francisco                            35
San Diego Metropolitan Transit System                       14
Amtrak                                                       7
Capitol Corridor Joint Powers Authority                      2
San Joaquin Regional Rail Commission                         1
Name: organization_name, dtype: int64

In [65]:
rail_only.columns

Index(['schedule_gtfs_dataset_key', 'name', 'route_id', 'route_long_name',
       'route_short_name', 'route_desc', 'route_type', 'combined_name',
       'is_express', 'is_rapid', 'is_rail', 'is_local', 'organization_name',
       'caltrans_district'],
      dtype='object')

In [67]:
# https://gtfs.org/documentation/schedule/reference/#
route_type_crosswalk = {
    "route_type": ["0", "1", "2", "3", "4", "5", "6", "7", "11", "12"],
    "route_type_str": [
        "Tram, Streetcar, Light rail",
        "Subway, Metro",
        "Rail",
        "Bus",
        "Ferry.",
        "Cable tram.",
        "Aerial lift, suspended cable car (e.g., gondola lift, aerial tramway).",
        "Funicular.",
        "Trolleybus.",
        "Monorail.",
    ],
}

In [69]:
route_type_crosswalk_df = pd.DataFrame(route_type_crosswalk)

In [70]:
route_type_crosswalk_df

Unnamed: 0,route_type,route_type_str
0,0,"Tram, Streetcar, Light rail"
1,1,"Subway, Metro"
2,2,Rail
3,3,Bus
4,4,Ferry.
5,5,Cable tram.
6,6,"Aerial lift, suspended cable car (e.g., gondola lift, aerial tramway)."
7,7,Funicular.
8,11,Trolleybus.
9,12,Monorail.


In [79]:
rail_only[
    [
        "organization_name",
        "route_id",
        "route_long_name",
        "route_short_name",
        "route_desc",
        "route_type",
        "is_rail"
    ]
].sort_values(by=["organization_name"])

Unnamed: 0,organization_name,route_id,route_long_name,route_short_name,route_desc,route_type,is_rail
3116,Amtrak,95,Adirondack,,,3,0.0
3142,Amtrak,84,Capitol Corridor,,,3,0.0
3143,Amtrak,84,Capitol Corridor,,,3,0.0
3144,Amtrak,84,Capitol Corridor,,,3,0.0
3145,Amtrak,84,Capitol Corridor,,,3,0.0
3146,Amtrak,84,Capitol Corridor,,,2,1.0
3147,Amtrak,88,Northeast Regional,,,3,0.0
3148,Amtrak,88,Northeast Regional,,,3,0.0
3149,Amtrak,88,Northeast Regional,,,3,0.0
3150,Amtrak,78,Pacific Surfliner,,,3,0.0


## Scheduled Trips

In [None]:
scheduled_trips_df = pd.concat(
    [
        helpers.import_scheduled_trips(
            analysis_date,
            columns=[
                "gtfs_dataset_key",
                "name",
                "route_id",
                "route_long_name",
                "route_short_name",
                "route_desc",
            ],
            get_pandas=True,
        ).assign(service_date=pd.to_datetime(analysis_date))
        for analysis_date in analysis_date_list
    ],
    axis=0,
    ignore_index=True,
)

In [None]:
scheduled_trips_df.head(1)

### Find the ferry

In [None]:
scheduled_trips_df.loc[scheduled_trips_df.name.str.contains("Ferry")][
    ["name"]
].drop_duplicates()

In [None]:
scheduled_trips_df.columns

In [None]:
ferry_schd_keys = list(
    scheduled_trips_df.loc[
        scheduled_trips_df.name.str.contains("Ferry")
    ].schedule_gtfs_dataset_key.unique()
)

In [None]:
ferry_names = list(
    scheduled_trips_df.loc[scheduled_trips_df.name.str.contains("Ferry")].name.unique()
)

In [None]:
scheduled_trips_df2 = scheduled_trips_df.loc[
    scheduled_trips_df.schedule_gtfs_dataset_key.isin(ferry_schd_keys)
]

In [None]:
len(scheduled_trips_df2)

In [None]:
scheduled_trips_df2.head(2)

In [None]:
# scheduled_trips_df2

## Scheduled Shapes 

In [None]:
TABLE = GTFS_DATA_DICT.schedule_downloads.shapes
FILE = f"{COMPILED_CACHED_VIEWS}{TABLE}_{analysis_date}.parquet"

In [None]:
shapes = gpd.read_parquet(FILE)

In [None]:
shapes.columns

In [None]:
scheduled_shapes_df = helpers.import_scheduled_shapes(
    analysis_date,
    columns=["shape_array_key", "geometry"],
    get_pandas=True,
    crs=PROJECT_CRS,
)

In [None]:
scheduled_shapes_df.columns

## Scheduled Stops

In [None]:
TABLE = GTFS_DATA_DICT.rt_vs_schedule_tables.stop_times_direction
FILE = f"{RT_SCHED_GCS}{TABLE}_{analysis_date}.parquet"

In [None]:
stops_df = gpd.read_parquet(FILE)

In [None]:
stops_df.columns

In [None]:
stops_df2 = stops_df.loc[stops_df.schedule_gtfs_dataset_key.isin(ferry_schd_keys)]

In [None]:
len(stops_df2)

In [None]:
# stops_df2.explore()

## Scheduled Stop Times

In [None]:
TABLE = GTFS_DATA_DICT.rt_vs_schedule_tables.stop_times_direction
FILE = f"{RT_SCHED_GCS}{TABLE}_{analysis_date}.parquet"

In [None]:
sched_stops = gpd.read_parquet(FILE)

In [None]:
sched_stops.columns

In [None]:
sched_stops2 = sched_stops.loc[
    sched_stops.schedule_gtfs_dataset_key.isin(ferry_schd_keys)
]

In [None]:
# sched_stops2.explore()