## Update `trips`
* cd rt_segment_speeds && pip install -r requirements.txt && cd ../_shared_utils && make setup_env
* https://github.com/cal-itp/data-analyses/pull/1016
    * Keep source data + metrics tightly defined with GCS bucket organization.
    * vp_usable is source data for rt_vs_sched metrics, do not merge in schedule data until gtfs_digest report. Only bring in schedule_gtfs_dataset_key column in    
    * vp_usable + route_id-direction_id for trips also present in schedule. If not in schedule, fill it with route_id = Unknown and direction_id as Int64
    * Add function to concatenate trip file, enable us to put in 1 day or 7 days for aggregation
    * A single function for normalized metrics (percent, per min, etc)
    * A single function for aggregation (summing up numerator / denominator)
    
* https://github.com/cal-itp/data-analyses/issues/989

* Notes 2/6
    * GTFS digest creates four datasets: schedule, average speeds, segment speeds, and rt vs schedule
    * Currently, merging is challenging.
    * Time categories are not necessarily the same (peak/offpeak/all-day)
    * Want all datasets to merge on the same set of columns (schedule gtfs key, route id, dir id, service date, and time categories) because `shapes` are unstable.
    * `Route ID` has been stabilized by Tiffany 
    * Update work from `rt_v_scheduled.py` (steps already outlined in `scripts/route_aggregation.ipynb`)
        * Do steps up until row 339 when the % are calculated. 
        * Take away `speeds`.
        * Bring in schedule gtfs key, trip instance key, route id, direction id either at the beginning or the end using `helpers.import_scheduled_trips`
        * Coerce DIR ID to Int64, don't fill it in with 0. It's not 0, it's Nan
        * Save files with the analysis date at the end instead of the beginning.
        * Split off the workstream -> one for trip level and one for route level
            * Use the config.yml to save the trips and routes stuff into their own folder.
            * Routes:
                * For routes, the minutes/pings should be totalled up. Currently, just taking the average of an average isn't really accurate.
                * The route level should be able to take multiple days of data and concatenate so we can get metrics for a week/2 weeks/etc instead of for a single day. [Done here](https://github.com/cal-itp/data-analyses/blob/main/rt_segment_speeds/scripts/average_speeds.py)
                * Add the route frequency as well?
           * Trips:
               * Do up to step 339 in `rt_v_scheduled.py`
               * Write a new generalized function to create all the % 
            
* Notes 2/13
    * Figure out how to set up Config file
    * Tiffany:
        * add_metrics looks good, just remove the coercing of percents to 0-100 to a separate function. I want everything from 0-1, and then before charting, scaled up to 0-100 all at once. Can you write a general         * function for this....all the chart display / cleaning functions should live in 1 script in segment_speed_utils.
        * Another tweak for a step somewhere before add_metrics. Certain columns can be coerced to be integers, like total_vp and vp_in_shape, just like how total_min_w_gtfs is an integer. Coerce all the ones that can be integers to be integers for your trip table, and this will save on the rounding step later.
        * Column naming: think about how you want to change the column names. total_pings_for_trip is not going to make sense once you aggregate, so maybe go with something more generic. Otherwise, you're going to be aggregating and renaming columns constantly. I would just rely on the other columns in the row to tell us whether it's per trip or per route , and the metrics all use generic names that are suitable for passing through aggregation functions. (edited) 

In [1]:
import dask.dataframe as dd
import pandas as pd
import yaml
from segment_speed_utils import gtfs_schedule_wrangling, helpers
from segment_speed_utils.project_vars import RT_SCHED_GCS, SEGMENT_GCS
from shared_utils import portfolio_utils, rt_dates, rt_utils

In [2]:
# Times
import datetime
from loguru import logger

In [3]:
pd.options.display.max_columns = 100
pd.options.display.float_format = "{:.2f}".format
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

In [4]:
# analysis_date = rt_dates.DATES["dec2023"]

### Load in `rt_v_scheduled_trip` functions

In [5]:
dec_df = pd.read_parquet("./ah_testing_2023-12-01.parquet")

In [6]:
len(dec_df)

86128

In [7]:
dec_df.sample()

Unnamed: 0,schedule_gtfs_dataset_key,trip_instance_key,rt_service_min,min_w_atleast2_trip_updates,total_pings_for_trip,total_min_w_gtfs,total_vp,vp_in_shape
55285,baeeb157e85a901e47b828ef9fe75091,c8ce559378a7af94523a8334f3e2fb9c,37.18,32,83,37,83.0,83.0


In [8]:
nov_df = pd.read_parquet("./ah_testing_2023-11-15.parquet")

In [9]:
nov_df.sample()

Unnamed: 0,schedule_gtfs_dataset_key,trip_instance_key,rt_service_min,min_w_atleast2_trip_updates,total_pings_for_trip,total_min_w_gtfs,total_vp,vp_in_shape
42804,cc53a0dbf5df90e3009b9cb5d89d80ba,57dce45cf5f041d3951cdd72c63b1059,34.0,34,100,34,100.0,93.0


In [10]:
len(nov_df)

86832

### Add back routes-schedule-trip instance
* This will go into rt_v_scheduled.trip
#### Fix time_of_day buckets
* https://github.com/cal-itp/data-analyses/blob/route_agg/rt_segment_speeds/segment_speed_utils/gtfs_schedule_wrangling.py


In [11]:
def temp_function(df, analysis_date: str):
    routes_df = helpers.import_scheduled_trips(
        analysis_date,
        columns=[
            "gtfs_dataset_key",
            "route_id",
            "direction_id",
            "trip_instance_key",
        ],
        get_pandas=True,
    )

    df2 = pd.merge(
        df,
        routes_df,
        on=["schedule_gtfs_dataset_key", "trip_instance_key"],
        how="left",
        indicator="sched_rt_category",
    )

    df2 = df2.assign(
        route_id=df2.route_id.fillna("Unknown"),
        direction_id=df2.direction_id.astype("Int64"),
        total_vp = df2.total_vp.fillna(0).astype("Int64"),
        vp_in_shape = df2.vp_in_shape.fillna(0).astype("Int64"),
        rt_service_min = df2.rt_service_min.round(0),
        sched_rt_category=df2.apply(
            lambda x: "vp_only" if x.sched_rt_category == "left_only" else "vp_sched",
            axis=1,
        ),
    )

    sched_time_of_day = gtfs_schedule_wrangling.get_trip_time_buckets(analysis_date)[
        ["trip_instance_key", "time_of_day", "service_minutes"]
    ].pipe(gtfs_schedule_wrangling.add_peak_offpeak_column)[
        ["trip_instance_key", "service_minutes", "peak_offpeak"]
    ]

    df3 = pd.merge(df2, sched_time_of_day, on="trip_instance_key", how="left")

    rt_time_of_day = gtfs_schedule_wrangling.get_vp_trip_time_buckets(analysis_date)

    df4 = pd.merge(
        df3,
        rt_time_of_day,
        on=["schedule_gtfs_dataset_key", "trip_instance_key"],
        how="inner",
    )
    df4 = df4.assign(
        peak_offpeak=df4.peak_offpeak_x.fillna(df4.peak_offpeak_y)
    )
    
    df4 = df4.drop(
            columns=["peak_offpeak_x", "peak_offpeak_y"]
        )

    return df4

In [12]:
start = datetime.datetime.now()
print(start)
nov_df2 = temp_function(nov_df, rt_dates.DATES["nov2023"])
end = datetime.datetime.now()
print(end)

2024-02-14 10:21:51.718248
2024-02-14 10:23:57.022639


In [13]:
start = datetime.datetime.now()
print(start)
dec_df2 = temp_function(dec_df, rt_dates.DATES["dec2023"])
end = datetime.datetime.now()
print(end)

2024-02-14 10:23:57.032353
2024-02-14 10:24:57.108115


### Trips: add back metrics

In [14]:
def add_metrics(df: pd.DataFrame) -> pd.DataFrame:
    
    df["pings_per_min"] = df.total_pings_for_trip / df.rt_service_min
    df["spatial_accuracy_pct"] = df.vp_in_shape / df.total_vp
    df["rt_w_gtfs_pct"] = df.total_min_w_gtfs / df.rt_service_min
    df["rt_v_scheduled_time_pct"] = df.rt_service_min / df.service_minutes - 1

    # Mask rt_triptime_w_gtfs_pct for any values above 100%
    df.rt_w_gtfs_pct = df.rt_w_gtfs_pct.mask(df.rt_w_gtfs_pct > 1)

    drop_cols = ["total_pings_for_trip", "vp_in_shape", "total_vp", "total_min_w_gtfs"]
    df = df.drop(columns=drop_cols)
    return df

In [15]:
dec_trip = add_metrics(dec_df2)

In [16]:
dec_trip.sample()

Unnamed: 0,schedule_gtfs_dataset_key,trip_instance_key,rt_service_min,min_w_atleast2_trip_updates,route_id,direction_id,sched_rt_category,service_minutes,peak_offpeak,pings_per_min,spatial_accuracy_pct,rt_w_gtfs_pct,rt_v_scheduled_time_pct
18139,3f3f36b4c41cc6b5df3eb7f5d8ea6e3c,d03d31399ab7fe02eb9ea2414226ef74,69.0,2,66-13172,0,vp_sched,56.0,offpeak,0.42,1.0,0.39,0.23


In [17]:
nov_trip = add_metrics(nov_df2)

In [18]:
nov_trip.sample()

Unnamed: 0,schedule_gtfs_dataset_key,trip_instance_key,rt_service_min,min_w_atleast2_trip_updates,route_id,direction_id,sched_rt_category,service_minutes,peak_offpeak,pings_per_min,spatial_accuracy_pct,rt_w_gtfs_pct,rt_v_scheduled_time_pct
48884,7dbe3e19a4966e0c0531fa826e0446d8,53e4977809d115622f5b9ee8db377249,37.0,36,525,1,vp_sched,29.0,peak,2.97,0.65,,0.28


In [19]:
nov_trip.sched_rt_category.value_counts()

vp_sched    78190
vp_only      8642
Name: sched_rt_category, dtype: int64

In [20]:
dec_trip.sched_rt_category.value_counts()

vp_sched    77977
vp_only      8151
Name: sched_rt_category, dtype: int64

In [21]:
nov_trip.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 86832 entries, 0 to 86831
Data columns (total 13 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   schedule_gtfs_dataset_key    86832 non-null  object 
 1   trip_instance_key            86832 non-null  object 
 2   rt_service_min               86832 non-null  float64
 3   min_w_atleast2_trip_updates  86832 non-null  int64  
 4   route_id                     86832 non-null  object 
 5   direction_id                 76528 non-null  Int64  
 6   sched_rt_category            86832 non-null  object 
 7   service_minutes              78190 non-null  float64
 8   peak_offpeak                 86832 non-null  object 
 9   pings_per_min                86832 non-null  float64
 10  spatial_accuracy_pct         86832 non-null  Float64
 11  rt_w_gtfs_pct                79356 non-null  float64
 12  rt_v_scheduled_time_pct      78190 non-null  float64
dtypes: Float64(1), I

### Routes add multiple days

In [22]:
def concatenate_trip_segment_speeds(analysis_date_list: list) -> pd.DataFrame:
    """
    Concatenate the speed-trip parquets together,
    whether it's for single day or multi-day averages.
    Add columns for peak_offpeak, weekday_weekend based
    on day of week and time-of-day.
    """
    """
    SPEED_FILE = dict_inputs["stage4"]
  
    df = pd.concat([
        pd.read_parquet(
            f"{SEGMENT_GCS}{SPEED_FILE}_{analysis_date}.parquet").assign(
            service_date = pd.to_datetime(analysis_date)
        ) for analysis_date in analysis_date_list], 
        axis=0, ignore_index = True
    )
    """
    df = pd.concat(
        [
            pd.read_parquet(f"./concat_test_{analysis_date}.parquet").assign(
                service_date=pd.to_datetime(analysis_date)
            )
            for analysis_date in analysis_date_list
        ],
        axis=0,
        ignore_index=True,
    )
    return df

In [23]:
all_routes = concatenate_trip_segment_speeds(["2023-11-15", "2023-12-01"])

In [24]:
all_routes.sched_rt_category.value_counts()

vp_sched    156167
Name: sched_rt_category, dtype: int64

In [25]:
all_routes.head()

Unnamed: 0,schedule_gtfs_dataset_key,trip_instance_key,rt_service_min,min_w_atleast2_trip_updates,total_pings_for_trip,total_min_w_gtfs,total_vp,vp_in_shape,route_id,direction_id,sched_rt_category,service_minutes,peak_offpeak,service_date
0,63029a23cb0e73f2a5d98a345c5e2e40,56f15f118776aaafbf3a1c69c5821c14,62.38,62,185,63,185.0,144.0,3428,1,vp_sched,58.0,offpeak,2023-11-15
1,63029a23cb0e73f2a5d98a345c5e2e40,4244cbaa19bdbc3f6e4cc95cb792ccb0,67.7,67,201,68,201.0,147.0,3428,1,vp_sched,58.0,offpeak,2023-11-15
2,63029a23cb0e73f2a5d98a345c5e2e40,ce51c00d412991d09ad1de4ea2715f6e,127.38,127,377,127,377.0,207.0,3428,0,vp_sched,58.0,peak,2023-11-15
3,63029a23cb0e73f2a5d98a345c5e2e40,d01f03119c56bdda01210558a6f25ec2,152.02,151,449,151,449.0,186.0,3428,0,vp_sched,58.0,peak,2023-11-15
4,63029a23cb0e73f2a5d98a345c5e2e40,90e793547709584c8921f0786f9d310f,76.3,75,227,76,227.0,124.0,3429,1,vp_sched,55.0,offpeak,2023-11-15


In [26]:
all_routes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156167 entries, 0 to 156166
Data columns (total 14 columns):
 #   Column                       Non-Null Count   Dtype         
---  ------                       --------------   -----         
 0   schedule_gtfs_dataset_key    156167 non-null  object        
 1   trip_instance_key            156167 non-null  object        
 2   rt_service_min               156167 non-null  float64       
 3   min_w_atleast2_trip_updates  156167 non-null  int64         
 4   total_pings_for_trip         156167 non-null  int64         
 5   total_min_w_gtfs             156167 non-null  int64         
 6   total_vp                     149500 non-null  float64       
 7   vp_in_shape                  149500 non-null  float64       
 8   route_id                     156167 non-null  object        
 9   direction_id                 153098 non-null  Int64         
 10  sched_rt_category            156167 non-null  object        
 11  service_minutes           

#### Add back metrics

In [27]:
def weighted_average_function(df: pd.DataFrame, group_cols: list):
    sum_cols = [
        "total_min_w_gtfs",
        "rt_service_min",
        "total_pings_for_trip",
        "service_minutes",
        "total_vp",
        "vp_in_shape",
    ]

    count_cols = ["trip_instance_key"]
    df2 = (
        df.groupby(group_cols + ["peak_offpeak"])
        .agg({**{e: "sum" for e in sum_cols}, **{e: "count" for e in count_cols}})
        .reset_index()
    )

    df2 = df2.rename(columns={"trip_instance_key": "n_trips"})

    df2 = add_metrics(df2)

    return df2

In [28]:
all_routes2 = weighted_average_function(
    all_routes,
    ["schedule_gtfs_dataset_key", "route_id", "direction_id", "sched_rt_category"],
)

In [29]:
all_routes2.columns

Index(['schedule_gtfs_dataset_key', 'route_id', 'direction_id',
       'sched_rt_category', 'peak_offpeak', 'rt_service_min',
       'service_minutes', 'n_trips', 'pings_per_min', 'spatial_accuracy_pct',
       'rt_w_gtfs_pct', 'rt_v_scheduled_time_pct'],
      dtype='object')

#### rt_v_scheduled_trip_time_pct -> just delete entirely?
* This is to determine trip timeliness, how much longer (or shorter) a trip took based on RT data compared to its scheduled length.

In [30]:
all_routes2.loc[all_routes2.route_id == "17"]

Unnamed: 0,schedule_gtfs_dataset_key,route_id,direction_id,sched_rt_category,peak_offpeak,rt_service_min,service_minutes,n_trips,pings_per_min,spatial_accuracy_pct,rt_w_gtfs_pct,rt_v_scheduled_time_pct
0,015d67d5b75b5cf2b710bbadadfb75f5,17,0,vp_sched,offpeak,1539.42,1134.0,20,2.91,0.76,0.98,0.36
1,015d67d5b75b5cf2b710bbadadfb75f5,17,0,vp_sched,peak,1723.65,1380.0,24,2.84,0.9,0.96,0.25
2,015d67d5b75b5cf2b710bbadadfb75f5,17,1,vp_sched,offpeak,1433.97,1138.0,22,2.79,0.87,0.94,0.26
3,015d67d5b75b5cf2b710bbadadfb75f5,17,1,vp_sched,peak,1309.88,1190.0,22,2.8,0.95,0.95,0.1
445,1ebafaca8716652559b2017b6eedc4ef,17,0,vp_sched,peak,212.5,224.0,6,1.97,1.0,1.0,-0.05
521,239f3baf3dd3b9e9464f66a777f9897d,17,0,vp_sched,offpeak,190.1,122.0,11,1.11,0.8,0.86,0.56
522,239f3baf3dd3b9e9464f66a777f9897d,17,0,vp_sched,peak,230.85,134.0,12,1.07,0.74,0.84,0.72
523,239f3baf3dd3b9e9464f66a777f9897d,17,1,vp_sched,offpeak,147.85,171.0,9,1.23,0.74,0.93,-0.14
524,239f3baf3dd3b9e9464f66a777f9897d,17,1,vp_sched,peak,145.65,171.0,9,1.24,0.8,0.95,-0.15
1721,43d8d305ee692724a532f30ea63a1cbe,17,1,vp_sched,offpeak,2787.37,1944.0,36,1.91,0.97,0.97,0.43


### Cleaning Function
* Prep for export and charts

In [31]:
all_routes2.columns

Index(['schedule_gtfs_dataset_key', 'route_id', 'direction_id',
       'sched_rt_category', 'peak_offpeak', 'rt_service_min',
       'service_minutes', 'n_trips', 'pings_per_min', 'spatial_accuracy_pct',
       'rt_w_gtfs_pct', 'rt_v_scheduled_time_pct'],
      dtype='object')

In [32]:
all_routes2.sample()

Unnamed: 0,schedule_gtfs_dataset_key,route_id,direction_id,sched_rt_category,peak_offpeak,rt_service_min,service_minutes,n_trips,pings_per_min,spatial_accuracy_pct,rt_w_gtfs_pct,rt_v_scheduled_time_pct
5671,f1b35a50955aeb498533c1c6fdafbe44,102,1,vp_sched,peak,1237.83,758.0,16,1.57,0.91,0.94,0.63


In [33]:
pct_cols = [
       'rt_w_gtfs_pct', 'rt_v_scheduled_time_pct','spatial_accuracy_pct',]

In [34]:
int_cols = [ 'rt_service_min',
       'service_minutes',]

In [35]:
def clean_df(df:pd.DataFrame, pct_cols:list, int_cols:list)->pd.DataFrame:
    for i in pct_cols:
        df[i] = df[i] * 100
    for i in int_cols:
        df[i] = df[i].fillna(0).round()
    
    df.columns = df.columns.str.replace("_", " ").str.strip().str.title()
    return df 

In [36]:
all_routes3 = clean_df(all_routes2, pct_cols, int_cols)

#### How come there are missing rt with GTFS Pct even if  it's vp_sched?

In [37]:
all_routes3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6276 entries, 0 to 6275
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Schedule Gtfs Dataset Key  6276 non-null   object 
 1   Route Id                   6276 non-null   object 
 2   Direction Id               6276 non-null   Int64  
 3   Sched Rt Category          6276 non-null   object 
 4   Peak Offpeak               6276 non-null   object 
 5   Rt Service Min             6276 non-null   float64
 6   Service Minutes            6276 non-null   float64
 7   N Trips                    6276 non-null   int64  
 8   Pings Per Min              6276 non-null   float64
 9   Spatial Accuracy Pct       5959 non-null   float64
 10  Rt W Gtfs Pct              5464 non-null   float64
 11  Rt V Scheduled Time Pct    6276 non-null   float64
dtypes: Int64(1), float64(6), int64(1), object(4)
memory usage: 594.6+ KB


In [38]:
all_routes3.loc[all_routes3['Rt W Gtfs Pct'].isna()].sample()

Unnamed: 0,Schedule Gtfs Dataset Key,Route Id,Direction Id,Sched Rt Category,Peak Offpeak,Rt Service Min,Service Minutes,N Trips,Pings Per Min,Spatial Accuracy Pct,Rt W Gtfs Pct,Rt V Scheduled Time Pct
5800,f1b35a50955aeb498533c1c6fdafbe44,94,0,vp_sched,offpeak,237.0,98.0,4,1.91,53.74,,142.26


In [39]:
all_routes.loc[all_routes.route_id == "3424"]

Unnamed: 0,schedule_gtfs_dataset_key,trip_instance_key,rt_service_min,min_w_atleast2_trip_updates,total_pings_for_trip,total_min_w_gtfs,total_vp,vp_in_shape,route_id,direction_id,sched_rt_category,service_minutes,peak_offpeak,service_date
110,63029a23cb0e73f2a5d98a345c5e2e40,c69b13c2119866eb06e6deb0acfb69a4,46.35,46,137,46,137.0,137.0,3424,0,vp_sched,35.0,offpeak,2023-11-15
111,63029a23cb0e73f2a5d98a345c5e2e40,93d1d9dfae46bf24f28e7362cab552fe,52.98,53,157,53,157.0,157.0,3424,0,vp_sched,35.0,peak,2023-11-15
112,63029a23cb0e73f2a5d98a345c5e2e40,6f74a93d0773096f7dcd0a355598a68f,34.7,35,102,35,102.0,102.0,3424,0,vp_sched,35.0,peak,2023-11-15
113,63029a23cb0e73f2a5d98a345c5e2e40,2aa610d0f1260a00a139fe6575314fa3,44.67,44,132,45,132.0,132.0,3424,0,vp_sched,35.0,peak,2023-11-15
114,63029a23cb0e73f2a5d98a345c5e2e40,7e6fdf20f688485e7d042515ec1b109f,27.58,28,81,28,81.0,81.0,3424,0,vp_sched,35.0,offpeak,2023-11-15
115,63029a23cb0e73f2a5d98a345c5e2e40,6ee8f1f8af4e608e9cd1e74b2f555aa9,48.68,49,144,49,144.0,144.0,3424,0,vp_sched,35.0,offpeak,2023-11-15
116,63029a23cb0e73f2a5d98a345c5e2e40,c74ea2abf0e63ff5780536807e2f427b,56.35,56,167,56,167.0,131.0,3424,0,vp_sched,35.0,offpeak,2023-11-15
117,63029a23cb0e73f2a5d98a345c5e2e40,ad4872a697aa30a6506537b734a5a1e1,43.35,43,128,43,128.0,128.0,3424,0,vp_sched,35.0,offpeak,2023-11-15
118,63029a23cb0e73f2a5d98a345c5e2e40,43b3456d36efd54c28ede96cb9e1e51e,93.03,92,275,93,275.0,223.0,3424,0,vp_sched,35.0,offpeak,2023-11-15
119,63029a23cb0e73f2a5d98a345c5e2e40,cd833db56a385b4d144d2100b506b0f7,44.4,42,128,45,128.0,128.0,3424,0,vp_sched,35.0,peak,2023-11-15


In [40]:
all_routes3['Sched Rt Category'].value_counts()

vp_sched    6276
Name: Sched Rt Category, dtype: int64

In [41]:
all_routes3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6276 entries, 0 to 6275
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Schedule Gtfs Dataset Key  6276 non-null   object 
 1   Route Id                   6276 non-null   object 
 2   Direction Id               6276 non-null   Int64  
 3   Sched Rt Category          6276 non-null   object 
 4   Peak Offpeak               6276 non-null   object 
 5   Rt Service Min             6276 non-null   float64
 6   Service Minutes            6276 non-null   float64
 7   N Trips                    6276 non-null   int64  
 8   Pings Per Min              6276 non-null   float64
 9   Spatial Accuracy Pct       5959 non-null   float64
 10  Rt W Gtfs Pct              5464 non-null   float64
 11  Rt V Scheduled Time Pct    6276 non-null   float64
dtypes: Int64(1), float64(6), int64(1), object(4)
memory usage: 594.6+ KB
