# GTFS Schedule and RT compliant operators

High-level metric to see how many ITP IDs we track year to year with GTFS schedule and RT data

* [Slack request](https://cal-itp.slack.com/archives/C014Q6G3VCJ/p1657141675073339)
* GTFS Schedule fact daily feeds: https://dbt-docs.calitp.org/#!/model/model.calitp_warehouse.gtfs_schedule_fact_daily
    * this is pre-aggregated, let's just grab distinct ITP IDs from here
* GTFS RT fact files: https://dbt-docs.calitp.org/#!/model/model.calitp_warehouse.gtfs_rt_fact_daily_feeds
    * model this after how GTFS schedule does it

In [1]:
import os
import pandas as pd

from calitp.tables import tbl
from siuba import *



In [2]:
gtfs_sched_operators = (
    tbl.views.gtfs_schedule_fact_daily()
    >> select(_.date, _.n_distinct_itp_ids)
    >> collect()
)

gtfs_rt_operators = (
    tbl.views.gtfs_rt_fact_daily_feeds()
    >> select(_.calitp_itp_id, _.date)
    >> distinct()
    >> group_by(_.date)
    >> summarize(n_distinct_itp_ids = _.calitp_itp_id.nunique())
    >> collect()  
)

In [3]:
def parse_date(df):
    df = df.assign(
        date = pd.to_datetime(df.date)
    ).sort_values("date").reset_index(drop=True)
    
    return df

def select_start_end(df, start, end):
    df2 = parse_date(df)
    
    df3 = df2[(df2.date==start) | 
              (df2.date==end)].reset_index(drop=True)
    
    return df3

## GTFS Schedule - unique ITP IDs year to year

In [4]:
start_date = "2021-07-01"
end_date = "2022-06-30"

gtfs_sched = select_start_end(gtfs_sched_operators, start_date, end_date)
gtfs_sched

Unnamed: 0,date,n_distinct_itp_ids
0,2021-07-01,181
1,2022-06-30,195


## GTFS RT - unique ITP IDs year to year

* Earliest RT is 7/7/21 (pretty close to 7/1/21!)

In [5]:
earliest_rt = pd.to_datetime(gtfs_rt_operators.date.min())

gtfs_rt = select_start_end(gtfs_rt_operators, earliest_rt, end_date)
gtfs_rt

Unnamed: 0,date,n_distinct_itp_ids
0,2021-07-07,29
1,2022-06-30,79
