# Open Loop Ridership Estimates

How much do we save by going after ridership boost from open loop vs more service?
[GH Issue](https://github.com/cal-itp/data-analyses/issues/545)

# Datasets
* NTD, annual ridership by agency 
* NTD, cost per revenue hour
* GTFS schedule, number of trips per agency

# Calculations

## Target Ridership

First we want to project the target ridership gained per agency by adopting open loop. This can be defined as projected ridership with open loop by agency equals annual ridership multiplied by 1.03, representing the projected 3% increase.


` projected_annual_ridership = existing_annual_ridership * 1.03 `

## Number of Riders per Additional Trip

Second we need to calculate the number of riders per additional trip so that we can determine what an equivalent service increase would be to guarantee the same ridership increase that open loop does. The number of additional riders can be calculated as existing ridership multiplied by percentage increase of ridership per additional trip added.

`  number_of_rider_per_additional_trip = existing_annual_ridership * percentage_increase_of_ridership_per_additional_trip_added `

## Increase in Service by Agency Needed

Third we need to calculate the increase in service needed. This is defined as number of overall riders after open loop subtracted by the current number of riders, then all divided by number of riders per additional trip.

` increase_in_service_by_agency = (projected_annual_ridership - existing_annual_ridership ) / number_of_rider_per_additional_trip `

## Cost Estimates

Fourth we need to calculate the cost of service increase. From the NTD, we know how much one revenue hour of service costs. In order to know the total number of additional hours needed, we need to take the number of additional trips needed and multiply it by the runtime per trip.

` estimated_cost = cost_of_one_revenue_hour_of_service * increase_in_service_by_agency * runtime_per_trip `

## Target Ridership

In [8]:
import numpy as np
import geopandas as gpd
import os
import pandas as pd

#raise limit
os.environ["CALITP_BQ_MAX_BYTES"] = str(50_000_000_000)

import calitp
from calitp.tables import tbls
from siuba import *

pd.set_option("display.max_rows", 10)

In [4]:
# Import data

ntd_master_dict = pd.read_excel("gs://calitp-analytics-data/data-analyses/2021-Annual-Database-Files/September 2022 Adjusted Database.xlsx", sheet_name='MASTER')
ntd_upt_dict = pd.read_excel("gs://calitp-analytics-data/data-analyses/2021-Annual-Database-Files/September 2022 Adjusted Database.xlsx", sheet_name='UPT')

In [6]:
ntd_upt_dict

Unnamed: 0,5 digit NTD ID,4 digit NTD ID,Agency,Active,Reporter Type,UZA,UZA Name,Modes,TOS,JAN02,...,DEC21,JAN22,FEB22,MAR22,APR22,MAY22,JUN22,JUL22,AUG22,SEP22
0,1.0,1,King County Department of Metro Transit,Active,Full Reporter,14.0,"Seattle, WA",DR,PT,135144.0,...,4.001800e+04,4.060000e+04,4.273800e+04,4.891800e+04,4.782300e+04,4.793400e+04,4.848500e+04,4.678300e+04,5.010500e+04,4.788700e+04
1,1.0,1,King County Department of Metro Transit,Active,Full Reporter,14.0,"Seattle, WA",DR,TX,,...,7.962000e+03,6.469000e+03,7.298000e+03,1.093100e+04,9.229000e+03,9.084000e+03,8.798000e+03,8.817000e+03,9.688000e+03,9.582000e+03
2,1.0,1,King County Department of Metro Transit,Active,Full Reporter,14.0,"Seattle, WA",FB,DO,,...,1.729300e+04,1.505000e+04,1.831700e+04,2.660900e+04,2.929200e+04,3.355900e+04,4.508500e+04,6.125100e+04,6.078600e+04,4.697000e+04
3,1.0,1,King County Department of Metro Transit,Inactive,Full Reporter,14.0,"Seattle, WA",LR,DO,12990.0,...,,,,,,,,,,
4,1.0,1,King County Department of Metro Transit,Active,Full Reporter,14.0,"Seattle, WA",MB,DO,6045861.0,...,3.422080e+06,3.510345e+06,3.502095e+06,4.311305e+06,4.465947e+06,4.646615e+06,4.664895e+06,4.649909e+06,4.809401e+06,4.804553e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2238,,,,,,,Rolling 12-Month Sum,,,,...,4.672424e+09,4.768064e+09,4.916651e+09,5.077845e+09,5.230274e+09,5.372008e+09,5.483131e+09,5.554762e+09,5.658613e+09,5.756836e+09
2239,,,,,,,Reduced Reporters,,,,...,4.294130e+07,4.294130e+07,4.294130e+07,4.294130e+07,4.294130e+07,4.294130e+07,4.294130e+07,4.294130e+07,4.294130e+07,4.294130e+07
2240,,,,,,,Rolling 12-Month Sum with Reduced Reporters,,,,...,4.715365e+09,4.811005e+09,4.959593e+09,5.120786e+09,5.273216e+09,5.414949e+09,5.526073e+09,5.597703e+09,5.701555e+09,5.799777e+09
2241,,,,,,,Rural Reporters,,,,...,7.115425e+07,7.115425e+07,7.115425e+07,7.115425e+07,7.115425e+07,7.115425e+07,7.115425e+07,7.115425e+07,7.115425e+07,7.115425e+07


In [None]:
#ntd_upt_dict.astype()

In [41]:
ntd_upt_dict_clean = pd.DataFrame(data=ntd_upt_dict).rename(columns={'5 digit NTD ID' : '5_digit_ntd_id',
                                         '4 digit NTD ID' : '4_digit_ntd_id',
                                          'Agency' : 'agency',
                                          'Active': 'active',
                                          'Reporter Type': 'reporter_type',
                                          'UZA': 'uza',
                                          'UZA Name': 'uza_name',
                                          'Modes': 'modes',
                                          'TOS': 'tos'})

ntd_upt_dict_clean

In [56]:
ntd_upt_dict_clean = ntd_upt_dict_clean.melt(id_vars=['5_digit_ntd_id', '4_digit_ntd_id', 'agency', 'active', 'reporter_type', 'uza', 'uza_name', 'modes', 'tos'],var_name='month').set_index(['5_digit_ntd_id', '4_digit_ntd_id', 'agency', 'active', 'reporter_type', 'uza', 'uza_name', 'modes', 'tos', 'month'])
ntd_upt_dict_clean

In [None]:
# Split string between month and year in month column
#ntd_upt_dict_clean.month.str.split('(\d+)',s)