In [1]:
import json
from tqdm import tqdm
from rtc_transit_equity.datasets import generate
data = generate()
print(data.keys())

Finished all dataset gathering and preprocessing in 3.002830743789673s
dict_keys(['route_df', 'ridership_df', 'tract_population_df', 'county_population_df', 'bus_stop_income', 'result'])


In [2]:
result = data['result']
ridership_df = data['ridership_df']

# Strategic Q1
What bus routes and stops, if made free, would most benefit low income riders in Massachusetts?

To answer this question, we used 2 approaches. First, we found the average household income of all users within a route and analyzed routes with respect to the average median household income.

Next we will analyze routes with respect to the lowest median household income.


p = population census count of some bus stop

m = median household income of some bus stop

We defined the median household income of a route to be:
$$\frac{\sum p * m}{\sum p}$$

Where we are essentially normalizing the median household income of a route based on the stops along that route

In [3]:
route_ids = list(set(result.route_id))

# TODO FIGURE OUT WHY WE HAVE DUPLICATE AGENCIES
# Normalize route's median household income by bus stops along the route
averages = {}
for route_id in tqdm(route_ids):
    route_stops = result.loc[result.route_id == route_id]
    route_avg = (route_stops.population * route_stops.median_household_income).sum() / route_stops.population.sum()
    
    agencies = list(set(route_stops.Agency))
    averages[route_id] = {
        "median_household_income": float(route_avg), 
        "RTA": list(set(route_stops.Agency)),
        "route_population": float(route_stops.drop_duplicates('census_tract').population.sum()),
        "short_name": list(set(route_stops.route_short_name))
    }


100%|██████████| 77/77 [00:00<00:00, 80.08it/s]


In [4]:
vals = list(averages.items())
poorest_routes = sorted(vals, key=lambda elem: elem[1]["median_household_income"])
print(f"Poorest routes:\n\n{json.dumps(poorest_routes[:10], indent=2)}")

Poorest routes:

[
  [
    "3481",
    {
      "median_household_income": 28750.0,
      "RTA": [
        "BrocktonAreaRTA"
      ],
      "route_population": 3130.0,
      "short_name": [
        "7"
      ]
    }
  ],
  [
    "X92",
    {
      "median_household_income": 40770.87731072433,
      "RTA": [
        "PioneerValleyRTA"
      ],
      "route_population": 326174.0,
      "short_name": [
        "X92"
      ]
    }
  ],
  [
    "R29",
    {
      "median_household_income": 44615.066435470304,
      "RTA": [
        "PioneerValleyRTA",
        "FranklinRTA"
      ],
      "route_population": 63920.0,
      "short_name": [
        "R29"
      ]
    }
  ],
  [
    "2939",
    {
      "median_household_income": 45217.78339521289,
      "RTA": [
        "BrocktonAreaRTA"
      ],
      "route_population": 12937.0,
      "short_name": [
        "6"
      ]
    }
  ],
  [
    "2812",
    {
      "median_household_income": 49181.229627430606,
      "RTA": [
        "CapeCodRTA"
    

# Strategic Q2

C: Total Operating Cost

T: Total number of unlinked trips

F: Total fares in a fiscal year

Average Cost = $$\frac{C}{T}$$

Free Estimated Average Cost Per Trip = $$\frac{C + F}{T * 1.3}$$

We use the 1.3 constant due to the simpson curtson rule

In [5]:
RIDERSHIP_INCREASE_CONSTANT = 1.3

ridership_df['Free Estimated Average Cost Per Trip'] = \
    (ridership_df['Operating Expenses FY'] + ridership_df['Fares FY']) / \
    (ridership_df['Unlinked Passenger Trips FY'] * RIDERSHIP_INCREASE_CONSTANT)

ridership_df['Free percent average cost change'] = \
    (ridership_df['Free Estimated Average Cost Per Trip'] - ridership_df['Average Cost per Trip FY']) / \
    ridership_df['Average Cost per Trip FY']

In [6]:
display(ridership_df[[
    'Free Estimated Average Cost Per Trip',
    'Average Cost per Trip FY',
    'Free percent average cost change',
    'Agency'
]].sort_values('Free percent average cost change').dropna())

Unnamed: 0,Free Estimated Average Cost Per Trip,Average Cost per Trip FY,Free percent average cost change,Agency
17,5.843595,7.5967,-0.230772,"Woods Hole, Martha's Vineyard and Nantucket St..."
16,38.680183,48.876,-0.208606,Worcester Regional Transit Authority COA
6,6.27262,7.5696,-0.171341,Merrimack Valley Regional Transit Authority
9,7.516318,9.0034,-0.165169,Cape Ann Transportation Authority
11,9.408694,11.1007,-0.152423,Greater Attleboro-Taunton Regional Transit Aut...
15,7.948441,9.1898,-0.13508,MetroWest Regional Transit Authority
10,9.334594,10.7798,-0.134066,Montachusett Regional Transit Authority
4,9.223292,10.6482,-0.133817,Berkshire Regional Transit Authority
2,6.260449,7.1723,-0.127135,Lowell Regional Transit Authority
13,9.986896,11.3875,-0.122995,Cape Cod Regional Transit Authority


Below I will prove that assuming we have a 30% increase of ridership when we remove 100% of fares via the simpson curtson rule, average cost per trip will always decrease when the amount of revenue you generate is less than 3 tenths of the total operational cost:

I will define the below function to be the percent change of average cost per trip when you remove 100% of fares. Let's define the following variables:

x: Total revenue from fares

C: Total cost of operation

T: Total number of trips

$$g(x) = \frac{(\frac{c + x}{t*1.3}) - \frac{c}{t}}{\frac{c}{t}}$$

Notice this function will be strictly increasing when C and T are positive numbers.

Now to solve for when y=0:

$$0 = \frac{(\frac{c + x}{t*1.3}) - \frac{c}{t}}{\frac{c}{t}}$$

$$0 = \frac{c + x}{t*1.3} - \frac{c}{t}$$

$$\frac{c}{t} = \frac{c + x}{t*1.3}$$

$$c * 1.3 = c + x$$

$$c * 1.3 - c = x$$

$$.3 * c = x$$