In [1]:
import json
from tqdm import tqdm
from rtc_transit_equity.datasets import generate
data = generate(regenerate=True)
print(data.keys())

Joining bus stops onto routes. This may take a while!


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


Finished all dataset gathering and preprocessing in 402.13168501853943s
dict_keys(['route_df', 'ridership_df', 'tract_population_df', 'county_population_df', 'bus_stop_income', 'result'])


In [2]:
result = data['result']
ridership_df = data['ridership_df']

# Strategic Q1
What bus routes and stops, if made free, would most benefit low income riders in Massachusetts?

To answer this question, we used 2 approaches. First, we found the average household income of all users within a route and analyzed routes with respect to the average median household income.

Next we will analyze routes with respect to the lowest median household income.


p = population census count of some bus stop

m = median household income of some bus stop

We defined the median household income of a route to be:
$$\frac{\sum p * m}{\sum p}$$

Where we are essentially normalizing the median household income of a route based on the stops along that route

In [3]:
route_ids = list(set(result.route_id))

# TODO FIGURE OUT WHY WE HAVE DUPLICATE AGENCIES
# Normalize route's median household income by bus stops along the route
averages = {}
for route_id in tqdm(route_ids):
    route_stops = result.loc[result.route_id == route_id]
    route_avg = (route_stops.population * route_stops.median_household_income).sum() / route_stops.population.sum()
    
    agencies = list(set(route_stops.Agency))
    averages[route_id] = {
        "median_household_income": float(route_avg), 
        "RTA": list(set(route_stops.Agency)),
        "route_population": float(route_stops.drop_duplicates('census_tract').population.sum()),
        "short_name": list(set(route_stops.route_short_name))
    }


100%|██████████| 221/221 [00:01<00:00, 121.67it/s]


In [4]:
vals = list(averages.items())
poorest_routes = sorted(vals, key=lambda elem: elem[1]["median_household_income"])
print(f"Poorest routes:\n\n{json.dumps(poorest_routes[:10], indent=2)}")

Poorest routes:

[
  [
    "B4",
    {
      "median_household_income": 16901.605198118472,
      "RTA": [
        "PioneerValleyRTA"
      ],
      "route_population": 8281.0,
      "short_name": [
        "B4"
      ]
    }
  ],
  [
    "921",
    {
      "median_household_income": 21671.941073823196,
      "RTA": [
        "PioneerValleyRTA"
      ],
      "route_population": 20286.0,
      "short_name": [
        "P21E"
      ]
    }
  ],
  [
    "3688",
    {
      "median_household_income": 26521.33830045623,
      "RTA": [
        "WRTA"
      ],
      "route_population": 11705.0,
      "short_name": [
        "6"
      ]
    }
  ],
  [
    "2995",
    {
      "median_household_income": 29320.265139403735,
      "RTA": [
        "LowellRTA"
      ],
      "route_population": 22544.0,
      "short_name": [
        "6"
      ]
    }
  ],
  [
    "2998",
    {
      "median_household_income": 30999.477108691663,
      "RTA": [
        "MerrimackValleyRTA",
        "LowellRTA"
     

# Strategic Q2

C: Total Operating Cost

T: Total number of unlinked trips

F: Total fares in a fiscal year

Average Cost = $$\frac{C}{T}$$

Free Estimated Average Cost Per Trip = $$\frac{C + F}{T * 1.3}$$

We use the 1.3 constant due to the simpson curtson rule

In [5]:
RIDERSHIP_INCREASE_CONSTANT = 1.3

ridership_df['Free Estimated Average Cost Per Trip'] = \
    (ridership_df['Operating Expenses FY'] + ridership_df['Fares FY']) / \
    (ridership_df['Unlinked Passenger Trips FY'] * RIDERSHIP_INCREASE_CONSTANT)

ridership_df['Free percent average cost change'] = \
    (ridership_df['Free Estimated Average Cost Per Trip'] - ridership_df['Average Cost per Trip FY']) / \
    ridership_df['Average Cost per Trip FY']

In [6]:
display(ridership_df[[
    'Free Estimated Average Cost Per Trip',
    'Average Cost per Trip FY',
    'Free percent average cost change',
    'Agency'
]].sort_values('Free percent average cost change').dropna())

Unnamed: 0,Free Estimated Average Cost Per Trip,Average Cost per Trip FY,Free percent average cost change,Agency
270,5.843595,7.5967,-0.230772,"Woods Hole, Martha's Vineyard and Nantucket St..."
261,38.680183,48.876,-0.208606,Worcester Regional Transit Authority COA
170,6.27262,7.5696,-0.171341,Merrimack Valley Regional Transit Authority
201,7.516318,9.0034,-0.165169,Cape Ann Transportation Authority
217,9.408694,11.1007,-0.152423,Greater Attleboro-Taunton Regional Transit Aut...
257,7.948441,9.1898,-0.13508,MetroWest Regional Transit Authority
210,9.334594,10.7798,-0.134066,Montachusett Regional Transit Authority
165,9.223292,10.6482,-0.133817,Berkshire Regional Transit Authority
160,6.260449,7.1723,-0.127135,Lowell Regional Transit Authority
247,9.986896,11.3875,-0.122995,Cape Cod Regional Transit Authority


Below I will prove that assuming we have a 30% increase of ridership when we remove 100% of fares via the simpson curtson rule, average cost per trip will always decrease when the amount of revenue you generate is less than 3 tenths of the total operational cost:

I will define the below function to be the percent change of average cost per trip when you remove 100% of fares. Let's define the following variables:

x: Total revenue from fares

C: Total cost of operation

T: Total number of trips

$$g(x) = \frac{(\frac{c + x}{t*1.3}) - \frac{c}{t}}{\frac{c}{t}}$$

Notice this function will be strictly increasing when C and T are positive numbers.

Now to solve for when y=0:

$$0 = \frac{(\frac{c + x}{t*1.3}) - \frac{c}{t}}{\frac{c}{t}}$$

$$0 = \frac{c + x}{t*1.3} - \frac{c}{t}$$

$$\frac{c}{t} = \frac{c + x}{t*1.3}$$

$$c * 1.3 = c + x$$

$$c * 1.3 - c = x$$

$$.3 * c = x$$

data['result']

In [7]:
set(data['result']['route_id'])

{'008eea84-4b41-4a40-834b-1fd64c4c263b',
 '047a7ee4-92be-411d-81c8-b5c368c8523b',
 '04f85805-f5dd-44e3-8ea6-f7eeba4dfde0',
 '0d53d025-3b35-4260-becb-308b0cfa9552',
 '10729',
 '10730',
 '10731',
 '10732',
 '10733',
 '10734',
 '10737',
 '10742',
 '10743',
 '10745',
 '10746',
 '10747',
 '10748',
 '10749',
 '10750',
 '10751',
 '10752',
 '10753',
 '10754',
 '10755',
 '12605',
 '14475',
 '15204',
 '159c25b6-9842-4920-9774-ed1a0475f4f6',
 '1b2be85e-a5c2-4bc7-a310-1eb635c356e9',
 '2601',
 '2602',
 '2603',
 '2604',
 '2606',
 '2607',
 '2608',
 '2741',
 '2799',
 '2801',
 '2803',
 '2804',
 '2805',
 '2807',
 '2808',
 '2812',
 '2842',
 '2843',
 '2844',
 '2845',
 '2847',
 '2848',
 '2883',
 '2884',
 '2885',
 '2886',
 '2887',
 '2888',
 '2889',
 '2890',
 '2891',
 '2892',
 '2893',
 '2894',
 '2895',
 '2896',
 '2897',
 '2898',
 '2899',
 '2900',
 '2901',
 '2902',
 '2903',
 '2904',
 '2905',
 '2906',
 '2907',
 '2908',
 '2909',
 '2910',
 '2911',
 '2912',
 '2933',
 '2934',
 '2935',
 '2936',
 '2937',
 '2938',
 '