## FY23 FTA Bus and Low- and No-Emission Grant Awards Analysis

<b>GH issue:</b> 
* Research Request - Bus Procurement Costs & Awards #897

<b>Data source(s):</b> 
1. https://www.transit.dot.gov/funding/grants/fy23-fta-bus-and-low-and-no-emission-grant-awards
2. https://storymaps.arcgis.com/stories/022abf31cedd438b808ec2b827b6faff

<b>Definitions:</b>  
* <u>Grants for Buses and Bus Facilities Program:</u>
    * 49 U.S.C. 5339(b)) makes federal resources available to states and direct recipients to replace, rehabilitate and purchase buses and related equipment and to construct bus-related facilities, including technological changes or innovations to modify low or no emission vehicles or facilities. Funding is provided through formula allocations and competitive grants. 
<br><br>
* <u>Low or No Emission Vehicle Program:</u>
    * 5339(c) provides funding to state and local governmental authorities for the purchase or lease of zero-emission and low-emission transit buses as well as acquisition, construction, and leasing of required supporting facilities.


In [61]:
import pandas as pd
#import shared_utils
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from scipy import stats
#set_option to increase max rows displayed to 200, to see entire df in 1 go/
pd.set_option("display.max_rows", 200)


ModuleNotFoundError: No module named 'shared_utils'

## Reading in raw data from gcs

In [13]:
df = pd.read_csv(
    "gs://calitp-analytics-data/data-analyses/bus_procurement_cost/data-analyses_bus_procurement_cost_fta_press_release_data_csv.csv"
)

## Data Cleaning
1. snake-case column name
2. currency format funcding column (with $ and , )
3. seperate text from # of bus col (split at '(')
    a. trim spaces in new col
    b. get rid of () characters in new col
4. trim spaces in other columns?

In [14]:
# snake case columns names via list
new_col = [
    "state",
    "project_sponsor",
    "project_title",
    "description",
    "funding",
    "#_of_buses",
    "project_type",
    "propulsion_type",
    "area_served",
    "congressional_districts",
    "fta_region",
    "bus/low-no_program",
]

df.columns = new_col
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program'],
      dtype='object')

In [15]:
# checking data type of funding col
# checking to see if any values are not numbers
# will need to clean up this col
display(df["funding"].dtype,
        df.funding.value_counts()
       )

dtype('O')

$5,000,000       3
$6,000,000       2
$3,400,000       2
$104,000,000     1
$4,313,552       1
                ..
$13,880,910      1
$15,423,904      1
$16,166,822      1
$16,358,000      1
$181,250         1
Name: funding, Length: 126, dtype: int64

In [16]:
#clean up funding column. removing $ and , and making column as int64
df['funding']=df['funding'].str.replace('$','')
df['funding']=df['funding'].str.replace(',','')
df['funding'] = df['funding'].astype('int64')

  df['funding']=df['funding'].str.replace('$','')


In [17]:
#checking to see if str.replace worked.
display(df["funding"].dtype,
        df.head()
       )

dtype('int64')

Unnamed: 0,state,project_sponsor,project_title,description,funding,#_of_buses,project_type,propulsion_type,area_served,congressional_districts,fta_region,bus/low-no_program
0,DC,Washington Metropolitan Area Transit Authority...,Battery-Electric Metrobus Procurement and Elec...,WMATA will receive funding to convert its Cind...,104000000,100(beb),bus/chargers,zero,Large Urban,DC-001 ; MD-004 ; MD-008 ; VA-008 ; VA-011,3,Low-No
1,TX,Dallas Area Rapid Transit (DART),DART CNG Bus Fleet Modernization Project,Dallas Area Rapid Transit will receive funding...,103000000,90 (estimated-CNG buses),bus,low,Large Urban,TX-003 ; TX-004 ; TX-005 ; TX-006 ; TX-024 ; T...,6,Low-No
2,PA,Southeastern Pennsylvania Transportation Autho...,SEPTA Zero-Emission Bus Transition Facility Sa...,The Southeastern Pennsylvania Transportation A...,80000000,0,facility,zero,Large Urban,PA-002 ; PA-003 ; PA-004 ; PA-005,3,Low-No
3,LA,New Orleans Regional Transit Authority,Accelerating Zero-Emissions Mobility for a Res...,The New Orleans Regional Transit Authority wil...,71439261,20 (zero-emission),Bus / Chargers / Equipment,zero,Large Urban,LA-002 ; LA-001,6,Low-No
4,NJ,New Jersey Transit Corporation,Hilton Bus Garage Modernization,New Jersey Transit will receive funding to mod...,47000000,0,facility/chargers,zero,Large Urban,nj-011,2,Bus


In [18]:
# test of removing the spaces first in # of bus colum, THEN split by (
df["#_of_buses"] = df["#_of_buses"].str.replace(" ", "")

In [19]:
#spaces removed, and zeros are kept
df['#_of_buses'].value_counts()

0                     34
7(Electric)            3
2                      3
4(BEBs)                3
20(BEBs)               3
                      ..
37(cng)                1
160                    1
31                     1
25(lowemissionCNG)     1
3                      1
Name: #_of_buses, Length: 81, dtype: int64

In [20]:
#spliting the # of buses column into 2, using the ( char as the delimiter
df[["bus_count", "bus_desc"]] = df["#_of_buses"].str.split(pat="(", n=1, expand=True)

In [21]:
#checking col. retained the initial col. and added new columns to the end.
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program',
       'bus_count', 'bus_desc'],
      dtype='object')

In [22]:
# examining the new bus count col.
#zero values remained the same
# see there are 2 values that are inconsistent.
df.bus_count.value_counts()

0                          34
4                          10
7                           8
6                           8
20                          6
2                           6
5                           6
9                           5
11                          4
3                           4
16                          3
15                          3
10                          3
25                          3
8                           2
39                          2
1                           2
13                          2
56estimated-cutawayvans     1
134                         1
42                          1
50                          1
14                          1
100                         1
37                          1
160                         1
31                          1
12batteryelectric           1
90                          1
18                          1
17                          1
23                          1
69                          1
30        

In [23]:
# function to find the row index of a specific value and column in a dataframe
def find_loc(data, col, val):
    x = data.loc[data[col] == val].index[0]
    return x

In [24]:
loc1 = find_loc(df, "bus_count", "56estimated-cutawayvans")
loc2 = find_loc(df, "bus_count", "12batteryelectric")

In [25]:
display(loc1, loc2)

58

32

In [26]:
# editing the values of the bus count col at specific location
#syntax, look at ## index, look at XX column
df.loc[58, "bus_count"] = 56
df.loc[32, "bus_count"] = 12

In [27]:
# updating values again for bus_desc. same location
df.loc[58, "bus_desc"] = "estimated-cutaway vans (PM- award will not fund 68 buses)"
df.loc[32, "bus_desc"] = "battery electric"

In [28]:
# values updated as inteneded for bus count and bus desc
display(df.loc[32], df.loc[58])

state                                                                     MN
project_sponsor                                                Metro Transit
project_title              Investments Toward an Electric Future: Metro T...
description                Metro Transit will receive funding to buy batt...
funding                                                             17532900
#_of_buses                                                 12batteryelectric
project_type                                      Bus / Chargers / Equipment
propulsion_type                                                         zero
area_served                                                      Large Urban
congressional_districts           MN-002 ; MN-003 ; MN-004 ; MN-005 ; MN-006
fta_region                                                                 5
bus/low-no_program                                                    Low-No
bus_count                                                                 12

state                                                                     TX
project_sponsor            Texas Department of Transportation on behalf o...
project_title              FY23 Rural Transit Asset Replacement & Moderni...
description                The Texas Department of Transportation will re...
funding                                                              7443765
#_of_buses                 56estimated-cutawayvans(PM-awardwillnotfund68b...
project_type                                                 bus / facilitiy
propulsion_type                                                          low
area_served                                                            Rural
congressional_districts    TX-001 ; TX-002 ; TX-004 ; TX-005 ; TX-006 ; T...
fta_region                                                                 6
bus/low-no_program                                                    Low-No
bus_count                                                                 56

In [29]:
# confirming via value counts that all values are valid now.
df.bus_count.value_counts()

0      34
4      10
7       8
6       8
20      6
2       6
5       6
9       5
11      4
3       4
16      3
15      3
10      3
25      3
8       2
39      2
1       2
13      2
56      1
134     1
42      1
50      1
14      1
100     1
37      1
160     1
31      1
12      1
90      1
18      1
17      1
23      1
69      1
30      1
35      1
40      1
12      1
Name: bus_count, dtype: int64

In [30]:
# clearning the bus desc col.
# removing the )
df["bus_desc"] = df["bus_desc"].str.replace(")", "")

  df["bus_desc"] = df["bus_desc"].str.replace(")", "")


In [31]:
df["bus_desc"].unique()

array(['beb', 'estimated-CNGbuses', None, 'zero-emission', 'cngbuses',
       'BEBs', 'Electric\n16(Hybrid', 'FCEB', 'Electric',
       'FuelCellElectric', 'CNG', 'FuelCell', 'hybrid', 'BEB',
       'battery electric', 'lowemissionCNG', 'cng',
       'BEBsparatransitbuses', 'hybridelectric', 'zeroemissionbuses',
       'dieselelectrichybrids', 'hydrogenfuelcell',
       '2BEBsand4HydrogenFuelCellBuses', '4fuelcell/3CNG',
       'estimated-cutaway vans (PM- award will not fund 68 buses',
       'hybridelectricbuses', 'CNGfueled', 'zeroemissionelectric',
       'hybridelectrics', 'dieselandgas', 'diesel-electrichybrids',
       'propane', 'electric', 'diesel-electric', 'propanebuses',
       '1:CNGbus;2cutawayCNGbuses', 'zeroemission',
       'propanedpoweredvehicles'], dtype=object)

In [32]:
# stripping the values in the bus desc col
df["bus_desc"] = df["bus_desc"].str.strip()

In [33]:
df.bus_desc.unique()

array(['beb', 'estimated-CNGbuses', None, 'zero-emission', 'cngbuses',
       'BEBs', 'Electric\n16(Hybrid', 'FCEB', 'Electric',
       'FuelCellElectric', 'CNG', 'FuelCell', 'hybrid', 'BEB',
       'battery electric', 'lowemissionCNG', 'cng',
       'BEBsparatransitbuses', 'hybridelectric', 'zeroemissionbuses',
       'dieselelectrichybrids', 'hydrogenfuelcell',
       '2BEBsand4HydrogenFuelCellBuses', '4fuelcell/3CNG',
       'estimated-cutaway vans (PM- award will not fund 68 buses',
       'hybridelectricbuses', 'CNGfueled', 'zeroemissionelectric',
       'hybridelectrics', 'dieselandgas', 'diesel-electrichybrids',
       'propane', 'electric', 'diesel-electric', 'propanebuses',
       '1:CNGbus;2cutawayCNGbuses', 'zeroemission',
       'propanedpoweredvehicles'], dtype=object)

In [34]:
# creating a dictionary to add spaces back to the values
new_dict = {
    "beb": "BEB",
    "estimated-CNGbuses": "estimated-CNG buses",
    "cngbuses": "CNG buses",
    "BEBs": "BEB",
    "Electric\n16(Hybrid": "15 electic, 16 hybrid",
    "FuelCellElectric": "fuel cell electric",
    "FuelCell": "fuel cell",
    "lowemissionCNG": "low emission CNG",
    "cng": "CNG",
    "BEBsparatransitbuses": "BEBs paratransit buses",
    "hybridelectric": "hybrid electric",
    "zeroemissionbuses": "zero emission buses",
    "dieselelectrichybrids": "diesel electric hybrids",
    "hydrogenfuelcell": "hydrogen fuel cell",
    "2BEBsand4HydrogenFuelCellBuses": "2 BEBs and 4 hydrogen fuel cell buses",
    "4fuelcell/3CNG": "4 fuel cell / 3 CNG",
    "hybridelectricbuses": "hybrid electric buses",
    "CNGfueled": "CNG fueled",
    "zeroemissionelectric": "zero emission electric",
    "hybridelectrics": "hybrid electrics",
    "dieselandgas": "diesel and gas",
    "diesel-electrichybrids": "diesel-electric hybrids",
    "propanebuses": "propane buses",
    "1:CNGbus;2cutawayCNGbuses": "1:CNGbus ;2 cutaway CNG buses",
    "zeroemission": "zero emission",
    "propanedpoweredvehicles": "propaned powered vehicles"
}

In [35]:
#using new dictionary to replace values in the bus desc col
df.replace({'bus_desc': new_dict}, inplace=True)

In [36]:
#confirming the bus desc values were replaced as indeded.
list(df.bus_desc.unique())

['BEB',
 'estimated-CNG buses',
 None,
 'zero-emission',
 'CNG buses',
 '15 electic, 16 hybrid',
 'FCEB',
 'Electric',
 'fuel cell electric',
 'CNG',
 'fuel cell',
 'hybrid',
 'battery electric',
 'low emission CNG',
 'BEBs paratransit buses',
 'hybrid electric',
 'zero emission buses',
 'diesel electric hybrids',
 'hydrogen fuel cell',
 '2 BEBs and 4 hydrogen fuel cell buses',
 '4 fuel cell / 3 CNG',
 'estimated-cutaway vans (PM- award will not fund 68 buses',
 'hybrid electric buses',
 'CNG fueled',
 'zero emission electric',
 'hybrid electrics',
 'diesel and gas',
 'diesel-electric hybrids',
 'propane',
 'electric',
 'diesel-electric',
 'propane buses',
 '1:CNGbus ;2 cutaway CNG buses',
 'zero emission',
 'propaned powered vehicles']

In [37]:
#bus count for row 12 needs to be adjusted to 31 instead of 15
df.loc[12, "bus_count"] = 31

In [38]:
#confirming the change
df.loc[12]

state                                                                     NC
project_sponsor            City of Charlotte - Charlotte Area Transit System
project_title              Charlotte Area Transit System's Sustainable Fl...
description                The city of Charlotte will receive funding to ...
funding                                                             30890413
#_of_buses                                          15(Electric)\n16(Hybrid)
project_type                                      Bus / Chargers / Equipment
propulsion_type                                                   Zero / Low
area_served                                                      Large Urban
congressional_districts           NC-008 ; NC-012 ; NC-013 ; NC-014 ; SC-005
fta_region                                                                 4
bus/low-no_program                                                       Bus
bus_count                                                                 31

In [39]:
#using str.lower() on project type 
df['project_type'] = df['project_type'].str.lower()

In [40]:
#using str.lower() on project type 
df['project_type'] = df['project_type'].str.replace(' ','')

In [41]:
#confirming lower and replace worked as intended
list(df['project_type'].sort_values(ascending=True).unique())

['\tbus/facility',
 'bus',
 'bus/chargers',
 'bus/chargers/equipment',
 'bus/chargers/other',
 'bus/equipment',
 'bus/facilitiy',
 'bus/facility',
 'bus/facility/chargers',
 'bus/facility/chargers/equipment',
 'bus/facility/equipment',
 'bus/facility/equipment/other',
 'bus/facility/other',
 'bus/other',
 'chargers',
 'chargers/equipment',
 'facilities',
 'facility',
 'facility/chargers',
 'facility/chargers/equipment',
 'facility/equipment']

In [42]:
#some values still need to get adjusted. will use a short dictionary to fix
new_type={'\tbus/facility':'bus/facility',
          'bus/facilitiy':'bus/facility',
          'facilities':'facility',
}

In [43]:
#using replace() with the dictionary to replace keys in project type col
#syntax df.replace({'bus_desc': new_dict}, inplace=True)
df.replace({'project_type': new_type}, inplace=True)

In [44]:
#double checking to ensure dictionary reaplce works.
list(df['project_type'].sort_values(ascending=True).unique())

['bus',
 'bus/chargers',
 'bus/chargers/equipment',
 'bus/chargers/other',
 'bus/equipment',
 'bus/facility',
 'bus/facility/chargers',
 'bus/facility/chargers/equipment',
 'bus/facility/equipment',
 'bus/facility/equipment/other',
 'bus/facility/other',
 'bus/other',
 'chargers',
 'chargers/equipment',
 'facility',
 'facility/chargers',
 'facility/chargers/equipment',
 'facility/equipment']

In [49]:
## Cleaning up the bus_desc col
list(df.bus_desc.sort_values().unique())

['15 electic, 16 hybrid',
 '1:CNGbus ;2 cutaway CNG buses',
 '2 BEBs and 4 hydrogen fuel cell buses',
 '4 fuel cell / 3 CNG',
 'BEB',
 'BEBs paratransit buses',
 'CNG',
 'CNG buses',
 'CNG fueled',
 'Electric',
 'FCEB',
 'battery electric',
 'diesel and gas',
 'diesel electric hybrids',
 'diesel-electric',
 'diesel-electric hybrids',
 'electric',
 'estimated-CNG buses',
 'estimated-cutaway vans (PM- award will not fund 68 buses',
 'fuel cell',
 'fuel cell electric',
 'hybrid',
 'hybrid electric',
 'hybrid electric buses',
 'hybrid electrics',
 'hydrogen fuel cell',
 'low emission CNG',
 'propane',
 'propane buses',
 'propaned powered vehicles',
 'zero emission',
 'zero emission buses',
 'zero emission electric',
 'zero-emission',
 None]

In [50]:
bus_dict ={
 'BEBs paratransit buses':'BEB',
 'CNG buses':'CNG',
 'CNG fueled':'CNG',
 'Electric': 'electrc (not specified)',
 'battery electric':'BEB',
 'diesel electric hybrids':'diesel-electric hybrids',
 'diesel-electric':'diesel-electric hybrids',
 'electric': 'electrc (not specified)',
 'estimated-CNG buses':'CNG',
 'fuel cell':'FCEB',
 'fuel cell electric':'FCEB',
 'hybrid': 'hybrid electric',
 'hybrid electric buses': 'hybrid electric',
 'hybrid electrics': 'hybrid electric',
 'low emission CNG': 'CNG',
 'propane buses': 'propane',
 'propaned powered vehicles': 'propane',
 'zero emission':'zero-emission bus (not specified)',
 'zero emission buses':'zero-emission bus (not specified)',
 'zero emission electric':'zero-emission bus (not specified)',
 'zero-emission':'zero-emission bus (not specified)',
 }

In [51]:
#repalcing values in bus_desc with bus_dict dictionary
df.replace({'bus_desc':bus_dict}, inplace=True)

In [53]:
#list of unique bus desc values reduced.
list(df.bus_desc.unique())

['BEB',
 'CNG',
 None,
 'zero-emission bus (not specified)',
 '15 electic, 16 hybrid',
 'FCEB',
 'electrc (not specified)',
 'hybrid electric',
 'diesel-electric hybrids',
 'hydrogen fuel cell',
 '2 BEBs and 4 hydrogen fuel cell buses',
 '4 fuel cell / 3 CNG',
 'estimated-cutaway vans (PM- award will not fund 68 buses',
 'diesel and gas',
 'propane',
 '1:CNGbus ;2 cutaway CNG buses']

In [55]:
#rename bus_desc col to propulsion_type
df = df.rename(columns={'bus_desc':'propulsion_type'})

In [58]:
#confirm column was renamed
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program',
       'bus_count', 'propulsion_type'],
      dtype='object')

In [62]:
df.description

0      WMATA will receive funding to convert its Cind...
1      Dallas Area Rapid Transit will receive funding...
2      The Southeastern Pennsylvania Transportation A...
3      The New Orleans Regional Transit Authority wil...
4      New Jersey Transit will receive funding to mod...
                             ...                        
125    The South Dakota Department of Transportation ...
126    Comanche Nation Transit will receive funds to ...
127    The North Carolina Department of Transportatio...
128    The Colorado Department of Transportation (CDO...
129    The Oregon Department of Transportation on beh...
Name: description, Length: 130, dtype: object

## Exporting cleaned data to GCS

In [59]:
#saving to GCS as csv
df.to_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_bus_cost_clean.csv')

## Reading in cleaned data from GCS

In [3]:
bus_cost = pd.read_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_bus_cost_clean.csv')

In [4]:
#confirming cleaned data shows as expected.
display(bus_cost.shape,
        type(bus_cost),
    bus_cost.columns
       )

(130, 15)

pandas.core.frame.DataFrame

Index(['Unnamed: 0', 'state', 'project_sponsor', 'project_title',
       'description', 'funding', '#_of_buses', 'project_type',
       'propulsion_type', 'area_served', 'congressional_districts',
       'fta_region', 'bus/low-no_program', 'bus_count', 'bus_desc'],
      dtype='object')

In [5]:
#drop unnessary columns
bus_cost = bus_cost.drop(['Unnamed: 0', 'congressional_districts'], axis=1)

In [6]:
#confirming columns dropped as intended.
#less columns(14 to 12)
display(bus_cost.shape,
        bus_cost.columns)

(130, 13)

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'fta_region', 'bus/low-no_program', 'bus_count', 'bus_desc'],
      dtype='object')

## Cost per Bus, per Transit Agency dataframe

In [8]:
only_bus=bus_cost[bus_cost['bus_count']>0]
only_bus.head()

Unnamed: 0,state,project_sponsor,project_title,description,funding,#_of_buses,project_type,propulsion_type,area_served,fta_region,bus/low-no_program,bus_count,bus_desc
0,DC,Washington Metropolitan Area Transit Authority...,Battery-Electric Metrobus Procurement and Elec...,WMATA will receive funding to convert its Cind...,104000000,100(beb),bus/chargers,zero,Large Urban,3,Low-No,100,BEB
1,TX,Dallas Area Rapid Transit (DART),DART CNG Bus Fleet Modernization Project,Dallas Area Rapid Transit will receive funding...,103000000,90(estimated-CNGbuses),bus,low,Large Urban,6,Low-No,90,estimated-CNG buses
3,LA,New Orleans Regional Transit Authority,Accelerating Zero-Emissions Mobility for a Res...,The New Orleans Regional Transit Authority wil...,71439261,20(zero-emission),bus/chargers/equipment,zero,Large Urban,6,Low-No,20,zero-emission
5,TX,Metropolitan Transit Authority of Harris Count...,FY 2023 Renewable Natural Gas Path to Zero Emi...,The Metropolitan Transit Authority of Harris C...,40402548,40(cngbuses),bus/facility,Low,Large Urban,6,Low-No,40,CNG buses
6,MD,"University of Maryland, College Park","35 Battery Electric Transit Buses, Infrastruct...","The University of Maryland, College Park will ...",39863156,35(BEBs),bus/chargers,zero,Large Urban,3,Low-No,35,BEB


In [9]:
cost_per_bus = only_bus.groupby('project_sponsor').agg({
    'funding':'sum',
    'bus_count':'sum'
}).reset_index()

In [10]:
cost_per_bus['cost_per_bus'] = (cost_per_bus['funding']/cost_per_bus['bus_count']).astype('int64')

In [11]:
cost_per_bus.dtypes

project_sponsor    object
funding             int64
bus_count           int64
cost_per_bus        int64
dtype: object

In [12]:
cost_per_bus

Unnamed: 0,project_sponsor,funding,bus_count,cost_per_bus
0,AUTORIDAD METROPOLITANA DE AUTOBUSES (PRMBA),10000000,8,1250000
1,Alameda-Contra Costa Transit District,25513684,25,1020547
2,Berkshire Regional Transit Authority,2212747,2,1106373
3,Brazos Transit District,9650646,11,877331
4,Cape Fear Public Transportation Authority,2860250,5,572050
...,...,...,...,...
90,Washington Metropolitan Area Transit Authority...,104000000,100,1040000
91,Washington State Department of Transportation ...,3303600,9,367066
92,Whatcom Transportation Authority (WTA),9644865,11,876805
93,White Earth Reservation Business Committee,723171,4,180792


In [None]:
## export cost_per_bus df to gcs
cost_per_bus.to_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_cost_per_bus.csv')

## Cost per bus, stats analysis

In [None]:
#read in fta cost per bus csv
cost_per_bus = pd.read_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_cost_per_bus.csv')

In [None]:
display(cost_per_bus.shape,
        cost_per_bus.head()
       )

## Initial Summary Stats

### Summary Stats

In [None]:
#top level alanysis

bus_cost.agg({'project_title':'count',
              'funding':'sum',
              'bus_count':'sum'}
              )

In [None]:
#start of agg. by project_type

bus_cost.groupby('project_type').agg({
    'project_type': 'count',
    'funding': 'sum',
    'bus_count':'sum'
})

In [None]:
#agg by program

bus_cost.groupby('bus/low-no_program').agg({
    'project_type': 'count',
    'funding': 'sum',
    'bus_count':'sum'
})

In [None]:
#agg by state, by funding
bus_cost.groupby('state').agg({
    'project_type': 'count',
    'funding': 'sum',
    'bus_count':'sum'
}).sort_values(by='funding', ascending=False)

### Projects with bus purchases

In [None]:
#df of only projects with a bus count
only_bus=bus_cost[bus_cost['bus_count']>0]

In [None]:
display(only_bus.shape,
        only_bus.columns)

In [None]:
#agg by propulsion type
only_bus['propulsion_type'].value_counts()

In [None]:
only_bus.project_type.value_counts()

In [None]:
#of the rows with bus_count >1, what are the project types?
bus_agg = only_bus.groupby('project_type').agg({
    'project_type': 'count',
    'funding': 'sum',
    'bus_count':'sum'
})

In [None]:
#new column that calculates `cost per bus`
bus_agg['cost_per_bus']=(bus_agg['funding']/bus_agg['bus_count']).astype('int64')

In [None]:
bus_agg

### Projects with no buses

In [None]:
no_bus=bus_cost[bus_cost['bus_count']<1]

In [None]:
no_bus['project_type'].value_counts()

## Overall Summary

In [None]:
project_count=bus_cost.project_title.count()
fund_sum=bus_cost.funding.sum()
bus_count_sum=bus_cost.bus_count.sum()
overall_cost_per_bus = (fund_sum)/(bus_count_sum)
bus_program_count=bus_cost['bus/low-no_program'].value_counts()

projects_with_bus=only_bus.project_title.count()
projects_with_bus_funds=only_bus.funding.sum()
cost_per_bus = (only_bus.funding.sum())/(bus_count_sum)

In [None]:
summary = f'''
Top Level observation:
- {project_count} projects awarded
- ${fund_sum:,.2f} dollars awarded
- {bus_count_sum} buses to be purchased
- ${overall_cost_per_bus:,.2f} overall cost per bus

Projects have some mix of buses, facilities and equipment. Making it difficult to disaggregate actual bus cost.

Of the {project_count} projects awarded, {projects_with_bus} projects inlcuded buses. The remainder were facilities, chargers and equipment

Projects with buses purchases:
- {projects_with_bus} projects
- ${projects_with_bus_funds:,.2f} awarded to purchases buses
- ${cost_per_bus:,.2f} cost per bus
'''

In [None]:
print(summary)

In [None]:
# Assuming your DataFrame is named df
cost_per_bus_values = cost_per_bus['cost_per_bus']

# Calculate mean and standard deviation
mean_value = cost_per_bus_values.mean()
std_deviation = cost_per_bus_values.std()

# Plot histogram
plt.hist(cost_per_bus_values, bins=30, color='skyblue', edgecolor='black', alpha=0.7)

# Add vertical lines for mean and standard deviation
plt.axvline(mean_value, color='red', linestyle='dashed', linewidth=2, label='Mean')
plt.axvline(mean_value + std_deviation, color='green', linestyle='dashed', linewidth=2, label='Mean + 1 Std Dev')
plt.axvline(mean_value - std_deviation, color='green', linestyle='dashed', linewidth=2, label='Mean - 1 Std Dev')

# Set labels and title
plt.xlabel('cost_per_bus')
plt.ylabel('Frequency')
plt.title('Histogram of cost_per_bus with Mean and Std Dev Lines')
plt.legend()

# Show the plot
plt.show()

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker

# Assuming your DataFrame is named df
cost_per_bus_values = cost_per_bus['cost_per_bus']

# Calculate mean and standard deviation
mean_value = cost_per_bus_values.mean()
std_deviation = cost_per_bus_values.std()

# Plot histogram
plt.hist(cost_per_bus_values, bins=20, color='skyblue', edgecolor='black', alpha=0.7)

# Add vertical lines for mean and standard deviation
plt.axvline(mean_value, color='red', linestyle='dashed', linewidth=2, label='Mean')
plt.axvline(mean_value + std_deviation, color='green', linestyle='dashed', linewidth=2, label='Mean + 1 Std Dev')
plt.axvline(mean_value - std_deviation, color='green', linestyle='dashed', linewidth=2, label='Mean - 1 Std Dev')

# Set labels and title
plt.xlabel('Cost per Bus (USD)')
plt.ylabel('Frequency')
plt.title('Histogram of Cost per Bus with Mean and Std Dev Lines')
plt.legend()

# Format x-axis ticks as USD
plt.gca().xaxis.set_major_formatter(mticker.StrMethodFormatter('${x:,.0f}'))

# Show the plot
plt.show()
