## FY23 FTA Bus and Low- and No-Emission Grant Awards Analysis

<b>GH issue:</b> 
* Research Request - Bus Procurement Costs & Awards #897

<b>Data source(s):</b> 
1. https://www.transit.dot.gov/funding/grants/fy23-fta-bus-and-low-and-no-emission-grant-awards
2. https://storymaps.arcgis.com/stories/022abf31cedd438b808ec2b827b6faff

<b>Definitions:</b>  
* <u>Grants for Buses and Bus Facilities Program:</u>
    * 49 U.S.C. 5339(b)) makes federal resources available to states and direct recipients to replace, rehabilitate and purchase buses and related equipment and to construct bus-related facilities, including technological changes or innovations to modify low or no emission vehicles or facilities. Funding is provided through formula allocations and competitive grants. 
<br><br>
* <u>Low or No Emission Vehicle Program:</u>
    * 5339(c) provides funding to state and local governmental authorities for the purchase or lease of zero-emission and low-emission transit buses as well as acquisition, construction, and leasing of required supporting facilities.


In [1]:
import pandas as pd

#set_option to increase max rows displayed to 200, to see entire df in 1 go/
pd.set_option("display.max_rows", 200)

## Reading in raw data from gcs

In [2]:
df = pd.read_excel(
    "gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_press_release_data2.xlsx"
)

In [None]:
# data is able to be read in
display(df.shape, type(df), df.head(5))

## Data Cleaning
1. snake-case column name
2. currency format funcding column (with $ and , )
3. seperate text from # of bus col (split at '(')
    a. trim spaces in new col
    b. get rid of () characters in new col
4. trim spaces in other columns?

In [None]:
df.columns

In [3]:
# snake case columns names via list
new_col = [
    "state",
    "project_sponsor",
    "project_title",
    "description",
    "funding",
    "#_of_buses",
    "project_type",
    "propulsion_type",
    "area_served",
    "congressional_districts",
    "fta_region",
    "bus/low-no_program",
]

df.columns = new_col
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program'],
      dtype='object')

In [4]:
# checking data type of funding col
# checking to see if any values are not numbers
display(df["funding"].dtype, df.funding.value_counts())

dtype('int64')

5000000      3
6000000      2
3400000      2
104000000    1
4313552      1
3133129      1
3187200      1
3199038      1
3248500      1
3303600      1
3326067      1
3609800      1
3645000      1
3937500      1
4094652      1
4278772      1
4500000      1
4492904      1
2860250      1
4690010      1
4738886      1
5001700      1
5750351      1
5883200      1
5945553      1
6197180      1
6341306      1
6407460      1
6424808      1
6455325      1
2932500      1
2819460      1
103000000    1
1080000      1
233760       1
280800       1
300000       1
320000       1
514002       1
653184       1
723171       1
753118       1
776714       1
945178       1
1006750      1
1010372      1
1055365      1
1145951      1
2359072      1
1162000      1
1200000      1
1276628      1
1280000      1
1456970      1
1506618      1
1672000      1
1760000      1
2063160      1
2160000      1
2162886      1
2207758      1
2212747      1
6586104      1
6635394      1
6859296      1
28947368     1
19040336  

In [None]:
# DO NOT USE THOUSAND SEPERATOR
# test of adding thousand comma seperators to funding column
# df["funding"] = df["funding"].apply("{:,}".format)

In [None]:
# thousand comma showing as intended.
# df.funding.value_counts().head()

In [5]:
# test of removing the spaces first in # of bus colum, THEN split by (
df["#_of_buses"] = df["#_of_buses"].str.replace(" ", "")

In [6]:
df[["bus_count", "bus_desc"]] = df["#_of_buses"].str.split(pat="(", n=1, expand=True)

In [7]:
# retained the initial col. and added new columns to the end.
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program',
       'bus_count', 'bus_desc'],
      dtype='object')

In [8]:
# examining the new bus count col.
# see there are 2 values that are inconsistent.
df.bus_count.value_counts()

4                          9
7                          6
20                         6
6                          6
5                          4
3                          3
2                          3
16                         3
9                          3
11                         3
25                         3
10                         3
1                          2
15                         2
56estimated-cutawayvans    1
14                         1
8                          1
50                         1
100                        1
37                         1
12batteryelectric          1
90                         1
39                         1
17                         1
13                         1
23                         1
30                         1
35                         1
40                         1
12                         1
Name: bus_count, dtype: int64

In [9]:
# function to find the row index of a specific value and column in a dataframe
def find_loc(data, col, val):
    x = data.loc[data[col] == val].index[0]
    return x

In [10]:
loc1 = find_loc(df, "bus_count", "56estimated-cutawayvans")
loc2 = find_loc(df, "bus_count", "12batteryelectric")

In [11]:
display(loc1, loc2)

58

32

In [12]:
# editing the values of the bus count col at specific location
#syntax, look at ## index, look at XX column
df.loc[58, "bus_count"] = 56
df.loc[32, "bus_count"] = 12

In [13]:
# updating values again for bus_desc. same location
df.loc[58, "bus_desc"] = "estimated-cutaway vans (PM- award will not fund 68 buses)"
df.loc[32, "bus_desc"] = "battery electric"

In [14]:
# values updated as inteneded for bus count and bus desc
display(df.loc[32], df.loc[58])

state                                                                     MN
project_sponsor                                                Metro Transit
project_title              Investments Toward an Electric Future: Metro T...
description                Metro Transit will receive funding to buy batt...
funding                                                             17532900
#_of_buses                                                 12batteryelectric
project_type                                      Bus / Chargers / Equipment
propulsion_type                                                         zero
area_served                                                      Large Urban
congressional_districts           MN-002 ; MN-003 ; MN-004 ; MN-005 ; MN-006
fta_region                                                                 5
bus/low-no_program                                                    Low-No
bus_count                                                                 12

state                                                                     TX
project_sponsor            Texas Department of Transportation on behalf o...
project_title              FY23 Rural Transit Asset Replacement & Moderni...
description                The Texas Department of Transportation will re...
funding                                                              7443765
#_of_buses                 56estimated-cutawayvans(PM-awardwillnotfund68b...
project_type                                                 bus / facilitiy
propulsion_type                                                          low
area_served                                                            Rural
congressional_districts    TX-001 ; TX-002 ; TX-004 ; TX-005 ; TX-006 ; T...
fta_region                                                                 6
bus/low-no_program                                                    Low-No
bus_count                                                                 56

In [15]:
# confirming via value counts that all values are valid now.
df.bus_count.value_counts()

4      9
7      6
20     6
6      6
5      4
3      3
2      3
16     3
9      3
11     3
25     3
10     3
1      2
15     2
56     1
14     1
8      1
50     1
100    1
37     1
12     1
90     1
39     1
17     1
13     1
23     1
30     1
35     1
40     1
12     1
Name: bus_count, dtype: int64

In [16]:
df.bus_desc.value_counts()

BEBs)                                                        13
CNG)                                                          6
hybrid)                                                       5
cng)                                                          4
Electric)                                                     4
hybridelectric)                                               4
dieselelectrichybrids)                                        2
propanebuses)                                                 2
electric)                                                     2
propane)                                                      2
BEB)                                                          2
CNGfueled)                                                    1
zeroemissionelectric)                                         1
hybridelectrics)                                              1
beb)                                                          1
dieselandgas)                           

In [17]:
# clearning the bus desc col.
# removing the )
# creating a dictionary to add spaces back to the values

df["bus_desc"] = df["bus_desc"].str.replace(")", "")

  df["bus_desc"] = df["bus_desc"].str.replace(")", "")


In [18]:
# stripping the values in the bus desc col
df["bus_desc"] = df["bus_desc"].str.strip()

In [19]:
df.bus_desc.unique()

array(['beb', 'estimated-CNGbuses', nan, 'zero-emission', 'cngbuses',
       'BEBs', 'Electric\n16(Hybrid', 'FCEB', 'Electric',
       'FuelCellElectric', 'CNG', 'FuelCell', 'hybrid', 'BEB',
       'battery electric', 'lowemissionCNG', 'cng',
       'BEBsparatransitbuses', 'hybridelectric', 'zeroemissionbuses',
       'dieselelectrichybrids', 'hydrogenfuelcell',
       '2BEBsand4HydrogenFuelCellBuses', '4fuelcell/3CNG',
       'estimated-cutaway vans (PM- award will not fund 68 buses',
       'hybridelectricbuses', 'CNGfueled', 'zeroemissionelectric',
       'hybridelectrics', 'dieselandgas', 'diesel-electrichybrids',
       'propane', 'electric', 'diesel-electric', 'propanebuses',
       '1:CNGbus;2cutawayCNGbuses', 'zeroemission',
       'propanedpoweredvehicles'], dtype=object)

In [20]:
new_dict = {
    "beb": "BEB",
    "estimated-CNGbuses": "estimated-CNG buses",
    "cngbuses": "CNG buses",
    "BEBs": "BEB",
    "Electric\n16(Hybrid": "15 electic, 16 hybrid",
    "FuelCellElectric": "fuel cell electric",
    "FuelCell": "fuel cell",
    "lowemissionCNG": "low emission CNG",
    "cng": "CNG",
    "BEBsparatransitbuses": "BEBs paratransit buses",
    "hybridelectric": "hybrid electric",
    "zeroemissionbuses": "zero emission buses",
    "dieselelectrichybrids": "diesel electric hybrids",
    "hydrogenfuelcell": "hydrogen fuel cell",
    "2BEBsand4HydrogenFuelCellBuses": "2 BEBs and 4 hydrogen fuel cell buses",
    "4fuelcell/3CNG": "4 fuel cell / 3 CNG",
    "hybridelectricbuses": "hybrid electric buses",
    "CNGfueled": "CNG fueled",
    "zeroemissionelectric": "zero emission electric",
    "hybridelectrics": "hybrid electrics",
    "dieselandgas": "diesel and gas",
    "diesel-electrichybrids": "diesel-electric hybrids",
    "propanebuses": "propane buses",
    "1:CNGbus;2cutawayCNGbuses": "1:CNGbus ;2 cutaway CNG buses",
    "zeroemission": "zero emission",
    "propanedpoweredvehicles": "propaned powered vehicles"
}

In [21]:
#using new dictionary to replace values in the bus desc col
df.replace({'bus_desc': new_dict}, inplace=True)

In [22]:
#confirming the bus desc values were replaced as indeded.
list(df.bus_desc.unique())

['BEB',
 'estimated-CNG buses',
 nan,
 'zero-emission',
 'CNG buses',
 '15 electic, 16 hybrid',
 'FCEB',
 'Electric',
 'fuel cell electric',
 'CNG',
 'fuel cell',
 'hybrid',
 'battery electric',
 'low emission CNG',
 'BEBs paratransit buses',
 'hybrid electric',
 'zero emission buses',
 'diesel electric hybrids',
 'hydrogen fuel cell',
 '2 BEBs and 4 hydrogen fuel cell buses',
 '4 fuel cell / 3 CNG',
 'estimated-cutaway vans (PM- award will not fund 68 buses',
 'hybrid electric buses',
 'CNG fueled',
 'zero emission electric',
 'hybrid electrics',
 'diesel and gas',
 'diesel-electric hybrids',
 'propane',
 'electric',
 'diesel-electric',
 'propane buses',
 '1:CNGbus ;2 cutaway CNG buses',
 'zero emission',
 'propaned powered vehicles']

In [23]:
#bus count for row 12 needs to be adjusted
df.loc[12]

state                                                                     NC
project_sponsor            City of Charlotte - Charlotte Area Transit System
project_title              Charlotte Area Transit System's Sustainable Fl...
description                The city of Charlotte will receive funding to ...
funding                                                             30890413
#_of_buses                                          15(Electric)\n16(Hybrid)
project_type                                      Bus / Chargers / Equipment
propulsion_type                                                   Zero / Low
area_served                                                      Large Urban
congressional_districts           NC-008 ; NC-012 ; NC-013 ; NC-014 ; SC-005
fta_region                                                                 4
bus/low-no_program                                                       Bus
bus_count                                                                 15

In [24]:
df.loc[12, "bus_count"] = 31

In [25]:
#confirming the change
df.loc[12]

state                                                                     NC
project_sponsor            City of Charlotte - Charlotte Area Transit System
project_title              Charlotte Area Transit System's Sustainable Fl...
description                The city of Charlotte will receive funding to ...
funding                                                             30890413
#_of_buses                                          15(Electric)\n16(Hybrid)
project_type                                      Bus / Chargers / Equipment
propulsion_type                                                   Zero / Low
area_served                                                      Large Urban
congressional_districts           NC-008 ; NC-012 ; NC-013 ; NC-014 ; SC-005
fta_region                                                                 4
bus/low-no_program                                                       Bus
bus_count                                                                 31

In [26]:
#dropping initial # of buses col
df = df.drop('#_of_buses', axis=1)

In [27]:
#confirming column was dropped.
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program',
       'bus_count', 'bus_desc'],
      dtype='object')

## Exporting cleaned data to GCS

In [28]:
#saving to GCS as csv
df.to_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_bus_cost_clean.csv')

## Reading in cleaned data from GCS

In [29]:
df = pd.read_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_bus_cost_clean.csv')

In [30]:
#confirming cleaned data shows as expected.
display(df.shape,
        type(df),
    df.head(),
    df.columns
       )

(130, 14)

pandas.core.frame.DataFrame

Unnamed: 0.1,Unnamed: 0,state,project_sponsor,project_title,description,funding,project_type,propulsion_type,area_served,congressional_districts,fta_region,bus/low-no_program,bus_count,bus_desc
0,0,DC,Washington Metropolitan Area Transit Authority...,Battery-Electric Metrobus Procurement and Elec...,WMATA will receive funding to convert its Cind...,104000000,bus/chargers,zero,Large Urban,DC-001 ; MD-004 ; MD-008 ; VA-008 ; VA-011,3,Low-No,100.0,BEB
1,1,TX,Dallas Area Rapid Transit (DART),DART CNG Bus Fleet Modernization Project,Dallas Area Rapid Transit will receive funding...,103000000,bus,low,Large Urban,TX-003 ; TX-004 ; TX-005 ; TX-006 ; TX-024 ; T...,6,Low-No,90.0,estimated-CNG buses
2,2,PA,Southeastern Pennsylvania Transportation Autho...,SEPTA Zero-Emission Bus Transition Facility Sa...,The Southeastern Pennsylvania Transportation A...,80000000,facility,zero,Large Urban,PA-002 ; PA-003 ; PA-004 ; PA-005,3,Low-No,,
3,3,LA,New Orleans Regional Transit Authority,Accelerating Zero-Emissions Mobility for a Res...,The New Orleans Regional Transit Authority wil...,71439261,Bus / Chargers / Equipment,zero,Large Urban,LA-002 ; LA-001,6,Low-No,20.0,zero-emission
4,4,NJ,New Jersey Transit Corporation,Hilton Bus Garage Modernization,New Jersey Transit will receive funding to mod...,47000000,facility/chargers,zero,Large Urban,nj-011,2,Bus,,


Index(['Unnamed: 0', 'state', 'project_sponsor', 'project_title',
       'description', 'funding', 'project_type', 'propulsion_type',
       'area_served', 'congressional_districts', 'fta_region',
       'bus/low-no_program', 'bus_count', 'bus_desc'],
      dtype='object')

In [31]:
#drop unnessary columns
bus_cost = df.drop(['Unnamed: 0', 'congressional_districts'], axis=1)

In [32]:
#confirming columns dropped as intended.
#less columns(14 to 12)
display(bus_cost.shape,
        bus_cost.columns)

(130, 12)

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       'project_type', 'propulsion_type', 'area_served', 'fta_region',
       'bus/low-no_program', 'bus_count', 'bus_desc'],
      dtype='object')

## Start to analyze `bus_cost` df

In [33]:
#what values are in project_type col?
bus_cost.project_type.value_counts()

#big mix of str. especially variations of bus/...

bus                                      26
facility                                 15
Bus                                      14
Bus / Chargers                           11
Bus / Facility / Chargers                 9
Facility                                  8
Bus / Facility                            6
Bus / Chargers / Equipment                5
Bus / Equipment                           4
bus / chargers                            4
Bus / Facility / Equipment                2
bus/chargers                              2
facilities                                2
bus / facility                            2
facility/chargers                         2
bus/facility                              2
facility/equipment                        1
chargers                                  1
Bus / Other                               1
Chargers / Equipment                      1
bus / facilitiy                           1
bus /equipment                            1
bus/equipment                   

In [36]:
bus_cost.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       'project_type', 'propulsion_type', 'area_served', 'fta_region',
       'bus/low-no_program', 'bus_count', 'bus_desc'],
      dtype='object')

In [34]:
# want to see how total funding for each of these values 
bus_cost.groupby('project_type')['funding'].sum().sort_values(ascending=False)

project_type
bus                                      246625604
Bus / Facility / Chargers                193377901
facility                                 180356442
Bus / Chargers / Equipment               162489938
Bus / Chargers                           157391009
Bus                                      119339829
bus/chargers                             106207758
Facility                                  78301690
Bus / Facility                            77261121
facility/chargers                         48672000
bus / facility                            41055732
Facility / Chargers                       31535000
Chargers                                  30128378
bus / chargers                            30084852
Bus / Equipment                           29008217
Bus / Facility / Equipment                24261170
Bus / Chargers / Other                    23984700
facilities                                16388000
Bus / Facility / Chargers / Equipment     16166822
bus/facility      

1689864104