## FY23 FTA Bus and Low- and No-Emission Grant Awards Analysis

<b>GH issue:</b> 
* Research Request - Bus Procurement Costs & Awards #897

<b>Data source(s):</b> 
1. https://www.transit.dot.gov/funding/grants/fy23-fta-bus-and-low-and-no-emission-grant-awards
2. https://storymaps.arcgis.com/stories/022abf31cedd438b808ec2b827b6faff

<b>Definitions:</b>  
* <u>Grants for Buses and Bus Facilities Program:</u>
    * 49 U.S.C. 5339(b)) makes federal resources available to states and direct recipients to replace, rehabilitate and purchase buses and related equipment and to construct bus-related facilities, including technological changes or innovations to modify low or no emission vehicles or facilities. Funding is provided through formula allocations and competitive grants. 
<br><br>
* <u>Low or No Emission Vehicle Program:</u>
    * 5339(c) provides funding to state and local governmental authorities for the purchase or lease of zero-emission and low-emission transit buses as well as acquisition, construction, and leasing of required supporting facilities.


In [1]:
import pandas as pd

#set_option to increase max rows displayed to 200, to see entire df in 1 go/
pd.set_option("display.max_rows", 200)

## Reading in raw data from gcs

In [None]:
df = pd.read_excel(
    "gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_press_release_data2.xlsx"
)

In [None]:
# data is able to be read in
display(df.shape, type(df), df.head(5))

## Data Cleaning
1. snake-case column name
2. currency format funcding column (with $ and , )
3. seperate text from # of bus col (split at '(')
    a. trim spaces in new col
    b. get rid of () characters in new col
4. trim spaces in other columns?

In [None]:
df.columns

In [None]:
# snake case columns names via list
new_col = [
    "state",
    "project_sponsor",
    "project_title",
    "description",
    "funding",
    "#_of_buses",
    "project_type",
    "propulsion_type",
    "area_served",
    "congressional_districts",
    "fta_region",
    "bus/low-no_program",
]

df.columns = new_col
df.columns

In [None]:
# checking data type of funding col
# checking to see if any values are not numbers
display(df["funding"].dtype, df.funding.value_counts())

In [None]:
# test of adding thousand comma seperators to funding column
df["funding"] = df["funding"].apply("{:,}".format)

In [None]:
# thousand comma showing as intended.
df.funding.value_counts().head()

In [None]:
# test of removing the spaces first in # of bus colum, THEN split by (
df["#_of_buses"] = df["#_of_buses"].str.replace(" ", "")

In [None]:
df[["bus_count", "bus_desc"]] = df["#_of_buses"].str.split(pat="(", n=1, expand=True)

In [None]:
# retained the initial col. and added new columns to the end.
df

In [None]:
# examining the new bus count col.
# see there are 2 values that are inconsistent.
df.bus_count.value_counts()

In [None]:
# function to find the row index of a specific value and column in a dataframe
def find_loc(data, col, val):
    x = data.loc[data[col] == val].index[0]
    return x

In [None]:
loc1 = find_loc(df, "bus_count", "56estimated-cutawayvans")
loc2 = find_loc(df, "bus_count", "12batteryelectric")

In [None]:
display(loc1, loc2)

In [None]:
# editing the values of the bus count col at specific location
df.loc[58, "bus_count"] = 56
df.loc[32, "bus_count"] = 12

In [None]:
# updating values again for bus_desc. same location
df.loc[58, "bus_desc"] = "estimated-cutaway vans (PM- award will not fund 68 buses)"
df.loc[32, "bus_desc"] = "battery electric"

In [None]:
# values updated as inteneded for bus count and bus desc
display(df.loc[32], df.loc[58])

In [None]:
# confirming via value counts that all values are valid now.
df.bus_count.value_counts()

In [None]:
df.bus_desc.value_counts()

In [None]:
# clearning the bus desc col.
# removing the )
# creating a dictionary to add spaces back to the values

df["bus_desc"] = df["bus_desc"].str.replace(")", "")

In [None]:
# stripping the values in the bus desc col
df["bus_desc"] = df["bus_desc"].str.strip()

In [None]:
df.bus_desc.unique()

In [None]:
new_dict = {
    "beb": "BEB",
    "estimated-CNGbuses": "estimated-CNG buses",
    "cngbuses": "CNG buses",
    "BEBs": "BEB",
    "Electric\n16(Hybrid": "15 electic, 16 hybrid",
    "FuelCellElectric": "fuel cell electric",
    "FuelCell": "fuel cell",
    "lowemissionCNG": "low emission CNG",
    "cng": "CNG",
    "BEBsparatransitbuses": "BEBs paratransit buses",
    "hybridelectric": "hybrid electric",
    "zeroemissionbuses": "zero emission buses",
    "dieselelectrichybrids": "diesel electric hybrids",
    "hydrogenfuelcell": "hydrogen fuel cell",
    "2BEBsand4HydrogenFuelCellBuses": "2 BEBs and 4 hydrogen fuel cell buses",
    "4fuelcell/3CNG": "4 fuel cell / 3 CNG",
    "hybridelectricbuses": "hybrid electric buses",
    "CNGfueled": "CNG fueled",
    "zeroemissionelectric": "zero emission electric",
    "hybridelectrics": "hybrid electrics",
    "dieselandgas": "diesel and gas",
    "diesel-electrichybrids": "diesel-electric hybrids",
    "propanebuses": "propane buses",
    "1:CNGbus;2cutawayCNGbuses": "1:CNGbus ;2 cutaway CNG buses",
    "zeroemission": "zero emission",
    "propanedpoweredvehicles": "propaned powered vehicles"
}

In [None]:
#using new dictionary to replace values in the bus desc col
df.replace({'bus_desc': new_dict}, inplace=True)

In [None]:
#confirming the bus desc values were replaced as indeded.
list(df.bus_desc.unique())

In [None]:
#bus count for row 12 needs to be adjusted
df.loc[12]

In [None]:
df.loc[12, "bus_count"] = 31

In [None]:
#confirming the change
f.loc[12]

In [None]:
#dropping initial # of buses col
df = df.drop('#_of_buses', axis=1)

In [None]:
#confirming column was dropped.
df.columns

## Exporting cleaned data to GCS

In [None]:
#saving to GCS as csv
df.to_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_bus_cost_clean.csv')

## Reading in cleaned data from GCS

In [2]:
df = pd.read_csv('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_bus_cost_clean.csv')

In [6]:
#confirming cleaned data shows as expected.
display(df.shape,
        type(df),
    df.head(),
    df.columns
       )

(130, 14)

pandas.core.frame.DataFrame

Unnamed: 0.1,Unnamed: 0,state,project_sponsor,project_title,description,funding,project_type,propulsion_type,area_served,congressional_districts,fta_region,bus/low-no_program,bus_count,bus_desc
0,0,DC,Washington Metropolitan Area Transit Authority...,Battery-Electric Metrobus Procurement and Elec...,WMATA will receive funding to convert its Cind...,104000000,bus/chargers,zero,Large Urban,DC-001 ; MD-004 ; MD-008 ; VA-008 ; VA-011,3,Low-No,100.0,BEB
1,1,TX,Dallas Area Rapid Transit (DART),DART CNG Bus Fleet Modernization Project,Dallas Area Rapid Transit will receive funding...,103000000,bus,low,Large Urban,TX-003 ; TX-004 ; TX-005 ; TX-006 ; TX-024 ; T...,6,Low-No,90.0,estimated-CNG buses
2,2,PA,Southeastern Pennsylvania Transportation Autho...,SEPTA Zero-Emission Bus Transition Facility Sa...,The Southeastern Pennsylvania Transportation A...,80000000,facility,zero,Large Urban,PA-002 ; PA-003 ; PA-004 ; PA-005,3,Low-No,,
3,3,LA,New Orleans Regional Transit Authority,Accelerating Zero-Emissions Mobility for a Res...,The New Orleans Regional Transit Authority wil...,71439261,Bus / Chargers / Equipment,zero,Large Urban,LA-002 ; LA-001,6,Low-No,20.0,zero-emission
4,4,NJ,New Jersey Transit Corporation,Hilton Bus Garage Modernization,New Jersey Transit will receive funding to mod...,47000000,facility/chargers,zero,Large Urban,nj-011,2,Bus,,


Index(['Unnamed: 0', 'state', 'project_sponsor', 'project_title',
       'description', 'funding', 'project_type', 'propulsion_type',
       'area_served', 'congressional_districts', 'fta_region',
       'bus/low-no_program', 'bus_count', 'bus_desc'],
      dtype='object')

In [7]:
#drop unnessary columns
bus_cost = df.drop(['Unnamed: 0', 'congressional_districts'], axis=1)

In [8]:
#confirming columns dropped as intended.
#less columns(14 to 12)
display(bus_cost.shape,
        bus_cost.columns)

(130, 12)

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       'project_type', 'propulsion_type', 'area_served', 'fta_region',
       'bus/low-no_program', 'bus_count', 'bus_desc'],
      dtype='object')

## Start to analyze `bus_cost` df