## FY23 FTA Bus and Low- and No-Emission Grant Awards Analysis

<b>GH issue:</b> 
* Research Request - Bus Procurement Costs & Awards #897

<b>Data source(s):</b> 
1. https://www.transit.dot.gov/funding/grants/fy23-fta-bus-and-low-and-no-emission-grant-awards
2. https://storymaps.arcgis.com/stories/022abf31cedd438b808ec2b827b6faff

<b>Definitions:</b>  
* <u>Grants for Buses and Bus Facilities Program:</u>
    * 49 U.S.C. 5339(b)) makes federal resources available to states and direct recipients to replace, rehabilitate and purchase buses and related equipment and to construct bus-related facilities, including technological changes or innovations to modify low or no emission vehicles or facilities. Funding is provided through formula allocations and competitive grants. 
<br><br>
* <u>Low or No Emission Vehicle Program:</u>
    * 5339(c) provides funding to state and local governmental authorities for the purchase or lease of zero-emission and low-emission transit buses as well as acquisition, construction, and leasing of required supporting facilities.


In [1]:
import pandas as pd
pd.set_option('display.max_rows', 200)

In [2]:
df = pd.read_excel('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_press_release_data2.xlsx')

In [3]:
# data is able to be read in
display(df.shape,
      type(df),
      df.head(5)
       )

(130, 12)

pandas.core.frame.DataFrame

Unnamed: 0,State,Project Sponsor,Project Title,Description,Funding,approx # of buses,project type,propulsion type,area served,congressional districts,FTA Region,Bus/Low-No program
0,DC,Washington Metropolitan Area Transit Authority...,Battery-Electric Metrobus Procurement and Elec...,WMATA will receive funding to convert its Cind...,104000000,100(beb),bus/chargers,zero,Large Urban,DC-001 ; MD-004 ; MD-008 ; VA-008 ; VA-011,3,Low-No
1,TX,Dallas Area Rapid Transit (DART),DART CNG Bus Fleet Modernization Project,Dallas Area Rapid Transit will receive funding...,103000000,90 (estimated-CNG buses),bus,low,Large Urban,TX-003 ; TX-004 ; TX-005 ; TX-006 ; TX-024 ; T...,6,Low-No
2,PA,Southeastern Pennsylvania Transportation Autho...,SEPTA Zero-Emission Bus Transition Facility Sa...,The Southeastern Pennsylvania Transportation A...,80000000,0,facility,zero,Large Urban,PA-002 ; PA-003 ; PA-004 ; PA-005,3,Low-No
3,LA,New Orleans Regional Transit Authority,Accelerating Zero-Emissions Mobility for a Res...,The New Orleans Regional Transit Authority wil...,71439261,20 (zero-emission),Bus / Chargers / Equipment,zero,Large Urban,LA-002 ; LA-001,6,Low-No
4,NJ,New Jersey Transit Corporation,Hilton Bus Garage Modernization,New Jersey Transit will receive funding to mod...,47000000,0,facility/chargers,zero,Large Urban,nj-011,2,Bus


## Data Cleaning
1. snake-case column name
2. currency format funcding column (with $ and , )
3. seperate text from # of bus col (split at '(')
    a. trim spaces in new col
    b. get rid of () characters in new col
4. trim spaces in other columns?

In [4]:
df.columns

Index(['State', 'Project Sponsor', 'Project Title', 'Description', 'Funding',
       'approx # of buses', 'project type', 'propulsion type', 'area served',
       'congressional districts', 'FTA Region', 'Bus/Low-No program'],
      dtype='object')

In [5]:
#snake case columns names via list
new_col=['state',
         'project_sponsor',
         'project_title',
         'description',
         'funding',
         '#_of_buses',
         'project_type',
         'propulsion_type',
         'area_served',
         'congressional_districts',
         'fta_region',
         'bus/low-no_program'
]

df.columns = new_col
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program'],
      dtype='object')

In [6]:
#checking data type of funding col
#checking to see if any values are not numbers
display(df['funding'].dtype,
        df.funding.value_counts()
       )

dtype('int64')

5000000      3
6000000      2
3400000      2
104000000    1
4313552      1
3133129      1
3187200      1
3199038      1
3248500      1
3303600      1
3326067      1
3609800      1
3645000      1
3937500      1
4094652      1
4278772      1
4500000      1
4492904      1
2860250      1
4690010      1
4738886      1
5001700      1
5750351      1
5883200      1
5945553      1
6197180      1
6341306      1
6407460      1
6424808      1
6455325      1
2932500      1
2819460      1
103000000    1
1080000      1
233760       1
280800       1
300000       1
320000       1
514002       1
653184       1
723171       1
753118       1
776714       1
945178       1
1006750      1
1010372      1
1055365      1
1145951      1
2359072      1
1162000      1
1200000      1
1276628      1
1280000      1
1456970      1
1506618      1
1672000      1
1760000      1
2063160      1
2160000      1
2162886      1
2207758      1
2212747      1
6586104      1
6635394      1
6859296      1
28947368     1
19040336  

In [7]:
#test of adding thousand comma seperators to funding column
df['funding'] = df['funding'].apply('{:,}'.format)

In [8]:
#thousand comma showing as intended.
df.funding.value_counts().head()

5,000,000      3
6,000,000      2
3,400,000      2
104,000,000    1
4,313,552      1
Name: funding, dtype: int64

In [9]:
#test of removing the spaces first in # of bus colum, THEN split by (
df['#_of_buses'] = df['#_of_buses'].str.replace(' ','')

In [10]:
df[['bus_count','bus_desc']] = df['#_of_buses'].str.split(pat='(',n=1, expand=True)

In [11]:
#retained the initial col. and added new columns to the end.
df.head()

Unnamed: 0,state,project_sponsor,project_title,description,funding,#_of_buses,project_type,propulsion_type,area_served,congressional_districts,fta_region,bus/low-no_program,bus_count,bus_desc
0,DC,Washington Metropolitan Area Transit Authority...,Battery-Electric Metrobus Procurement and Elec...,WMATA will receive funding to convert its Cind...,104000000,100(beb),bus/chargers,zero,Large Urban,DC-001 ; MD-004 ; MD-008 ; VA-008 ; VA-011,3,Low-No,100.0,beb)
1,TX,Dallas Area Rapid Transit (DART),DART CNG Bus Fleet Modernization Project,Dallas Area Rapid Transit will receive funding...,103000000,90(estimated-CNGbuses),bus,low,Large Urban,TX-003 ; TX-004 ; TX-005 ; TX-006 ; TX-024 ; T...,6,Low-No,90.0,estimated-CNGbuses)
2,PA,Southeastern Pennsylvania Transportation Autho...,SEPTA Zero-Emission Bus Transition Facility Sa...,The Southeastern Pennsylvania Transportation A...,80000000,,facility,zero,Large Urban,PA-002 ; PA-003 ; PA-004 ; PA-005,3,Low-No,,
3,LA,New Orleans Regional Transit Authority,Accelerating Zero-Emissions Mobility for a Res...,The New Orleans Regional Transit Authority wil...,71439261,20(zero-emission),Bus / Chargers / Equipment,zero,Large Urban,LA-002 ; LA-001,6,Low-No,20.0,zero-emission)
4,NJ,New Jersey Transit Corporation,Hilton Bus Garage Modernization,New Jersey Transit will receive funding to mod...,47000000,,facility/chargers,zero,Large Urban,nj-011,2,Bus,,


In [12]:
#examining the new bus count col. 
#see there are 2 values that are inconsistent.
df.bus_count.value_counts()

4                          9
7                          6
20                         6
6                          6
5                          4
3                          3
2                          3
16                         3
9                          3
11                         3
25                         3
10                         3
1                          2
15                         2
56estimated-cutawayvans    1
14                         1
8                          1
50                         1
100                        1
37                         1
12batteryelectric          1
90                         1
39                         1
17                         1
13                         1
23                         1
30                         1
35                         1
40                         1
12                         1
Name: bus_count, dtype: int64

In [13]:
#function to find the row index of a specific value and column in a dataframe
def find_loc(data,col,val):
    x = data.loc[data[col] == val].index[0]
    return x

In [14]:
loc1 = find_loc(df,'bus_count','56estimated-cutawayvans')
loc2 = find_loc(df,'bus_count','12batteryelectric')

In [15]:
display(loc1,
        loc2
)

58

32

In [None]:
#editing the values of the bus count col at specific location
