## FY23 FTA Bus and Low- and No-Emission Grant Awards Analysis

<b>GH issue:</b> 
* Research Request - Bus Procurement Costs & Awards #897

<b>Data source(s):</b> 
1. https://www.transit.dot.gov/funding/grants/fy23-fta-bus-and-low-and-no-emission-grant-awards
2. https://storymaps.arcgis.com/stories/022abf31cedd438b808ec2b827b6faff

<b>Definitions:</b>  
* <u>Grants for Buses and Bus Facilities Program:</u>
    * 49 U.S.C. 5339(b)) makes federal resources available to states and direct recipients to replace, rehabilitate and purchase buses and related equipment and to construct bus-related facilities, including technological changes or innovations to modify low or no emission vehicles or facilities. Funding is provided through formula allocations and competitive grants. 
<br><br>
* <u>Low or No Emission Vehicle Program:</u>
    * 5339(c) provides funding to state and local governmental authorities for the purchase or lease of zero-emission and low-emission transit buses as well as acquisition, construction, and leasing of required supporting facilities.


In [2]:
import pandas as pd

In [3]:
df = pd.read_excel('gs://calitp-analytics-data/data-analyses/bus_procurement_cost/fta_press_release_data2.xlsx')

In [4]:
# data is able to be read in
display(df.shape,
      type(df),
      df.head(5)
       )

(130, 12)

pandas.core.frame.DataFrame

Unnamed: 0,State,Project Sponsor,Project Title,Description,Funding,approx # of buses,project type,propulsion type,area served,congressional districts,FTA Region,Bus/Low-No program
0,DC,Washington Metropolitan Area Transit Authority...,Battery-Electric Metrobus Procurement and Elec...,WMATA will receive funding to convert its Cind...,104000000,100(beb),bus/chargers,zero,Large Urban,DC-001 ; MD-004 ; MD-008 ; VA-008 ; VA-011,3,Low-No
1,TX,Dallas Area Rapid Transit (DART),DART CNG Bus Fleet Modernization Project,Dallas Area Rapid Transit will receive funding...,103000000,90 (estimated-CNG buses),bus,low,Large Urban,TX-003 ; TX-004 ; TX-005 ; TX-006 ; TX-024 ; T...,6,Low-No
2,PA,Southeastern Pennsylvania Transportation Autho...,SEPTA Zero-Emission Bus Transition Facility Sa...,The Southeastern Pennsylvania Transportation A...,80000000,0,facility,zero,Large Urban,PA-002 ; PA-003 ; PA-004 ; PA-005,3,Low-No
3,LA,New Orleans Regional Transit Authority,Accelerating Zero-Emissions Mobility for a Res...,The New Orleans Regional Transit Authority wil...,71439261,20 (zero-emission),Bus / Chargers / Equipment,zero,Large Urban,LA-002 ; LA-001,6,Low-No
4,NJ,New Jersey Transit Corporation,Hilton Bus Garage Modernization,New Jersey Transit will receive funding to mod...,47000000,0,facility/chargers,zero,Large Urban,nj-011,2,Bus


## Data Cleaning
1. snake-case column name
2. currency format funcding column (with $ and , )
3. seperate text from # of bus col (split at '(')
    a. trim spaces in new col
    b. get rid of () characters in new col
4. trim spaces in other columns?

In [6]:
df.columns

Index(['State', 'Project Sponsor', 'Project Title', 'Description', 'Funding',
       'approx # of buses', 'project type', 'propulsion type', 'area served',
       'congressional districts', 'FTA Region', 'Bus/Low-No program'],
      dtype='object')

In [7]:
#snake case columns names via list
new_col=['state',
         'project_sponsor',
         'project_title',
         'description',
         'funding',
         '#_of_buses',
         'project_type',
         'propulsion_type',
         'area_served',
         'congressional_districts',
         'fta_region',
         'bus/low-no_program'
]

df.columns = new_col
df.columns

Index(['state', 'project_sponsor', 'project_title', 'description', 'funding',
       '#_of_buses', 'project_type', 'propulsion_type', 'area_served',
       'congressional_districts', 'fta_region', 'bus/low-no_program'],
      dtype='object')

In [8]:
#checking data type of funding col
#checking to see if any values are not numbers
display(df['funding'].dtype,
        df.funding.value_counts()
       )

dtype('int64')

5000000      3
6000000      2
3400000      2
104000000    1
4313552      1
            ..
13880910     1
15423904     1
16166822     1
16358000     1
181250       1
Name: funding, Length: 126, dtype: int64

In [9]:
#test of adding thousand comma seperators to funding column
df['funding'] = df['funding'].apply('{:,}'.format)

In [10]:
#thousand comma showing as intended.
df.funding.value_counts()

5,000,000      3
6,000,000      2
3,400,000      2
104,000,000    1
4,313,552      1
              ..
13,880,910     1
15,423,904     1
16,166,822     1
16,358,000     1
181,250        1
Name: funding, Length: 126, dtype: int64

In [15]:
#test of using `str.split('', expand=True) to split # of bus col by '('
test = df['#_of_buses'].str.split(pat='(',n=1, expand=True)

In [16]:
test

Unnamed: 0,0,1
0,100,beb)
1,90,estimated-CNG buses)
2,,
3,20,zero-emission)
4,,
...,...,...
125,,
126,,
127,,
128,,
