# Run Functions to Add Information to Projects

To run the data through the script, all you need to do is update `my_file` path to the most recent export from FMIS and QMRS uploaded to GCS, then run the function in the section `Export Data` with your dataframe and the current date. Then your aggregated data will be ready in GCS. 

In [1]:
import pandas as pd
from siuba import *

import _script_utils

from calitp_data_analysis.sql import to_snakecase


In [2]:
pd.set_option("display.max_columns", 100)
pd.set_option('display.max_colwidth', None)

## Read in Data and function development / Test Function

For the following function:
* update the file path for `my_file` to the most recent file name of the FMIS & QMRS export
* the second kwargs is the unique recipient identifier, in this case it should stay the same with subsequent exports
* the third kwargs is the aggregation level you want for the data. Unless otherwise specified, it should be `agg` which is one row per project

In [3]:
GCS_FILE_PATH  = 'gs://calitp-analytics-data/data-analyses/dla/dla-iija'

In [4]:
my_file = "IIJA Project List 01_2025.xlsx"

### Check data

In [5]:
check_data = to_snakecase(pd.read_excel(f"{GCS_FILE_PATH}/{my_file}"))

In [6]:
check_data.head(1)

Unnamed: 0,fmis_transaction_date,program_code,program_code_description,pid_district,project_number,recipient_project_number,pid_check1,efis_id,pid_check2,project_title,rk_locode,county_code,congressional_district,project_status_description,project_description,improvement_type,improvement_type_description,total_cost_amount,obligations_amount,summary_recipient_defined_text_field_1_value,comp
0,2022-01-20,ER01,EMERGENCY REL 2022 SUPPLEMENT,5.0,31RA002,0518000118S,11,518000118,10,MONTEREY COUNTY NEAR BIG SUR 2.3 MILES NORTH OF CASTRO CANYON BRIDGE TO 0.8 MILE SOUTH OF BIG SUR RIVER BRIDGE. EMERGENCY PROJECT - PERMANENT RESTORA,,53,Cong Dist 20,Active,MONTEREY COUNTY NEAR BIG SUR 2.3 MILES NORTH OF CASTRO CANYON BRIDGE TO 0.8 MILE SOUTH OF BIG SUR RIVER BRIDGE. EMERGENCY PROJECT - PERMANENT RESTORATION. COMPLETE COASTAL DEVELOPMENT PERMIT REQUIREMENTS AT PFEIFFER CANYON BRIDGE.,16,Right of Way,600000.0,531100.0,S AMBAG,IIJA-A


In [7]:
check_data.project_number.nunique()

2489

### Run Script

In [8]:
df = _script_utils.run_script(my_file, 'summary_recipient_defined_text_field_1_value', 'agg')

  df['implementing_agency_locode'] = df['implementing_agency_locode'].str.replace('.0', '')


In [9]:
df2 = _script_utils.run_script2(my_file, 'summary_recipient_defined_text_field_1_value', 'agg')

'Rows with locodes filled'

both          3085
left_only        2
right_only       0
Name: _merge, dtype: int64

'Do the # of rows match?'

True

  df['implementing_agency_locode'] = df['implementing_agency_locode'].str.replace('.0', '')


### Testing the data

In [10]:
df.sample(3)

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
113,2022-08-16,37M0001,California,S ER NONE,ER01,Emergency Supplement Funding Program,0518000006S,6|15|16|17,4R - Restoration & Rehabilitation|Preliminary Engineering|Right of Way|Construction Engineering,SANTA BARBARA COUNTY FROM VENTURA COUNTY LINE TO GARDEN STREET UNDERCROSSING. EMERGENCY OPENING - REPAIR ROADWAY AND BRIDGES DUE TO MUDSLIDE.,7498011,|24|,|05|,83,Santa Barbara County,|SB|,,,,Road Restoration & Rehabilitation in Santa Barbara County,"Road Restoration & Rehabilitation in Santa Barbara County, part of the Emergency Supplement Funding Program. (Federal Project ID: 37M0001)."
2175,2024-11-01,5038028,Antioch,L5038MTC,YS70,Vulnerable Road User Safety Special Rule Program,0424000478L,15,Preliminary Engineering,"69 SIGNALIZED INTERSECTIONS ALONG MULTIPLE ROADWAY SEGMENTS IMPROVE SIGNAL HARDWARE: LENSES, BACK-PLATES WITH RETROREFLECTIVE BORDERS, MOUNTING, SIZE",369000,|10|,|04|,13,Contra Costa County,|CC|,38.0,Metropolitan Transportation Commission,Metropolitan Transportation Commission,Improve Signals in Antioch,"Improve Signals in Antioch, part of the Vulnerable Road User Safety Special Rule Program. (Federal Project ID: 5038028)."
172,2022-09-20,40A0074,Palm Springs,L5282SCAG,ER01,Emergency Supplement Funding Program,0820000054L,4,4R - No Added Capacity,"ARABY ROAD AT PALM CANYON WASH PLACEMENT AND LATER REMOVAL OF BARRICADES, REMOVAL OF SAND, AND SILT FROM ROAD, AND WATER TRUCK FOR DUST CONTROL FOR E",108227,|36|,|08|,65,Riverside County,|RIV|,5282.0,Riverside County Transportation Commission,Southern California Association Of Governments,Road Construction in Palm Springs,"Road Construction in Palm Springs, part of the Emergency Supplement Funding Program. (Federal Project ID: 40A0074)."


In [11]:
## when grouping by funding program (pne project can have multiple rows), len is 1612 for 2023 version of data
## asserting the length of the df is the same as number of projects
assert len(df) == check_data.project_number.nunique()

In [12]:
## check one project with multiple funding codes
df>>filter(_.project_number=='5004049')

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
1341,2024-04-15,5004049,San Diego,L5004SANDAG,Y001|Y110|Y908|Y909,National Highway Performance Program (NHPP)|Bridge Formula Program|Bridge Replacement and Rehabilitation Program,11955780L,10|17,Bridge Replacement - Added Capacity|Construction Engineering,"WEST MISSION BAY DRIVE OVER THE SAN DIEGO RIVER BRIDGE REPLACEMENT, BR. NO. 57C-0023",80036838,|52|,|11|,73,San Diego County,|SD|,4,San Diego Association of Governments,San Diego Association Of Governments,Replace Bridge in San Diego,"Replace Bridge in San Diego, part of the National Highway Performance Program (NHPP), and the Bridge Formula Program, and the Bridge Replacement and Rehabilitation Program. (Federal Project ID: 5004049)."


In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2489 entries, 0 to 2488
Data columns (total 21 columns):
 #   Column                                        Non-Null Count  Dtype         
---  ------                                        --------------  -----         
 0   fmis_transaction_date                         2489 non-null   datetime64[ns]
 1   project_number                                2489 non-null   object        
 2   implementing_agency                           2489 non-null   object        
 3   summary_recipient_defined_text_field_1_value  2489 non-null   object        
 4   program_code                                  2489 non-null   object        
 5   program_code_description                      2489 non-null   object        
 6   recipient_project_number                      2489 non-null   object        
 7   improvement_type                              2489 non-null   object        
 8   improvement_type_description                  2489 non-null   object

In [16]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2489 entries, 0 to 2488
Data columns (total 21 columns):
 #   Column                                        Non-Null Count  Dtype         
---  ------                                        --------------  -----         
 0   fmis_transaction_date                         2489 non-null   datetime64[ns]
 1   project_number                                2489 non-null   object        
 2   implementing_agency                           2489 non-null   object        
 3   summary_recipient_defined_text_field_1_value  2489 non-null   object        
 4   program_code                                  2489 non-null   object        
 5   program_code_description                      2489 non-null   object        
 6   recipient_project_number                      2489 non-null   object        
 7   improvement_type                              2489 non-null   object        
 8   improvement_type_description                  2489 non-null   object

## Export Data

In [14]:
### rename the file for export to GCS
### use date to rename

In [15]:
# _script_utils.export_to_gcs(df, "01302025_agg")

In [17]:
_script_utils.export_to_gcs(df2, "01312025_agg")