# Run Functions to Add Information to Projects

To run the data through the script, all you need to do is update `my_file` path to the most recent export from FMIS and QMRS uploaded to GCS, then run the function in the section `Export Data` with your dataframe and the current date. Then your aggregated data will be ready in GCS. 

In [1]:
import pandas as pd
from siuba import *

import _script_utils

from calitp_data_analysis.sql import to_snakecase


In [2]:
pd.set_option("display.max_columns", 100)
pd.set_option('display.max_colwidth', None)

## Read in Data and function development / Test Function

For the following function:
* update the file path for `my_file` to the most recent file name of the FMIS & QMRS export
* the second kwargs is the unique recipient identifier, in this case it should stay the same with subsequent exports
* the third kwargs is the aggregation level you want for the data. Unless otherwise specified, it should be `agg` which is one row per project

In [3]:
GCS_FILE_PATH  = 'gs://calitp-analytics-data/data-analyses/dla/dla-iija'

In [4]:
my_file = "FMIS_Projects_Universe_IIJA_Reporting_03012024_ToDLA.xlsx"

### Check data

In [5]:
check_data = to_snakecase(pd.read_excel(f"{GCS_FILE_PATH}/{my_file}"))

In [6]:
check_data.sample()

Unnamed: 0,fmis_transaction_date,program_code,program_code_description,project_number,recipient_project_number,project_title,county_code,congressional_district,project_status_description,project_description,improvement_type,improvement_type_description,total_cost_amount,obligations_amount,summary_recipient_defined_text_field_1_value,proj_id
3430,45261,Y001,NATIONAL HIGHWAY PERF IIJA,P905025,1119000045S,"NEAR SAN DIEGO, AT THE OTAY MESA COMMERCIAL VEHICLE ENFORCEMENT FACILITY (CVEF) ADD NEW INSPECTION LANE AND TRUCK WEIGHING SYSTEM.",73,Cong Dist 51,Active,"ON STATE ROUTE: 905. NEAR SAN DIEGO, AT THE OTAY MESA COMMERCIAL VEHICLE ENFORCEMENT FACILITY (CVEF) ADD NEW INSPECTION LANE AND TRUCK WEIGHING SYSTEM.",42,Training,3400.0,3000.0,S SANDAG,1119000045


In [7]:
check_data.project_number.nunique()

1968

### Run Script

In [8]:
df = _script_utils.run_script(my_file, 'summary_recipient_defined_text_field_1_value', 'agg')

  df['implementing_agency_locode'] = df['implementing_agency_locode'].str.replace('.0', '')


### Testing the data

In [9]:
df.sample(3)

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
422,44886,P299171,California,S SHASTA,YS32,Section 164 Penalties - Use for HSIP Activities,0200000216S,21,Safety,IN SHA COUNTY ABOUT 17.3 MI W OF REDDING FROM 4.3 TO 5.5 MI E OF TRINITY COUNTY CURVE IMPROVEMENT (TC),1017605,|01|,|02|,89,Shasta County,|SHA|,,,,Safety Improvements in Shasta County,"Safety Improvements in Shasta County, part of the program(s) Section 164 Penalties - Use for HSIP Activities. (Federal Project ID: P299171)."
806,45057,P213008,California,S SCAG,Y001,National Highway Performance Program (NHPP),0718000286S,6|17,4R - Restoration & Rehabilitation|Construction Engineering,"IN LOS ANGELES COUNTIES, IN THE CITIES OF TORRANCE, RANCHO PALOS VERDES, LOMITA AND LOS ANGELES, FROM WEST 25TH STREET TO WEST CARSON STREET AT VARIO",4922103,|33|43|44|,|07|,37,Los Angeles County,|LA|,,,,Road Restoration & Rehabilitation in Los Angeles County,"Road Restoration & Rehabilitation in Los Angeles County, part of the program(s) National Highway Performance Program (NHPP). (Federal Project ID: P213008)."
1393,45188,29S1005,California,S MTC,ER03,Emergency Relieve Funding,0416000410S,15,Preliminary Engineering,"IN SONOMA CO., NEAR MONTE RIO, AT 0.7 MI. EAST OF OLD MONTE RIO ROAD. EMERGENCY RELIEF - PRELIMINARY ENGINEERING RELATED TO LAND SLIDE REPAIR/ SOLDI",1667019,|02|,|04|,97,Sonoma County,|SON|,,,,Repair Slide Repair in Sonoma County,"Repair Slide Repair in Sonoma County, part of the program(s) Emergency Relieve Funding. (Federal Project ID: 29S1005)."


In [11]:
## when grouping by funding program (pne project can have multiple rows), len is 1612 for 2023 version of data
## asserting the length of the df is the same as number of projects
assert len(df) == check_data.project_number.nunique()

In [12]:
## check one project with multiple funding codes
df>>filter(_.project_number=='5004049')

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
1129,45133,5004049,San Diego,L5004SANDAG,Y001|Y110|Y908|Y909,National Highway Performance Program (NHPP)|Bridge Formula Program|Bridge Replacement and Rehabilitation Program,11955780L,10|17,Bridge Replacement - Added Capacity|Construction Engineering,"WEST MISSION BAY DRIVE OVER THE SAN DIEGO RIVER BRIDGE REPLACEMENT, BR. NO. 57C-0023",69715548,|52|,|11|,73,San Diego County,|SD|,4,San Diego Association of Governments,San Diego Association Of Governments,Replace Bridge in San Diego,"Replace Bridge in San Diego, part of the program(s) National Highway Performance Program (NHPP), and the Bridge Formula Program, and the Bridge Replacement and Rehabilitation Program. (Federal Project ID: 5004049)."


## Export Data

In [13]:
### rename the file for export to GCS
### use date to rename

In [14]:
# _script_utils.export_to_gcs(df, "03012024_agg.csv")