# Run Functions to Add Information to Projects

To run the data through the script, all you need to do is update `my_file` path to the most recent export from FMIS and QMRS uploaded to GCS, then run the function in the section `Export Data` with your dataframe and the current date. Then your aggregated data will be ready in GCS. 

In [1]:
import pandas as pd
from siuba import *

import _script_utils

from calitp_data_analysis.sql import to_snakecase


In [2]:
pd.set_option("display.max_columns", 100)
pd.set_option('display.max_colwidth', None)

## Read in Data and function development / Test Function

For the following function:
* update the file path for `my_file` to the most recent file name of the FMIS & QMRS export
* the second kwargs is the unique recipient identifier, in this case it should stay the same with subsequent exports
* the third kwargs is the aggregation level you want for the data. Unless otherwise specified, it should be `agg` which is one row per project

In [3]:
GCS_FILE_PATH  = 'gs://calitp-analytics-data/data-analyses/dla/dla-iija'

In [4]:
my_file = "FMIS_Projects_Universe_IIJA_Reporting_03012024_ToDLA.xlsx"

### Check data

In [5]:
check_data = to_snakecase(pd.read_excel(f"{GCS_FILE_PATH}/{my_file}"))

In [6]:
check_data.head(1)

Unnamed: 0,fmis_transaction_date,program_code,program_code_description,project_number,recipient_project_number,project_title,county_code,congressional_district,project_status_description,project_description,improvement_type,improvement_type_description,total_cost_amount,obligations_amount,summary_recipient_defined_text_field_1_value,proj_id
0,44581,ER01,EMERGENCY REL 2022 SUPPLEMENT,31RA002,0518000118S,MONTEREY COUNTY NEAR BIG SUR 2.3 MILES NORTH OF CASTRO CANYON BRIDGE TO 0.8 MILE SOUTH OF BIG SUR RIVER BRIDGE. EMERGENCY PROJECT - PERMANENT RESTORA,53,Cong Dist 20,Active,MONTEREY COUNTY NEAR BIG SUR 2.3 MILES NORTH OF CASTRO CANYON BRIDGE TO 0.8 MILE SOUTH OF BIG SUR RIVER BRIDGE. EMERGENCY PROJECT - PERMANENT RESTORATION. COMPLETE COASTAL DEVELOPMENT PERMIT REQUIREMENTS AT PFEIFFER CANYON BRIDGE.,16,Right of Way,600000.0,531100.0,S AMBAG,518000118


In [7]:
check_data.project_number.nunique()

1968

### Run Script

In [8]:
df = _script_utils.run_script(my_file, 'summary_recipient_defined_text_field_1_value', 'agg')

  df['implementing_agency_locode'] = df['implementing_agency_locode'].str.replace('.0', '')


### Testing the data

In [9]:
df.sample(3)

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
981,45098,5925051,El Dorado County,L5925SACOG,Y001,National Highway Performance Program (NHPP),03928745L,15,Preliminary Engineering,BUCKS BAR ROAD BR. @ N. FORK COSUMNES RIVER. BR. # 25C0003 BRIDGE REPLACEMENT,385105,|04|,|03|,17,El Dorado County,|ED|,5925.0,El Dorado County Transportation Commission,Sacramento Area Council Of Governments,Replace Bridge in El Dorado County,"Replace Bridge in El Dorado County, part of the National Highway Performance Program (NHPP). (Federal Project ID: 5925051)."
725,45047,5481013,Goleta,L5481SBCAG,Y001,National Highway Performance Program (NHPP),0512000237L,10|17,Bridge Replacement - Added Capacity|Construction Engineering,HOLLISTER AVENUE BRIDGE #51C0027 BRIDGE REPLACEMENT,13948388,|24|,|05|,83,Santa Barbara County,|SB|,5481.0,Santa Barbara County Association of Governments,Santa Barbara County Association Of Governments,Replace Bridge in Goleta,"Replace Bridge in Goleta, part of the National Highway Performance Program (NHPP). (Federal Project ID: 5481013)."
952,45096,5030067,Vallejo,L5030MTC,Y230,Surface Transportation Block Grant Program,0420000343L,6,4R - Restoration & Rehabilitation,"SACRAMENTO ST FROM TENNESSEE ST TO CAPITAL ST IMPLEMENT ROAD DIET PROJECT . INSTALL NEW DESIGNATED CLASS 2 BIKE LANE OR BIKE SHARROWS, WITH CORRESP",681000,|05|,|04|,95,Solano County,|SOL|,,NON-RTPA,Metropolitan Transportation Commission,Install Bike Lanes in Vallejo,"Install Bike Lanes in Vallejo, part of the Surface Transportation Block Grant Program. (Federal Project ID: 5030067)."


In [10]:
## when grouping by funding program (pne project can have multiple rows), len is 1612 for 2023 version of data
## asserting the length of the df is the same as number of projects
assert len(df) == check_data.project_number.nunique()

In [11]:
## check one project with multiple funding codes
df>>filter(_.project_number=='5004049')

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
1129,45133,5004049,San Diego,L5004SANDAG,Y001|Y110|Y908|Y909,National Highway Performance Program (NHPP)|Bridge Formula Program|Bridge Replacement and Rehabilitation Program,11955780L,10|17,Bridge Replacement - Added Capacity|Construction Engineering,"WEST MISSION BAY DRIVE OVER THE SAN DIEGO RIVER BRIDGE REPLACEMENT, BR. NO. 57C-0023",69715548,|52|,|11|,73,San Diego County,|SD|,4,San Diego Association of Governments,San Diego Association Of Governments,Replace Bridge in San Diego,"Replace Bridge in San Diego, part of the National Highway Performance Program (NHPP), and the Bridge Formula Program, and the Bridge Replacement and Rehabilitation Program. (Federal Project ID: 5004049)."


## Export Data

In [14]:
### rename the file for export to GCS
### use date to rename

In [16]:
# _script_utils.export_to_gcs(df, "04162024_agg")