# Run Functions to Add Information to Projects

To run the data through the script, all you need to do is update `my_file` path to the most recent export from FMIS and QMRS uploaded to GCS, then run the function in the section `Export Data` with your dataframe and the current date. Then your aggregated data will be ready in GCS. 

In [1]:
import pandas as pd
from siuba import *

import _script_utils

from calitp_data_analysis.sql import to_snakecase


In [2]:
pd.set_option("display.max_columns", 100)
pd.set_option('display.max_colwidth', None)

## Read in Data and function development / Test Function

For the following function:
* update the file path for `my_file` to the most recent file name of the FMIS & QMRS export
* the second kwargs is the unique recipient identifier, in this case it should stay the same with subsequent exports
* the third kwargs is the aggregation level you want for the data. Unless otherwise specified, it should be `agg` which is one row per project

In [3]:
GCS_FILE_PATH  = 'gs://calitp-analytics-data/data-analyses/dla/dla-iija'

In [4]:
my_file = "FMIS_Projects_Universe_IIJA_Reporting_03012024_ToDLA.xlsx"

### Check data

In [5]:
check_data = to_snakecase(pd.read_excel(f"{GCS_FILE_PATH}/{my_file}"))

In [6]:
check_data.sample()

Unnamed: 0,fmis_transaction_date,program_code,program_code_description,project_number,recipient_project_number,project_title,county_code,congressional_district,project_status_description,project_description,improvement_type,improvement_type_description,total_cost_amount,obligations_amount,summary_recipient_defined_text_field_1_value,proj_id
3912,45324,YS32,SEC 164 PENALTIES HSIP IIJA,P084056,0416000005S,"ALAMEDA COUNTY IN FREMONT FROM THE NORTH END OF DUMBARTON BRIDGE TO 0.1 MILE SOUTH OF TOLL PLAZA CONCRETE BARRIER, SHOULDER WIDENING, STRIPING, COLD",1,Cong Dist 15,Active,"ON STATE ROUTE: 84. ALAMEDA COUNTY IN FREMONT FROM THE NORTH END OF DUMBARTON BRIDGE TO 0.1 MILE SOUTH OF TOLL PLAZA CONCRETE BARRIER, SHOULDER WIDENING, STRIPING, COLD PLAN, AND OVERLAY.- (TC)",42,Training,5966.96,5966.96,S MTC,416000005


In [7]:
check_data.project_number.nunique()

1968

### Run Script

In [8]:
df = _script_utils.run_script(my_file, 'summary_recipient_defined_text_field_1_value', 'agg')

  df['implementing_agency_locode'] = df['implementing_agency_locode'].str.replace('.0', '')


### Testing the data

In [9]:
df.sample(3)

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
683,45021,5208171,Clovis,L5208FCOG,Y230,Surface Transportation Block Grant,0620000056L,15,Preliminary Engineering,"BARSTOW AVENUE BETWEEN MINNEWAWA AND CLOVIS AVENUES ROAD REHABILITATION INCLUDING CURB, SIGNAL, SIGNAGE, DETECTOR LOOPS AND STRIPING (TC)",8444,|22|,|06|,19,Fresno County,|FRE|,58,Council of Fresno County Governments,Council Of Fresno County Goverments,Signals in Clovis,"Signals in Clovis, part of the program(s) Surface Transportation Block Grant. (Federal Project ID: 5208171)."
1549,45258,5954183,San Bernardino County,L5954SCAG,Y001,National Highway Performance Program (NHPP),0821000031L,15,Preliminary Engineering,"NATIONAL TRAILS HIGHWAY (ROUTE 66) AT IZZY DITCH 24.2 MILES EAST OF KELBAKER ROAD, BR. NO. 54C-0319 REPLACE TWO LANE TIMBER BRIDGE WITH TWO LANE BRID",328000,|08|,|08|,71,San Bernardino County,|SBD|,5954,San Bernardino Associated Governments,Southern California Association Of Governments,Preliminary Engineering Projects in San Bernardino County,"Preliminary Engineering Projects in San Bernardino County, part of the program(s) National Highway Performance Program (NHPP). (Federal Project ID: 5954183)."
1421,45223,5904127,Humboldt County,L5904NON-MPO,Y001,National Highway Performance Program (NHPP),0112000291L,17,Construction Engineering,"WILLIAMS CREEK BRIDGE (04C0209) ON GRIZZLY BLUFF ROAD IN HUMBOLDT COUNTY, CA BRIDGE REPLACEMENT",45602,|02|,|01|,23,Humboldt County,|HUM|,54,Humboldt County Association of Governments,NON-MPO,Replace Bridge in Humboldt County,"Replace Bridge in Humboldt County, part of the program(s) National Highway Performance Program (NHPP). (Federal Project ID: 5904127)."


In [10]:
## when grouping by funding program (pne project can have multiple rows), len is 1612 for 2023 version of data
## asserting the length of the df is the same as number of projects
assert len(df) == check_data.project_number.nunique()

In [11]:
## check one project with multiple funding codes
df>>filter(_.project_number=='5004049')

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
1129,45133,5004049,San Diego,L5004SANDAG,Y001|Y110|Y908|Y909,National Highway Performance Program (NHPP)|Bridge Formula Program|Bridge Replacement and Rehabilitation Program,11955780L,10|17,Bridge Replacement - Added Capacity|Construction Engineering,"WEST MISSION BAY DRIVE OVER THE SAN DIEGO RIVER BRIDGE REPLACEMENT, BR. NO. 57C-0023",69715548,|52|,|11|,73,San Diego County,|SD|,4,San Diego Association of Governments,San Diego Association Of Governments,Replace Bridge in San Diego,"Replace Bridge in San Diego, part of the program(s) National Highway Performance Program (NHPP), and the Bridge Formula Program, and the Bridge Replacement and Rehabilitation Program. (Federal Project ID: 5004049)."


## Export Data

In [12]:
### rename the file for export to GCS
### use date to rename

In [13]:
_script_utils.export_to_gcs(df, "03012024_agg")