# Run Functions to Add Information to Projects

To run the data through the script, all you need to do is update `my_file` path to the most recent export from FMIS and QMRS uploaded to GCS, then run the function in the section `Export Data` with your dataframe and the current date. Then your aggregated data will be ready in GCS. 

In [1]:
import pandas as pd
from siuba import *

import _script_utils

In [2]:
pd.set_option("display.max_columns", 100)
pd.set_option('display.max_colwidth', None)

## Read in Data and function development / Test Function

For the following function:
* update the file path for `my_file` to the most recent file name of the FMIS & QMRS export
* the second kwargs is the unique recipient identifier, in this case it should stay the same with subsequent exports
* the third kwargs is the aggregation level you want for the data. Unless otherwise specified, it should be `agg` which is one row per project

In [5]:
GCS_FILE_PATH  = 'gs://calitp-analytics-data/data-analyses/dla/dla-iija'

In [6]:
my_file = "FMIS_Projects_Universe_IIJA_Reporting_062923_ToDLA.xlsx"

In [7]:
df = _script_utils.run_script(my_file, 'summary_recipient_defined_text_field_1_value', 'agg')

  df['implementing_agency_locode'] = df['implementing_agency_locode'].str.replace('.0', '')


### Testing the data

In [10]:
df.sample(3)

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
913,2023-03-28,5931030,Alpine County,L5931NON-MPO,Y001|Y110,National Highway Performance Program (NHPP)|Bridge Formula Program,1000020606L,15|10|17,Preliminary Engineering|Bridge Replacement - Added Capacity|Construction Engineering,HOT SPRINGS ROAD (BRIDGE 31C0005) OVER HOT SPRINGS CREEK BRIDGE REPLACEMENT,2923277,|04|,|10|,3,Alpine County,|ALP|,5931,Alpine County Transportation Commission,NON-MPO,Replace Bridge in Alpine County,"Replace Bridge in Alpine County, part of the program(s) National Highway Performance Program (NHPP), and the Bridge Formula Program. (Federal Project ID: 5931030)."
879,2023-03-23,5059242,Modesto,L5059STANCOG,Y230,Surface Transportation Block Grant,1020000126L,6|17,4R - Restoration & Rehabilitation|Construction Engineering,PELANDALE AVENUE FROM DALE ROAD TO DETROIT LANE PAVEMENT REHABILITATION (TC),2000000,|10|,|10|,99,Stanislaus County,|STA|,59,Stanislaus Council of Governments,Stanislaus Council Of Governments,Pavement Rehabilitation in Modesto,"Pavement Rehabilitation in Modesto, part of the program(s) Surface Transportation Block Grant. (Federal Project ID: 5059242)."
807,2023-02-27,6212022,Caltrans,S6212SCAG,Y230,Surface Transportation Block Grant,1214000097L,17,Construction Engineering,"INTERSTATE 5 FROM OSO CREEK TO ALICIA PARKWAY CONSTRUCT ONE GENERAL PURPOSE LANE ON EACH DIRECTION, RECONSTRUCT LA PAZ ROAD INTERCHANGE AND ADD AUXIL",1000000,|45|,|12|,59,Multi-County,|NA|,6212,CT-ADMIN,CT-ADMIN,Caltrans Construction Engineering Projects,"Caltrans Construction Engineering Projects, part of the program(s) Surface Transportation Block Grant. (Federal Project ID: 6212022)."


In [11]:
## when grouping by funding program (pne project can have multiple rows), len is 1612
len(df)

1465

In [13]:
## check one project with multiple funding codes
df>>filter(_.project_number=='5004049')

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
1372,2023-06-26,5004049,San Diego,L5004SANDAG,Y001|Y110|Y908|Y909,National Highway Performance Program (NHPP)|Bridge Formula Program|Bridge Replacement and Rehabilitation Program,11955780L,10|17,Bridge Replacement - Added Capacity|Construction Engineering,"WEST MISSION BAY DRIVE OVER THE SAN DIEGO RIVER BRIDGE REPLACEMENT, BR. NO. 57C-0023",69715548,|52|,|11|,73,San Diego County,|SD|,4,San Diego Association of Governments,San Diego Association Of Governments,Replace Bridge in San Diego,"Replace Bridge in San Diego, part of the program(s) National Highway Performance Program (NHPP), and the Bridge Formula Program, and the Bridge Replacement and Rehabilitation Program. (Federal Project ID: 5004049)."


## Export Data

In [14]:
# _script_utils.export_to_gcs(df, "export_name")