# Run Functions to Add Information to Projects

To run the data through the script, all you need to do is update `my_file` path to the most recent export from FMIS and QMRS uploaded to GCS, then run the function in the section `Export Data` with your dataframe and the current date. Then your aggregated data will be ready in GCS. 

In [1]:
import pandas as pd
from siuba import *

import _script_utils

from calitp_data_analysis.sql import to_snakecase


In [2]:
pd.set_option("display.max_columns", 100)
pd.set_option('display.max_colwidth', None)

## Read in Data and function development / Test Function

For the following function:
* update the file path for `my_file` to the most recent file name of the FMIS & QMRS export
* the second kwargs is the unique recipient identifier, in this case it should stay the same with subsequent exports
* the third kwargs is the aggregation level you want for the data. Unless otherwise specified, it should be `agg` which is one row per project

In [3]:
GCS_FILE_PATH  = 'gs://calitp-analytics-data/data-analyses/dla/dla-iija'

In [4]:
my_file = "FMIS_Projects_Universe_IIJA_Reporting_03012024_ToDLA.xlsx"

### Check data

In [5]:
check_data = to_snakecase(pd.read_excel(f"{GCS_FILE_PATH}/{my_file}"))

In [6]:
check_data.head(1)

Unnamed: 0,fmis_transaction_date,program_code,program_code_description,project_number,recipient_project_number,project_title,county_code,congressional_district,project_status_description,project_description,improvement_type,improvement_type_description,total_cost_amount,obligations_amount,summary_recipient_defined_text_field_1_value,proj_id
0,44581,ER01,EMERGENCY REL 2022 SUPPLEMENT,31RA002,0518000118S,MONTEREY COUNTY NEAR BIG SUR 2.3 MILES NORTH OF CASTRO CANYON BRIDGE TO 0.8 MILE SOUTH OF BIG SUR RIVER BRIDGE. EMERGENCY PROJECT - PERMANENT RESTORA,53,Cong Dist 20,Active,MONTEREY COUNTY NEAR BIG SUR 2.3 MILES NORTH OF CASTRO CANYON BRIDGE TO 0.8 MILE SOUTH OF BIG SUR RIVER BRIDGE. EMERGENCY PROJECT - PERMANENT RESTORATION. COMPLETE COASTAL DEVELOPMENT PERMIT REQUIREMENTS AT PFEIFFER CANYON BRIDGE.,16,Right of Way,600000.0,531100.0,S AMBAG,518000118


In [7]:
check_data.project_number.nunique()

1968

### Run Script

In [8]:
df = _script_utils.run_script(my_file, 'summary_recipient_defined_text_field_1_value', 'agg')

  df['implementing_agency_locode'] = df['implementing_agency_locode'].str.replace('.0', '')


### Testing the data

In [9]:
df.sample(3)

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
267,44803,X023043,California,S NON-MPO,YS30|YS32,Highway Safety Improvement Program (HSIP)|Section 164 Penalties - Use for HSIP Activities Program,0117000023S,17|21,Construction Engineering|Safety,"IN AND NEAR ARCATA, FROM SAINT LOUIS ROAD OVERCROSSING TO 0.7 MILE NORTH OF GIUNTOLI LANE OVERCROSSING. INSTALL GUARDRAIL AND UPGRADE END TREATMENTS,",6465800,|02|,|01|,23,Humboldt County,|HUM|,,,,Install Guardrails in Humboldt County,"Install Guardrails in Humboldt County, part of the Highway Safety Improvement Program (HSIP), and the Section 164 Penalties - Use for HSIP Activities Program. (Federal Project ID: X023043)."
1136,45134,5026063,San Buenaventura,L5026SCAG,Y301,Transportation Alternatives Program,0719000155L,17|28,Construction Engineering|Facilities for Pedestrians and Bicycles,"HARMON BARRANCA PATH & TELEPHONE ROAD, HARMON BARRANCA PATH & RALSTON STREET, HARMON BARRANCA PATH NORTH LINK & ANTELOPE AVE. CONSTRUCT CLASS III BIK",432000,|26|,|07|,111,Ventura County,|VEN|,26.0,Ventura County Transportation Commission,Southern California Association Of Governments,Facilities for Pedestrians and Bicycles in San Buenaventura,"Facilities for Pedestrians and Bicycles in San Buenaventura, part of the Transportation Alternatives Program. (Federal Project ID: 5026063)."
1260,45156,5307036,Fontana,L5307SCAG,Y601,Carbon Reduction Program,0823000118L,17|28,Construction Engineering|Facilities for Pedestrians and Bicycles,"FROM THE PACIFIC ELECTRIC TRAIL TO THE SOUTH, ALONG THE ETIWANDA CREEK FLOOD CONTROL CHANNEL, TO BANYAN STREET TO THE NORTH IN FONTANA: SAN SEVAINE T",2721400,|35|,|08|,71,San Bernardino County,|SBD|,57.0,San Bernardino Associated Governments,Southern California Association Of Governments,Facilities for Pedestrians and Bicycles in Fontana,"Facilities for Pedestrians and Bicycles in Fontana, part of the Carbon Reduction Program. (Federal Project ID: 5307036)."


In [10]:
## when grouping by funding program (pne project can have multiple rows), len is 1612 for 2023 version of data
## asserting the length of the df is the same as number of projects
assert len(df) == check_data.project_number.nunique()

In [11]:
## check one project with multiple funding codes
df>>filter(_.project_number=='5004049')

Unnamed: 0,fmis_transaction_date,project_number,implementing_agency,summary_recipient_defined_text_field_1_value,program_code,program_code_description,recipient_project_number,improvement_type,improvement_type_description,old_project_title_desc,obligations_amount,congressional_district,district,county_code,county_name,county_name_abbrev,implementing_agency_locode,rtpa_name,mpo_name,new_project_title,new_description_col
1129,45133,5004049,San Diego,L5004SANDAG,Y001|Y110|Y908|Y909,National Highway Performance Program (NHPP)|Bridge Formula Program|Bridge Replacement and Rehabilitation Program,11955780L,10|17,Bridge Replacement - Added Capacity|Construction Engineering,"WEST MISSION BAY DRIVE OVER THE SAN DIEGO RIVER BRIDGE REPLACEMENT, BR. NO. 57C-0023",69715548,|52|,|11|,73,San Diego County,|SD|,4,San Diego Association of Governments,San Diego Association Of Governments,Replace Bridge in San Diego,"Replace Bridge in San Diego, part of the National Highway Performance Program (NHPP), and the Bridge Formula Program, and the Bridge Replacement and Rehabilitation Program. (Federal Project ID: 5004049)."


In [22]:
# def update_program_code_list2():
#     updated_codes = to_snakecase(pd.read_excel(f"{GCS_FILE_PATH}/program_codes/FY21-22ProgramCodesAsOf5-25-2022.v2_expanded090823.xlsx"))
#     updated_codes = updated_codes>>select(_.iija_program_code, _.new_description)
#     original_codes = to_snakecase(pd.read_excel(f"{GCS_FILE_PATH}/program_codes/Copy of lst_IIJA_Code_20230908.xlsx"))
#     original_codes = original_codes>>select(_.iija_program_code, _.description, _.program_name)
    
#     program_codes = pd.merge(updated_codes, original_codes, on='iija_program_code', how = 'outer', indicator=True)
#     program_codes['new_description'] = program_codes['new_description'].str.strip()

#     program_codes.new_description.fillna(program_codes['description'], inplace=True)
    
#     program_codes = program_codes.drop(columns={'description' , '_merge'})
    
#     def add_program_to_row(row):
#         if 'Program' not in row['program_name']:
#             return row['program_name'] + ' Program'
#         else:
#             return row['program_name']
        
#     program_codes['program_name'] = program_codes.apply(add_program_to_row, axis=1)
    
#     return program_codes 

In [23]:
# def add_program_to_row(row):
#     if 'Program' not in row['program_name']:
#         return row['program_name'] + ' Program'
#     else:
#         return row['program_name']

In [24]:
# Apply the function to the column
# program_codes['program_name'] = program_codes.apply(add_program_to_row, axis=1)

In [25]:
# program_codes[~program_codes["program_name"].str.contains("Program")]

## Export Data

In [12]:
### rename the file for export to GCS
### use date to rename

In [12]:
# _script_utils.export_to_gcs(df, "04092024_agg")