## Notes 
<b> To-Do<b>
* For PAP, problem is prior col is all lumped together for projects before 2020?
    * Highlands sheet does not differentiate between the years. Wants each year to be separated out.
    * Highlands should address.   
* Make sure PPNO numbers match across the sheets.
* Wants to compare cols from previous SAR with current SAR b/c she has to bold any changes in black for CTC.  
    
<b> Done </b>
* Make sure grant recipients don't have multiple variations of the same recipient. 
    * Replaced duplicates with one version.
* SAR: allocation amount col isn't from highlands sheet, it's taken by adding GGRF Alloc + PTA-SB1. 
    * Added these amounts up. 
* SAR:
    * Projects with NO allocation amount won't appear in SAR. 
    * Projects with 100%  expenditures won't appear as well.
    * Filtered them out. 

In [1]:
import numpy as np
import pandas as pd
import data_prep

GCS_FILE_PATH = "gs://calitp-analytics-data/data-analyses/tircp/"
pd.options.display.max_columns = 50
pd.options.display.max_rows = 225
pd.options.display.float_format = "{:.2f}".format
from calitp import *
from siuba import *

# from styleframe import StyleFrame, Styler



In [2]:
# Load in Crosswalks
FILE_NAME3 = "Allocation_PPNO_Crosswalk.csv"
allocation_ppno_crosswalk = pd.read_csv(f"{GCS_FILE_PATH}{FILE_NAME3}")

# Allocation PPNO Crosswalk
FILE_NAME4 = "Projects_PPNO.xlsx"
project_ppno_crosswalk = pd.read_excel(f"{GCS_FILE_PATH}{FILE_NAME4}")

## Clean Up Data

### Allocation 

In [3]:
FILE_NAME = "TIRCP_Allocation_March_14_2022.xlsx"
allocation = pd.read_excel(f"{GCS_FILE_PATH}{FILE_NAME}")

In [4]:
allocation.columns = allocation.columns.str.strip().str.replace(" ", "_")

In [5]:
### REMINDER: TAKE OUT DF WHEN EXPORTING TO SCRIPT ###
def allocation_function(df):
    # FILE_NAME2 = "TIRCP_Allocation_March_14_2022.xlsx"
    # df = pd.read_excel(f"{GCS_FILE_PATH}{FILE_NAME2}")

    ### GENERAL CLEAN UP ###
    # stripping spaces & _
    df.columns = df.columns.str.strip().str.replace(" ", "_")
    # stripping spaces in columns
    df.columns = df.columns.map(lambda x: x.strip())
    # drop rows that are all NA
    df = df.dropna(how="all")
    # Change column name
    df = df.rename(columns={"3rd_Party_Award_Date": "Third_Party_Award_Date"})

    ### CORRECT DUPLICATES ###
    # Some grant recipients have multiple spellings of their name. E.g. BART versus Bay Area Rapid Tranist
    df["Grant_Recipient"] = df["Grant_Recipient"].replace(
        {
            "Antelope Valley Transit Authority ": "Antelope Valley Transit Authority",
            "Capitol Corridor Joint Powers Authority ": "Capitol Corridor Joint Powers Authority",
            "Los Angeles County Metropolitan Transportation Authority ": "Los Angeles County Metropolitan Transportation Authority",
            "Sacramento Regional Transit District ": "Sacramento Regional Transit District",
            "Southern California Regional Rail Authority": "Southern California Regional Rail Authority (Metrolink)",
        }
    )
    ### PPNO CLEAN UP ###
    # stripping PPNO down to <5 characters
    df = df.assign(PPNO_New=df["PPNO"].str.slice(start=0, stop=5))
    # Merge in Crosswalk
    df = pd.merge(
        df,
        allocation_ppno_crosswalk,
        left_on=["Award_Year", "Grant_Recipient"],
        right_on=["Award_Year", "Award_Recipient"],
        how="left",
    )
    # Map Crosswalk
    df.PPNO_New = df.apply(
        lambda x: x.PPNO_New if (str(x.PPNO_New2) == "nan") else x.PPNO_New2, axis=1
    )
    # Drop old PPNO
    df = df.drop(["PPNO", "PPNO_New2"], axis=1).rename(columns={"PPNO_New": "PPNO"})
    # Change  PPNO to all be strings
    df.PPNO = df.PPNO.astype(str)

    ### DATES CLEAN UP ###
    # rename thid party award date
    df = df.rename(columns={"3rd_Party_Award_Date": "Third_Party_Award_Date"})
    # clean up dates in a loop
    alloc_dates = [
        "Allocation_Date",
        "Third_Party_Award_Date",
        "Completion_Date",
        "LED",
    ]
    for i in [alloc_dates]:
        df[i] = (
            df[i]
            .replace("/", "-", regex=True)
            .replace("Complete", "", regex=True)
            .replace("\n", "", regex=True)
            .replace("Pending", "TBD", regex=True)
            .fillna("TBD")
        )
    # replacing values for date columns to be coerced later
    df["Allocation_Date"] = df["Allocation_Date"].replace(
        {"08/12//20": "2020-08-12 00:00:00", "FY 20/21": "2020-12-31 00:00:00"}
    )

    df["Completion_Date"] = df["Completion_Date"].replace(
        {
            "Complete\n6/1/2019": "2019-06-01 00:00:00",
            "Complete\n2/11/2018": "2018-02-11 00:00:00",
            "Complete\n6/30/2020": "2020-06-30 00:00:00",
            "\n6/30/2018": "2018-06-30 00:00:00",
            "\n6/29/2020": "2020-06-29 00:00:00",
            "\n11/1/2019": "2019-01-11 00:00:00",
            "\nJun-29\n": "2019-06-01 00:00:00",
            "6/30/2021\n12/31/2021\n10/20/2022": "2022-10-22 00:00:00",
            "Complete\n1/31/2020": "2020-01-31 00:00:00",
            "Complete\n8/30/2020": "2020-08-30 00:00:00",
            "June 24. 2024": "2024-06-01 00:00:00",
            "11/21/2024\n7/30/2025 (Q4)": "2024-11-21 00:00:00",
            "Jun-26": "2026-01-01 00:00:00",
            "Jun-29": "2029-06-01 00:00:00",
            "Complete\n11/12/2019": "2019-11-12 00:00:00",
            "Deallocated": "",
            "Jun-28": "2028-06-01 00:00:00",
            "Jun-25": "2025-06-01 00:00:00",
            "Jun-23": "2023-06-01 00:00:00",
            "Jun-27": "2027-06-01 00:00:00",
            "Jan-25": "2025-01-01 00:00:00",
            "11-21-20247-30-2025 (Q4)": "2025-07-30 00:00:00",
            "6-30-202112-31-2021": "2021-12-31 00:00:00",
            "6-1-2019": "2019-06-01 00:00:00",
            "2-11-2018": "2018-02-11 00:00:00",
            "6-30-2020": "2020-06-30 00:00:00",
            " 6-30-2018": "2018-06-30 00:00:00",
            "6-29-2020": "2020-06-29 00:00:00",
            "11-1-2019": "2019-11-01 00:00:00",
            " 12-10-2018": "2018-12-10 00:00:00",
            " 11-13-2019": "2019-11-13 00:00:00",
            "3-30-2020": "2020-03-30 00:00:00",
            " 6-30-2020": "2020-06-30 00:00:00",
            "11-12-2019": "2019-11-12 00:00:00",
            "1-31-2020": "2020-01-31 00:00:00",
            "8-30-2020": "2020-08-30 00:00:00",
            "5-16-2020": "2020,05-16 00:00:00",
            "5-7-2020": "2020-05-07 00:00:00",
        }
    )

    df["Third_Party_Award_Date"] = df["Third_Party_Award_Date"].replace(
        {
            "-": "TBD",
            "Pending 6/30/2022": "2022-06-30 00:00:00",
            "Augsut 12, 2021": "2021-08-12 00:00:00",
        }
    )

    # coerce to dates
    df = df.assign(
        Allocation_Date_New=pd.to_datetime(df.Allocation_Date, errors="coerce").dt.date,
        Third_Party_Award_Date_New=pd.to_datetime(
            df.Third_Party_Award_Date, errors="coerce"
        ).dt.date,
        Completion_Date_New=pd.to_datetime(df.Completion_Date, errors="coerce").dt.date,
        LED_New=pd.to_datetime(df.LED, errors="coerce").dt.date,
    )

    # dropping old date columns
    df = df.drop(alloc_dates, axis=1)
    # rename coerced columns
    df = df.rename(
        columns={
            "Allocation_Date_New": "Allocation_Date",
            "Third_Party_Award_Date_New": "Third_Party_Award_Date",
            "Completion_Date_New": "Completion_Date",
            "LED_New": "LED",
        }
    )

    # Fill in missing dates
    missing_date = pd.to_datetime("2100-01-01")
    dates = ["Allocation_Date", "LED", "Completion_Date", "Third_Party_Award_Date"]
    for i in dates:
        df[i] = df[i].fillna(missing_date)

    ### CLEAN UP MONETARY COLS ###
    # correcting string to 0
    df["Expended_Amount"].replace({"Deallocation": 0}, inplace=True)
    # replacing monetary amounts with 0 & coerce to numeric
    allocation_monetary_cols = [
        "SB1_Funding",
        "Expended_Amount",
        "Allocation_Amount",
        "GGRF_Funding",
        "Prior_Fiscal_Years_to_2020",
        "Fiscal_Year_2020-2021",
        "Fiscal_Year_2021-2022",
        "Fiscal_Year_2022-2023",
        "Fiscal_Year_2023-2024",
        "Fiscal_Year_2024-2025",
        "Fiscal_Year_2025-2026",
        "Fiscal_Year_2026-2027",
        "Fiscal_Year_2027-2028",
        "Fiscal_Year_2028-2029",
        "Fiscal_Year_2029-2030",
    ]
    df[allocation_monetary_cols] = df[allocation_monetary_cols].fillna(value=0)
    df[allocation_monetary_cols] = df[allocation_monetary_cols].apply(
        pd.to_numeric, errors="coerce"
    )

    ### CLEAN UP IDS NUMBERS ####
    missing_ids = ["Project_ID", "EA", "CTC_Financial_Resolution"]
    df[missing_ids] = df[missing_ids].fillna(value="missing")

    ### Suffix to avoid confusion ###
    df = df.add_prefix("Allocation_")

    return df

In [6]:
# Test out function
alloc_test = allocation_function(allocation)

In [7]:
alloc_test.columns

Index(['Allocation_Award_Year', 'Allocation_Project_#',
       'Allocation_Grant_Recipient', 'Allocation_Implementing_Agency',
       'Allocation_Project_ID', 'Allocation_EA', 'Allocation_Components',
       'Allocation_Phase', 'Allocation_Allocation_Amount',
       'Allocation_Expended_Amount', 'Allocation_SB1_Funding',
       'Allocation_SB1_Budget_Year', 'Allocation_GGRF_Funding',
       'Allocation_GGRF_Budget_Year', 'Allocation_CTC_Financial_Resolution',
       'Allocation_CTC_Allocation_Amendment', 'Allocation_CTC_Waiver',
       'Allocation_CalSTA_Waiver', 'Allocation_PSA_#',
       'Allocation_CT_Document_#', 'Allocation_Date_Branch_Chief_Receives_PSA',
       'Allocation_Date_Regional_Coordinator_Receives_PSA',
       'Allocation_Date_OC_Receives_PSA', 'Allocation_Date_OPM_Receives_PSA',
       'Allocation_Date_Legal_Receives_PSA', 'Allocation_Date_Returned_to_PM',
       'Allocation_Date_PSA_Sent_to_Local_Agency',
       'Allocation_Date_PSA_Approved_by_Local_Agency',
       

In [8]:
alloc_test.loc[alloc_test["Allocation_PPNO"] == "CP062"]

Unnamed: 0,Allocation_Award_Year,Allocation_Project_#,Allocation_Grant_Recipient,Allocation_Implementing_Agency,Allocation_Project_ID,Allocation_EA,Allocation_Components,Allocation_Phase,Allocation_Allocation_Amount,Allocation_Expended_Amount,Allocation_SB1_Funding,Allocation_SB1_Budget_Year,Allocation_GGRF_Funding,Allocation_GGRF_Budget_Year,Allocation_CTC_Financial_Resolution,Allocation_CTC_Allocation_Amendment,Allocation_CTC_Waiver,Allocation_CalSTA_Waiver,Allocation_PSA_#,Allocation_CT_Document_#,Allocation_Date_Branch_Chief_Receives_PSA,Allocation_Date_Regional_Coordinator_Receives_PSA,Allocation_Date_OC_Receives_PSA,Allocation_Date_OPM_Receives_PSA,Allocation_Date_Legal_Receives_PSA,Allocation_Date_Returned_to_PM,Allocation_Date_PSA_Sent_to_Local_Agency,Allocation_Date_PSA_Approved_by_Local_Agency,Allocation_Date_Signed_by_DRMT,Allocation_PSA_Expiry_Date,Allocation_LONP,Allocation_Prior_Fiscal_Years_to_2020,Allocation_Fiscal_Year_2020-2021,Allocation_Fiscal_Year_2021-2022,Allocation_Fiscal_Year_2022-2023,Allocation_Fiscal_Year_2023-2024,Allocation_Fiscal_Year_2024-2025,Allocation_Fiscal_Year_2025-2026,Allocation_Fiscal_Year_2026-2027,Allocation_Fiscal_Year_2027-2028,Allocation_Fiscal_Year_2028-2029,Allocation_Fiscal_Year_2029-2030,Allocation_Allocation_Comments,Allocation_PSA_Comments,Allocation_PPNO,Allocation_Award_Recipient,Allocation_Allocation_Date,Allocation_Third_Party_Award_Date,Allocation_Completion_Date,Allocation_LED
200,2020.0,4.0,City of Inglewood,City of Inglewood,20000275,R441GA,Automated People Mover,PA&ED,20000000.0,1823462.51,10000000.0,2019-20,10000000.0,2019-10,TIRCP-2021-02,,,,07InglewoodPS-01,07InglewoodPS-01,NaT,NaT,,,,,,2021-01-06 00:00:00,2021-01-12,NaT,,0.0,20000000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,CP062,,2020-08-13,2100-01-01 00:00:00,2023-06-30,2023-06-30


In [9]:
alloc_test.shape

(215, 50)

In [10]:
Unique_Recipients = alloc_test.Allocation_Grant_Recipient.unique().tolist()

### Projects

In [11]:
FILE_NAME_Project = "TIRCP_Projects_March_14_2022.xlsx"
project = pd.read_excel(f"{GCS_FILE_PATH}{FILE_NAME_Project}")

In [12]:
### REMINDER: TAKE OUT DF WHEN EXPORTING TO SCRIPT ###
def project_function(df):
    # FILE_NAME1 = "TIRCP_Projects_March_14_2022.xlsx"
    # df = pd.read_excel(f"{GCS_FILE_PATH}{FILE_NAME1}")

    ### GENERAL CLEANING ###
    df.columns = df.columns.str.strip().str.replace(" ", "_")
    df.columns = df.columns.map(lambda x: x.strip())

    ### PPNO CLEAN UP ###
    # stripping PPNO down to <5 characters
    df = df.assign(PPNO_New=df["PPNO"].str.slice(start=0, stop=5))

    ### RECIPIENTS ###
    # Some grant recipients have multiple spellings of their name. E.g. BART versus Bay Area Rapid Tranist
    df["Grant_Recipient"] = df["Grant_Recipient"].replace(
        {
            "San Joaquin Regional\nRail Commission / San Joaquin Joint Powers Authority": "San Joaquin Regional Rail Commission / San Joaquin Joint Powers Authority",
            "San Francisco Municipal  Transportation Agency": "San Francisco Municipal Transportation Agency",
            "San Francisco Municipal Transportation Agency (SFMTA)": "San Francisco Municipal Transportation Agency",
            "Capitol Corridor Joint Powers Authority (CCJPA)": "Capitol Corridor Joint Powers Authority",
            "Bay Area Rapid Transit (BART)": "Bay Area Rapid Transit District (BART)",
            "Los Angeles County Metropolitan Transportation Authority (LA Metro)": "Los Angeles County Metropolitan Transportation Authority",
            "Santa Clara Valley Transportation Authority (SCVTA)": "Santa Clara Valley Transportation Authority",
            "Solano Transportation Authority (STA)": "Solano Transportation Authority",
            "Southern California Regional Rail Authority (SCRRA - Metrolink)": "Southern California  Regional Rail Authority",
        }
    )

    ### CROSSWALK ###
    df = pd.merge(
        df,
        project_ppno_crosswalk,
        left_on=["Award_Year", "Grant_Recipient"],
        right_on=["Award_Year", "Local_Agency"],
        how="left",
    )
    df.PPNO_New = df.apply(
        lambda x: x.PPNO_New if (str(x.PPNO_New2) == "nan") else x.PPNO_New2, axis=1
    )
    df = df.drop(["PPNO", "PPNO_New2"], axis=1).rename(columns={"PPNO_New": "PPNO"})

    # Change  PPNO to all be strings
    df.PPNO = df.PPNO.astype(str)

    ### DATES CLEAN UP ###
    # Replace FY 21/22 with Cycle 4
    df["Award_Cycle"].replace({"FY 21/22": 4}, inplace=True)

    ### MONETARY COLS CLEAN UP ###
    # correcting string to 0
    df["Percentage_Allocated"].replace({"Not Allocated": 0}, inplace=True)
    proj_cols = [
        "TIRCP_Award_Amount_($)",
        "Allocated_Amount",
        "Expended_Amount",
        "Unallocated_Amount",
        "Total_Project_Cost",
        "Other_Funds_Involved",
    ]
    df[proj_cols] = df[proj_cols].fillna(value=0)
    df[proj_cols] = df[proj_cols].apply(pd.to_numeric, errors="coerce")

    # Suffix to avoid confusion
    df = df.add_prefix("Project_")

    return df

In [13]:
project_test = project_function(project)

In [14]:
project_test.columns.sort_values()

Index(['Project_Allocated_Amount', 'Project_Award_Cycle', 'Project_Award_Year',
       'Project_Comments/Additional_Contacts', 'Project_County',
       'Project_District', 'Project_Expended_Amount',
       'Project_Grant_Recipient', 'Project_Local_Agency',
       'Project_Local_Agency_Address', 'Project_Local_Agency_City',
       'Project_Local_Agency_Contact', 'Project_Local_Agency_Email',
       'Project_Local_Agency_Phone_Number', 'Project_Local_Agency_Zip',
       'Project_Master_Agreement_Expiration_Date',
       'Project_Master_Agreement_Number', 'Project_Other_Funds_Involved',
       'Project_PPNO', 'Project_Percentage_Allocated', 'Project_Project_#',
       'Project_Project_Description', 'Project_Project_Manager',
       'Project_Project_Title', 'Project_Regional_Coordinator',
       'Project_TIRCP_Award_Amount_($)',
       'Project_Technical_Assistance-CALITP_(Y/N)',
       'Project_Technical_Assistance-Fleet_(Y/N)',
       'Project_Technical_Assistance-Network_Integration_(Y/

In [15]:
project_test[
    [
        "Project_Award_Year",
        "Project_Project_#",
        "Project_Grant_Recipient",
        "Project_Project_Title",
        "Project_Project_Manager",
        "Project_TIRCP_Award_Amount_($)",
    ]
].head(5)

Unnamed: 0,Project_Award_Year,Project_Project_#,Project_Grant_Recipient,Project_Project_Title,Project_Project_Manager,Project_TIRCP_Award_Amount_($)
0,2015,1,Antelope Valley Transit Authority (AVTA),Regional Transit Interconnectivity & Environme...,Yesenia Ochoa,24403000
1,2015,2,Capitol Corridor Joint Powers Authority,Travel Time Reduction Project,Doug Adams,4620000
2,2015,3,Los Angeles County Metropolitan Transportation...,Willowbrook/Rosa Parks Station & Blue Line Lig...,Arthur Murray,38494000
3,2015,4,Los Angeles-San Diego-San Luis Obispo Rail Cor...,Pacific Surfliner Transit Transfer Program,Luisa Lopez,1675000
4,2015,5,Montery-Salinas Transit,Monterey Bay Operations and Maintenance Facili...,Dina Facchini,10000000


### Checking PPNO differences between projects & allocation before crosswalk

In [16]:
PPNO_project = set(project_test.Project_PPNO.unique().tolist())
PPNO_allocation = set(alloc_test.Allocation_PPNO.unique().tolist())

In [17]:
alloc_test.Allocation_PPNO.nunique()

66

In [18]:
project_test.Project_PPNO.nunique()

69

In [19]:
PPNO_project - PPNO_allocation  # checking for differences

{'CP026',
 'CP060',
 'CP065',
 'CP068',
 'CP070',
 'CP071',
 'CP077',
 'CP078',
 'CP080'}

In [20]:
subset = [
    "CP026",
    "CP060",
    "CP065",
    "CP068",
    "CP070",
    "CP071",
    "CP077",
    "CP078",
    "CP080",
]

In [21]:
project_subset = project_test[
    [
        "Project_Award_Year",
        "Project_Project_Title",
        "Project_Grant_Recipient",
        "Project_PPNO",
    ]
]

In [22]:
# filter out subset.
project_subset[project_subset.Project_PPNO.isin(subset)]

Unnamed: 0,Project_Award_Year,Project_Project_Title,Project_Grant_Recipient,Project_PPNO
23,2016,Downtown/Riverfront Sacramento-West Sacramento...,Sacramento Regional Transit District (SacRT),CP080
28,2016,SB 132 ACE Extension Lathrop to Ceres/Merced,San Joaquin Regional Rail Commission / San Joa...,CP026
45,2018,Ride Between the Line: Enhancing Access to Tra...,San Diego Association of Governments (SANDAG),CP077
49,2018,SamTrans Express Bus Pilot,San Mateo County Transit District (SamTrans),CP078
59,2020,The Transbaby Corridor Core Capacity Program: ...,Bay Area Rapid Transit District (BART),CP060
64,2020,Metrolink Antelope Valley Line Capital and Ser...,LA County Metropolitan Transportation Authorit...,CP065
67,2020,West Valley Connector Bus Rapid Transit Phase ...,San Bernardino County Transportation Authority...,CP068
69,2020,Core Capacity Program,San Francisco Municipal Transportation Agency,CP070
70,2020,"For People, Place and Planet: Connecting Ingle...",Santa Monica Big Blue Bus,CP071


In [23]:
# filter out subset for allocations.
alloc_test[alloc_test.Allocation_PPNO.isin(subset)]

Unnamed: 0,Allocation_Award_Year,Allocation_Project_#,Allocation_Grant_Recipient,Allocation_Implementing_Agency,Allocation_Project_ID,Allocation_EA,Allocation_Components,Allocation_Phase,Allocation_Allocation_Amount,Allocation_Expended_Amount,Allocation_SB1_Funding,Allocation_SB1_Budget_Year,Allocation_GGRF_Funding,Allocation_GGRF_Budget_Year,Allocation_CTC_Financial_Resolution,Allocation_CTC_Allocation_Amendment,Allocation_CTC_Waiver,Allocation_CalSTA_Waiver,Allocation_PSA_#,Allocation_CT_Document_#,Allocation_Date_Branch_Chief_Receives_PSA,Allocation_Date_Regional_Coordinator_Receives_PSA,Allocation_Date_OC_Receives_PSA,Allocation_Date_OPM_Receives_PSA,Allocation_Date_Legal_Receives_PSA,Allocation_Date_Returned_to_PM,Allocation_Date_PSA_Sent_to_Local_Agency,Allocation_Date_PSA_Approved_by_Local_Agency,Allocation_Date_Signed_by_DRMT,Allocation_PSA_Expiry_Date,Allocation_LONP,Allocation_Prior_Fiscal_Years_to_2020,Allocation_Fiscal_Year_2020-2021,Allocation_Fiscal_Year_2021-2022,Allocation_Fiscal_Year_2022-2023,Allocation_Fiscal_Year_2023-2024,Allocation_Fiscal_Year_2024-2025,Allocation_Fiscal_Year_2025-2026,Allocation_Fiscal_Year_2026-2027,Allocation_Fiscal_Year_2027-2028,Allocation_Fiscal_Year_2028-2029,Allocation_Fiscal_Year_2029-2030,Allocation_Allocation_Comments,Allocation_PSA_Comments,Allocation_PPNO,Allocation_Award_Recipient,Allocation_Allocation_Date,Allocation_Third_Party_Award_Date,Allocation_Completion_Date,Allocation_LED


## Reports

### Semi Annual Report
* Some projects missing b/c PPNO are missing in allocation.
* Need to add Implementing Agency and Project ID

'''
df_pivot = df_pivot.rename(columns = {'Project_Award_Year':'Award Year',
                                         'Project_Project_#':'Project No.',
                                         'Allocation_Grant_Recipient': 'Award Recipient',
                                         'Project_Project_Title':'Project Title',
                                         'Percent_of_Award_Fully_Allocated': 'Percent of Award Fully Allocated',
                                         'Allocation_Components': 'Project Description/Component',
                                         'Project_PPNO':'PPNO',
                                         'Allocation_Phase':'Allocation Phase',
                                         'TIRCP_Award_Amount':'Award Amount',
                                         'Allocation_Amount': 'Allocation Amount',
                                         'Allocation_GGRF_Funding': "GGRF Allocation Amount",
                                        'Allocation_SB1_Funding':"PTA-SB1 Allocation Amount",
                                          'Allocation_Allocation_Date':'Allocation Date',
                                          'CON_Contract_Award_Date':'CON Contract Award Date',
                                          'Allocation_Expended_Amount':'Expended Amount',
                                          'Percent_of_Allocation_Expended':'Percent of Allocation Expended',
                                          'Allocated_Before_July_31_2020': 'Allocated Before July 2020',
                                          'Phase_Completion_Date': 'Phase Completion Date',
                                          
})
'''

In [24]:
FAKE_FILE = "Fake_SAR_4_14.xlsx"
fake_SAR = pd.read_excel(f"{GCS_FILE_PATH}{FAKE_FILE}")

In [25]:
fake_SAR.head(2)

Unnamed: 0,Project_Award_Year,Project_Project_#,Project_Project_Manager,Allocation_Grant_Recipient,Allocation_Implementing_Agency,Project_Project_Title,Percent_of_Award_Fully_Allocated,TIRCP_Award_Amount,Allocation_Components,Project_PPNO,Allocation_Phase,Allocation_Allocation_Date,CON_Contract_Award_Date,Phase_Completion_Date,Allocation_Project_ID,Allocation_EA,Allocation_Amount,Allocation_SB1_Funding,Allocation_GGRF_Funding,Allocation_Expended_Amount,Percent_of_Allocation_Expended,Allocated_Before_July_31_2020
0,2015,1,Yesenia Ochoa,Antelope Valley Transit Authority,Antelope Valley Transit Authority,Regional Transit Interconnectivity & Environme...,1.0,24403000,Purchase 13 60-foot articulated BRT buses and ...,CP005,CONST,2015-10-22,2016-03-14,2022-03-30,16000048,T343GA,24403000,0,24403000,21714177.53,0.89,X
1,2015,5,Dina Facchini,Monterey-Salinas Transit,Monterey-Salinas Transit,Monterey Bay Operations and Maintenance Facili...,1.0,10000000,Renovation and expansion of the Monterey maint...,CP013,CONST,2016-05-19,2016-11-03,2018-09-30,16000275,R349GA,10000000,0,10000000,0.0,0.0,X


In [26]:
# For table 2 in semi annual report
def summary_SAR_table_two(df):
    # pivot
    df = (
        df.drop_duplicates()
        .groupby(["Project_Award_Year"])
        .agg(
            {
                "Project_Project_#": "count",
                "Project_TIRCP_Award_Amount_($)": "sum",
                "Project_Allocated_Amount": "sum",
                "Project_Expended_Amount": "sum",
            }
        )
        .reset_index()
    )
    # renaming columns to match report
    df = df.rename(
        columns={
            "Project_Project_#": "Number_of_Awarded_Projects",
            "Project_TIRCP_Award_Amount_($)": "Award_Amount",
            "Project_Allocated_Amount": "Amount_Allocated",
            "Project_Expended_Amount": "Expended_Amount",
            "Project_Award_Year": "Award_Year",
        }
    )
    # create percentages
    df["Expended_Percent_of_Awarded"] = df["Expended_Amount"] / df["Award_Amount"]
    df["Expended_Percent_of_Allocated"] = df["Expended_Amount"] / df["Amount_Allocated"]
    df["Percent_Allocated"] = df["Amount_Allocated"] / df["Award_Amount"]
    # transpose
    df = df.set_index("Award_Year").T
    # grand totals for monetary columns
    list_to_add = [
        "Award_Amount",
        "Amount_Allocated",
        "Expended_Amount",
        "Number_of_Awarded_Projects",
    ]
    df["Grand_Total"] = df.loc[list_to_add, :].sum(axis=1)
    # grand total variables of each monetary column to fill in percentages below.
    Exp = df.at["Expended_Amount", "Grand_Total"]
    Alloc = df.at["Amount_Allocated", "Grand_Total"]
    TIRCP = df.at["Award_Amount", "Grand_Total"]
    # filling in totals of percentages
    df.at["Expended_Percent_of_Awarded", "Grand_Total"] = Exp / TIRCP
    df.at["Expended_Percent_of_Allocated", "Grand_Total"] = Exp / Alloc
    df.at["Percent_Allocated", "Grand_Total"] = Alloc / TIRCP
    # switching rows to correct order
    df = df.reindex(
        [
            "Number_of_Awarded_Projects",
            "Award_Amount",
            "Amount_Allocated",
            "Percent_Allocated",
            "Expended_Amount",
            "Expended_Percent_of_Awarded",
            "Expended_Percent_of_Allocated",
        ]
    )

    return df

In [27]:
### SAR ENTIRE REPORT ###
def semi_annual_report(
    df_project, df_allocation
):  ## CHANGE THIS FOR SCRIPT LATER BACK TO JUST ()
    ### LOAD IN SHEETS ###
    # df_project = project()  ## CHANGE THIS LATER FOR SCRIPT
    # df_allocation = allocation()  ## CHANGE THIS LATER FOR SCRIPT
    df_project = df_project  ##DELETE FOR SCRIPT
    df_allocation = df_allocation  ##DELETE FOR SCRIPT
    df_previous_sar = fake_SAR  ##DELETE FOR SCRIPT

    ### KEEP ONLY RELEVANT COLS ###
    df_project = df_project[
        [
            "Project_Project_Manager",
            "Project_Award_Year",
            "Project_Project_#",
            "Project_Project_Title",
            "Project_PPNO",
            "Project_TIRCP_Award_Amount_($)",
            "Project_Expended_Amount",
            "Project_Allocated_Amount",
        ]
    ]
    df_allocation = df_allocation[
        [
            "Allocation_Expended_Amount",
            "Allocation_Project_ID",
            "Allocation_EA",
            "Allocation_Award_Year",
            "Allocation_Grant_Recipient",
            "Allocation_Implementing_Agency",
            "Allocation_PPNO",
            "Allocation_Phase",
            "Allocation_Allocation_Date",
            "Allocation_Completion_Date",
            "Allocation_Third_Party_Award_Date",
            "Allocation_Components",
            "Allocation_SB1_Funding",
            "Allocation_GGRF_Funding",
        ]
    ]

    ###SUMMARY TABLE ###
    summary_table_2 = summary_SAR_table_two(df_project)

    ### JOIN ###
    df_sar = df_allocation.merge(
        df_project,
        how="left",
        left_on=["Allocation_PPNO", "Allocation_Award_Year"],
        right_on=["Project_PPNO", "Project_Award_Year"],
    )
    ### DROP DUPLICATES ###
    df_sar = df_sar.drop_duplicates()

    ### ADD % & ALLOCATED AMOUNTS###
    df_sar["Allocation_Amount"] = (
        df_sar["Allocation_SB1_Funding"] + df_sar["Allocation_GGRF_Funding"]
    )
    df_sar = df_sar.assign(
        Percent_of_Allocation_Expended=(
            df_sar["Allocation_Expended_Amount"] / df_sar["Allocation_Amount"]
        ),
        Percent_of_Award_Fully_Allocated=(
            df_sar["Project_Allocated_Amount"]
            / df_sar["Project_TIRCP_Award_Amount_($)"]
        ),
    )

    ### FILTER OUT PROJECTS THAT SHOULD BE EXCLUDED ###
    # Only projects with Allocation Amounts > $0 are included
    df_sar = df_sar[df_sar["Allocation_Amount"] > 0]
    # Only projects that haven't spent 100% of their money is included.
    df_sar = df_sar[df_sar["Percent_of_Allocation_Expended"] < 0.99]

    ### CLEAN UP PERCENTAGES ###
    cols = [
        "Allocation_Expended_Amount",
        "Allocation_Amount",
        "Project_TIRCP_Award_Amount_($)",
        "Project_Expended_Amount",
        "Percent_of_Allocation_Expended",
        "Percent_of_Award_Fully_Allocated",
    ]
    df_sar[cols] = df_sar[cols].apply(pd.to_numeric, errors="coerce").fillna(0)
    # rename cols
    df_sar = df_sar.rename(
        columns={
            "Allocation_Completion_Date": "Phase_Completion_Date",
            "Project_TIRCP_Award_Amount_($)": "TIRCP_Award_Amount",
            "Allocation_Third_Party_Award_Date": "CON_Contract_Award_Date",
        }
    )

    ### CLEAN DATE-TIME  ###
    # if the allocation date is AFTER  7-31-2020 then 0, if BEFORE 7-31-2020 then X
    df_sar = df_sar.assign(
        Allocated_Before_July_31_2020=df_sar.apply(
            lambda x: " "
            if x.Allocation_Allocation_Date > pd.Timestamp(2020, 7, 31, 0)
            else "X",
            axis=1,
        )
    )

    ### PIVOT ###
    df_pivot = df_sar.groupby(
        [
            "Project_Award_Year",
            "Project_Project_#",
            "Project_Project_Manager",
            "Allocation_Grant_Recipient",
            "Allocation_Implementing_Agency",
            "Project_Project_Title",
            "Percent_of_Award_Fully_Allocated",
            "TIRCP_Award_Amount",
            "Allocation_Components",
            "Project_PPNO",
            "Allocation_Phase",
            "Allocation_Allocation_Date",
            "CON_Contract_Award_Date",
            "Phase_Completion_Date",
            "Allocation_Project_ID",
            "Allocation_EA",
        ]
    ).agg(
        {
            "Allocation_Amount": "sum",
            "Allocation_SB1_Funding": "sum",
            "Allocation_GGRF_Funding": "sum",
            "Allocation_Expended_Amount": "sum",
            "Percent_of_Allocation_Expended": "max",
            "Allocated_Before_July_31_2020": "max",
        }
    )

    ### COMPARE PREVIOUS SAR TO CURRENT SAR ###
    ### Unpivot ###
    df_current_reset_index = df_pivot.reset_index()
    # https://stackoverflow.com/questions/50102808/highlighting-the-difference-between-two-dataframes
    # https://stackoverflow.com/questions/56647813/perform-operations-after-styling-in-a-dataframe
    def highlight_diff(current, previous, color="pink"):
        # Define html attribute
        attr = "background-color: {}".format(color)
        # Where data != other set attribute
        return pd.DataFrame(
            np.where(current.ne(previous), attr, ""),
            index=current.index,
            columns=current.columns,
        )

    # Apply Function
    df_current_highlighted_diffs = df_current_reset_index.style.apply(
        highlight_diff, axis=None, previous=fake_SAR, color="pink"
    )

    ### GCS ###
    with pd.ExcelWriter(f"{GCS_FILE_PATH}TESTING_Semi_Annual_Report.xlsx") as writer:
        summary_table_2.to_excel(writer, sheet_name="Summary", index=True)
        df_pivot.to_excel(writer, sheet_name="FY", index=True)
        df_current_reset_index.to_excel(
            writer, sheet_name="Unpivoted_Current_Version", index=False
        )
        df_current_highlighted_diffs.to_excel(
            writer, sheet_name="Highlighted_Differences", index=False
        )

    return df_pivot  # CHANGE BACK TO ORIGINAL DF

In [28]:
sar_test = semi_annual_report(project_test, alloc_test)



In [29]:
sar_test.head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0,Unnamed: 12_level_0,Unnamed: 13_level_0,Unnamed: 14_level_0,Unnamed: 15_level_0,Allocation_Amount,Allocation_SB1_Funding,Allocation_GGRF_Funding,Allocation_Expended_Amount,Percent_of_Allocation_Expended,Allocated_Before_July_31_2020
Project_Award_Year,Project_Project_#,Project_Project_Manager,Allocation_Grant_Recipient,Allocation_Implementing_Agency,Project_Project_Title,Percent_of_Award_Fully_Allocated,TIRCP_Award_Amount,Allocation_Components,Project_PPNO,Allocation_Phase,Allocation_Allocation_Date,CON_Contract_Award_Date,Phase_Completion_Date,Allocation_Project_ID,Allocation_EA,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015.0,1.0,Yesenia Ochoa,Antelope Valley Transit Authority,Antelope Valley Transit Authority,Regional Transit Interconnectivity & Environmental Sustability,1.0,24403000.0,Purchase 13 60-foot articulated BRT buses and 16 45-foot electric commuter buses,CP005,CONST,2015-10-22,2016-03-14,2022-03-30,16000048,T343GA,24403000.0,0.0,24403000.0,21714177.53,0.89,X
2015.0,5.0,Dina Facchini,Monterey-Salinas Transit,Monterey-Salinas Transit,Monterey Bay Operations and Maintenance Facility/Salinas Transit Service Project,1.0,10000000.0,Renovation and expansion of the Monterey maintenance and operations facility.,CP013,CONST,2016-05-19,2016-11-03,2018-09-30,16000275,R349GA,10000000.0,0.0,10000000.0,0.0,0.0,X


### Program Allocation Plan
* Add Allocation Date. Done. 
* Move Project ID Behind Phase. Done.
* Highlands sheet missing phases, so this automated version doesn't match the actual Program Allocation Plan Document.

In [30]:
# Take out df_project & df_allocation
def program_allocation_plan(df_project, df_allocation):
    ### LOAD IN SHEETS ### TAKE OUT COMMENETS FOR DF PROJECT() AND DF_ALLOCATION()
    # df_project = project()
    # df_allocation = allocation()
    # Only keeping certain columns
    df_project = df_project[
        [
            "Project_Award_Year",
            "Project_Project_#",
            "Project_TIRCP_Award_Amount_($)",
            "Project_Grant_Recipient",
            "Project_Project_Title",
            "Project_PPNO",
            "Project_Unallocated_Amount",
        ]
    ]
    df_allocation = df_allocation[
        [
            "Allocation_Award_Year",
            "Allocation_Grant_Recipient",
            "Allocation_Implementing_Agency",
            "Allocation_Components",
            "Allocation_PPNO",
            "Allocation_Phase",
            "Allocation_Prior_Fiscal_Years_to_2020",
            "Allocation_Fiscal_Year_2020-2021",
            "Allocation_Fiscal_Year_2021-2022",
            "Allocation_Fiscal_Year_2022-2023",
            "Allocation_Fiscal_Year_2023-2024",
            "Allocation_Fiscal_Year_2024-2025",
            "Allocation_Fiscal_Year_2025-2026",
            "Allocation_Fiscal_Year_2026-2027",
            "Allocation_Fiscal_Year_2027-2028",
            "Allocation_Fiscal_Year_2028-2029",
            "Allocation_Fiscal_Year_2029-2030",
            "Allocation_CTC_Financial_Resolution",
            "Allocation_Project_ID",
            "Allocation_SB1_Funding",
            "Allocation_GGRF_Funding",
            "Allocation_Allocation_Amount",
            "Allocation_Allocation_Date",
        ]
    ]
    ### MERGE 2 SHEETS ###
    df_combined = df_allocation.merge(
        df_project,
        how="left",
        left_on=["Allocation_PPNO", "Allocation_Award_Year"],
        right_on=["Project_PPNO", "Project_Award_Year"],
    )

    ### CLEAN UP ###

    # Fill in missing dates with something random
    missing_date = pd.to_datetime("2100-01-01")
    df_combined["Allocation_Allocation_Date"] = df_combined[
        "Allocation_Allocation_Date"
    ].fillna(missing_date)

    # Create Total_Amount Col
    df_combined["Total_Amount"] = (
        df_combined["Allocation_GGRF_Funding"] + df_combined["Allocation_SB1_Funding"]
    )

    # Rename cols to the right names
    df_combined = df_combined.rename(
        columns={
            "Project_TIRCP_Award_Amount_($)": "Award_Amount",
            "Allocation_Components": "Separable_Phases/Components",
            "Allocation_CTC_Financial_Resolution": "Allocation_Resolution",
            "Allocation_SB1_Funding": "PTA-SB1_Amount",
            "Project_Unallocated_Amount": "Not_Allocated",
        }
    )
    # Drop NA columns
    df_combined = df_combined.dropna(
        subset=[
            "Allocation_Award_Year",
            "Allocation_Grant_Recipient",
            "Allocation_Implementing_Agency",
        ]
    )

    ### PIVOT ###
    def pivot(df):
        df = df.groupby(
            [
                "Allocation_Award_Year",
                "Project_Project_#",
                "Award_Amount",
                "Not_Allocated",
                "Project_PPNO",
                "Allocation_Grant_Recipient",
                "Allocation_Implementing_Agency",
                "Project_Project_Title",
                "Separable_Phases/Components",
                "Allocation_Phase",
                "Allocation_Project_ID",
                "Allocation_Resolution",
                "Allocation_Allocation_Date",
            ]
        ).agg(
            {   "Allocation_Prior_Fiscal_Years_to_2020": "max",
                "Allocation_Fiscal_Year_2020-2021": "max",
                "Allocation_Fiscal_Year_2021-2022": "max",
                "Allocation_Fiscal_Year_2022-2023": "max",
                "Allocation_Fiscal_Year_2023-2024": "max",
                "Allocation_Fiscal_Year_2024-2025": "max",
                "Allocation_Fiscal_Year_2025-2026": "max",
                "Allocation_Fiscal_Year_2026-2027": "max",
                "Allocation_Fiscal_Year_2027-2028": "max",
                "Allocation_Fiscal_Year_2028-2029": "max",
                "Allocation_Fiscal_Year_2029-2030": "max",
                "PTA-SB1_Amount": "sum",
                "Allocation_GGRF_Funding": "sum",
                "Total_Amount": "sum",
            }
        )
        return df

    df_2015 = pivot(df_combined.loc[df_combined["Project_Award_Year"] == 2015])
    df_2016 = pivot(df_combined.loc[df_combined["Project_Award_Year"] == 2016])
    df_2018 = pivot(df_combined.loc[df_combined["Project_Award_Year"] == 2018])
    df_2020 = pivot(df_combined.loc[df_combined["Project_Award_Year"] == 2020])

    # GCS CHANGE TO TESTING OUT OF PATH
    with pd.ExcelWriter(
        f"{GCS_FILE_PATH}TESTING_Program_Allocation_Plan.xlsx"
    ) as writer:
        df_2015.to_excel(writer, sheet_name="2015_Cycle_1", index=True)
        df_2016.to_excel(writer, sheet_name="2016_Cycle_2", index=True)
        df_2018.to_excel(writer, sheet_name="2018_Cycle_3", index=True)
        df_2020.to_excel(writer, sheet_name="2020_Cycle_4", index=True)

    return df_2020

In [31]:
program_test = program_allocation_plan(project_test, alloc_test)

In [32]:
program_test.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0,Unnamed: 12_level_0,Allocation_Prior_Fiscal_Years_to_2020,Allocation_Fiscal_Year_2020-2021,Allocation_Fiscal_Year_2021-2022,Allocation_Fiscal_Year_2022-2023,Allocation_Fiscal_Year_2023-2024,Allocation_Fiscal_Year_2024-2025,Allocation_Fiscal_Year_2025-2026,Allocation_Fiscal_Year_2026-2027,Allocation_Fiscal_Year_2027-2028,Allocation_Fiscal_Year_2028-2029,Allocation_Fiscal_Year_2029-2030,PTA-SB1_Amount,Allocation_GGRF_Funding,Total_Amount
Allocation_Award_Year,Project_Project_#,Award_Amount,Not_Allocated,Project_PPNO,Allocation_Grant_Recipient,Allocation_Implementing_Agency,Project_Project_Title,Separable_Phases/Components,Allocation_Phase,Allocation_Project_ID,Allocation_Resolution,Allocation_Allocation_Date,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
2020.0,1.0,6503000.0,0.0,CP059,Antelope Valley Transit Authority,Antelope Valley Transit Authority,"Reaching the Most Transit-Vulnerable: AVTA's Zero Emission ""Microtransit"" & Bus Expansion Proposal",Network Integration,CONST,20000277,TIRCP-2021-02,2020-08-13,0.0,250000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,250000.0,0.0,250000.0
2020.0,1.0,6503000.0,0.0,CP059,Antelope Valley Transit Authority,Antelope Valley Transit Authority,"Reaching the Most Transit-Vulnerable: AVTA's Zero Emission ""Microtransit"" & Bus Expansion Proposal",Purchase of 11 Zero-Emission Vehicles and Supporting Infrastructure,CONST,20000276,TIRCP-2021-02,2020-08-13,0.0,6253000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3002000.0,3251000.0,6253000.0
2020.0,3.0,3914000.0,3194000.0,CP061,Capitol Corridor Joint Powers Authority,Capitol Corridor Joint Powers Authority,Sacramento Valley Station (SVS) Transit Center,Network Integration,CONST,20000279,TIRCP-2021-02,2020-08-12,0.0,720000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,720000.0,0.0,720000.0
2020.0,4.0,95200000.0,75200000.0,CP062,City of Inglewood,City of Inglewood,Inglewood Transit Connector Project,Automated People Mover,PA&ED,20000275,TIRCP-2021-02,2020-08-13,0.0,20000000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10000000.0,10000000.0,20000000.0
2020.0,5.0,12994000.0,12744000.0,CP063,Lake Transit Authority,Lake Transit Authority,North State Intercity Bus System,"New transit center, 4 new EV buses, expanded service routes, new infrastructure at maintenance facility",PA&ED,21000114,TIRCP-2021-09,2100-01-01 00:00:00,0.0,250000.0,150000.0,8034.0,4560.0,0.0,0.0,0.0,0.0,0.0,0.0,125000.0,125000.0,250000.0


### Tableau
* Has 75 projects b/c HIghlands sheet has one duplicate entry and one project that isn't TIRCP. 
* Didn't drop these because these will be corrected later.

In [33]:
def tableau(df):  # DELETE DF() LATER
    df.columns = df.columns.str.replace("Project_", "")
    # Keeping only certain columns.
    df = df[
        [
            "PPNO",
            "Award_Year",
            "#",
            "Grant_Recipient",
            "Title",
            "District",
            "County",
            "Description",
            "Master_Agreement_Number",
            "Master_Agreement_Expiration_Date",
            "Manager",
            "Regional_Coordinator",
            "Technical_Assistance-Fleet_(Y/N)",
            "Technical_Assistance-Network_Integration_(Y/N)",
            "Technical_Assistance-Priority_Population_(Y/N)",
            "Total_Cost",
            "Technical_Assistance-CALITP_(Y/N)",
            "TIRCP_Award_Amount_($)",
            "Allocated_Amount",
            "Expended_Amount",
            "Other_Funds_Involved",
        ]
    ]

    # Rename
    df = df.rename(
        columns={"TIRCP_Award_Amount_($)": "TIRCP_Amount", "Title": "Project_Title"}
    )

    # Getting percentages & filling in with 0
    df["Expended_Percent"] = df["Expended_Amount"] / df["Allocated_Amount"]
    df["Allocated_Percent"] = df["Allocated_Amount"] / df["TIRCP_Amount"]

    # Subtract TIRCP with Allocated Amount with Unallocated
    df["Unallocated_Amount"] = df["TIRCP_Amount"] - df["Allocated_Amount"]
    # filling in for 0's
    df[["Expended_Percent", "Allocated_Percent"]] = df[
        ["Expended_Percent", "Allocated_Percent"]
    ].fillna(value=0)
    df[["Expended_Percent", "Allocated_Percent"]] = df[
        ["Expended_Percent", "Allocated_Percent"]
    ].replace(np.inf, 0)

    # Categorizing expended percentage into bins
    def expended_percent(row):

        if (row.Expended_Percent > 0) and (row.Expended_Percent < 0.26):
            return "1-25"
        elif (row.Expended_Percent > 0.25) and (row.Expended_Percent < 0.51):
            return "26-50"
        elif (row.Expended_Percent > 0.50) and (row.Expended_Percent < 0.76):
            return "51-75"
        elif (row.Expended_Percent > 0.75) and (row.Expended_Percent < 1.0):
            return "76-99"
        elif row.Expended_Percent == 0.0:
            return "0"
        else:
            return "100"

    df["Expended_Percent_Group"] = df.apply(lambda x: expended_percent(x), axis=1)

    # Categorize years and expended_percent_group into bins
    def progress(df):
        ### 2015 ###
        if (df["Award_Year"] == 2015) and (df["Expended_Percent_Group"] == "1-25") | (
            df["Expended_Percent_Group"] == "26-50"
        ):
            return "Behind"
        elif (df["Award_Year"] == 2015) and (
            df["Expended_Percent_Group"] == "76-99"
        ) | (df["Expended_Percent_Group"] == "51-75"):
            return "On Track"

        ### 2016 ###
        elif (df["Award_Year"] == 2016) and (df["Expended_Percent_Group"] == "1-25") | (
            df["Expended_Percent_Group"] == "26-50"
        ):
            return "Behind"
        elif (df["Award_Year"] == 2016) and (
            df["Expended_Percent_Group"] == "51-75"
        ) | (df["Expended_Percent_Group"] == "76-99"):
            return "On Track"

        ### 2018 ###
        elif (df["Award_Year"] == 2018) and (df["Expended_Percent_Group"] == "1-25"):
            return "Behind"
        elif (df["Award_Year"] == 2018) and (
            df["Expended_Percent_Group"] == "26-50"
        ) | (df["Expended_Percent_Group"] == "51-75"):
            return "On Track"
        elif (df["Award_Year"] == 2018) and (df["Expended_Percent_Group"] == "76-99"):
            return "Ahead"

        ### 2020 ###
        elif (df["Award_Year"] == 2020) and (df["Expended_Percent_Group"] == "1-25"):
            return "Behind"
        elif (df["Award_Year"] == 2020) and (df["Expended_Percent_Group"] == "26-50"):
            return "On Track"
        elif (df["Award_Year"] == 2020) and (
            df["Expended_Percent_Group"] == "51-75"
        ) | (df["Expended_Percent_Group"] == "76-99"):
            return "Ahead"

        ### 0 Expenditures ###
        elif df["Expended_Percent_Group"] == "0":
            return "No expenditures recorded"

        ### Else ###
        else:
            return "100% of allocated funds spent"

    df["Progress"] = df.apply(progress, axis=1)

    # Which projects are large,small, medium
    p75 = df.TIRCP_Amount.quantile(0.75).astype(float)
    p25 = df.TIRCP_Amount.quantile(0.25).astype(float)
    p50 = df.TIRCP_Amount.quantile(0.50).astype(float)

    def project_size(row):
        if (row.TIRCP_Amount > 0) and (row.TIRCP_Amount < p25):
            return "Small"
        elif (row.TIRCP_Amount > p25) and (row.TIRCP_Amount < p75):
            return "Medium"
        elif (row.TIRCP_Amount > p50) and (row.TIRCP_Amount > p75):
            return "Large"
        else:
            return "$0 recorded for TIRCP"

    df["Project_Category"] = df.apply(lambda x: project_size(x), axis=1)

    ### GCS ###
    # with pd.ExcelWriter(f"{GCS_FILE_PATH}Tableau_Sheet.xlsx") as writer:
    # df.to_excel(writer, sheet_name="Data", index=False)
    # return df

    return df

In [34]:
tableau_test = tableau(project_test)

In [35]:
len(tableau_test)

75

In [36]:
tableau_test.head(2)

Unnamed: 0,PPNO,Award_Year,#,Grant_Recipient,Project_Title,District,County,Description,Master_Agreement_Number,Master_Agreement_Expiration_Date,Manager,Regional_Coordinator,Technical_Assistance-Fleet_(Y/N),Technical_Assistance-Network_Integration_(Y/N),Technical_Assistance-Priority_Population_(Y/N),Total_Cost,Technical_Assistance-CALITP_(Y/N),TIRCP_Amount,Allocated_Amount,Expended_Amount,Other_Funds_Involved,Expended_Percent,Allocated_Percent,Unallocated_Amount,Expended_Percent_Group,Progress,Project_Category
0,CP005,2015,1,Antelope Valley Transit Authority (AVTA),Regional Transit Interconnectivity & Environme...,7,LA,Purchase 13 60-foot articulated BRT buses and ...,64AVTA2015MA,2024-04-01 00:00:00,Yesenia Ochoa,Ryan Greenway,,,,39478000,,24403000,24403000,21714177.53,0.0,0.89,1.0,0,76-99,On Track,Medium
1,CP012,2015,2,Capitol Corridor Joint Powers Authority,Travel Time Reduction Project,4,VAR,Track and curve improvements between San Jose ...,64CCJPAMA-A01,2022-05-01 00:00:00,Doug Adams,Shannon Simonds,No,No,No,5420700,No,4620000,4620000,4619999.9,0.0,1.0,1.0,0,76-99,On Track,Small


## Loading in Scripts

In [37]:
# test_SAR = TIRCP_functions.semi_annual_report()

In [38]:
# test_program = TIRCP_functions.program_allocation_plan()

In [39]:
# alloc_test = TIRCP_functions.allocation()

In [40]:
# alloc_test.columns

In [41]:
# tableau_test = TIRCP_functions.tableau()

In [42]:
project_test = TIRCP_functions.project()

NameError: name 'TIRCP_functions' is not defined

In [None]:
project_test.columns

In [None]:
# tableau_test = tableau_test.to_parquet('Tableau_parquet.parquet')