## Quarterly billing or new project report 
* Pull a quarterly billing or new project report from the TIRCP spreadsheet of any new allocations (new project IDs) made since the previous report?
* New projects highlighted in yellow
* Sort small Project ID to larger project ID
* Include existing projects (no highlight)
* Each cycle its own tab!

### Columns to keep
* Project ID	
* EA	
* Ph.	
* Dist.	
* Recipient	
* Project	
* Amount Available	
* Allocation Amount	
* Fund Type	
* Budget Year	
* Appropriation	
* CTC Allocation Date

In [1]:
import A1_data_prep
import A2_tableau
import A7_accounting_analysis
import numpy as np
import pandas as pd
from babel.numbers import format_currency
from calitp import *

pd.options.display.max_columns = 100
pd.options.display.float_format = "{:.2f}".format
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

In [2]:
# Load in sheets
project = A1_data_prep.clean_project()
alloc = A1_data_prep.clean_allocation()

  warn(msg)


#### Function 1: Load "Previous" Allocation Sheet 

In [83]:
def load_previous_allocation(file_path: str, previous_sheet_name: str):
    # Load in previous allocation sheet
    previous_allocation = to_snakecase(
        pd.read_excel(
            f"{A1_data_prep.GCS_FILE_PATH}{file_path}", sheet_name=previous_sheet_name
        )
    )

    # Clean project ID
    previous_allocation = A7_accounting_analysis.clean_project_ids(
        previous_allocation, "project_id"
    )

    # Coerce project ID to numeric
    previous_allocation.project_id = previous_allocation.project_id.apply(
        pd.to_numeric, errors="coerce"
    )

    # Get set the "previous" project ids
    previous_project_ids = set(previous_allocation.project_id.unique().tolist())

    return previous_project_ids

In [85]:
test_set = load_previous_allocation("fake_allocation_sheet.xlsx", "fake_aa")

In [87]:
type(test_set)

set

In [35]:
# Load in "previous allocation"
# previous_allocation = to_snakecase(
#        pd.read_excel(
#            f"{A1_data_prep.GCS_FILE_PATH}fake_allocation_sheet.xlsx", sheet_name="fake_aa"
#        )
#    )

In [38]:
# Clean up Project ID
# previous_allocation = A7_accounting_analysis.clean_project_ids(
#    previous_allocation,
#    "project_id",
# )

In [39]:
# previous_allocation.project_id = previous_allocation.project_id.apply(
#    pd.to_numeric, errors="coerce"
# )

#### Function 2: Allocation Sheet "Current"

In [4]:
# Columns for allocation subset
alloc_subset = [
    "allocation_award_year",
    "allocation_ppno",
    "allocation_project_id",
    "allocation_ea",
    "allocation_grant_recipient",
    "allocation_phase",
    "allocation_allocation_amount",
    "allocation_sb1_funding",
    "allocation_sb1_budget_year",
    "allocation_ggrf_funding",
    "allocation_ggrf_budget_year",
    "allocation_allocation_date",
]

In [19]:
# Subset
alloc2 = alloc[alloc_subset]

In [10]:
# Clean up Project IDs
alloc2 = A7_accounting_analysis.clean_project_ids(
    alloc2,
    "allocation_project_id",
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[project_id_col] = df[project_id_col].astype("str")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[project_id_col] = df[project_id_col].apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[project_id_col] = df[project_id_col].str.slice(start=0, stop=8)


In [12]:
# Filter out any project IDs that are none.
alloc2 = (alloc2.loc[alloc2.allocation_project_id != "None"]).reset_index(drop=True)

In [18]:
len(alloc2), len(alloc.loc[alloc.allocation_project_id == "None"]), len(alloc)

(280, 94, 374)

In [22]:
# Coerce project Ids to numeric
alloc2.allocation_project_id = alloc2.allocation_project_id.apply(
    pd.to_numeric, errors="coerce"
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


#### Function 3:  Compare new versus old project ids

In [45]:
# Get list the "previous" project ids
previous_project_ids = set(previous_allocation.project_id.unique().tolist())

In [50]:
# Get list of the "new" project ids
current_project_ids = set(alloc2.allocation_project_id.unique().tolist())

In [51]:
new_project_ids = list(current_project_ids - previous_project_ids)

In [52]:
new_project_ids

[nan, 1.0, 2.0, 3.0, 4.0, 2203000019.0, 200000283.0]

#### Function 4: Project Sheet
* Test with 2016 first

In [20]:
project_subset = [
    "project_grant_recipient",
    "project_project_title",
    "project_tircp_award_amount__$_",
    "project_ppno",
    "project_district",
    "project_award_year",
]

In [21]:
# Subset
project2 = project[project_subset]

In [23]:
# Filter
project2 = (project2.loc[project2.project_award_year == 2016]).reset_index(drop=True)

### Functions 5-7: Create the sheet
* One function for the first merge
* One for the melt 
* One for merging the original merged df with the melted values

In [24]:
# Merge the allocation w/ project sheet
m1 = pd.merge(
    alloc2,
    project2,
    how="inner",
    left_on=["allocation_ppno", "allocation_award_year"],
    right_on=["project_ppno", "project_award_year"],
    indicator=True,
)

In [25]:
m1.shape

(39, 19)

In [26]:
# Melt based on project id
ggrf_sb1_values = pd.melt(
    m1,
    id_vars=["allocation_project_id"],
    value_vars=["allocation_sb1_funding", "allocation_ggrf_funding"],
)

In [27]:
# Keep only values above 1
ggrf_sb1_values = (
    (ggrf_sb1_values.loc[ggrf_sb1_values["value"] > 0.00])
    .reset_index(drop=True)
    .rename(columns={"variable": "Fund Type", "value": "Allocation Amount"})
)

In [28]:
# Merge the m1 w/  ggrf_sb1_values
m2 = pd.merge(
    m1.drop(columns=["_merge"]),
    ggrf_sb1_values,
    how="left",
    on=["allocation_project_id"],
)

In [29]:
m3 = m2.drop_duplicates().sort_values("allocation_project_id")

In [43]:
alloc2.columns

Index(['allocation_award_year', 'allocation_ppno', 'allocation_project_id',
       'allocation_ea', 'allocation_grant_recipient', 'allocation_phase',
       'allocation_allocation_amount', 'allocation_sb1_funding',
       'allocation_sb1_budget_year', 'allocation_ggrf_funding',
       'allocation_ggrf_budget_year', 'allocation_allocation_date'],
      dtype='object')

### Function 8: Groupby to create sheet 

In [66]:
# Duplicate project ID so can apply highlightinbg
m3["allocation_project_id_2"] = m3.allocation_project_id

In [89]:
groupby_cols = [
    "project_project_title",
    "allocation_grant_recipient",
    "project_district",
    "project_tircp_award_amount__$_",
    "allocation_project_id",
    "allocation_phase",
    "allocation_ea",
    "allocation_sb1_budget_year",
    "allocation_ggrf_budget_year",
    "allocation_award_year",
    "Fund Type",
]

In [90]:
grouped_test = m3.groupby(groupby_cols).agg(
    {"Allocation Amount": "max", "allocation_project_id_2": "max"}
)

### Function 9: Highlight new project ID

In [61]:
# https://stackoverflow.com/questions/68439695/pandas-highlight-specific-number-with-different-color-in-dataframe
def HIGHLIGHT_COLOR(x):
    def colour_switch(number):
        if number in new_project_ids:
            color = "yellow"
        else:
            # default
            color = "white"

        return color

    return [f"background-color: {colour_switch(number)}" for number in x]

In [91]:
grouped_test.style.apply(HIGHLIGHT_COLOR)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Allocation Amount,allocation_project_id_2
project_project_title,allocation_grant_recipient,project_district,project_tircp_award_amount__$_,allocation_project_id,allocation_phase,allocation_ea,allocation_sb1_budget_year,allocation_ggrf_budget_year,allocation_award_year,Fund Type,Unnamed: 11_level_1,Unnamed: 12_level_1
ACE Near-Term Capacity Improvement Program,San Joaquin Regional Rail Commission,VAR,16459000.0,18000009.0,PA&ED,R368GA,,2016-17,2016.0,allocation_ggrf_funding,250000.0,18000009.0
ACE Near-Term Capacity Improvement Program,San Joaquin Regional Rail Commission,VAR,16459000.0,18000010.0,CONST,R368GB,,2016-17,2016.0,allocation_ggrf_funding,7500000.0,18000010.0
ACE Near-Term Capacity Improvement Program,San Joaquin Regional Rail Commission,VAR,16459000.0,18000287.0,PS&E,R368GC,,2016-17,2016.0,allocation_ggrf_funding,500000.0,18000287.0
ACE Near-Term Capacity Improvement Program,San Joaquin Regional Rail Commission,VAR,16459000.0,22000242.0,CONST,R368GD,2020-21,,2016.0,allocation_sb1_funding,8459000.0,22000242.0
Airport Metro Connector 96th Street Station/Metro Green Line Extension to LAX,Los Angeles County Metropolitan Transportation Authority,7,40000000.0,20000197.0,CONST,R435GA,,2019-20,2016.0,allocation_ggrf_funding,40000000.0,20000197.0
All Aboard: Transforming Southern California Rail Travel,Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,VAR,82000000.0,17000182.0,CONST,R362GA,,2016-17,2016.0,allocation_ggrf_funding,49995000.0,17000182.0
All Aboard: Transforming Southern California Rail Travel,Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,VAR,82000000.0,17000234.0,CONST,R366GA,,2016-17,2016.0,allocation_ggrf_funding,4017000.0,17000234.0
All Aboard: Transforming Southern California Rail Travel,Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,VAR,82000000.0,18000175.0,CONST,R362GB,,2016-17,2016.0,allocation_ggrf_funding,11988000.0,18000175.0
All Aboard: Transforming Southern California Rail Travel,Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,VAR,82000000.0,19000068.0,CONST,R401GB,,2017-18,2016.0,allocation_ggrf_funding,500000.0,19000068.0
All Aboard: Transforming Southern California Rail Travel,Los Angeles-San Diego-San Luis Obispo Rail Corridor Agency,VAR,82000000.0,19000069.0,CONST,R401GA,,2017-18,2016.0,allocation_ggrf_funding,500000.0,19000069.0


### Function 10: Wrap everything up. 
* Projects should be in different tabs based on whatever cycle they correspond with