# Revenue Modeling: BIRT

**Purpose:** For the tax revenue of interest, explore forecasts, testing out different combinations of 
endog and exog variables.

Once the best fit is determined, we can plug the parameters into the main "parameters.yml" file and run reproducible model fits through the command line using `kedro run`.

## Software Setup

If changes are made to the analysis code, run the below cell to reload the changes:

In [None]:
%reload_kedro

Imports:

In [None]:
import pandas as pd

# Prediction functions
from fyp_analysis.pipelines.modeling.predict import (
    get_possible_endog_variables,
    run_possible_models,
    fit_var_model,
    plot_forecast_results,
)

# The main preprocess pipeline
from fyp_analysis.pipelines.data_processing.preprocess import PreprocessPipeline
from fyp_analysis.extras.datasets import load_cbo_data

In [None]:
pd.options.display.max_columns = 999

## Parameter Setup

Set up the data catalog. We can use `DATA.load()` to load specific data instances.

In [None]:
DATA = catalog

Available data:

In [None]:
DATA.list()

Load the parameter dict too:

In [None]:
PARAMS = context.params

In [None]:
PARAMS

Extract specific parameters:

In [None]:
# Trim features to this start year
min_year = PARAMS["min_feature_year"]

# When is the CBO forecast from?
cbo_forecast_date = PARAMS["cbo_forecast_date"]

# First fiscal year of the plan
plan_start_year = PARAMS["plan_start_year"]

## Data Setup

Load the correlation matrix and Granger's matrix:

In [None]:
C = DATA.load("scaled_feature_correlations") # correlation matrix
G = DATA.load("grangers_matrix") # Grangers matrix

Load the final unscaled features:

In [None]:
unscaled_features = DATA.load("final_unscaled_features")

In [None]:
unscaled_features.head()

The final scaled features:

In [None]:
scaled_features = DATA.load("final_scaled_features")

In [None]:
scaled_features.head()

Initialize the preprocesser that goes from unscaled to scaled features:

In [None]:
guide = DATA.load("stationary_guide")
preprocess = PreprocessPipeline(guide)

In [None]:
guide.head()

Load the CBO data frame:

In [None]:
cbo_data = load_cbo_data(date=cbo_forecast_date)
cbo_columns = cbo_data.columns.tolist()

In [None]:
cbo_data.head()

## Forecast: Net Income

In [None]:
TAX_NAME = "NetIncome"
TAX_BASE_COLUMN = f"{TAX_NAME}Base"

In [None]:
SCALED_COLUMN = [col for col in scaled_features.columns if TAX_BASE_COLUMN in col][0]

### Correlations

In [None]:
C[SCALED_COLUMN].sort_values().head(n=10)

In [None]:
C[SCALED_COLUMN].sort_values().tail(n=10)

Load the possible endog variables:

In [None]:
possible_endog = DATA.load('possible_endog_variables')[SCALED_COLUMN]

In [None]:
possible_endog

### Explore possible fits

In [None]:
net_income_fits = run_possible_models(
    unscaled_features,
    preprocess,
    main_endog=TAX_BASE_COLUMN,
    other_endog=[
        "ConsumerConfidence",
        "CorporateProfits",
        "UnemploymentPhillyMSA",
        "InitialClaimsPA",
        "CPIPhillyMSA",
        "SP500",
        "GDP",
        "NonresidentialInvestment",
    ],
    orders=[2, 3, 4, 5, 6, 7, 8],
    grangers=G,
    max_fit_date=["2019-12-31", "2021-06-30"],
    cbo_columns=cbo_columns,
    alpha=0.1,
    max_exog=4,
    split_year=2014,
    max_other_endog=2,
    model_quarters=[True, False],
)

In [None]:
best_params = net_income_fits[0]
best_params

In [None]:
result, net_income_forecast = fit_var_model(
    unscaled_features,
    preprocess,
    plan_start_year=plan_start_year,
    max_fit_date=best_params["max_fit_date"],
    cbo_data=cbo_data,
    endog_cols=best_params["endog_cols"],
    order=best_params["order"],
    exog_cols=best_params["exog_cols"],
    model_quarters=best_params['model_quarters']
)
print(result.aic)

In [None]:
result.summary()

In [None]:
fig = plot_forecast_results(net_income_forecast, TAX_BASE_COLUMN);

## Forecast: Gross Receipts

In [None]:
TAX_NAME = "GrossReceipts"
TAX_BASE_COLUMN = f"{TAX_NAME}Base"

In [None]:
SCALED_COLUMN = [col for col in scaled_features.columns if TAX_BASE_COLUMN in col][0]

### Correlations

In [None]:
C[SCALED_COLUMN].sort_values().head(n=10)

In [None]:
C[SCALED_COLUMN].sort_values().tail(n=10)

Load the possible endog variables:

In [None]:
possible_endog = DATA.load('possible_endog_variables')[SCALED_COLUMN]

In [None]:
possible_endog

### Explore possible fits

In [None]:
gross_receipts_fits = run_possible_models(
    unscaled_features,
    preprocess,
    main_endog=TAX_BASE_COLUMN,
    other_endog=[
        "ConsumerConfidence",
        "CorporateProfits",
        "RealRetailFoodServiceSales",
        "CPIPhillyMSA",
        "CarSales",
        "PCEPriceIndex",
    ],
    orders=[2, 3, 4, 5, 6, 7, 8],
    grangers=G,
    max_fit_date=["2019-12-31", "2021-06-30"],
    cbo_columns=cbo_columns,
    alpha=0.1,
    max_exog=4,
    split_year=2014,
    max_other_endog=1,
    model_quarters=[True, False],
)

In [None]:
best_params = gross_receipts_fits[0]
best_params

In [None]:
result, gross_receipts_forecast = fit_var_model(
    unscaled_features,
    preprocess,
    plan_start_year=plan_start_year,
    max_fit_date=best_params["max_fit_date"],
    cbo_data=cbo_data,
    endog_cols=best_params["endog_cols"],
    order=best_params["order"],
    exog_cols=best_params["exog_cols"],
    model_quarters=best_params["model_quarters"],
)
print(result.aic)

In [None]:
result.summary()

In [None]:
fig = plot_forecast_results(gross_receipts_forecast, TAX_BASE_COLUMN);