# Volume Forecast

- **V1**: 
  - By month and by year 
  - **Funded only**: given the used **MBA Forecast** is for **funded loans**
  - **Clean Clients only**: given it showed **highest correlation** between dollar amount and loan volume unit change (thus most confident results..)
- **V2**: 
    - LoanTypes doesn't make sense/doesn't affect seasonality (use loan purpose instead) - Margie
    - **Groupby : interest rates ranges + loan purpose** -> seasonality would be detected better this way - Margie
    - Need to break it into two (separately) : 
      - Loan Type Volumes is more for Employee (workload purposes..)
      1) Seasonality  
      2) Loan Type


## Imports

In [None]:
import numpy as np
import pandas as pd
import pyspark.pandas as ps
from pandas.tseries.offsets import BMonthEnd
from pandas.tseries.offsets import CustomBusinessDay, MonthEnd
from functools import reduce
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
pd.options.mode.chained_assignment = None  # default='warn'

import os
import sys
from databricks.sdk.runtime import *

## Notebook imports
# nb path
sys.path.append(os.path.dirname(os.path.abspath('/Workspace/Shared/data_science/nexus_vision/Efficiency_Model_TDS1042')))
from input_data import read_sql_data, create_mba_forecast_df
import configs as c
from preprocess_data import convert_col_types, drop_nulls, create_ymd_cols, subset_data
from create_metrics import create_interest_rate_range_col, groupby_for_loans_amount_and_volume, calculate_percent_change, calulate_and_analyse_loan_volume_amount_correlations, calculate_cumulative_quarters
from create_forecast import prepare_data_for_forecast, apply_mba_forecast

## Read Data

In [None]:
sandbox = "datause1_sandbox"
folder = "nexus_vision"
dataset1 = "ds_unique_loan_record"
dataset2 = "ds_data_and_ds_data_prior"

In [None]:
# Get UniqueLoanRecord - This dataset will be used to calculate loan fallouts 
# created by Margie - pulled by Sue
unique_loans_df = read_sql_data(sandbox, folder, dataset1)

# Get DSData = DSDataPrior + DSData - has all historical-Oct8th 
# created by Margie - pulled by Cameron
dsdata = read_sql_data(sandbox, folder, dataset2)

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
File [0;32m<command-2565087960174922>, line 3[0m
[1;32m      1[0m [38;5;66;03m# Get UniqueLoanRecord - This dataset will be used to calculate loan fallouts [39;00m
[1;32m      2[0m [38;5;66;03m# created by Margie - pulled by Sue[39;00m
[0;32m----> 3[0m unique_loans_df [38;5;241m=[39m read_sql_data(sandbox, folder, dataset1)
[1;32m      5[0m [38;5;66;03m# Get DSData = DSDataPrior + DSData - has all historical-Oct8th [39;00m
[1;32m      6[0m [38;5;66;03m# created by Margie - pulled by Cameron[39;00m
[1;32m      7[0m dsdata [38;5;241m=[39m read_sql_data(sandbox, folder, dataset2)

File [0;32m/Workspace/Shared/data_science/nexus_vision/Efficiency_Model_TDS1042/input_data.py:9[0m, in [0;36mread_sql_data[0;34m(sandbox, folder, dataset)[0m
[1;32m      8[0m [38;5;28;01mdef[39;00m [38;5;2

In [None]:
## Define parameters, these will be periodically updated as needed by DS
dollar_amount_list = [333, 463, 444, 399, 422, 517, 543, 519]
quarters = ['Q1_23', 'Q2_23','Q3_23','Q4_23','Q1_24', 'Q2_24','Q3_24','Q4_24']
MBA_forecast = create_mba_forecast_df(dollar_amount_list, quarters)
MBA_forecast



## Data Preprocessing

In [None]:
## Drop NULL ApplicationDate -> Lead loans, do not want that in my data - Margie
dsdata = drop_nulls(dsdata, 'ApplicationDate')

## convert numeric cols dsdata
dsdata = convert_col_types(dsdata)

## Create year and month cols from ApplicationDate column
dsdata = create_ymd_cols(dsdata, 'ApplicationDate', year=True, month=True, day=False, ymd=False)

## Subset dataset as desired
funded_clean_clients_loans_dsdata = subset_data(dsdata, only_funded_loans=True, only_clean_clients=True)



In [None]:
## Create interest rate ranges, to include with volume forecast 
funded_clean_clients_loans_dsdata = create_interest_rate_range_col(funded_clean_clients_loans_dsdata, 'IntRate')

## Let's look at correlations between loan dollar amount and loan volume w.r.t. both's unit change (percent increase/decrease) 
calulate_and_analyse_loan_volume_amount_correlations(funded_clean_clients_loans_dsdata)



So moving forward, based on the above correlations..:
1) When using MBA Forecast data, we will assume, **unit increase in $$ amount -equiv- to unit increase in Application Volume**
2) We will (evidently) be **more confident in our Clean-Client forecasts**...so that will be V1 forecast (for the sake of *validation*)

## Skeleton Volume Forecast
- Predict Volume Based on **histoical data + Seasonality Adjustment**
- V1: **Forecasting Funded Loans Volumes** 

### Forecasting for Loan Application Volume

- Assumptions:
  - **MBA Mort.Loan Originations Forecasts** are relatively close to expected
  - Unit change in dollar equiv. to unit change in Volume  (direct high positive correlation)

- Logic: 
  - Translated MBA quarterly forecasts to percentage change
  - Used **last quarter of Q3_2023 real data** (cumsum) and applied forecasts
  - To get monthly forecasts, the quarterly's were split (per David H's rec)
  
  - consequently monthly's for the same quarter will appear to be a straight line

In [None]:
data_to_forecast = prepare_data_for_forecast(funded_clean_clients_loans_dsdata)



In [None]:
display(data_to_forecast)



In [None]:
all_forecasted = apply_mba_forecast(data_to_forecast, MBA_forecast, '2023-09-01')
all_forecasted[all_forecasted['Clientkey']==168]

