# (16) Factor Augmented Vector Autoregressive Model with Elastic Net Estimation (FAVAR_Elastic_Net)

A vector autoregressive (VAR) model with $p$ lags is defined by 

$$
Y_{t} = c + \sum_{i=1}^{p} \Phi_{i}Y_{t-i} + e_{t}.
$$

where $Y_{t}$ is an $8 \times 1$ vector of endogenous variables, $c$ is an $8 \times 1 $ vector of equation constants, $\Phi_{i}$ is an $8 \times 8$ matrix of coefficients to be determined during model estimation, and $e_{t}$ is an $8 \times 1$ vector of forecast errors. The vector of endogenous variables ($Y_{t}$) includes a target series to be forecasted and seven principal components that are extracted from the entire variable space. The resulting principal components are designed to be mutually orthogonal and maximize the variability within the original variable space. 

Elastic Net estimation is applied to every equation within the VAR framework. Elastic Net estimation is used to minimize forecast errors. Elastic Net estimation works by adding a penalty term designed to minimize the sum of squared coefficients and the sum of absolute coefficients. Therefore, the coefficients of less important predictors are pushed to zero and elastic net performs variables selection. Additionally, elastic net takes linear combinations of correlated predictors and works well in cases of multicolinearity.  

$$
L(a_{1},...,a_{n_{a}}) = \sum_{t}(Y_{t+1} - Y_{t+1|t})^{2} + \lambda_{1}\sum_{j=1}^{n_{a}}|a_{j}| + \lambda_{2}\sum_{j=1}^{n_{a}}a_{j}^{2}
$$

The optimal lag length of $p$ is set to a length long enough to return white noise residuals. Reasonable penalty parameters ($\lambda_{1},\lambda_{2}$) are set using validation set root mean squared error (RMSE) minimization. The following code reestimates the VAR model each period using walk foreword cross-validation with a fixed lag length over the validation set. Model validation is carried out using an 80-20 split. The initial training model is estimated on the first 80% of the training data. The training model weights are updated after each peiord. Therefore, model weights are always updated to reflect the most recent information. Walk foreword cross-validation is carried out on the remaining 20% of the in-sample set. Each $h$-step ahead forecast is produced using linear model iteration. In the codes below, the phrase "test" actually references the “validation” set AND NOT an out-of-sample test set. 

In the Python Scikit-Learn library, the elastic net loss function is redefined to the following:

$$
L(a_{1},...,a_{n_{a}}) = \sum_{t}(y_{t+1} - f_{t+1|t})^{2} + \alpha \lambda_{1}^{Ratio} \sum_{j=1}^{n_{a}}|a_{j}| + \alpha (1-\lambda_{1}^{Ratio})\sum_{j=1}^{n_{a}}a_{j}^{2}
$$
where $\alpha = \lambda_{1} + \lambda_{2}$ and $\lambda_{1}^{Ratio} = \lambda_{1}/(\lambda_{1} + \lambda_{2})$. Here, $\alpha$ is a homogenous hyperparameter that controls the strength of the penalty. Homogeneity implies that a doubling of $\alpha$ imposes a doubling of each pentalty parameter, both equally and respectively. The Elastic Net Mixture is controlled by the hyperparameter $\lambda_{1}^{Ratio}$. If $\lambda_{1}^{Ratio} = 0$, then the Elastic Net loss function equals the Ridge Regression loss function. If $\lambda_{1}^{Ratio} = 1$, then the Elastic Net loss function equals the Lasso Regression loss function. Therefore, our constraints are $\alpha > 0$ and $0 < \lambda_{1}^{Ratio} < 1$.

The first block of code defines one function. The MODEL function takes in six arguments. The large information set is defined with the data argument. The target variable to be forecasted is defined with the target_name argument, which tells the algorithm what variable should be removed from the large information set to be forecasted from the resulting principal components. The number of lags to include in each equation is set using the lags argument. Lags should be set to a large enough number in order to return white noise residuals. The regularization parameters are set using the penalty and mixture arguments. Lastly, the number of forecast horizons is defined by step_size. The output of the MODEL function is designed to return the training and validation set RMSE values during regularization parameter grid searching. Additionally, the number of principal components is returned via factors. After a reasonable regularization parameter is set into the model, the MODEL function will then return the training and validation set predicted values. The first block of code defines a region to grid search in order to identify a reasonable regularization parameter. The second block of code sets the reasonable regularization parameter into the model and returns the forecasts.

In [None]:
# Load Library:
from pandas import read_csv
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Function to Fit Model using Walk Foreward Cross-Validation:
def MODEL(data, target_name = 'RHP', lags = 36, penalty = 1.0, mixture = 0.5, step_size = 1):
    # Solving the Hyperparameters:
    lambda_1 = penalty*mixture
    lambda_2 = penalty*(1-mixture)
    # Seperate Target from Feature Space:
    target = data[[target_name]].values
    feature_space = data.drop(target_name, axis = 1).values
    # Store Index Values:
    index_values = data.index.values
    # Store Number of Features and Extracted Factors:
    features = feature_space.shape[1]
    factors = 7 
    # Store Training & Test Set Sizes:
    train_size = int(data.shape[0]*0.8)
    test_size = data.shape[0] - train_size
    # Storage & Model Estimation:
    test_pred = []
    name = 'FAVAR-Type Elastic Net Regression'
    print('-'*len(name))
    print(name)
    print('-'*len(name))
    print('Alpha: ', penalty)
    print('L1 Ratio: ', mixture)
    for t in range(test_size - step_size + 1):
        # Tracking Convergence:
        print('Test Set Walk Foreward: Iteration '+str(t+1))
        # Define Walk Foreward Training Sets:
        target_train = target[:train_size+t]
        feature_space_train = feature_space[:train_size+t, :]
        # Define Walk Foreward Test Set:
        target_test = target[train_size+t:]
        feature_space_test = feature_space[train_size+t:, :]
        # Define Normalization Functions:
        feature_space_normalization = StandardScaler().fit(feature_space_train)
        # Normalize Training Data:
        feature_space_train = feature_space_normalization.transform(feature_space_train)
        # Normalize Test Data:
        feature_space_test = feature_space_normalization.transform(feature_space_test)
        # Define Principal Component Analysis Function:
        pca_function = PCA(n_components = factors, random_state = 1).fit(feature_space_train)
        # Extract Training Set Factors:
        feature_space_train = pca_function.transform(feature_space_train)
        # Extract Test Set Factors:
        feature_space_test = pca_function.transform(feature_space_test)
        # Compile Data to Create Current & Lagged Data Sets:
        Compiled_Features = np.concatenate((feature_space_train, feature_space_test), axis = 0)
        Compiled_Target = np.concatenate((target_train, target_test), axis = 0)
        Transformed_Data = np.concatenate((Compiled_Target, Compiled_Features), axis = 1)
        Data_to_Use = Transformed_Data
        for l in range(1,lags+1,1):
            Lag_Transformed_Data = np.roll(Transformed_Data, l, axis = 0)
            Data_to_Use = np.append(Data_to_Use, Lag_Transformed_Data, axis = 1)
        Data_to_Use = Data_to_Use[lags:, :]
        # Create Current & Lagged Data Sets:
        Current_Data = Data_to_Use[:, 0:factors+1]
        Lagged_Data = Data_to_Use[:, factors+1:]
        # Redefine Walk Foreward Training Set Post Feature Extraction:
        Lagged_Data_Train = Lagged_Data[:train_size-lags+t, :]
        RHP_Train = Current_Data[:train_size-lags+t, 0]
        Factor1_Train = Current_Data[:train_size-lags+t, 1]
        Factor2_Train = Current_Data[:train_size-lags+t, 2]
        Factor3_Train = Current_Data[:train_size-lags+t, 3]
        Factor4_Train = Current_Data[:train_size-lags+t, 4]
        Factor5_Train = Current_Data[:train_size-lags+t, 5]
        Factor6_Train = Current_Data[:train_size-lags+t, 6]
        Factor7_Train = Current_Data[:train_size-lags+t, 7]
        # Redefine Walk Foreward Test Set Post Feature Extraction:
        Lagged_Data_Test = Lagged_Data[train_size-lags+t:, :]
        RHP_Test = Current_Data[train_size-lags+t:, 0]
        Factor1_Test = Current_Data[train_size-lags+t:, 1]
        Factor2_Test = Current_Data[train_size-lags+t:, 2]
        Factor3_Test = Current_Data[train_size-lags+t:, 3]
        Factor4_Test = Current_Data[train_size-lags+t:, 4]
        Factor5_Test = Current_Data[train_size-lags+t:, 5]
        Factor6_Test = Current_Data[train_size-lags+t:, 6]
        Factor7_Test = Current_Data[train_size-lags+t:, 7]
        # Fit Model to Training Set: RHP Equation
        RHP_Model = ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        RHP_Model.fit(X = Lagged_Data_Train, y = RHP_Train)
        # Fit Model to Training Set: Factor1 Equation
        Factor1_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor1_Model.fit(X = Lagged_Data_Train, y = Factor1_Train)
        # Fit Model to Training Set: Factor2 Equation
        Factor2_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor2_Model.fit(X = Lagged_Data_Train, y = Factor2_Train)
        # Fit Model to Training Set: Factor3 Equation
        Factor3_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor3_Model.fit(X = Lagged_Data_Train, y = Factor3_Train)
        # Fit Model to Training Set: Factor4 Equation
        Factor4_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor4_Model.fit(X = Lagged_Data_Train, y = Factor4_Train)
        # Fit Model to Training Set: Factor5 Equation
        Factor5_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor5_Model.fit(X = Lagged_Data_Train, y = Factor5_Train)
        # Fit Model to Training Set: Factor6 Equation
        Factor6_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor6_Model.fit(X = Lagged_Data_Train, y = Factor6_Train)
        # Fit Model to Training Set: Factor7 Equation
        Factor7_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor7_Model.fit(X = Lagged_Data_Train, y = Factor7_Train)
        # Forecast Storage:
        forecast_storage = Lagged_Data[train_size-lags+t,:]
        RHP_horizons = []
        Factor1_horizons = []
        Factor2_horizons = []
        Factor3_horizons =[]
        Factor4_horizons = []
        Factor5_horizons = []
        Factor6_horizons = []
        Factor7_horizons = []
        for h in range(step_size):
            # Storing Iterative Forecasts:
            RHP_horizons = np.append(RHP_horizons, RHP_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor1_horizons = np.append(Factor1_horizons, Factor1_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor2_horizons = np.append(Factor2_horizons, Factor2_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor3_horizons = np.append(Factor3_horizons, Factor3_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor4_horizons = np.append(Factor4_horizons, Factor4_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor5_horizons = np.append(Factor5_horizons, Factor5_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor6_horizons = np.append(Factor6_horizons, Factor6_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor7_horizons = np.append(Factor7_horizons, Factor7_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            # Update Forecast Predictor Space:
            forecast_storage = np.insert(forecast_storage, 0, RHP_horizons[h])
            forecast_storage = np.insert(forecast_storage, 1, Factor1_horizons[h])
            forecast_storage = np.insert(forecast_storage, 2, Factor2_horizons[h])
            forecast_storage = np.insert(forecast_storage, 3, Factor3_horizons[h])
            forecast_storage = np.insert(forecast_storage, 4, Factor4_horizons[h])
            forecast_storage = np.insert(forecast_storage, 5, Factor5_horizons[h])
            forecast_storage = np.insert(forecast_storage, 6, Factor6_horizons[h])
            forecast_storage = np.insert(forecast_storage, 7, Factor7_horizons[h])
        # Store Forecasted Values:
        test_pred = np.append(test_pred, RHP_horizons[step_size - 1])
        # Store Training Predictions:
        if t == 0:
            train_pred = RHP_Model.predict(X = Lagged_Data_Train)
            train_RMSE = np.sqrt(mean_squared_error(RHP_Train, train_pred))
    # Model Evaluation:
    test_RMSE = np.sqrt(mean_squared_error(Current_Data[train_size-lags+step_size-1:, 0], test_pred))
    return train_RMSE, test_RMSE, lambda_1, lambda_2, factors
# Setting Seed:
np.random.seed(12345)
# Load Data:
data = read_csv('Compiled_Data.csv', header = 0, index_col = 0, parse_dates = True)
data.index = pd.DatetimeIndex(data.index.values, freq = "MS")
# Set Model Hyperparameters:
Target_Name = 'RHP'
AR_Lags = 36
L1_Ratio = np.arange(0.300,0.320,0.001)
Alpha = np.arange(0.600,0.800,0.001)
horizons = 1
# Storage for Model Results:
Results = pd.DataFrame(columns = ['Lags', 'Factors', 'Alpha', 'L1_Ratio', 'Lambda_1', 'Lambda_2', 'Train_RMSE', 'Test_RMSE'])
for mixture in L1_Ratio:
    for penalty in Alpha:
        try:
            train_RMSE, test_RMSE, lambda_1, lambda_2, factors = MODEL(data, target_name = Target_Name, lags = AR_Lags, penalty = penalty, mixture = mixture, step_size = horizons)
            model_performance = {'Lags':AR_Lags, 'Factors':factors, 'Alpha':penalty, 'L1_Ratio':mixture, 'Lambda_1':lambda_1, 'Lambda_2':lambda_2, 'Train_RMSE':train_RMSE, 'Test_RMSE':test_RMSE}
            Results = Results.append(model_performance, ignore_index = True)
        except:
            continue

The second block of code reestimates the top performing model after setting the reasonable regularization parameters.

In [None]:
# Load Library:
from pandas import read_csv
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Function to Fit Model using Walk Foreward Cross-Validation:
def MODEL(data, target_name = 'RHP', lags = 36, penalty = 1.0, mixture = 0.5, step_size = 1):
    # Solving the Hyperparameters:
    lambda_1 = penalty*mixture
    lambda_2 = penalty*(1-mixture)
    # Seperate Target from Feature Space:
    target = data[[target_name]].values
    feature_space = data.drop(target_name, axis = 1).values
    # Store Index Values:
    index_values = data.index.values
    # Store Number of Features and Extracted Factors:
    features = feature_space.shape[1]
    factors = 7 
    # Store Training & Test Set Sizes:
    train_size = int(data.shape[0]*0.8)
    test_size = data.shape[0] - train_size
    # Storage & Model Estimation:
    test_pred = []
    name = 'FAVAR-Type Elastic Net Regression'
    print('-'*len(name))
    print(name)
    print('-'*len(name))
    print('Alpha: ', penalty)
    print('L1 Ratio: ', mixture)
    for t in range(test_size - step_size + 1):
        # Tracking Convergence:
        print('Test Set Walk Foreward: Iteration '+str(t+1))
        # Define Walk Foreward Training Sets:
        target_train = target[:train_size+t]
        feature_space_train = feature_space[:train_size+t, :]
        # Define Walk Foreward Test Set:
        target_test = target[train_size+t:]
        feature_space_test = feature_space[train_size+t:, :]
        # Define Normalization Functions:
        feature_space_normalization = StandardScaler().fit(feature_space_train)
        # Normalize Training Data:
        feature_space_train = feature_space_normalization.transform(feature_space_train)
        # Normalize Test Data:
        feature_space_test = feature_space_normalization.transform(feature_space_test)
        # Define Principal Component Analysis Function:
        pca_function = PCA(n_components = factors, random_state = 1).fit(feature_space_train)
        # Extract Training Set Factors:
        feature_space_train = pca_function.transform(feature_space_train)
        # Extract Test Set Factors:
        feature_space_test = pca_function.transform(feature_space_test)
        # Compile Data to Create Current & Lagged Data Sets:
        Compiled_Features = np.concatenate((feature_space_train, feature_space_test), axis = 0)
        Compiled_Target = np.concatenate((target_train, target_test), axis = 0)
        Transformed_Data = np.concatenate((Compiled_Target, Compiled_Features), axis = 1)
        Data_to_Use = Transformed_Data
        for l in range(1,lags+1,1):
            Lag_Transformed_Data = np.roll(Transformed_Data, l, axis = 0)
            Data_to_Use = np.append(Data_to_Use, Lag_Transformed_Data, axis = 1)
        Data_to_Use = Data_to_Use[lags:, :]
        # Create Current & Lagged Data Sets:
        Current_Data = Data_to_Use[:, 0:factors+1]
        Lagged_Data = Data_to_Use[:, factors+1:]
        # Redefine Walk Foreward Training Set Post Feature Extraction:
        Lagged_Data_Train = Lagged_Data[:train_size-lags+t, :]
        RHP_Train = Current_Data[:train_size-lags+t, 0]
        Factor1_Train = Current_Data[:train_size-lags+t, 1]
        Factor2_Train = Current_Data[:train_size-lags+t, 2]
        Factor3_Train = Current_Data[:train_size-lags+t, 3]
        Factor4_Train = Current_Data[:train_size-lags+t, 4]
        Factor5_Train = Current_Data[:train_size-lags+t, 5]
        Factor6_Train = Current_Data[:train_size-lags+t, 6]
        Factor7_Train = Current_Data[:train_size-lags+t, 7]
        # Redefine Walk Foreward Test Set Post Feature Extraction:
        Lagged_Data_Test = Lagged_Data[train_size-lags+t:, :]
        RHP_Test = Current_Data[train_size-lags+t:, 0]
        Factor1_Test = Current_Data[train_size-lags+t:, 1]
        Factor2_Test = Current_Data[train_size-lags+t:, 2]
        Factor3_Test = Current_Data[train_size-lags+t:, 3]
        Factor4_Test = Current_Data[train_size-lags+t:, 4]
        Factor5_Test = Current_Data[train_size-lags+t:, 5]
        Factor6_Test = Current_Data[train_size-lags+t:, 6]
        Factor7_Test = Current_Data[train_size-lags+t:, 7]
        # Fit Model to Training Set: RHP Equation
        RHP_Model = ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        RHP_Model.fit(X = Lagged_Data_Train, y = RHP_Train)
        # Fit Model to Training Set: Factor1 Equation
        Factor1_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor1_Model.fit(X = Lagged_Data_Train, y = Factor1_Train)
        # Fit Model to Training Set: Factor2 Equation
        Factor2_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor2_Model.fit(X = Lagged_Data_Train, y = Factor2_Train)
        # Fit Model to Training Set: Factor3 Equation
        Factor3_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor3_Model.fit(X = Lagged_Data_Train, y = Factor3_Train)
        # Fit Model to Training Set: Factor4 Equation
        Factor4_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor4_Model.fit(X = Lagged_Data_Train, y = Factor4_Train)
        # Fit Model to Training Set: Factor5 Equation
        Factor5_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor5_Model.fit(X = Lagged_Data_Train, y = Factor5_Train)
        # Fit Model to Training Set: Factor6 Equation
        Factor6_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor6_Model.fit(X = Lagged_Data_Train, y = Factor6_Train)
        # Fit Model to Training Set: Factor7 Equation
        Factor7_Model =  ElasticNet(alpha = penalty, l1_ratio = mixture, random_state = 1)
        Factor7_Model.fit(X = Lagged_Data_Train, y = Factor7_Train)
        # Forecast Storage:
        forecast_storage = Lagged_Data[train_size-lags+t,:]
        RHP_horizons = []
        Factor1_horizons = []
        Factor2_horizons = []
        Factor3_horizons =[]
        Factor4_horizons = []
        Factor5_horizons = []
        Factor6_horizons = []
        Factor7_horizons = []
        for h in range(step_size):
            # Storing Iterative Forecasts:
            RHP_horizons = np.append(RHP_horizons, RHP_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor1_horizons = np.append(Factor1_horizons, Factor1_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor2_horizons = np.append(Factor2_horizons, Factor2_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor3_horizons = np.append(Factor3_horizons, Factor3_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor4_horizons = np.append(Factor4_horizons, Factor4_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor5_horizons = np.append(Factor5_horizons, Factor5_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor6_horizons = np.append(Factor6_horizons, Factor6_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            Factor7_horizons = np.append(Factor7_horizons, Factor7_Model.predict(X = forecast_storage[0:Lagged_Data.shape[1]].reshape(1,Lagged_Data.shape[1])))
            # Update Forecast Predictor Space:
            forecast_storage = np.insert(forecast_storage, 0, RHP_horizons[h])
            forecast_storage = np.insert(forecast_storage, 1, Factor1_horizons[h])
            forecast_storage = np.insert(forecast_storage, 2, Factor2_horizons[h])
            forecast_storage = np.insert(forecast_storage, 3, Factor3_horizons[h])
            forecast_storage = np.insert(forecast_storage, 4, Factor4_horizons[h])
            forecast_storage = np.insert(forecast_storage, 5, Factor5_horizons[h])
            forecast_storage = np.insert(forecast_storage, 6, Factor6_horizons[h])
            forecast_storage = np.insert(forecast_storage, 7, Factor7_horizons[h])
        # Store Forecasted Values:
        test_pred = np.append(test_pred, RHP_horizons[step_size - 1])
        # Store Training Predictions:
        if t == 0:
            train_pred = RHP_Model.predict(X = Lagged_Data_Train)
            train_RMSE = np.sqrt(mean_squared_error(RHP_Train, train_pred))
    # Model Evaluation:
    test_RMSE = np.sqrt(mean_squared_error(Current_Data[train_size-lags+step_size-1:, 0], test_pred))
    train_pred = pd.DataFrame(train_pred, index = index_values[lags:train_size], columns = ['train_pred'])
    test_pred = pd.DataFrame(test_pred, index = index_values[data.shape[0]-test_size+step_size-1:], columns = ['test_pred'])
    return train_RMSE, test_RMSE, train_pred, test_pred, lambda_1, lambda_2
# Setting Seed:
np.random.seed(12345)
# Load Data:
data = read_csv('Compiled_Data.csv', header = 0, index_col = 0, parse_dates = True)
data.index = pd.DatetimeIndex(data.index.values, freq = "MS")
# Set Model Hyperparameters:
Target_Name = 'RHP'
target_series = data[[Target_Name]]
lags = Results.sort_values(by = 'Test_RMSE', ascending = True).iloc[0,0]
penalty = Results.sort_values(by = 'Test_RMSE', ascending = True).iloc[0,2]
mixture = Results.sort_values(by = 'Test_RMSE', ascending = True).iloc[0,3]
horizons = 1
# Evaluate Model:
train_RMSE, test_RMSE, train_pred, test_pred, lambda_1, lambda_2 = MODEL(data, target_name = Target_Name, lags = lags, penalty = penalty, mixture = mixture, step_size = horizons)

The third block presents and graphs the stored output from the MODEL function. The MODEL above is fit to housing price data in order to forecast real housing price growth rates at the U.S. national level.

In [None]:
# Evaluate Model: Growth Rates
print('-----------------------------')
print('National Housing Price Series')
print('-----------------------------')
print('Data Type: Growth Rates')
print('Model Type: FAVAR-Type Elastic Net Regression')
print('Alpha (Strength) Hyperparameter: ', penalty)
print('L1 Ratio (Mixture) Hyperparameter: ', mixture)
print('Lambda 1 (L1) Hyperparameter: ', lambda_1)
print('Lambda 2 (L2) Hyperparameter: ', lambda_2)
print('Train RMSE: %.3f' % (train_RMSE))
print('Test RMSE: %.3f' % (test_RMSE))
# Plot Forecast: Growth Rates
sns.set_theme(style = 'whitegrid')
pyplot.figure(figsize = (12,6))
pyplot.plot(target_series, label = 'Observed')
pyplot.plot(train_pred, label = 'FAVAR_Elastic_Net: Train')
pyplot.plot(test_pred, label = 'FAVAR_Elastic_Net: Test')
pyplot.xlabel('Date')
pyplot.ylabel('Growth Rate')
pyplot.title('Real Housing Price Series (National)')
pyplot.legend()
pyplot.show()

The fourth block of code is used to analyze the forecast errors for stationarity. The forecast errors are computed, plotted, and distributed. Lastly, the autocorrelation function (ACF) is plotted and the Augmented Dickey-Fuller (ADF) unit root test is carried out.

In [None]:
# Load Library:
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf
# Compute Model Residuals:
Error = pd.concat([target_series,train_pred], axis = 1)
Error = Error.dropna()
Error['Resids'] = Error.iloc[:,0] - Error.iloc[:,1]
# Plot Residuals:
sns.set_theme(style = 'whitegrid')
pyplot.figure(figsize = (16,4))
pyplot.subplot(1,2,1)
pyplot.plot(Error['Resids'])
pyplot.xlabel('Date')
pyplot.title('Residual Series')
pyplot.subplot(1,2,2)
pyplot.hist(Error['Resids'], bins = 20)
pyplot.title('Residual Distribution')
pyplot.tight_layout()
pyplot.show()
# Plot Autocorelation Function (ACF):
sns.set_theme(style = 'whitegrid')
fig, ax = pyplot.subplots(figsize=(8,4))
plot_acf(Error['Resids'], title = 'Residual ACF', lags = 36, ax = ax)
pyplot.show()
# ADF Test: Non-Stationary v. Stationary
ADF_Test = adfuller(Error['Resids'])
print('----------------------')
print('  ADF Unit-Root Test  ')
print('----------------------')
print('Test Statistic: %.3f' % (ADF_Test[0]))
print('P-Value: %.3f' % (ADF_Test[1]))
print('Critical Values:')
for key, value in ADF_Test[4].items():
    print('%s: %.3f' % (key, value))

The last block of code loads in the previous .csv files "National_Train_Growth_One" and "National_Test_Growth_One" that contain the stored forecasted values. The storage files are then augmented to include the predicted values from the current algorithm in order to estimate the forecast combinations, produce the final "top performing" model plots, and carry out the final comparison tests for predictive accuracy.

In [None]:
# Load Forecast Tables: 
train_forecasts = read_csv('National_Train_Growth_One.csv', header = 0, index_col = 0, parse_dates = True)
train_forecasts.index = pd.DatetimeIndex(train_forecasts.index.values, freq = "MS")
test_forecasts = read_csv('National_Test_Growth_One.csv', header = 0, index_col = 0, parse_dates = True)
test_forecasts.index = pd.DatetimeIndex(test_forecasts.index.values, freq = "MS")
# Add New Forecast Model:
train_forecasts['FAVAR_Elastic_Net'] = train_pred
test_forecasts['FAVAR_Elastic_Net'] = test_pred
# Save Forecast:
pd.DataFrame(train_forecasts).to_csv('National_Train_Growth_One.csv')
pd.DataFrame(test_forecasts).to_csv('National_Test_Growth_One.csv')