# (21) Multi-Equation Median Forecast Combination (ME_Median)

A Multi-Equation Median Forecast Combination (ME_Median) takes the best predicted values from each of the previous multi-equation (multivariate) algorithms and returns the median predicted value at each time period. Forecast combining leverages the variability among predicted values in order to produce a more accurate prediction.

In the first block of code I define two functions. The Target_Feature_Split function takes in five arguments including the training set predicted values (train_data), the test set predicted values (test_vaues), the name of the target series to be forecasted (target_name), the names of the single equation forecasting algorithms (single_equation_names), and the names of the multi-equation forecasting algorithms (multi_equation_names). The Target_Feature_Split function returns six attributes including the observed series to be forecasted over the training sample (target_train), the observed series to be forecasted over the test sample (target_test), the single equation predicted values over the training (single_equation_train) and test (single_equation_test) samples, and the multi-equation predicted values over the training (multiple_equation_train) and test (multiple_equation_test) samples.

The Forecast_Combination function takes in four arguments; the observed series to be forecasted over the training sample (target_train), the observed series to be forecasted over the test sample (target_test), the forecasted values to be combined on the training set (forecasted_train), and the forecasted values to be combined on the test set (forecasted_test). The Forecast_Combination function returns the combination predictions over the training (forecast_combination_train) and testing (forecast_combination_test) sets, the training root mean squared error value (train_RMSE), and the test set root mean squared error value (test_RMSE).

In [None]:
# Load Library:
from pandas import read_csv
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Function to Split Observed Values from S.E. & M.E. Forecasted Values:
def Target_Feature_Split(train_data, test_data, target_name, single_equation_names, multiple_equation_names):
    # Drop Observation with Missing Values: Balances the DataFrames
    train_data = train_data.dropna()
    test_data = test_data.dropna()
    # Return Target Series:
    target_train = train_data[[target_name]]
    target_test = test_data[[target_name]]
    # Return Single Equation Forecasts:
    single_equation_train = train_data[single_equation_names]
    single_equation_test = test_data[single_equation_names]
    # Return Multiple Equation Forecasts:
    multiple_equation_train = train_data[multiple_equation_names]
    multiple_equation_test = test_data[multiple_equation_names]
    return target_train, target_test, single_equation_train, single_equation_test, multiple_equation_train, multiple_equation_test 
# Function to Derive Forecast Combination:
def Forecast_Combination(target_train, target_test, forecasted_train, forecasted_test):
    # Compute Forecast Combination: 
    forecast_combination_train = forecasted_train.median(axis = 1)
    forecast_combination_test = forecasted_test.median(axis = 1)
    # Evaluate Model:
    train_RMSE = np.sqrt(mean_squared_error(target_train.values, forecast_combination_train.values))
    test_RMSE = np.sqrt(mean_squared_error(target_test.values, forecast_combination_test.values))
    return forecast_combination_train, forecast_combination_test, train_RMSE, test_RMSE
# Set Seed:
np.random.seed(12345)
# Load in Data:
train_data = read_csv('National_Train_Growth_One.csv', header = 0, index_col = 0, parse_dates = True)
train_data.index = pd.DatetimeIndex(train_data.index.values, freq = "MS")
test_data = read_csv('National_Test_Growth_One.csv', header = 0, index_col = 0, parse_dates = True)                
test_data.index = pd.DatetimeIndex(test_data.index.values, freq = "MS")
target_data = read_csv('Housing_Prices.csv', header = 0, index_col = 0, parse_dates = True)
target_data.index = pd.DatetimeIndex(target_data.index.values, freq = "MS")
# Setting Names:
target_name = 'RHP'
single_equation_names = ['RW','RW_Drift','AR','ARMA','AR_Ridge','AR_Lasso','AR_Elastic_Net']
multiple_equation_names = ['VAR','FAVAR','BFAVAR','VAR_Ridge','VAR_Lasso','VAR_Elastic_Net','FAVAR_Ridge','FAVAR_Lasso','FAVAR_Elastic_Net','BFAVAR_Ridge','BFAVAR_Lasso','BFAVAR_Elastic_Net']
# Function to Split Data:
target_train, target_test, single_equation_train, single_equation_test, multiple_equation_train, multiple_equation_test = Target_Feature_Split(train_data, test_data, target_name, single_equation_names, multiple_equation_names)
# Evaluating Model: 
forecast_combination_train, forecast_combination_test, train_RMSE, test_RMSE = Forecast_Combination(target_train, target_test, multiple_equation_train, multiple_equation_test) 
# Setting Target Series:
target_series = target_data[[target_name]]

The second block of code presents and graphs the results from the Forecast_Combination function.

In [None]:
# Evaluate Model: Growth Rates
print('-----------------------------')
print('National Housing Price Series')
print('-----------------------------')
print('Data Type: Growth Rates')
print('Model Type: Multiple Equation Median Forecast Combination')
print('Train RMSE: %.3f' % (train_RMSE))
print('Test RMSE: %.3f' % (test_RMSE))
# Plot Forecast: Growth Rates
sns.set_theme(style = 'whitegrid')
pyplot.figure(figsize = (12,6))
pyplot.plot(target_series, label = 'Observed')
pyplot.plot(forecast_combination_train, label = 'ME_Median: Train')
pyplot.plot(forecast_combination_test, label = 'ME_Median: Test')
pyplot.xlabel('Date')
pyplot.ylabel('Growth Rate')
pyplot.title('Real Housing Price Series (National)')
pyplot.legend()
pyplot.show()

The third block of code is used to analyze the forecast errors for stationarity. The forecast errors are computed, plotted, and distributed. Lastly, the autocorrelation function (ACF) is plotted and the Augmented Dickey-Fuller (ADF) unit root test is carried out.

In [None]:
# Load Library:
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf
# Compute Model Residuals:
Error = pd.concat([target_train,forecast_combination_train], axis = 1)
Error = Error.dropna()
Error['Resids'] = Error.iloc[:,0] - Error.iloc[:,1]
# Plot Residuals:
sns.set_theme(style = 'whitegrid')
pyplot.figure(figsize = (16,4))
pyplot.subplot(1,2,1)
pyplot.plot(Error['Resids'])
pyplot.xlabel('Date')
pyplot.title('Residual Series')
pyplot.subplot(1,2,2)
pyplot.hist(Error['Resids'], bins = 20)
pyplot.title('Residual Distribution')
pyplot.tight_layout()
pyplot.show()
# Plot Autocorelation Function (ACF):
sns.set_theme(style = 'whitegrid')
fig, ax = pyplot.subplots(figsize=(8,4))
plot_acf(Error['Resids'], title = 'Residual ACF', lags = 36, ax = ax)
pyplot.show()
# ADF Test: Non-Stationary v. Stationary
ADF_Test = adfuller(Error['Resids'])
print('----------------------')
print('  ADF Unit-Root Test  ')
print('----------------------')
print('Test Statistic: %.3f' % (ADF_Test[0]))
print('P-Value: %.3f' % (ADF_Test[1]))
print('Critical Values:')
for key, value in ADF_Test[4].items():
    print('%s: %.3f' % (key, value))

The last block of code loads in the previous .csv files "National_Train_Growth_One" and "National_Test_Growth_One" that contain the stored forecasted values. The storage files are then augmented to include the predicted values from the current algorithm in order to produce the final "top performing" model plots and carry out the final comparison tests for predictive accuracy.

In [None]:
# Add New Forecast Model to Forecast DataFrames:
train_data['ME_Median'] = forecast_combination_train
test_data['ME_Median'] = forecast_combination_test
# Save Forecast DataFrames with New Model:
pd.DataFrame(train_data).to_csv('National_Train_Growth_One.csv')
pd.DataFrame(test_data).to_csv('National_Test_Growth_One.csv')