# Equity Premium Prediction Analysis

This notebook follows the (Rapach 2010) [^1] to implement the prediction performance analysis.

[^1]: Rapach, D. E., Strauss, J. K., & Zhou, G. (2010). Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy. The Review of Financial Studies, 23(2), 821â€“862. https://doi.org/10.1093/rfs/hhp063


In [175]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from datetime import datetime
import matplotlib.pyplot as plt
import sys
sys.path.append('../module')

from data_handler import get_monthly_date_format
from data_handler import get_econ_predictors
from IO_handler import post_dataframe_to_latex_table

In (Rapach 2010), they adopt three measurements to evaluate the performance of the prediction of equity premium.

1. [$R^2_{OS}$: out of sample $R^2$](##out-of-sample-$r^2$)
2. [MSPE - adjusted statistic: The significance of the $R^2_{OS}$](##mspe-adjusted-test-(mean-squared-prediction-error))
3. [$\Delta$: The utility gain](##the-utility-gain)

In [3]:
prediction_df = pd.read_csv('../../data/linear_prediction.csv', index_col=0, parse_dates=True, date_parser=get_monthly_date_format)
equity_premium = prediction_df.pop('Equity Premium')
historical_average = prediction_df.pop('Historical Average')

## out of sample $R^2$

$$
\begin{equation}
R_{O S}^2=1-\frac{\sum_{k=q_0+1}^q\left(r_{m+k}-\hat{r}_{m+k}\right)^2}{\sum_{k=q_0+1}^q\left(r_{m+k}-\bar{r}_{m+k}\right)^2}
\end{equation}
$$

Where $m$ is the size of in sample data and $q_0$ is the origin of the hold out period.

In [167]:
def get_oos_r_square(y_hat: np.ndarray, y: np.ndarray, y_bar: np.ndarray) -> float:
    """
    This function calculates the out-of-sample R square for a prediction.

    Parameters
    ----------
    y_hat : np.ndarray
        Prediction values.
    y : np.ndarray
        True values.
    y_bar : np.ndarray
        Historical average.

    Returns
    -------
    float
        Out-of-sample R square.
    """
    ss_res = np.sum((y - y_hat) ** 2)
    ss_tot = np.sum((y - y_bar) ** 2)
    R_2 = 1 - ss_res / ss_tot
    R_2_percentage = R_2 * 100

    return R_2_percentage

In [168]:
R_2_OOS = prediction_df.apply(lambda x: get_oos_r_square(y=equity_premium, y_hat=x, y_bar=historical_average))
R_2_OOS

Dividend Price Ratio    -0.182131
Dividend Yield          -0.171221
Earnings Price Ratio    -0.174394
Earnings Payout Ratio   -0.467257
Stock Variance          -1.493501
Book To Market          -1.084464
Net Equity Expansion    -0.359853
Treasury Bill           -0.219023
Long Term Yield         -0.613617
Long Term Return         0.186147
Term Spread             -0.522086
Default Yield Spread    -0.539182
Default Return Spread   -0.543614
Inflation                1.343229
Mean                     1.148581
Median                   0.804252
Trimmed mean             1.080225
DMSPE theta 1            1.186768
DMSPE theta 0.9          1.718607
dtype: float64

## MSPE adjusted test (Mean Squared Prediction Error)

We test whether a prediction method is significantly different from the historical average. We follow the statistical test by Clark and West (2007). Rapach (2010) also uses this statistical test method.

The null hypothesis is H0: $R_{OS}^2 < 0$

The test statistics is:
\begin{equation}
f_{t+1}=\left(r_{t+1}-\bar{r}_{t+1}\right)^2-\left[\left(r_{t+1}-\hat{r}_{t+1}\right)^2-\left(\bar{r}_{t+1}-\hat{r}_{t+1}\right)^2\right]
\end{equation}

We regress this statistics against a constant and get the one-side p-value.

In [34]:
def get_p_value_of_MSPE_adjusted_test(y:np.ndarray, y_bar:np.ndarray, y_hat:np.ndarray) -> float:
    """
    
    Parameters
    ----------
    y : np.ndarray (n_samples, 1)
    y_bar : np.ndarray (n_samples, 1)
    y_hat : np.ndarray (n_samples, 1)

    Returns
    -------
    p_value_of_MSPE_adjusted : float
    """
    F = (y - y_bar) ** 2 - ((y - y_hat) ** 2 - (y_bar - y_hat) ** 2)
    dummy = np.ones_like(F)
    lm_result = sm.OLS(F, dummy).fit()
    p_value = lm_result.pvalues.values[0]

    return p_value


In [35]:
get_p_value_of_MSPE_adjusted_test(y = equity_premium,
y_bar = historical_average,
y_hat = prediction_df['Dividend Price Ratio'])

0.36931093740509213

In [36]:
p_value = prediction_df.apply(lambda x: get_p_value_of_MSPE_adjusted_test(y=equity_premium, y_hat=x, y_bar=historical_average))
p_value

Dividend Price Ratio     0.369311
Dividend Yield           0.355614
Earnings Price Ratio     0.758137
Earnings Payout Ratio    0.603071
Stock Variance           0.093280
Book To Market           0.392455
Net Equity Expansion     0.412408
Treasury Bill            0.043529
Long Term Yield          0.164870
Long Term Return         0.041693
Term Spread              0.026674
Default Yield Spread     0.887431
Default Return Spread    0.995998
Inflation                0.031556
Mean                     0.003736
Median                   0.002932
Trimmed mean             0.003053
DMSPE theta 1            0.003389
DMSPE theta 0.9          0.000184
dtype: float64

In [162]:
def get_significance_of_MSPE_adjusted_test(y:np.ndarray, y_bar:np.ndarray, y_hat:np.ndarray) -> str:
    """
    
    Parameters
    ----------
    y : np.ndarray (n_samples, 1)
    y_bar : np.ndarray (n_samples, 1)
    y_hat : np.ndarray (n_samples, 1)

    Returns
    -------
    significance_of_MSPE_adjusted : str
    """

    p_value = get_p_value_of_MSPE_adjusted_test(y=y, y_hat=y_hat, y_bar=y_bar)
    p_value = round(p_value, ndigits=3)
    if p_value >= 0.1:
        significance = str(p_value) + ' '
    elif p_value > 0.05:
        significance = str(p_value) +' *'
    elif p_value > 0.01:
        significance = str(p_value) +' **'
    elif p_value <= 0.01:
        significance = str(p_value) +' ***'

    return significance

In [164]:
significance = prediction_df.apply(lambda x: get_significance_of_MSPE_adjusted_test(y=equity_premium, y_hat=x, y_bar=historical_average))
significance

Dividend Price Ratio        0.369 
Dividend Yield              0.356 
Earnings Price Ratio        0.758 
Earnings Payout Ratio       0.603 
Stock Variance             0.093 *
Book To Market              0.392 
Net Equity Expansion        0.412 
Treasury Bill             0.044 **
Long Term Yield             0.165 
Long Term Return          0.042 **
Term Spread               0.027 **
Default Yield Spread        0.887 
Default Return Spread       0.996 
Inflation                 0.032 **
Mean                     0.004 ***
Median                   0.003 ***
Trimmed mean             0.003 ***
DMSPE theta 1            0.003 ***
DMSPE theta 0.9            0.0 ***
dtype: object

## The Utility Gain

We assume a mean-variance investor. He balance his portfolio between stock and risk-free bill monthly. The portfolio weights are decided by the prediction of the equity premium. The portfolio weights are given as

\begin{equation}
w_{j, t}=\left(\frac{1}{\gamma}\right)\left(\frac{\hat{r}_{t+1}}{\hat{\sigma}_{t+1}^2}\right)
\end{equation}

This weight is based on the prediction of the stock return and the variance of stock. We use ten years of rolling window for the estimation. This is in line with (Rapach 2010) and Campbell and Thompson (2008). The investor will gain an average utility over out-of-sample period as

\begin{equation}
\hat{v}_0=\hat{\mu}_0-\left(\frac{1}{2}\right) \gamma \hat{\sigma}_0^2
\end{equation}

To get the monthly portfolio weight we set $\gamma = 3$

In [147]:
def get_utility_gain_from_prediction(START_DATE: str,
                                     END_DATE: str,
                                     prediction: pd.DataFrame,
                                     historical_average: pd.DataFrame,
                                     rolling_window_size: int = 10, # number in year
                                    data_frequency: int = 12, # number of observations per year
                                    gamma: int = 3) -> float:
    """
    Get utility gain from prediction.

    Parameters
    ----------
    START_DATE : str
        Start date of the utility gain curve.
        Format: YYYY-MM
    END_DATE : str
        End date of the utility gain curve.
        Format: YYYY-MM
    rolling_window_size : int, optional
        Rolling window size.
        Default: 10
    data_frequency : int, optional
        Data frequency.
        Default: 12
    gamma : int, optional
        Gamma.
        Default: 3

    Returns
    -------
    utility_gain : float
        Utility gain.
    """

    START_DATE = datetime.strptime(START_DATE, '%Y-%m')
    START_DATE = str(START_DATE.year - rolling_window_size) + '-' + str(START_DATE.month)
    econ_predictors = get_econ_predictors(START_DATE=START_DATE, END_DATE=END_DATE)
    risk_free_bond = econ_predictors['Treasury Bill'] / 100
    stock_return = econ_predictors['Equity Premium']
    portfolio_df = pd.concat([stock_return, risk_free_bond], axis=1)

    sample_varince = portfolio_df.iloc[:, 0].rolling(rolling_window_size * data_frequency - 1).var().dropna()
    varince_estimation = sample_varince.shift(1).dropna()

    stock_weight_0 = (1 / gamma) * (historical_average / varince_estimation)
    stock_weight_0 = stock_weight_0.clip(0, 1.5)
    portfolio_weight_0 = pd.concat([stock_weight_0, 1 - stock_weight_0], axis = 1)
    w_0 = portfolio_weight_0.values.reshape(-1, 1, 2)

    stock_weight_1 = (1 / gamma) * (prediction / varince_estimation)
    stock_weight_1 = stock_weight_1.clip(0, 1.5)
    portfolio_weight_1 = pd.concat([stock_weight_1, 1 - stock_weight_1], axis = 1)
    w_1 = portfolio_weight_1.values.reshape(-1, 1, 2)

    return_df = portfolio_df.loc[portfolio_weight_0.index] # need to change
    returns = return_df.values.reshape(-1, 2, 1)

    portfolio_return_0 = (w_0 @ returns).flatten()
    portfolio_return_1 = (w_1 @ returns).flatten()

    mu_0 = np.mean(portfolio_return_0)
    sigma_0 = np.var(portfolio_return_0)
    uitility_0 = mu_0 - 0.5 * gamma * sigma_0

    mu_1 = np.mean(portfolio_return_1)
    sigma_1 = np.var(portfolio_return_1)
    uitility_1 = mu_1 - 0.5 * gamma * sigma_1

    utility_gain = uitility_1 - uitility_0
    utility_gain_percentage = utility_gain * 100

    return utility_gain_percentage

In [148]:
utility_gain = prediction_df.apply(lambda x: get_utility_gain_from_prediction(START_DATE='1965-01', 
                                                                            END_DATE='2005-04',
                                                                            historical_average=historical_average, 
                                                                            prediction=x))
utility_gain

Dividend Price Ratio     0.089991
Dividend Yield           0.130309
Earnings Price Ratio    -0.037092
Earnings Payout Ratio    0.076006
Stock Variance          -0.043466
Book To Market          -0.151456
Net Equity Expansion     0.047219
Treasury Bill            0.243582
Long Term Yield          0.236009
Long Term Return         0.160331
Term Spread              0.291191
Default Yield Spread    -0.070497
Default Return Spread   -0.018239
Inflation                0.280685
Mean                     0.281942
Median                   0.194431
Trimmed mean             0.269735
DMSPE theta 1            0.288201
DMSPE theta 0.9          0.361056
dtype: float64

In [174]:
evaluation_matric_df = pd.concat([R_2_OOS, significance, utility_gain], axis=1)
evaluation_matric_df.columns = ['R2', 'significance', 'Utility Gain']
evaluation_matric_df

Unnamed: 0,R2,significance,Utility Gain
Dividend Price Ratio,-0.182131,0.369,0.089991
Dividend Yield,-0.171221,0.356,0.130309
Earnings Price Ratio,-0.174394,0.758,-0.037092
Earnings Payout Ratio,-0.467257,0.603,0.076006
Stock Variance,-1.493501,0.093 *,-0.043466
Book To Market,-1.084464,0.392,-0.151456
Net Equity Expansion,-0.359853,0.412,0.047219
Treasury Bill,-0.219023,0.044 **,0.243582
Long Term Yield,-0.613617,0.165,0.236009
Long Term Return,0.186147,0.042 **,0.160331


In [176]:
post_dataframe_to_latex_table(evaluation_matric_df, table_name='equity premium out-of-sample forecasting')

Save table to:../../table/
