# TA Review Session 7

## FINM 36700 - 2023

### UChicago Financial Mathematics
* Mani Sawhney
* msawhn2@uchicago.edu

### Case: Grantham, Mayo, and Van Otterloo, 2012: Estimating the Equity Risk Premium) [9-211-051]. Case



### Notation
(Hidden LaTeX commands)







$$\newcommand{\mux}{\tilde{\boldsymbol{\mu}}}$$
$$\newcommand{\wtan}{\boldsymbol{\text{w}}^{\text{tan}}}$$
$$\newcommand{\wtarg}{\boldsymbol{\text{w}}^{\text{port}}}$$
$$\newcommand{\mutarg}{\tilde{\boldsymbol{\mu}}^{\text{port}}}$$
$$\newcommand{\wEW}{\boldsymbol{\text{w}}^{\text{EW}}}$$
$$\newcommand{\wRP}{\boldsymbol{\text{w}}^{\text{RP}}}$$
$$\newcommand{\wREG}{\boldsymbol{\text{w}}^{\text{REG}}}$$

## Agenda
 - Lecture Review : GMO Case
 - HW 7 Highlights

## GMO Case

### Overview:

- GMO functions as sophisticated asset allocators, employing a meticulous top-down modeling approach for comprehensive market forecasting.
- The organization navigates across diverse asset classes, strategically adapting to dynamic market conditions.

### Forecasting Focus:

- A primary emphasis is placed on macroeconomic models, facilitating the identification of attractive asset classes within a strategic 7-10 year timeframe.
- This forward-looking approach distinguishes GMO's investment strategy, aligning it with the extended investment horizon.


### Long-Term Value Investing Philosophy:

- GMO stands out with its unwavering commitment to long-term value investing principles.
- Views markets as a "voting machine" in the short run, influenced by sentiments, trends, and behavioral factors.
- In the long run, perceives markets as a "waiting machine," where fundamental values prevail, allowing for strategic decision-making.

### Specialization in Fundamental Values:

- GMO's specialization lies in trading based on fundamental values over the long term.
- This strategic positioning enables GMO to remain resilient against short-term market fluctuations, unaffected by transient news or Federal Reserve announcements.

### Pros and Cons:

Pros:
- a. Scientific Approach:
    GMO's methodology is rooted in a scientific analysis framework, incorporating data-driven insights and a meticulous study of price formation related to cash flows.

- b. Value Investing Premium:
    The adoption of a value investing approach provides GMO with a premium, leveraging fundamental principles for sustainable investment strategies.

Cons:
- a. Extreme Long-Term Challenges:
    The potential challenge of managing extreme long-term investments is acknowledged, requiring robust strategies to navigate extended investment horizons.
    
- b. Limited Tech Stock Exposure Drawbacks:
    Historical instances, such as the late 1990s tech boom, highlight challenges stemming from less investment in tech stocks, impacting short-term returns. However, this contrasts with the success of value plays during the subsequent tech bubble burst.



In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats
from scipy.stats import kurtosis, skew
from scipy.stats import norm
import seaborn as sns
import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS


from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn import tree
from sklearn.neural_network import MLPRegressor

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline


import matplotlib.pyplot as plt
plt.rcParams['figure.figsize']=[15, 6]
import matplotlib.cm as cm

# Helper Functions

### Performance Summary Statistics

In [2]:
def performance_summary(return_data):
    """ 
        Returns the Performance Stats for given set of returns
        Inputs: 
            return_data - DataFrame with Date index and Monthly Returns for different assets/strategies.
        Output:
            summary_stats - DataFrame with annualized mean return, vol, sharpe ratio. Skewness, Excess Kurtosis, Var (0.5) and
                            CVaR (0.5) and drawdown based on monthly returns. 
    """
    summary_stats = return_data.mean().to_frame('Mean').apply(lambda x: x*12)
    summary_stats['Volatility'] = return_data.std().apply(lambda x: x*np.sqrt(12))
    summary_stats['Sharpe Ratio'] = summary_stats['Mean']/summary_stats['Volatility']
    
    summary_stats['Skewness'] = return_data.skew()
    summary_stats['Excess Kurtosis'] = return_data.kurtosis()
    summary_stats['VaR (0.05)'] = return_data.quantile(.05, axis = 0)
    summary_stats['CVaR (0.05)'] = return_data[return_data <= return_data.quantile(.05, axis = 0)].mean()
    summary_stats['Min'] = return_data.min()
    summary_stats['Max'] = return_data.max()
    
    wealth_index = 1000*(1+return_data).cumprod()
    previous_peaks = wealth_index.cummax()
    drawdowns = (wealth_index - previous_peaks)/previous_peaks

    summary_stats['Max Drawdown'] = drawdowns.min()
    summary_stats['Peak'] = [previous_peaks[col][:drawdowns[col].idxmin()].idxmax() for col in previous_peaks.columns]
    summary_stats['Bottom'] = drawdowns.idxmin()
    
    recovery_date = []
    for col in wealth_index.columns:
        prev_max = previous_peaks[col][:drawdowns[col].idxmin()].max()
        recovery_wealth = pd.DataFrame([wealth_index[col][drawdowns[col].idxmin():]]).T
        recovery_date.append(recovery_wealth[recovery_wealth[col] >= prev_max].index.min())
    summary_stats['Recovery'] = recovery_date
    
    return summary_stats

### Time-series Regression

In [3]:
def time_series_regression(portfolio, factors, multiple_factors = False, resid = False):
    
    ff_report = pd.DataFrame(index=portfolio.columns)
    bm_residuals = pd.DataFrame(columns=portfolio.columns)

    rhs = sm.add_constant(factors)

    for portf in portfolio.columns:
        lhs = portfolio[portf]
        res = sm.OLS(lhs, rhs, missing='drop').fit()
        ff_report.loc[portf, 'alpha_hat'] = res.params['const'] * 12
        if multiple_factors:
            ff_report.loc[portf, factors.columns[0] + ' beta'] = res.params[1]
            ff_report.loc[portf, factors.columns[1]+ ' beta'] = res.params[2] 
            ff_report.loc[portf, factors.columns[2]+ ' beta'] = res.params[3]
        else:
            ff_report.loc[portf, factors.name + ' beta'] = res.params[1]

            
        ff_report.loc[portf, 'info_ratio'] = np.sqrt(12) * res.params['const'] / res.resid.std()
        ff_report.loc[portf, 'treynor_ratio'] = 12 * portfolio[portf].mean() / res.params[1]
        ff_report.loc[portf, 'R-squared'] = res.rsquared
        ff_report.loc[portf, 'Tracking Error'] = (res.resid.std()*np.sqrt(12))

        if resid:
            bm_residuals[portf] = res.resid
            
            
        
    if resid:
        return bm_residuals
        
    return ff_report

### Tangency Weights

In [4]:
def tangency_weights(returns, cov_mat = 1):
    
    if cov_mat ==1:
        cov_inv = np.linalg.inv((returns.cov()*12))
    else:
        cov = returns.cov()
        covmat_diag = np.diag(np.diag((cov)))
        covmat = cov_mat * cov + (1-cov_mat) * covmat_diag
        cov_inv = np.linalg.inv((covmat*12))  
        
    ones = np.ones(returns.columns[1:].shape) 
    mu = returns.mean()*12
    scaling = 1/(np.transpose(ones) @ cov_inv @ mu)
    tangent_return = scaling*(cov_inv @ mu) 
    tangency_wts = pd.DataFrame(index = returns.columns[1:], data = tangent_return, columns = ['Tangent Weights'] )
        
    return tangency_wts

Out-of-sample R-squared

In [5]:
def OOS_r2(df, factors, start):
    y = df['SPY']
    X = sm.add_constant(factors)

    forecast_err, null_err = [], []

    for i,j in enumerate(df.index):
        if i >= start:
            currX = X.iloc[:i]
            currY = y.iloc[:i]
            reg = sm.OLS(currY, currX, missing = 'drop').fit()
            null_forecast = currY.mean()
            reg_predict = reg.predict(X.iloc[[i]])
            actual = y.iloc[[i]]
            forecast_err.append(reg_predict - actual)
            null_err.append(null_forecast - actual)
            
    RSS = (np.array(forecast_err)**2).sum()
    TSS = (np.array(null_err)**2).sum()
    
    return ((1 - RSS/TSS),reg)

OOS strategy

In [6]:
def OOS_strat(df, factors, start, weight):
    returns = []
    y = df['SPY']
    X = sm.add_constant(factors)

    for i,j in enumerate(df.index):
        if i >= start:
            currX = X.iloc[:i]
            currY = y.iloc[:i]
            reg = sm.OLS(currY, currX, missing = 'drop').fit()
            pred = reg.predict(X.iloc[[i]])
            w = pred * weight
            returns.append((df.iloc[i]['SPY'] * w)[0])

    df_strat = pd.DataFrame(data = returns, index = df.iloc[-(len(returns)):].index, columns = ['Strat Returns'])
    return df_strat

# Reading Data

In [7]:
gmo_total_ret = pd.read_excel('gmo_analysis_data.xlsx',sheet_name = 'returns (total)', index_col = 0)
gmo_total_ret.index.name = 'Date'

In [8]:
path = r'gmo_analysis_data.xlsx'
rf = pd.read_excel(path,sheet_name = 'risk-free rate', index_col = 0)
rf.index.name = 'Date'

In [9]:
path = r'gmo_analysis_data.xlsx'
gmo_signals = pd.read_excel(path,sheet_name = 'signals', index_col = 0)
gmo_signals.index.name = 'Date'

In [10]:
gmo_excess_ret = gmo_total_ret.copy()
for col in gmo_excess_ret.columns:
    gmo_excess_ret[col] = gmo_excess_ret[col] - rf['US3M']

gmo_excess_ret.tail()

Unnamed: 0_level_0,SPY,GMWAX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-06-30,0.060289,0.035234
2023-07-31,0.028108,0.019797
2023-08-31,-0.020885,-0.02565
2023-09-30,-0.052018,-0.020966
2023-10-31,-0.026367,-0.0343


## 2) Analyzing GMO

#### This section utilizes data in the file, `gmo_analysis_data.xlsx`.
#### Examine GMO's performance. Use the risk-free rate to convert the total returns to excess returns

### 2.1) Calculate the mean, volatility, and Sharpe ratio for GMWAX. Do this for three samples:

### • from inception through 2011
### • 2012-present
### • inception - present

In [11]:
sub_samples = {
              '1993-2011' : ['1993','2011'],
              '2012-2023' : ['2012','2023'],
              '1993-2023' : ['1993','2023'],
              }

gmo_sum = []
for k,v in sub_samples.items():
    sub_gmo = gmo_excess_ret.loc[sub_samples[k][0]:sub_samples[k][1],['GMWAX']].dropna()
    gmo_summary = performance_summary(sub_gmo)
    gmo_summary = gmo_summary
    gmo_summary.index = [k]
    gmo_sum.append(gmo_summary)

gmo_summary = pd.concat(gmo_sum)
gmo_summary.loc[:,['Mean','Volatility','Sharpe Ratio']]

Unnamed: 0,Mean,Volatility,Sharpe Ratio
1993-2011,0.015827,0.125011,0.126603
2012-2023,0.036436,0.094503,0.385556
1993-2023,0.024859,0.112537,0.220898


The mean increased and volatility decreased during 2012-2023, showing that GMO's forecasts and subsequently thier asset allocations for GMWAX worked during this period of turmoil as compared to the previous sub-period of 1993-2011.

### 2.2 GMO believes a risk premium is compensation for a security's tendency to lose money at "bad times". For all three samples, analyze extreme scenarios by looking at -
### • Min return
### • 5th percentile (VaR-5th)
### • Maximum  Drawdown

In [12]:
sub_samples = {
              '1993-2011' : ['1993','2011'],
              '2012-2023' : ['2012','2023'],
              '1993-2023' : ['1993','2023'],
              }

gmo_mdd = []
for k,v in sub_samples.items():
    sub_gmo = gmo_total_ret.loc[sub_samples[k][0]:sub_samples[k][1],['GMWAX']].dropna()
    gmo_drawdown = performance_summary(sub_gmo)
    gmo_drawdown = gmo_drawdown.loc[:,['Max Drawdown']]
    gmo_drawdown.index = [k]
    gmo_mdd.append(gmo_drawdown)

gmo_mdd = pd.concat(gmo_mdd)
gmo_mdd_summary = gmo_summary.loc[:,['Min','VaR (0.05)']].merge(gmo_mdd,how='inner',on=gmo_mdd.index).rename(columns={'key_0':'Sub-Sample'})
gmo_mdd_summary.index = gmo_mdd_summary['Sub-Sample']
gmo_mdd_summary = gmo_mdd_summary.drop(['Sub-Sample'],axis = 1)
gmo_mdd_summary

Unnamed: 0_level_0,Min,VaR (0.05),Max Drawdown
Sub-Sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1993-2011,-0.149179,-0.059806,-0.355219
2012-2023,-0.11865,-0.037826,-0.216773
1993-2023,-0.149179,-0.047061,-0.355219


GMWAX seems to have low tail-risk as depicted by the tail risk statistics above. 

### 2.3) For all three samples, regress excess returns of GMWAX on excess returns of SPY.

### sub1 - 1993-2011
### sub2 - 2012-2023
### sub3 - 1993-2023


In [13]:
sub_1 = time_series_regression(gmo_excess_ret.loc['1993':'2011',['GMWAX']], gmo_excess_ret.loc['1993':'2011','SPY'])
sub_2 = time_series_regression(gmo_excess_ret.loc['2012':'2023',['GMWAX']], gmo_excess_ret.loc['2012':'2023','SPY'])
sub_3 = time_series_regression(gmo_excess_ret.loc['1993':'2023',['GMWAX']], gmo_excess_ret.loc['1993':'2023','SPY'])

sub_1.index = ['GMWAX 1993-2011']
sub_2.index = ['GMWAX 2012-2023']
sub_3.index = ['GMWAX 1993-2023']


reg_sub = pd.concat([sub_1,sub_2,sub_3])



### 2.3.a) Report the estimated alpha, beta, and r-squared.

In [14]:
reg_sub.loc[:,['SPY beta','alpha_hat','R-squared']]

Unnamed: 0,SPY beta,alpha_hat,R-squared
GMWAX 1993-2011,0.539615,-0.005751,0.507129
GMWAX 2012-2023,0.573764,-0.032654,0.754377
GMWAX 1993-2023,0.550609,-0.016567,0.582145


### 2.3.b) Is GMWAX a low-beta strategy? Has that changed since the case?

GMWAX seems to have a relatively moderate beta with the market: ~50%-57%, It's market beta is not very low, we can consider it a low-beta strategy. The beta remains quite stable across both sub-samples

In [15]:
reg_sub.loc['GMWAX 2012-2023',['SPY beta','alpha_hat','R-squared']].to_frame().T

Unnamed: 0,SPY beta,alpha_hat,R-squared
GMWAX 2012-2023,0.573764,-0.032654,0.754377


Correlation between SPY and GMWAX

In [35]:
gmo_excess_ret['SPY'].corr(gmo_excess_ret['GMWAX'])

0.7629845877139098

### Performance Metrics and Strategy Evaluation:

- Multivariate Stats:
     Negative alpha in every subsample implies a scarcity of surplus returns, emphasizing the challenges in consistently outperforming the market.

- Correlation:
    A notable 70% correlation between SPY and GMO over the entire period underscores a strong interconnectedness between GMWAX and the market.


## 3 Forecast Regressions

#### This section utilizes data in the file,`gmo_analysis_data.xlsx`.

### 3.1) Consider the lagged regression, where the regressor, ($X$), is a period behind the target, ($r^{SPY}$).
\begin{align}
r^{SPY}_t = \alpha^{SPY,X}+(\beta^{SPY,X})'X_{t-1}+\epsilon^{SPY,X}_t
\end{align}
### Estimate (1) and report the $R^2$, as well as the OLS estimates for $\alpha$ and $\beta$. Do this for...
- $X$ as a single regressor, the dividend-price ratio.
- $X$ as a single regressor, the earnings-price ratio.
- $X$ as three regressors, the dividend-price ratio, the earnings-price ratio, and the 10-year yield.

### For each, report the r-squared.

The beauty of dynamic trading is moving around weights, we can generate less than perfect correlation to the one asset that I am trading. We are very correlated to the market.

In [21]:
SPY = gmo_total_ret.loc[:,['SPY']]
signal_1 = ['DP']
factor_1 = gmo_signals.loc[:,signal_1].shift(1).squeeze()
signal_reg_1 = time_series_regression(SPY, factor_1, multiple_factors=False, resid=False)
signal_reg_1.index = ['DP']
signal_reg_1


signal_2 = ['EP']
factor_2 = gmo_signals.loc[:,signal_2].shift(1).squeeze()
signal_reg_2 = time_series_regression(SPY, factor_2, multiple_factors=False, resid=False)
signal_reg_2.index = ['EP']
signal_reg_2


signal_3 = ['DP','EP','US10Y']
factor_3 = gmo_signals.loc[:,['DP','EP','US10Y']].shift(1)
signal_reg_3 = time_series_regression(SPY, factor_3, multiple_factors=True, resid=False)
signal_reg_3.index = ['DP,EP,US10Y']

display(signal_reg_1,signal_reg_2,signal_reg_3)

Unnamed: 0,alpha_hat,DP beta,info_ratio,treynor_ratio,R-squared,Tracking Error
DP,-0.113775,0.009516,-0.7659,10.851643,0.009359,0.148551


Unnamed: 0,alpha_hat,EP beta,info_ratio,treynor_ratio,R-squared,Tracking Error
EP,-0.073934,0.003252,-0.497533,31.751752,0.008692,0.148601


Unnamed: 0,alpha_hat,DP beta,EP beta,US10Y beta,info_ratio,treynor_ratio,R-squared,Tracking Error
"DP,EP,US10Y",-0.180763,0.008023,0.002694,-0.000982,-1.221168,12.870259,0.016364,0.148025


### 3.2) For each of the three regressions, let’s try to utilize the resulting forecast in a trading strategy.
- Build the forecasted SPY returns: $\hat{r}^{SPY}_{t+1}$. Note that this denotes the forecast made using $X_t$ to forecast the $(t+1)$ return.
- Set the scale of the investment in SPY equal to 100 times the forecasted value:
$
w_t = 100 \hat{r}^{SPY}_{t+1}
$
- We are not taking this scaling too seriously. We just want the  strategy  to  go  bigger  inperiods where the forecast is high and to withdraw in periods where the forecast is low, or even negative.
- Calcualte the return on this strategy:
$
r^X_{t+1} = w_tr^{SPY}_{t+1}
$

#### You should now have the trading strategy returns, $r^x$ for each of the forecasts. For each strategy, estimate:
- mean, volatility, Sharpe,
- max-drawdown
- market alpha
- market beta
- market Information

In [25]:
DP_return = (gmo_signals.loc[:,'DP'].shift(1).to_frame() * signal_reg_1['DP beta'])+signal_reg_1['alpha_hat']/12
DP_return = DP_return.rename(columns={'DP':'Forecasted Return'}) * 100
DP_forecast_return = pd.DataFrame(DP_return['Forecasted Return']*gmo_total_ret.loc[:,['SPY']]['SPY'], columns=DP_return.columns, index=DP_return.index)


EP_return = (gmo_signals.loc[:,'EP'].shift(1).to_frame() * signal_reg_2['EP beta'])+signal_reg_2['alpha_hat']/12
EP_return = EP_return.rename(columns={'EP':'Forecasted Return'}) * 100
EP_forecast_return = pd.DataFrame(EP_return['Forecasted Return']*gmo_total_ret.loc[:,['SPY']]['SPY'], columns=EP_return.columns, index=EP_return.index)


forecasted_rets = (np.array(gmo_signals.shift(1).loc[:,['DP','EP','US10Y']]) @ np.array(signal_reg_3.loc[:,['DP beta','EP beta','US10Y beta']].T))
multiple_factor_return = (pd.DataFrame(forecasted_rets,columns = ['Forecasted Return'],index= gmo_signals.index)) 
multiple_factor_return['Forecasted Return'] = (multiple_factor_return['Forecasted Return'] + float(signal_reg_3['alpha_hat']/12))*100
multiple_forecast_return = pd.DataFrame(multiple_factor_return['Forecasted Return'] *gmo_total_ret.loc[:,['SPY']]['SPY'], columns=multiple_factor_return.columns, index=multiple_factor_return.index)



In [26]:
strategy = {'DP': DP_forecast_return.dropna(),
          'EP': EP_forecast_return.dropna(),
          'DP-EP-US10Y': multiple_forecast_return.dropna()
         }
factor = gmo_excess_ret.loc[:,['SPY']]
total_strategy_summary = []

for key,value in strategy.items():
    strat = strategy[key]
    strat_summary = performance_summary(strat)
    strat_summary.index = [key]
    strat_summary['Negative Risk Premium Months'] = len(strat[strat['Forecasted Return'] - rf['US3M'] <0])
    strat_summary['Total Months'] = len(strat)
    ts = time_series_regression(strat, factor[strat.index[0]:].squeeze(), False)
    strat_summary['Market Beta'] = ts['SPY beta'].values
    strat_summary['Market Alpha'] = ts['alpha_hat'].values
    strat_summary['Market Information Ratio'] = ts['info_ratio'].values
    
    total_strategy_summary.append(strat_summary)
    
total_strategy_df = pd.concat(total_strategy_summary)
  
total_strategy_df.loc[:,['Mean','Volatility','Sharpe Ratio','Max Drawdown','Market Beta','Market Alpha','Market Information Ratio']]





Unnamed: 0,Mean,Volatility,Sharpe Ratio,Max Drawdown,Market Beta,Market Alpha,Market Information Ratio
DP,0.109536,0.148855,0.735857,-0.65696,0.861719,0.041107,0.549035
EP,0.108053,0.128903,0.83825,-0.385314,0.733538,0.049803,0.732554
DP-EP-US10Y,0.125098,0.145607,0.859145,-0.524621,0.77817,0.063304,0.721245


### 3.3) GMO believes a risk premium is compensation for a security's tendency to lose money at "bad times". Let's consider risk characteristics.

### 3.3.a) For both strategies, the market, and GMO, calculate the monthly VaR for $\pi=.05$. Just use the quantile of the historic data for this VaR calculation.

In [27]:
market_summary = performance_summary(gmo_excess_ret.loc[:,['SPY']])
gmo_summary = performance_summary(gmo_excess_ret.loc[:,['GMWAX']].dropna())
strat_var= pd.concat([total_strategy_df.loc[:,['VaR (0.05)']],market_summary.loc[:,['VaR (0.05)']],gmo_summary.loc[:,['VaR (0.05)']]])
strat_var

Unnamed: 0,VaR (0.05)
DP,-0.052332
EP,-0.053891
DP-EP-US10Y,-0.064085
SPY,-0.073525
GMWAX,-0.047061


### 3.3.b) The GMO case mentions that stocks under-performed short-term bonds from 2000-2011. Does the dynamic portfolio above under-perform the risk-free rate over this time?

All dynamic portfolios outperform the risk-free rate.

In [29]:
strats = {'DP': DP_forecast_return.dropna(),
          'EP': EP_forecast_return.dropna(),
          'DP-EP-US10Y': multiple_forecast_return.dropna()
         }
strat_summary_0011 =[]
for k,v in strats.items():
    strat = (strats[k]['2000':'2011']['Forecasted Return']).to_frame('Forecasted Returns')
    perf_summary = performance_summary(strat)
    perf_summary.index = [k]
    strat_summary_0011.append(perf_summary)
    

strat_summary_df_0011 = pd.concat(strat_summary_0011)
strat_summary_df_0011.loc[:,['Mean','Volatility','Sharpe Ratio','Max Drawdown']]

Unnamed: 0,Mean,Volatility,Sharpe Ratio,Max Drawdown
DP,0.039708,0.186013,0.21347,-0.65696
EP,0.037708,0.134765,0.279807,-0.385314
DP-EP-US10Y,0.061472,0.158856,0.386965,-0.524621


### 3.3.c) Based on the regression estimates, in how many periods do we estimate a negative risk premium?

In [30]:
neg_risk_premium = total_strategy_df.loc[:,['Negative Risk Premium Months','Total Months']]
neg_risk_premium['Negative Risk Premium Months (%)'] = neg_risk_premium['Negative Risk Premium Months'] *100/ neg_risk_premium['Total Months']
neg_risk_premium

Unnamed: 0,Negative Risk Premium Months,Total Months,Negative Risk Premium Months (%)
DP,139,368,37.771739
EP,139,368,37.771739
DP-EP-US10Y,138,368,37.5


### 3.3.c) Based on the regression estimates, in how many periods do we estimate a negative risk premium?


In [32]:
neg_risk_premium = total_strategy_df.loc[:,['Negative Risk Premium Months','Total Months']]
neg_risk_premium['Negative Risk Premium Months (%)'] = neg_risk_premium['Negative Risk Premium Months'] *100/ neg_risk_premium['Total Months']
neg_risk_premium

Unnamed: 0,Negative Risk Premium Months,Total Months,Negative Risk Premium Months (%)
DP,139,368,37.771739
EP,139,368,37.771739
DP-EP-US10Y,138,368,37.5


This section utilizes data in the file, `gmo_analysis_data.xlsx`.

Reconsider the problem above, of estimating (1) for $x$. The reported $R^2$ was the in-sample $R^2$ it examined how well the forecasts fit in the sample from which the parameters were estimated. <br><br>

**In particular, focus on the case of using both dividend-price and earnings-price as signals.**

Let's consider the out-of-sample r-squared. To do so, we need the following:
- Start at $t=60$.
- Estmiate (1) only using data through time $t$.
- Use the estimated parameters of (1), along with $x_{t+1}$ to calculate the out-of-sample forecast for the following period, $t+1$.
\begin{align}
\hat{r}^{SPY}_{t+1} = \hat{a}^{SPY,x}_t+(\beta^{SPY,x})'x_t 
\end{align}
- Calculate the $t+1$ forecast error,
\begin{align}
  e^x_{t+1} = r^{SPY}_{t+1} - \hat{r}^{SPY}_{t+1}
\end{align}
- Move to $t=61$, and loop through the rest of the sample.

You now have the time-series of out-of-sample prediction errors, $e^x$.

Calculate the time-series of out-of-sample prediction errors $e^0$, which are based on the null forecast:
\begin{align*}
\bar{r}^{SPY}_{t+1} &= \frac{1}{t}\sum^{t}_{i=1}r^{SPY}_i \\
e^0_{t+1} &= r^{SPY}_{t+1} - \bar{r}^{SPY}_{t+1}
\end{align*}


### 4.1) Report the out-of-sample $R^2$:
\begin{align}
 R^2_{OOS} \equiv 1-\frac{\sum^T_{i=61}(e^x_i)^2}{\sum^T_{i=61}(e^0_i)^2} 
\end{align}
### note that unlike an in-sample r-squared, the out-of-sample r-squared can be anywhere between $(-\infty,1]$.

In [36]:
def OOS_r2(df, factors, start):
    y = df['SPY']
    X = sm.add_constant(factors)

    forecast_err, null_err = [], []

    for i,j in enumerate(df.index):
        if i >= start:
            currX = X.iloc[:i]
            currY = y.iloc[:i]
            reg = sm.OLS(currY, currX, missing = 'drop').fit()
            null_forecast = currY.mean()
            reg_predict = reg.predict(X.iloc[[i]])
            actual = y.iloc[[i]]
            forecast_err.append(reg_predict - actual)
            null_err.append(null_forecast - actual)
            
    RSS = (np.array(forecast_err)**2).sum()
    TSS = (np.array(null_err)**2).sum()
    
    return ((1 - RSS/TSS),reg)

In [336]:
oos_r2_sum = pd.concat([OOS_r2_dp,OOS_r2_ep,OOS_r2_epdp,OOS_r2_all])
oos_r2_sum

Unnamed: 0,OOS R-Squared
DP,-0.002074
EP,-0.006394
DP-EP,-0.017227
All,-0.030651


This forecasting strategy produces a negative OOS r-squared, which indicates our strategy fits the data worse than a horizontal line given by the expanding mean of the sample.


In the context of Out-of-Sample (OOS) R-squared, it is benchmarked against a regular forecast. This assessment involves comparing the error of our forecast using our sample average (instead of 1). The baseline is determined by the sum of squared errors, which measures how substantial the forecast errors were compared to a scenario of doing nothing. Consequently, the OOS R-squared could potentially yield a negative value, indicating the extent to which the forecast performed worse than a simple strategy of using the sample average.