In [32]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from arch import arch_model
from arch.univariate import GARCH, EWMAVariance 
from sklearn import linear_model
import scipy.stats as stats
from statsmodels.regression.rolling import RollingOLS
from sklearn.linear_model import LinearRegression
import seaborn as sns
import warnings
from scipy.stats import norm
warnings.filterwarnings("ignore")
pd.set_option("display.precision", 5)

# Q1

#### Q1.1 Consider our momentum construction of going long the biggest winners and short the biggest losers.
What is the tradeoff of focusing the long and short positions narrowly, (a single decile top and bottom) versus more broadly, (three deciles top and bottom)?
Did our empirical investigation support this theoretical tradeoff?


Answer 1.1
When we focus on just he top and bottom deciles, we have a higher idiosyncratic risk. The risk of a single or very few assets dominating the allocation can lead to huge losses if the said assets fail.
Yes, the empirical investigation did support this theoretical tradeoff showing higher tail risk.

#### Q1.2 We investigated LTCM’s market exposure and found it is nonlinear. Explain this nonlinearity: does it imply LTCM has high upside, large downside, both, etc?


Ans 1.2 When we regressed LTCM against the squared returns of SPY we found the betas to be higher. LTCM has negative beta to spy-squared, thus short the market options. For a big monthly return, the negative beta to SPY-squared would lead to heavy losses. Thus, LTCM is shorting SPY-squared options, to hedge extreme moves in the market.

#### Q1.3 State one reason that Mean-Variance optimization is not robust, (i.e. the solution is fragile with respect to the inputs.)
State one approach we discussed regarding how to improve the stability of our optimized port- folio.

And 1.3 Inverting the covariance matrix (which is nearly singular, highly correlated returns). The mean-variance optimization is not robust because it is sensitive to the inputs. The inputs are the expected returns and the covariance matrix. The expected returns are sensitive to the time period and the covariance matrix is sensitive to the time period and the number of assets. The approach we discussed to improve the stability of the optimized portfolio is to use a rolling window to calculate the expected returns and the covariance matrix.

#### Q1.4 You have monthly returns from January 2001 to December 2022 for 40 portfolios of assets. You want to test the performance of some of your Linear Factor Pricing Models using Time Series and Cross Sectional regressions.
 What would be the regression sample sizes for the TS regression? How many TS regressions would you estimate? <br>
 What would be the regression sample sizes for the CS regression? How many CS regressions would you estimate?

In TS regression, we regress the assets we against the 40 factors to get the betas and alphas. The sample size would be number of months * 40.
In CS regression, we use the betas estimated in TS regression, thus the sample will be smaller.

#### 1.5 GMO stated that they had a “contrarian” investment style. What did they mean by this? Was this seen in our investigation of the fund, GMWAX?
- When the market has an **overly optimistic view of future dividend**, prices would exceed fair value. Investors would then **eventually** realize they were too optimistic and that prices would rever toward **fair value.**
- Thus at times of high prices GMO would have a contrarian view of expected returns being low.
- Indeed in 2012 GMO believed valuations were high for stocks. They were pressimistic about future earnings growth, as they believed the stocks were already high and did not have much more growth potential.
  - $\%\Delta(\frac{P}{E})$ multiple was 0% in 2011, causing positive estimated returns.
  - $\%\Delta(S)$ or Reas Sales Growth was 2.9% up from 2.7% in 2011, causing positive estimated returns.

#### 1.6 How does Harvard make their portfolio allocation more realistic than a basic mean- variance optimization would imply? Is their approach easily implemented and computed from a numerical standpoint?

ans 1.6 Harvard places bounds on the portfolio allocation rather than implementing whatever numbers come out of the MV optimization problem along with a long only constraint on non-cash assets. The solution is numerical (rather than an explicit formula,) due to the inequality constraints. While the solution is computationally easy, it leads to the need for many boundary parameters, which greatly influence the solution. Thus, the solution may be overparameterized with little guide on how to set the parameters, or the motive to parameterize the problem to achieve a certain solution

#### 1.7 If Barnstable’s assumptions hold, (log iid returns, normally distributed,) then how will an investment’s Sharpe ratio compare across short and long-term horizons? Explain.

ans 1.7 The 100-year Market Sharpe Ratio will outperform the 1-year Market Sharpe Ratio. Though, the log iid assumption is strong, we saw much evidence that Sharpe ratios grow nearly with the square-root of the time horizon, which would make the 100-year Sharpe about 10x the 1-year Sharpe.

#### Q1.8 Does Uncovered Interest Parity (UIP) imply Covered Interest Parity (CIP)? Or vice-versa? Or neither? Explain.

Ans 1.8 Covered Interest Parity (CIP) implies Uncovered Interest Parity (UIP). CIP is a special case of UIP. CIP is when the interest rate differential is zero. CIP deals with Forward prices which the Uncovered Interest Parity deals with Spot prices.

#### Q1.9  Name and briefly explain two reasons why we said it is very hard for investors to under- stand the mean returns of managed funds.

Ans 1.9 
- The first reason is that as investors, we cannot observe the population mean of returns, μ. We must try to infer it.
- The returns turn fragile when investors chase performance. The second reason is that the mean return of a managed fund is not a simple function of the mean return of the underlying assets. The mean return of a managed fund is a function of the mean return of the underlying assets, the skill of the manager, and the fees charged by the manager.Just because a return can be explained by factors, it is not clear that investors would find it convenient to build such returns themselves.

#### Q1.10 Suppose we have a strategy with returns, rti. If we want to hedge our position with respect to SPY, how could we calculate the optimal ratio? How would this ratio then be used to build the hedged position?

And 1.10 We would perform a regression of our portfolio against SPY to see how much of our return is explained by SPY. The regression would give us the beta. We would then use the beta to calculate how much SPY (or ETF that track SPY) would be need to hedge our position. (we would short the SPY ETF if we are long our porfolio)

# Q2

In [33]:
forecast = pd.read_excel("final_exam_data.xlsx", 2, index_col=0)
forecast.head()

Unnamed: 0_level_0,GLD,Tbill rate,Tbill change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2009-04-19,-0.01263,0.13,-0.045
2009-04-26,0.0528,0.095,-0.035
2009-05-03,-0.03087,0.145,0.05
2009-05-10,0.03485,0.165,0.02
2009-05-17,0.01745,0.155,-0.01


In [34]:
forecast_gld = pd.DataFrame(forecast["GLD"])
forecast_gld.head()

Unnamed: 0_level_0,GLD
Date,Unnamed: 1_level_1
2009-04-19,-0.01263
2009-04-26,0.0528
2009-05-03,-0.03087
2009-05-10,0.03485
2009-05-17,0.01745


#### Q2.1 Calculate the 5th percentile VaR and CVaR for GLD as of the end of the sample using the empirical CDF approach over the full sample of data.


In [122]:
## Var

rets = forecast_gld['GLD']

#calculate the value at risk at 95% confidence interval

rets_var = rets.quantile(0.05)
rets_var

-0.03332338238201468

In [123]:
# empirical cdf to get cvar

rets_cvar = rets[rets < rets_var].mean()
rets_cvar

-0.04712463823967865

### Q2.2

In [130]:
WINDOW = 150
QUANTILE = 0.05
mu = 0

#sigma = pd.concat([None,sigma_expanding,sigma_rolling],axis=1,keys='empirical cdf')

from scipy.stats import norm
zscore = norm.ppf(QUANTILE)

VaRret = dict()
CVaRret = dict()

VaRret['empirical cdf'] = rets.expanding(WINDOW).quantile(QUANTILE)
CVaRret['empirical cdf'] = rets[rets<VaRret['empirical cdf']].expanding().mean()

# for method in METHODS[1:]:
#     VaRret[method] = mu + zscore * sigma[method]
#     CVaRret[method] = mu - norm.pdf(zscore)/QUANTILE * sigma[method]

VaRret = pd.concat(VaRret,axis=1)
CVaRret = pd.concat(CVaRret,axis=1)
CVaRret

Unnamed: 0_level_0,empirical cdf
Date,Unnamed: 1_level_1
2012-05-13,-0.03706
2013-02-17,-0.03651
2013-04-14,-0.04367
2013-04-21,-0.04748
2013-05-19,-0.0502
2013-06-23,-0.05346
2013-06-30,-0.05261
2013-09-15,-0.05193
2014-11-02,-0.05151
2015-02-08,-0.05025


### 2.3 In our analysis in the course, which of the methods above did we find did best? How did we judge which method did best?

And 2.3 We found that the empirical CDF method did not work well as the returns were never falling below the empirical CVAR. We calculated the frequency of periods of returns falling below the respective CVAR technique and found the number closest to 5% to be thge best CVAR technique. In our case the ROLLING WINDOW CVAR technique was the best!!

# 3 Pricing Model (25 pts)

In [36]:
futures = pd.read_excel("final_exam_data.xlsx", 1, index_col=0)
factors = pd.read_excel("final_exam_data.xlsx", 0, index_col=0)
factors.head()

Unnamed: 0_level_0,MKT,UMD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-01-31,-0.0474,0.0192
2000-02-29,0.0245,0.182
2000-03-31,0.052,-0.0683
2000-04-30,-0.064,-0.0839
2000-05-31,-0.0442,-0.0898


#### Q3.1 Estimate the time-series test of the pricing model.

Ans 3.1(a)

In [37]:
def ts_test(df, factor_df, factors, constant = True,annualization=12):
    res = pd.DataFrame(data = None, index = df.columns, columns = ['alpha','f_1','f_2', 'r_2', 'treynor', 'info'])
    
    for port in df.columns:
        y = df[port]
        if constant:
            X = sm.add_constant(factor_df[factors])
        else:
            X = factor_df[factors]
        model = sm.OLS(y, X).fit()
        
        if constant:
            beta = model.params[1:]
            alpha = model.params[0] * annualization
            information_ratio = model.params[0] * np.sqrt(annualization) / model.resid.std()
        else:
            beta = model.params
    
        treynor = df[port].mean() * annualization / beta[0]
        tracking_error = model.resid.std() * np.sqrt(annualization)
        if constant:
            res.loc[port] = [alpha, model.params[1], model.params[2], model.rsquared, treynor, information_ratio]
        else:
            res.loc[port] = [None, model.params[0], model.params[1], model.rsquared, treynor, None]
    return res

#df is the asset portfolios which we are regressing
#factor_df is the factor data
#intercept is whether we want to include an intercept in the regression

In [38]:
df_ts = ts_test(futures, factors, ['MKT', 'UMD'])
df_ts.rename(columns = {'f_1':'beta_MKT', 'f_2':'beta_UMD'}, inplace = True)
df_ts

Unnamed: 0,alpha,beta_MKT,beta_UMD,r_2,treynor,info
NG1,0.11195,0.35411,0.38123,0.01731,0.4066,0.2099
KC1,0.0232,0.31512,-0.02747,0.02589,0.14265,0.07326
CC1,0.07079,0.20732,-0.03582,0.01203,0.40891,0.21944
LB1,0.06448,0.94207,-0.00479,0.13685,0.13898,0.17317
CT1,0.02492,0.50425,-0.1786,0.09902,0.11353,0.08547
SB1,0.09313,0.05797,-0.3192,0.03273,1.57591,0.2731
LC1,0.01542,0.18306,0.0661,0.02005,0.16152,0.08304
W1,0.05453,0.2989,0.02243,0.02133,0.25446,0.17693
S1,0.04254,0.39948,0.02726,0.05292,0.17838,0.16493
C1,0.06087,0.3404,0.06204,0.02825,0.25281,0.2068


In [39]:
### MAE
print('MAE:-',round(df_ts['alpha'].abs().mean(),5))

MAE:- 0.05892


In [40]:
### Mean r-squared
print('Mean r-squared:-',round(df_ts['r_2'].mean(),5))

Mean r-squared:- 0.05875


ANS 3.1(b) If the pricing model worked perfectly:
- The Alpha for each asset would be zero as we expect the factors to explain most of the expected returns of the portfolios.
- Based on the expectation of alphas to be zero we would be expect the MAE to be near zero as well.
- Nothing could be said about the R-Squared as in the TimeSeries test we do not care about high R-Squared. Thus, the average R-Squared statistic would be unrestricted
- Nothing needs to be said about the regression $\beta$, as they would vary with the exposure to the respective factor and would vary with each asset

### Q3.2 Estimate the cross-sectional test of the pricing model.

In [41]:
def cross_regression(asset_df, factors_df, factors, intercept=True, annualization=12):
    
    res = pd.DataFrame(data = None, index = factors, columns = ['cs_premia'])
    
    factors_df = factors_df.iloc[:,1:(len(factors)+1)]
    
    y = asset_df.mean() * annualization
    if intercept == True:
        X = sm.add_constant(factors_df[factors].astype(float))
    else:
        X = factors_df[factors].astype(float)
    model = sm.OLS(y, X).fit()
    if intercept:
        alpha = model.params[0]
        mae = model.resid.abs().mean()
        for i in range(len(factors)):
            res.loc[factors[i]] = model.params[i+1]
        predicted_premia = np.matrix(factors_df[factors].astype(float)) @ np.matrix(res.astype(float))
        
    else:
        alpha = None
        mae = model.resid.abs().mean()
        for i in range(len(factors)):
            res.loc[factors[i]] = model.params[i]
        predicted_premia = np.matrix(factors_df[factors].astype(float)) @ np.matrix(res.astype(float))
        
    predicted_premia = pd.DataFrame(predicted_premia, columns = ['Cross-Section Premia'], index = factors_df.index)
    if intercept:
        return predicted_premia,res, f'r-squared = {round(model.rsquared, 4)}', f'alpha = {round(alpha, 4)}', f'mae = {round(mae, 4)}'
    else:
        return predicted_premia,res, f'r-squared = {round(model.rsquared, 4)}', f'mae = {round(mae, 4)}'

In [42]:

predicted_premia,cs_premia,r2,alpha,mae = cross_regression(futures, df_ts, ['beta_MKT', 'beta_UMD'], intercept=True)

annualized intercept

In [43]:
alpha

'alpha = 0.0611'

annualized factor premia

In [44]:
cs_premia

Unnamed: 0,cs_premia
beta_MKT,0.06197
beta_UMD,0.0735


r-squared.

In [45]:
r2

'r-squared = 0.3914'

annualized mean-absolute error.

In [46]:
mae

'mae = 0.018'

#### ANs 3.2(b)

If the pricing model worked perfectly:
- The R-squared would be near 1 as the factors would explain most of the expected returns of the portfolios.
- We expect Mae to be near zero.
- We would expect the Alpha for the cross-sectional regression to be zero if we measured the risk-free rate (in building the excess returns). A non-zero cross-sectional intercept means the model pricing is off by a fixed amount, potentially due to risk-free rate mismeasurment.
- We cannot say anything specific about the factor premia as the cross-sectional regression provides the freedom for the factor premia to be anything and is derived as the regression coefficient.

### Q3.3Compare the factor premia across the cross-sectional and time-series estimations.

In [47]:
#Calculate mean, standard deviation and sharpe ratio
def mean_vol_sharpe(df,ann=12):
    mean = df.mean() * ann
    volatility = df.std() * np.sqrt(ann)
    sharpe_ratio = mean/volatility
    return pd.DataFrame({'mean': mean, 'volatility': volatility, 'sharpe_ratio': sharpe_ratio})

In [52]:
def ts_premia(df_ts, factor_mean):
    #multiply factor premium row wise to each f_1 an f_2 in factor and sum
    premia = pd.DataFrame(data = None, index = df_ts.index, columns = ['TS Premia'])
    for row in df_ts.index:
        a = df_ts.loc[row, 'beta_MKT']* factor_mean[0]
        b = df_ts.loc[row, 'beta_UMD']* factor_mean[1]
        premia.loc[row] = a + b
        
    return premia

In [56]:
factor_mean = mean_vol_sharpe(factors).iloc[:,0]
ts_premia_1 = ts_premia(df_ts, factor_mean)
#ts_premia_1

In [55]:
predicted_premia.rename(columns={'predicted_premia':'cross section premia'}, inplace=True)
#predicted_premia

In [57]:
premia_all = pd.concat([ts_premia_1, predicted_premia], axis=1)
premia_all

Unnamed: 0,TS Premia,Cross-Section Premia
NG1,0.03203,0.04997
KC1,0.02175,0.01751
CC1,0.01398,0.01022
LB1,0.06645,0.05803
CT1,0.03233,0.01812
SB1,-0.00178,-0.01987
LC1,0.01415,0.0162
W1,0.02152,0.02017
S1,0.02872,0.02676
C1,0.02519,0.02566


# 4 Forecasting (50pts)

In [58]:
forecast.head()

Unnamed: 0_level_0,GLD,Tbill rate,Tbill change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2009-04-19,-0.01263,0.13,-0.045
2009-04-26,0.0528,0.095,-0.035
2009-05-03,-0.03087,0.145,0.05
2009-05-10,0.03485,0.165,0.02
2009-05-17,0.01745,0.155,-0.01


In [82]:
## Regress [SPY] or anything against a list of factors to estimate. Lagged Regression
## you can decide the weight
def lagged_reg(df, y_col, X_col, weight=100, lag=1, intercept = True, annual_fac=12):
    y = df[y_col]
    if intercept == True:
        X = sm.add_constant(df[X_col].shift(lag))
    else:
        X = df[X_col].shift(lag)
    
    model = sm.OLS(y, X, missing = 'drop').fit()
    reg_df = model.params.to_frame('Regression Parameters')
    reg_df.loc['r-squared'] = model.rsquared
    
    if intercept == True:
        reg_df.loc['const'] *= annual_fac
        final = reg_df.loc['const'][0]/12
    else:
        final = 0
    
    reg_df = reg_df.drop('const')
    reg_df = reg_df.drop('r-squared')
    
    for i in reg_df.index:
        final += reg_df.loc[i][0] * df[i]
    
    weight = 0.2 + 80 * final.shift().dropna()
    
    final = final.shift().dropna()
    
    final_series = (weight * df[y_col]).dropna()
    
    return model, final, final_series

## Q4.1

In [84]:
reg, a, b = lagged_reg(forecast, 'GLD', ['Tbill rate','Tbill change'], weight=100, lag=1, intercept = True, annual_fac=12)

In [85]:
a = a.to_frame('Active')
a

Unnamed: 0_level_0,Active
Date,Unnamed: 1_level_1
2009-04-26,0.00100
2009-05-03,0.00100
2009-05-10,0.00105
2009-05-17,0.00104
2009-05-24,0.00103
...,...
2022-11-06,0.00202
2022-11-13,0.00202
2022-11-20,0.00203
2022-11-27,0.00206


In [86]:
reg.summary()
# Alpha Beta and R-squared below

0,1,2,3
Dep. Variable:,GLD,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.003
Method:,Least Squares,F-statistic:,0.03771
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,0.963
Time:,19:23:57,Log-Likelihood:,1734.1
No. Observations:,711,AIC:,-3462.0
Df Residuals:,708,BIC:,-3449.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0010,0.001,1.044,0.297,-0.001,0.003
Tbill rate,0.0003,0.001,0.259,0.796,-0.002,0.002
Tbill change,0.0005,0.015,0.031,0.975,-0.029,0.029

0,1,2,3
Omnibus:,32.369,Durbin-Watson:,1.952
Prob(Omnibus):,0.0,Jarque-Bera (JB):,70.408
Skew:,-0.247,Prob(JB):,5.14e-16
Kurtosis:,4.46,Cond. No.,23.3


# 4.2

In [88]:
b = b.to_frame('Active')
b

Unnamed: 0_level_0,Active
Date,Unnamed: 1_level_1
2009-04-26,0.01480
2009-05-03,-0.00865
2009-05-10,0.00990
2009-05-17,0.00495
2009-05-24,0.00801
...,...
2022-11-06,0.00782
2022-11-13,0.01868
2022-11-20,-0.00390
2022-11-27,0.00096


# 4.3

In [91]:
forecast_gld['active'] = b
mean_vol_sharpe(forecast_gld)

Unnamed: 0,mean,volatility,sharpe_ratio
GLD,0.01337,0.07316,0.18281
active,0.004,0.02093,0.19113


In [92]:
## Var, skewness, kurtosis, expected shortfall, maximum drawdown
def risk_stats(data, q=0.05):
    df = data.copy()
    df.index = data.index.date
    report = pd.DataFrame(columns = df.columns)
    
    report.loc['Skewness'] = df.skew()
    report.loc['Excess Kurtosis'] = df.kurtosis()
    report.loc['VaR (negated)'] = -df.quantile(q)
    report.loc['Expected Shortfall (negated)'] = -df[df < df.quantile(q)].mean()
    
    cum_ret = (1 + df).cumprod()
    rolling_max = cum_ret.cummax()
    drawdown = (cum_ret - rolling_max) / rolling_max
    report.loc['Max Drawdown'] = drawdown.min()
    report.loc['MDD Start'] = None
    report.loc['MDD End'] = drawdown.idxmin()
    report.loc['Recovery Date'] = None
    
    for col in df.columns:
        report.loc['MDD Start', col] = (rolling_max.loc[:report.loc['MDD End', col]])[col].idxmax()
        recovery_df = (drawdown.loc[report.loc['MDD End', col]:])[col]
        
        try:
            report.loc['Recovery Date', col] = recovery_df[recovery_df >= 0].index[0]

        except:
            report.loc['Recovery Date', col] = None
            report.loc['Recovery period (days)'] = None
    report.loc['Recovery period (days)'] = (report.loc['Recovery Date'] - report.loc['MDD Start']).dt.days
    return round(report,4)

#risk_stats(ltcm_ex).iloc[:3,1:4].T

In [96]:
risk_stats(forecast_gld).iloc[4:,:]

Unnamed: 0,GLD,active
Max Drawdown,-0.44745,-0.14257
MDD Start,2011-09-04,2011-09-04
MDD End,2015-11-29,2015-11-29
Recovery Date,2020-08-02,2020-05-17
Recovery period (days),3255,3178


# 4.4

In [152]:
## SIMPLE REGRESSION

def regress(y, X, intercept = True, annual_fac=12):
    if intercept == True:
        X_ = sm.add_constant(X)
        reg = sm.OLS(y, X_).fit()
        reg_df = reg.params.to_frame('Regression Parameters')
        reg_df.loc['r-squared'] = reg.rsquared
        reg_df.loc['const'] *= annual_fac
        reg_df.loc['treynor ratio'] = y.mean()/reg_df.loc['GLD'][0]
        reg_df.loc['information ratio'] = reg.params[0] * np.sqrt(annual_fac) / reg.resid.std()
    else:
        reg = sm.OLS(y, X).fit()
        reg_df = reg.params.to_frame('Regression Parameters')
        reg_df.loc['r-squared'] = reg.rsquared
        reg_df.loc['treynor'] = reg_df.loc['const'][0]/reg_df.loc['const'][1]
    
    return reg_df

In [153]:
# regress active on GLD
forecast_gld = forecast_gld.dropna()
regress(forecast_gld['active'], forecast_gld['GLD'], intercept = True, annual_fac=12)

Unnamed: 0,Regression Parameters
const,0.00012
GLD,0.2856
r-squared,0.99713
treynor ratio,0.00117
information ratio,0.10256


#### 4.5 Suppose we were going to forecast GLD using just one of our two signals. Which of the signals would likely lead to a result where the long-term forecast compounds the effect over long horizons, as we saw for forecasting SPY using dividend-price ratios? Explain.

The T-bill rate will compound over time as it is a constant rate. The change in T-bill will not compound over time.

## 4.6

In [160]:
sigs = forecast[['Tbill rate','Tbill change']]

sigs_lag = sigs.shift().dropna()

sigs_lag, spy_aligned = sigs_lag.align(forecast[['GLD']], join='inner',axis=0)
spy = pd.DataFrame(forecast['GLD'])
spy = spy.loc[sigs_lag.index]

In [161]:
# start predict one period earlier than we want the first forecast
START_PREDICT = pd.to_datetime('1999-12-31')

forecasts_OOS = pd.DataFrame(columns=['Forecast'],index=spy_aligned.index, dtype='float64')

est = LinearRegression()

Xlag = sigs_lag
X = sigs
y = spy_aligned

for t in spy_aligned.loc[START_PREDICT:,:].index:
    yt = y.loc[:t].values.reshape(-1,1)
    Xlag_t = Xlag.loc[:t,:].values
    x_t = X.loc[t,:].values.reshape(1,-1)

    est.fit(Xlag_t,yt);
    predval = est.predict(x_t)[0,0]

    # this timing is assigning forecast to datestamp of info used to make the forecast
    forecasts_OOS.loc[t,'Forecast'] = predval

# make sure expanded mean (baseline forecast) uses all spy data, (spy, not spy_aligned)
forecasts_OOS.insert(0,'Mean', spy.expanding().mean().dropna())

# more convenient to have datestamp reflect date of the forecasted value
forecasts_OOS = forecasts_OOS.shift(1).dropna()
forecasts_OOS

Unnamed: 0_level_0,Mean,Forecast
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2009-05-03,0.05280,0.05280
2009-05-10,0.01097,0.02596
2009-05-17,0.01893,0.09583
2009-05-24,0.01856,0.04338
2009-05-31,0.02053,0.04850
...,...,...
2022-11-06,0.00105,0.00056
2022-11-13,0.00108,0.00111
2022-11-20,0.00115,0.00252
2022-11-27,0.00114,0.00219


## 4.7 Report the out-of-sample R2, relative to a baseline forecast which is simply the mean of GLD up to the point the forecast is made.

In [162]:
def oos_rsquared(data,forecasts,null=None):
    data = data.copy()
    forecasts = forecasts.copy()
    null = null.copy()
    
    # if no Null forecast given, use expanding mean
    if null is None:
        null = data.expanding().mean().shift()

    # label Data and Null accordingly--input may be series or dataframe
    if isinstance(null, pd.DataFrame):
        null.columns = ['Null']
    elif isinstance(null,pd.Series):
        null.name = 'Null'
    if isinstance(data, pd.DataFrame):
        data.columns = ['Data']
    elif isinstance(data,pd.Series):
        data.name = 'Data'

    # double check data is aligned and no NaN (null will have NaN in first value by default)
    alldata = forecasts.join(data,how='inner',rsuffix='_Data').join(null,how='inner',rsuffix='_Null').dropna(axis=0)
    null = alldata[['Null']]
    data = alldata[['Data']]
    forecasts = alldata.drop(columns=['Data','Null'])


    # Forecast MSE
    err_forecast = forecasts.subtract(data.values)
    mse_forecast = (err_forecast**2).sum()

    # Null MSE
    err_null = null.subtract(data.values)
    mse_null = (err_null**2).sum()

    # OOS R-squared
    r2oos = (1 - mse_forecast/mse_null.values).to_frame().T
    r2oos.index = ['OOS-Rsquared']

    return r2oos

In [163]:
spy_OOS, _ = spy.align(forecasts_OOS, join='right', axis=0)

oos_rsquared(spy_OOS,forecasts_OOS,forecasts_OOS[['Mean']])

Unnamed: 0,Mean,Forecast
OOS-Rsquared,0.0,-0.08837


### 4.8 Report the correlation between the two forecasts of SPY (regression based and the baseline forecast) with the actual realized value of SPY.

No neither of the forecast positively correlates. <br>
This is more informative and the R-squared was close to 1

In [166]:
corr_val = forecasts_OOS.corrwith(spy_OOS['GLD'])
corr_val.to_frame('Corr. to GLD')

Unnamed: 0,Corr. to GLD
Mean,-0.00566
Forecast,-0.00565


## 4.9

In [167]:
wts_OOS = 0.2 + 80 * forecasts_OOS
fund_returns_OOS = wts_OOS * spy_OOS.values
fund_returns_OOS.insert(0,'Passive', spy_OOS)
fund_returns_OOS

Unnamed: 0_level_0,Passive,Mean,Forecast
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2009-05-03,-0.03087,-0.13660,-0.13660
2009-05-10,0.03485,0.03754,0.07935
2009-05-17,0.01745,0.02991,0.13725
2009-05-24,0.02840,0.04784,0.10423
2009-05-31,0.02177,0.04011,0.08883
...,...,...,...
2022-11-06,0.02161,0.00614,0.00529
2022-11-13,0.05170,0.01482,0.01494
2022-11-20,-0.01076,-0.00314,-0.00432
2022-11-27,0.00264,0.00077,0.00099


In [168]:
mean_vol_sharpe(fund_returns_OOS)

Unnamed: 0,mean,volatility,sharpe_ratio
Passive,0.01273,0.07293,0.17459
Mean,0.00484,0.03878,0.12468
Forecast,0.00559,0.07708,0.07249


In [170]:
risk_stats(fund_returns_OOS).iloc[4:,:]

Unnamed: 0,Passive,Mean,Forecast
Max Drawdown,-0.44745,-0.23135,-0.45
MDD Start,2011-09-04,2011-09-04,2020-03-08
MDD End,2015-11-29,2015-11-29,2022-10-16
Recovery Date,2020-08-02,,
Recovery period (days),3255.0,,


In [172]:
## SIMPLE REGRESSION

def regress(y, X, intercept = True, annual_fac=12):
    if intercept == True:
        X_ = sm.add_constant(X)
        reg = sm.OLS(y, X_).fit()
        reg_df = reg.params.to_frame('Regression Parameters')
        reg_df.loc['r-squared'] = reg.rsquared
        reg_df.loc['const'] *= annual_fac
        reg_df.loc['treynor ratio'] = y.mean()/reg.params[1]
        reg_df.loc['information ratio'] = reg.params[0] * np.sqrt(annual_fac) / reg.resid.std()
    else:
        reg = sm.OLS(y, X).fit()
        reg_df = reg.params.to_frame('Regression Parameters')
        reg_df.loc['r-squared'] = reg.rsquared
        reg_df.loc['treynor'] = reg_df.loc['const'][0]/reg_df.loc['const'][1]
    
    return reg_df

In [173]:
regress(spy_OOS,fund_returns_OOS, intercept = True, annual_fac=12)

Unnamed: 0,Regression Parameters
const,-2.68385e-17
Passive,1.0
Mean,1.34346e-15
Forecast,2.19745e-16
r-squared,1.0
treynor ratio,
information ratio,-0.21761


# Question 5 FX Carry (40pts)

In [103]:
fx = pd.read_excel('final_exam_data.xlsx',3, index_col=0)
fx.head()

Unnamed: 0_level_0,GBP,SOFR,SONIA
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-04-03,1.4068,0.0183,0.00465
2018-04-04,1.4076,0.0174,0.00462
2018-04-05,1.3991,0.0175,0.00465
2018-04-06,1.4088,0.0175,0.00467
2018-04-09,1.4136,0.0175,0.00465


## 5.1

In [105]:
fx_1 = pd.DataFrame(fx['GBP'])
log_fx = np.log(fx_1)

rf = pd.DataFrame(fx[['SOFR','SONIA']])
log_rf = np.log(1+rf)


Unnamed: 0_level_0,GBP
DATE,Unnamed: 1_level_1
2018-04-03,0.34132
2018-04-04,0.34189
2018-04-05,0.33583
2018-04-06,0.34274
2018-04-09,0.34614
...,...
2022-11-18,0.17412
2022-11-21,0.16424
2022-11-22,0.17227
2022-11-23,0.18565


In [106]:
# Display the mean of all three series.
log_fx.mean().to_frame('Mean').append(log_rf.mean().to_frame('Mean'))

Unnamed: 0,Mean
GBP,0.26008
SOFR,0.01135
SONIA,0.00531


#### 5.2 (3pts) If we assume the Uncovered Interest Parity to hold true, what would you expect from the (static, passive) return to GBP?

Ans 5.2 - Proving UIP (Uncovered interest Parity) works
  - example $ S_{t+1} - S_{t} = \alpha + \beta (r^\$ - r^{foregin})  + \epsilon $
  - we expect $\alpha$ to be 0 and $\beta$ to be 1. and epsilon is $N(0,1)$
  - It not true, can do trading strategy involving the two currencies, called the CARRY TRADE.
  - Carry Trade
  - At time t: Borrow at the risk-free rate in USD. Convert USD to GBP. Buy risk-free assets in GBP
  - At time t+1: Close risk-free assets in GBP, convert GBP to USD, return the borrowed amount

#### 5.3 Calculate the excess log return to a USD investor of holding GBP. Report the following annualized stats...

In [111]:
logrfspread = (log_rf['SOFR'] - log_rf['SONIA']).shift(1).rename('RF spread')
logfxgrowth = log_fx['GBP'].diff().rename('GBP Growth')

logrx = (logfxgrowth - logrfspread).to_frame('GBP excess return')

mean_vol_sharpe(logrx)

Unnamed: 0,mean,volatility,sharpe_ratio
GBP excess return,-0.07411,0.03281,-2.25846


#### 5.4 Over the sample, was it better to be long or short GBP relative to USD?

In [112]:
fxcomponents = pd.concat([logfxgrowth, logrfspread, logrx], axis=1)
fxcomponents.mean()

GBP Growth          -0.00013
RF spread            0.00604
GBP excess return   -0.00618
dtype: float64

And 5.4
- We see negative GBP excess returns. The GBP-USD rf spread is postive on average, which indicates GBP rates are higher on average. The interest spread did not help enough as we still see negative GBP excess returns.
- Also, the GBP growth was negative on average, which means the USD was appreciating against the GBP.

### 5.5

In [115]:
import scipy.stats as scistats

YRS = 5
PERYR = 12

mu_tilde = logrx.mean()
sigma_tilde = logrx.std()
prob_rx = scistats.norm.cdf(-np.sqrt(YRS*PERYR)*(mu_tilde/sigma_tilde))[0]
underperform = pd.DataFrame(prob_rx, columns=['RfUSD'],index=['Probability of Underperforming'])
underperform.style.format('{:.2%}')

Unnamed: 0,RfUSD
Probability of Underperforming,100.00%


# 5.6

In [117]:
y, X  = logfxgrowth.to_frame().dropna().align(logrfspread.to_frame().dropna(),join='inner',axis=0)
mod = LinearRegression().fit(X,y)

FXpredictOLS = pd.DataFrame(
    {'alpha':mod.intercept_[0],
     'beta':mod.coef_[0,0],
     'r-squared':mod.score(X,y)},
    index=['GBP Growth'])

FXpredictOLS

Unnamed: 0,alpha,beta,r-squared
GBP Growth,-7e-05,-0.01064,0.00016


### 5.7 If we assume the Uncovered Interest Parity to hold true, what would you expect to be true of the regression estimates?

 $ S_{t+1} - S_{t} = \alpha + \beta (r^\$ - r^{foregin})  + \epsilon $
  - we expect $\alpha$ to be 0 and $\beta$ to be 1
  - It not true, can do trading strategy involving the two currencies.

### 5.8 Based on the regression results, if we observe an increase in the interest rate on GBP relative to USD, should we expect the USD to get stronger (appreciate) or weaker (depreciate)?

Ans 5.8 If regressor goes down ($r^\$ - r^{foreign}$), then
GBP (negative $\beta$) increase relative to US. That is the USD should depreciate.

### 5.9 If the risk free rates in USD increase relative to risk-free rates in GBP, we expect the forward exchange rate to be higher than the spot exchange rate?

Ans 5.9 - If the risk free rates in USD increase relative to risk-free rates in GBP, we expect the forward exchange rate to be higher than the spot exchange rate. This is because the forward exchange rate is the spot exchange rate plus the interest rate differential. If the interest rate differential increases, then the forward exchange rate will increase.

#### 5.10 Based on the regression results, construct an in-sample forecast of the excess log return to holding GBP. Report the forecasted values for the first 5 and last 5 dates.