# Final Exam Solution
## FINM 36700: Portfolio Theory 
### 2021
### Mark Hendricks

Discussion below goes beyond what would have been strictly required. I try to indicate extra discussion with the following symbol: ⮕

# 1 Short Answer

## 1.1
The required condition is just that correlation is less than 1. 

⮕
Misconceptions include that this diversification requires 0 or negative correlation, but anything less than perfect correlation is sufficient.
$$r^p_t \equiv w_1 r^1_t + w_2r^2_t$$
Then if $\rho<1$,
$$\sigma_p < w_1\sigma_1 + w_2\sigma_2$$

## 1.2
The time-series test is stricter. The cross-sectional test allows an extra regression to fit the factor premia as well as possible.

⮕ In the CS test, the factor premia are the estimated slopes of the cross-sectional regression, whereas the time-series test requires that the factor premia are the time-series averages of the factors.

In the figure below, the time-series test completely fixes pricing through the black points and line, whereas the cross-sectional test fits the scatter plot more flexibly. (In the figure, it has an unrestricted intercept, but even without an intercept, it can choose any slope it wants, whereas the time-series test slope is fixed by the MKT premium.)

<img src="../refs/ts_vs_cs_pricing.png" width="400"/>

*If figure does not display, make sure you have the `refs` directory in the repo, at the expected relative path.*


## 1.3
The tangency portfolio is a linear combination of just these four factors, (with zero weight on all other assets.)

⮕ The tangency portfolio is a perfect pricing factor. Thus, any set of factors that works perfectly must span the tangency portfolio!

## 1.4
The cross-sectional regression uses time-series regression outputs (the betas) as its inputs (the regressors.) This usage of model estimates as input data introduces additional uncertainty, noise, imprecision. The usual OLS stats do not account for this.

⮕ Ways of dealing with this include using Fama-MacBeth regressions or Generalized Method of Moments.

## 1.5
1. Betas will be impacted.

    ⮕ If the extra regressor is correlated to any of the 4 factors, it will impact their betas. Even if it is uncorrelated to the other factors, it still may have a non-zero beta.

1. The alphas will still be zero! (Statistically zero--an estimated sample will have some statistical noise, but they will not be statistically different than zero.) 

    ⮕ Linear Factor Pricing Models are no-arbitrage models, so the 5th factor will not introduce any mispricing, though it may confuse the attribution of the pricing.

1. Yes, the factor premia estimates ($\lambda$) from the cross-sectional regression slopes will potentially be different. The time-series betas have changed, and that will potentially impact the cross-sectional regression slopes.

1. Same answer as for the time-series alphas--they will still be zero! (Statistically speaking.)


## 1.6
False. The Linear Pricing Model factors are not guaranteed to be good for Linear Factor Decomposition. 

In other words, a great (even perfect!) LPM does not necesssarily have high time-series R-squared.


## 1.7
You should include an intercept in the regression. By including an intercept, you are effectively demeaning both $y$ and $X$. Thus, if you do not trust the sample mean, you should not let it impact the regression betas, and thus should include an intercept.

# Load Modules

In [1]:
import numpy as np
import pandas as pd
pd.set_option("display.precision", 4)

import matplotlib.pyplot as plt
import seaborn as sns

import os
import warnings
import sys

if os.path.isfile('../cmds/portfolio.py'):
    sys.path.insert(0, '../cmds')
    from portfolio import *
else:
    warnings.warn('Notebook below relies on the functions in portfolio.py. Found at the GitHub repo.')

# 2. Portfolio Analysis

In [2]:
DATAPATH = '../data/midterm_2_data_pricing.xlsx'
SHEET = 'assets (excess returns)'
retsx_cmdty = pd.read_excel(DATAPATH,sheet_name=SHEET)
retsx_cmdty.set_index('Date',inplace=True)
retsx_cmdty

Unnamed: 0_level_0,NG1,KC1,CC1,LB1,CT1,SB1,LC1,W1,S1,C1,GC1,SI1,HG1,PA1
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2000-01-31,0.1389,-0.1217,-0.0543,-0.0110,0.1362,-0.1185,0.0204,0.0271,0.0961,0.0717,-0.0262,-0.0274,-0.0141,0.0749
2000-02-29,0.0329,-0.1051,-0.0571,-0.0516,-0.0261,-0.1464,-0.0100,-0.0404,-0.0176,-0.0270,0.0345,-0.0495,-0.0718,0.4646
2000-03-31,0.0619,0.0333,0.0577,-0.0214,0.0225,0.2641,0.0300,0.0570,0.0836,0.0930,-0.0584,-0.0102,0.0118,-0.1683
2000-04-30,0.0620,-0.0856,-0.0709,-0.0822,-0.0464,-0.1300,0.0242,-0.0809,-0.0394,-0.0565,-0.0179,-0.0166,-0.0171,0.0375
2000-05-31,0.3818,-0.0291,0.1222,-0.0133,0.1194,0.4582,-0.0923,0.1292,-0.0221,0.0006,-0.0159,-0.0088,0.0216,-0.0674
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-02-28,0.0807,0.1131,0.0672,0.1224,0.0890,0.0392,-0.0169,-0.0121,0.0257,0.0155,-0.0654,-0.0190,0.1458,0.0424
2021-03-31,-0.0588,-0.0976,-0.1307,0.0136,-0.0790,-0.1021,0.0696,-0.0565,0.0224,0.0158,-0.0092,-0.0722,-0.0223,0.1324
2021-04-30,0.1238,0.1332,-0.0026,0.4870,0.0810,0.1496,-0.0411,0.2015,0.0934,0.3115,0.0304,0.0538,0.1189,0.1274
2021-05-31,0.0188,0.1601,0.0299,0.0437,-0.0607,0.0224,-0.0011,-0.1064,-0.0258,-0.1125,0.0763,0.0836,0.0454,-0.0438


## 2.1

In [3]:
wts_tan = tangency_weights(retsx_cmdty).rename({0:'Tangency'},axis=1)
wts_tan

Unnamed: 0,Tangency
NG1,0.0574
KC1,-0.0728
CC1,0.0745
LB1,0.0866
CT1,-0.0095
SB1,0.0636
LC1,0.1289
W1,-0.0104
S1,0.0273
C1,0.0848


## 2.2

In [4]:
retsx_tan = (retsx_cmdty @ wts_tan)
performanceMetrics(retsx_tan,annualization=12)

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
Tangency,0.0875,0.1163,0.7524,-0.138,0.1102


## 2.3

In [5]:
VaR_tan = retsx_tan.quantile(.05).values[0]
f'The 5th quantile of the tangency portfolio returns is {VaR_tan:.2%}'

'The 5th quantile of the tangency portfolio returns is -4.14%'

In [6]:
MeanVaR = retsx_cmdty.mean().values / retsx_cmdty.quantile(.05)
MeanVaR.append(retsx_tan.mean() / retsx_tan.quantile(.05))

NG1        -0.0604
KC1        -0.0307
CC1        -0.0543
LB1        -0.0793
CT1        -0.0360
SB1        -0.0550
LC1        -0.0265
W1         -0.0502
S1         -0.0485
C1         -0.0496
GC1        -0.1123
SI1        -0.0716
HG1        -0.0790
PA1        -0.0649
Tangency   -0.1762
dtype: float64

### Conclusion
Yes, the Tangency compares favorably to the commodities when judged by Mean-per-VaR, which is much larger (in magnitude) for the Tangency.

#### Note
Does not matter if you scaled VaR as a positive or negative number--we're just looking at magnitudes.

⮕So even though the Tangency maximizes mean-per-variance and mean-per-volatility, it seems to do a good job of boosting mean-per-VaR.

## 2.4

In [7]:
wts_tan_IS = tangency_weights(retsx_cmdty.loc[:'2017',:]).rename({0:'Tangency'},axis=1)
retsx_OOS = retsx_cmdty.loc['2018':,:] @ wts_tan_IS
performanceMetrics(retsx_OOS,annualization=12)

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
Tangency,0.0741,0.0989,0.7488,-0.0674,0.0539


## 2.5

Two points: 
- inverting the covariance matrix (which is nearly singular, highly correlated returns) 
- mean return estimates are imprecise
The first point typically matter 
Mean-Variance optimization inverts a covariance matrix of returns, and this covariance matrix tends to be nearly singular. Thus, its inversion is statistically imprecise, leading to large changes in the inverted covariance matrix in-sample versus out-of-sample.

## 2.6

In [8]:
SHEET = 'factors (excess returns)'
retsx_cla = pd.read_excel(DATAPATH,sheet_name=SHEET).drop(columns=['MKT'])
retsx_cla.set_index('Date',inplace=True)

In [9]:
X = retsx_cmdty[['NG1','KC1']]
y = retsx_cla
mod = LinearRegression(fit_intercept=False).fit(X,y)
hedge_ratios = pd.DataFrame(mod.coef_,index=['beta'],columns=X.columns)
hedge_ratios.index.name='Hedge Ratios'
display(hedge_ratios)

Unnamed: 0_level_0,NG1,KC1
Hedge Ratios,Unnamed: 1_level_1,Unnamed: 2_level_1
beta,0.131,0.1064


In [10]:
retsx_cla['CL1-Hedged'] = (y - X.values @ mod.coef_.T)
performanceMetrics(retsx_cla,annualization=12)

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
CL1,0.1087,0.3916,0.2775,-0.5436,0.8837
CL1-Hedged,0.085,0.3839,0.2215,-0.5491,0.8989


# 3. Cases

## 3.1
- Directional positions: GMO took directional positions in asset classes based on their macro forecasting. (One could note that DFA is directional in terms of their factors. LTCM has very little directional exposure.)
- DFA relies on factor investing.
- LTCM utilizes many forms of spread trades, much more than directional or factor based investing.
- DFA is a liquidity provider for small stocks, and LTCM is a liquidity provider for the less liquid side of their spread trade.
- DFA is certainly a value-factor investor, and GMO has a value-driven forecasting. Value-investor would apply much less to LTCM.
- LTCM is involved in many so-called "arbitrage" trades. GMO and DFA are not.
- Funding Risk. LTCM faces enormous funding risk given the high leverage behind its trades. DFA nor GMO face particularly high funding risk, given the nature of their strategies, (though any such firm has some funding risk, and GMO did face stress from investors pulling out during the late 1990's when growth was outperforming value.)
- DFA faces adverse-selection risk in that they stand ready to buy shares of small stocks and must worry that the sellers know something they do not. LTCM nor GMO deal with this to a particularly large degree.

## 3.2
LTCM has almost no SPY exposure, whether measured linearly or non-linearly. GMO has substantial SPY exposure, (beta around .5.)

# 4. The Expectations Hypothesis

## 4.1
### (a)
- If EH held, then excess returns would be unpredictable: $\alpha$ and $\beta$ would be zero. 
- The weaker form of EH would allow a non-zero $\alpha$ but would still require $\beta$ to be zero.
- EH says nothing about R-squared.

### (b)
The evidence is strongly against the EH. The betas are all much closer to 1 than to 0. Of course, we do not have statistical significance reported in the table, but the point estimates are strongly against EH. The alpha estimates look small, but in monthly data, they are substantial.
### (c)
Yes, all the maturities show roughly the same thing: evidence that the EH is false, and that excess returns can be predicted (to some degree) by forward spreads.

## 4.2
### (a)
- If EH held, then current forwards would be the best predictor of future yields.
- Thus, $\alpha=0$ in the strictest version of EH, though the weaker EH would allow $\alpha\ne 0$.
- EH implies $\beta=1$.
- EH has no implication for the R-squared stat.

### (b)
The evidence is strongly against the EH. The betas are all much closer to 0 than to 1! (Again, we do not have reports on statistical significance, but the point estimates are evidence against EH.)

### (c)
- Yes, all maturities can be seen as evidence against EH given that all 4 regressions have beta estimates far from 0. 

- And if using the strict EH, note that the alphas are non-zero with substantial magnitudes (given the monthly data.)

## 4.3
Several things one could list:
- Bonds with higher yields are expected to have higher returns, even if not held to maturity.
- In times of higher forward rates or steeper yield curve, we expect long-term bonds to have particularly high excess returns.
- If the yield curve is flat or inverted---or if the forward curve is flat or inverted--we expect short-term bonds to do relatively better.
- Forward rates are not the optimal predictor of future yields, they are a predictor of high excess returns.

# 5. Forecasting

## Organize Data

In [11]:
DATAPATH = '../data/final_exam_data.xlsx'
SHEET = 'signals'
sigs = pd.read_excel(DATAPATH,sheet_name=SHEET)
sigs.set_index('date',inplace=True)
sigs

Unnamed: 0_level_0,Level,Slope,Inflation Growth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1993-02-28,6.03,2.11,-0.2975
1993-03-31,6.03,2.07,-0.6374
1993-04-30,6.05,2.22,-0.6554
1993-05-31,6.16,1.92,-0.3864
1993-06-30,5.80,1.77,-0.2924
...,...,...,...
2021-06-30,1.45,1.20,0.5444
2021-07-31,1.24,1.05,0.0073
2021-08-31,1.30,1.10,0.1029
2021-09-30,1.52,1.24,0.5955


In [12]:
SHEET = 'spy (total returns)'
spy = pd.read_excel(DATAPATH,sheet_name=SHEET)
spy.set_index('date',inplace=True)
spy

Unnamed: 0_level_0,SPY
date,Unnamed: 1_level_1
1993-02-28,0.0107
1993-03-31,0.0224
1993-04-30,-0.0256
1993-05-31,0.0270
1993-06-30,0.0037
...,...
2021-06-30,0.0225
2021-07-31,0.0244
2021-08-31,0.0298
2021-09-30,-0.0466


## Useful Functions

In [13]:
def oos_rsquared(data,forecasts,null=None):
    data = data.copy()
    forecasts = forecasts.copy()
    null = null.copy()
    
    # if no Null forecast given, use expanding mean
    if null is None:
        null = data.expanding().mean().shift()

    # label Data and Null accordingly--input may be series or dataframe
    if isinstance(null, pd.DataFrame):
        null.columns = ['Null']
    elif isinstance(null,pd.Series):
        null.name = 'Null'
    if isinstance(data, pd.DataFrame):
        data.columns = ['Data']
    elif isinstance(data,pd.Series):
        data.name = 'Data'

    # double check data is aligned and no NaN (null will have NaN in first value by default)
    alldata = forecasts.join(data,how='inner',rsuffix='_Data').join(null,how='inner',rsuffix='_Null').dropna(axis=0)
    null = alldata[['Null']]
    data = alldata[['Data']]
    forecasts = alldata.drop(columns=['Data','Null'])


    # Forecast MSE
    err_forecast = forecasts.subtract(data.values)
    mse_forecast = (err_forecast**2).sum()

    # Null MSE
    err_null = null.subtract(data.values)
    mse_null = (err_null**2).sum()

    # OOS R-squared
    r2oos = (1 - mse_forecast/mse_null.values).to_frame().T
    r2oos.index = ['OOS-Rsquared']

    return r2oos

## 5.1 Forecasting In-Sample

### Careful with lagging

- `sigs_lag`: lag the signals
- `spy_aligned`: align the target (as the lagged signals will lose first row to NaN)
- `spy`: keep the full version of target for the expanding mean later

In [14]:
# lag the independent variable, so that we can align date stamps and still have lag
sigs_lag = sigs.shift().dropna()

# align the data frames to the same subset of dates
sigs_lag, spy_aligned = sigs_lag.align(spy[['SPY']], join='inner',axis=0)

In [15]:
X = sigs_lag
y = spy_aligned
mod = LinearRegression().fit(X,y)
forecasts = pd.DataFrame(data=mod.predict(X), columns=['Active'],index=sigs_lag.index)
forecasts

Unnamed: 0_level_0,Active
date,Unnamed: 1_level_1
1993-03-31,0.0020
1993-04-30,0.0067
1993-05-31,0.0060
1993-06-30,0.0039
1993-07-31,0.0044
...,...
2021-06-30,0.0051
2021-07-31,0.0068
2021-08-31,0.0150
2021-09-30,0.0134


In [16]:
forecast_ols = pd.DataFrame(mod.coef_,columns=X.columns,index=['OLS'])
forecast_ols.loc['OLS','alpha'] = mod.intercept_
forecast_ols.loc['OLS','R-squared'] = mod.score(X,y)
forecast_ols

Unnamed: 0,Level,Slope,Inflation Growth,alpha,R-squared
OLS,-0.0023,-0.0055,-0.0129,0.0238,0.0296


## 5.2 Trading the Forecast

In [17]:
wts = 100 * forecasts

fund_returns = wts * spy_aligned.values
fund_returns.insert(0,'Passive', spy_aligned)
fund_returns

Unnamed: 0_level_0,Passive,Active
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1993-03-31,0.0224,0.0046
1993-04-30,-0.0256,-0.0170
1993-05-31,0.0270,0.0162
1993-06-30,0.0037,0.0014
1993-07-31,-0.0049,-0.0021
...,...,...
2021-06-30,0.0225,0.0115
2021-07-31,0.0244,0.0166
2021-08-31,0.0298,0.0447
2021-09-30,-0.0466,-0.0623


## 5.3 Strategy Return Stats

In [18]:
display(performanceMetrics(fund_returns,annualization=12).style.format('{:.2%}'))
display(maximumDrawdown(fund_returns))

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
Passive,11.14%,14.62%,76.20%,-16.52%,12.70%
Active,16.66%,17.72%,94.02%,-18.29%,32.09%


Unnamed: 0,Max Drawdown,Peak,Bottom,Recover,Duration (to Recover)
Passive,-0.508,2007-10-31,2009-02-28,2012-03-31,1613 days
Active,-0.2833,2007-10-31,2009-02-28,2009-08-31,670 days


## 5.4 Linear Factor Decomposition

In [19]:
get_ols_metrics(spy_aligned,fund_returns,annualization=12).style.format('{:.2%}')

Unnamed: 0,alpha,SPY,r-squared,Treynor Ratio,Info Ratio
Passive,0.00%,100.00%,100.00%,11.14%,nan%
Active,6.50%,91.18%,56.61%,18.27%,55.70%


## 5.5 and 5.6: Forecasting OOS

In [20]:
# start predict one period earlier than we want the first forecast
START_PREDICT = pd.to_datetime('1999-12-31')

forecasts_OOS = pd.DataFrame(columns=['Forecast'],index=spy_aligned.index, dtype='float64')

est = LinearRegression()

Xlag = sigs_lag
X = sigs
y = spy_aligned

for t in spy_aligned.loc[START_PREDICT:,:].index:
    yt = y.loc[:t].values.reshape(-1,1)
    Xlag_t = Xlag.loc[:t,:].values
    x_t = X.loc[t,:].values.reshape(1,-1)

    est.fit(Xlag_t,yt);
    predval = est.predict(x_t)[0,0]

    # this timing is assigning forecast to datestamp of info used to make the forecast
    forecasts_OOS.loc[t,'Forecast'] = predval

# make sure expanded mean (baseline forecast) uses all spy data, (spy, not spy_aligned)
forecasts_OOS.insert(0,'Mean', spy.expanding().mean().dropna())

# more convenient to have datestamp reflect date of the forecasted value
forecasts_OOS = forecasts_OOS.shift(1).dropna()
forecasts_OOS

Unnamed: 0_level_0,Mean,Forecast
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-01-31,0.0170,0.0209
2000-02-29,0.0162,0.0254
2000-03-31,0.0158,0.0271
2000-04-30,0.0167,0.0384
2000-05-31,0.0161,0.0303
...,...,...
2021-06-30,0.0091,0.0044
2021-07-31,0.0092,0.0062
2021-08-31,0.0092,0.0149
2021-09-30,0.0093,0.0134


## 5.7 OOS R-squared

In [21]:
spy_OOS, _ = spy.align(forecasts_OOS, join='right', axis=0)

oos_rsquared(spy_OOS,forecasts_OOS,forecasts_OOS[['Mean']])

Unnamed: 0,Mean,Forecast
OOS-Rsquared,0.0,-0.011


## 5.8 Correlation between Forecast and Target

- Yes, the forecast is positively correlated to the target, (SPY). 
- However, the baseline forecast (the expanded mean) is negatively correlated to the target.

In [22]:
corr_val = forecasts_OOS.corrwith(spy_OOS['SPY'])
corr_val.to_frame('Corr. to SPY')

Unnamed: 0,Corr. to SPY
Mean,-0.1623
Forecast,0.0729


## 5.9 Trading the Forecast

In [23]:
wts_OOS = 100 * forecasts_OOS

fund_returns_OOS = wts_OOS * spy_OOS.values
fund_returns_OOS.insert(0,'Passive', spy_OOS)
fund_returns_OOS

Unnamed: 0_level_0,Passive,Mean,Forecast
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2000-01-31,-0.0498,-0.0844,-0.1038
2000-02-29,-0.0152,-0.0246,-0.0386
2000-03-31,0.0969,0.1530,0.2628
2000-04-30,-0.0351,-0.0588,-0.1350
2000-05-31,-0.0157,-0.0254,-0.0477
...,...,...,...
2021-06-30,0.0225,0.0205,0.0098
2021-07-31,0.0244,0.0224,0.0153
2021-08-31,0.0298,0.0274,0.0444
2021-09-30,-0.0466,-0.0432,-0.0623


## 5.10 Performance Statistics

In [24]:
display(performanceMetrics(fund_returns_OOS,annualization=12).style.format('{:.2%}'))
display(maximumDrawdown(fund_returns_OOS))
display(get_ols_metrics(spy_OOS,fund_returns_OOS,annualization=12).style.format('{:.2%}'))

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
Passive,8.23%,15.04%,54.71%,-16.52%,12.70%
Mean,5.60%,13.57%,41.25%,-13.54%,15.30%
Forecast,11.38%,21.45%,53.04%,-25.68%,29.48%


Unnamed: 0,Max Drawdown,Peak,Bottom,Recover,Duration (to Recover)
Passive,-0.508,2007-10-31,2009-02-28,2012-03-31,1613 days
Mean,-0.5177,2000-08-31,2002-09-30,2013-07-31,4717 days
Forecast,-0.5051,2000-03-31,2003-02-28,2009-02-28,3256 days


Unnamed: 0,alpha,SPY,r-squared,Treynor Ratio,Info Ratio
Passive,0.00%,100.00%,100.00%,8.23%,nan%
Mean,-1.57%,87.06%,93.12%,6.43%,-43.99%
Forecast,6.58%,58.27%,16.70%,19.53%,33.62%


## 5.11 Conclusions

The Regression-Forecast Strategy **is attractive** on as part of a portfolio.
- *As part of a portfolio*: Alpha is positive, Info ratio is positive.
The Regression-Forecast Strategy **is mediocre** as a stand-alone strategy compared to SPY.
- *Stand-alone*: Mean and Sharpe are lower than SPY (Passive)

The Mean-Forecast Strategy **is not attractive** on its own or as part of a portfolio.
- *Stand-alone*: Mean and Sharpe are lower than SPY (Passive)
- *As part of a portfolio*: Alpha is negative, Info ratio is substantially negative

# 6 FX Carry

In [25]:
DATAPATH = '../data/final_exam_data.xlsx'
SHEET = 'fx rates'
fx = pd.read_excel(DATAPATH,sheet_name=SHEET)
fx.set_index('date',inplace=True)

In [26]:
SHEET = 'risk-free rates'
rf = pd.read_excel(DATAPATH,sheet_name=SHEET)
rf.set_index('date',inplace=True)

USDRF = 'RF-USD'

## 6.1 Prepare Data

In [27]:
logFX = np.log(fx)
logRF = np.log(rf+1)

logFX.mean().to_frame('Mean').append(logRF.mean().to_frame('Mean'))

Unnamed: 0,Mean
USMX,-2.5801
RF-USD,0.0017
RF-MXN,0.0066


## 6.2 Excess Returns

### Excess log returns on FX can be written as
$$\tilde{\texttt{r}}^{\text{MXN}}_{t+1} \equiv \left(\texttt{s}_{t+1}^{\text{MXN}} - \texttt{s}_{t}^{\text{MXN}}\right) - \left(\texttt{r}^{f,\text{USD}}_{t,t+1} - \texttt{r}^{f,\text{MXN}}_{t,t+1}\right)$$
which we can rewrite as
$$\tilde{\texttt{r}}^{\text{MXN}}_{t+1} = \Delta{\texttt{s}}_{t+1}^{\text{MXN}}  - \tilde{\texttt{r}}^{f,\text{USD}}_{t,t+1}$$
where 
$$\Delta{\texttt{s}}_{t+1}^{\text{MXN}} \equiv \texttt{s}_{t+1}^{\text{MXN}} - \texttt{s}_{t}^{\text{MXN}}$$
and
$$\tilde{\texttt{r}}^{f,\text{USD}}_{t,t+1} \equiv \texttt{r}^{f,\text{USD}}_{t,t+1} - \texttt{r}^{f,\text{MXN}}_{t,t+1}$$

Note then that the return realized at $t+1$ depends on a time-$t+1$ variable, $\Delta{\texttt{s}}_{t+1}^{\text{MXN}}$ and a time-$t$ variable, $\tilde{\texttt{r}}^{f,\text{USD}}_{t,t+1}$.

Because of this, we will need to define both and **shift the spread of risk-free rates a period.**

In [28]:
logRFspread = (logRF['RF-USD']-logRF['RF-MXN']).shift(1).rename('RF Spread')
logFXgrowth = logFX['USMX'].diff().rename('MXN Growth')

logRX = (logFXgrowth - logRFspread).to_frame('MXN Excess Return')
logRX

performanceMetrics(logRX,annualization=12)

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
MXN Excess Return,0.0273,0.1101,0.2477,-0.1678,0.0833


## 6.3 FX Return Components

Look at the mean of the two individual components of the excess returns to see how they were impacted by average FX growth and interest-rate spread.

In [29]:
FXcomponents = pd.concat([logFXgrowth,logRFspread,logRX],axis=1)
FXcomponents.mean()

MXN Growth          -0.0026
RF Spread           -0.0049
MXN Excess Return    0.0023
dtype: float64

- We see that the FX growth was negative on average, which means that the USD appreciated over the sample. (This was drag on mean excess returns to MXN.)
- We see that the USD-MXN spread was negative on average, which indicates that MXN rates were higher on average, which **helped the excess return to MXN** over this sample.
- (In fact the latter effect was large enough to outweigh the former effect, and deliver a positive mean excess return over the sample.)

Also fine if you checked the overall trend by comparing the first and last values of the log or level values. Those interpretations would give the same answer.

## 6.4

The majority interpretation of this question is whether cumulative MXN returns underperform the USD risk-free rate. 
- So we are investigating if the sum of log returns is smaller than the sum of log USD risk-free rates.
- But this is equivalent to whether the sum of **excess** log returns is less than zero.
- Which is equivalent to whether the **mean excess log** return is less than zero.
$$\mathcal{P}\left(r_{t,t+h}<r^{f}_{t,t+h}\right)$$
$$=\mathcal{P}\left(\tilde{r}_{t,t+h}<0\right) = 
\mathcal{P}\left(\overline{\tilde{r}} < 0\right)$$

Assuming log-normal returns means that the distribution of excess log returns are,
$$\tilde{r}_{t,t+h}\sim \mathcal{N}\left(h\tilde{\mu},h\tilde{\sigma}^2\right)$$
$$\overline{\tilde{r}}\sim \mathcal{N}\left(\tilde{\mu},\frac{\tilde{\sigma}^2}{h}\right)$$
Thus,
$$\mathcal{P}\left(\overline{\tilde{r}} < 0\right) = \Phi\left(-\frac{\tilde{\mu}}{\frac{\tilde{\sigma}}{\sqrt{h}}}\right) = 
\Phi\left(-\sqrt{h}\frac{\tilde{\mu}}{\tilde{\sigma}}\right)$$
where $\Phi$ denotes the standard-normal CDF.

In [30]:
import scipy.stats as scistats

YRS = 10
PERYR = 12

mu_tilde = logRX.mean()
sigma_tilde = logRX.std()
prob_rx = scistats.norm.cdf(-np.sqrt(YRS*PERYR)*(mu_tilde/sigma_tilde))[0]
underperform = pd.DataFrame(prob_rx, columns=['RfUSD'],index=['Probability of Underperforming'])

### You could have interpreted this slightly differently
You may have read it as asking the probability of underperformance of cumulative MXN returns to 0--not the USD risk-free rate.

If so, you would solve as above, and the answer is the same, just using total log returns, not excess log returns.
$$\mathcal{P}\left(r_{t,t+h}<0\right) = \mathcal{P}\left(\overline{r}<0\right) = \Phi\left(-\sqrt{h}\frac{\mu}{\sigma}\right)$$

In [31]:
### For those that interpreted it as whether the total return underperforms 0
# total returns
logR = logRX.add(logRF['RF-USD'],axis=0)
# underperformance
mu = logR.mean()
sigma = logR.std()
prob_r = scistats.norm.cdf(-np.sqrt(YRS*PERYR)*(mu/sigma))[0]
underperform.loc['Probability of Underperforming','0'] = prob_r

**Such reasonable interpretations are fine, and answers to both are below.**

In [32]:
underperform.style.format('{:.2%}')

Unnamed: 0,RfUSD,0
Probability of Underperforming,21.67%,8.59%


## 6.6 Forecasting Regression

In [33]:
y, X = logFXgrowth.to_frame().dropna().align(logRFspread.to_frame().dropna(),join='inner',axis=0)
mod = LinearRegression().fit(X,y)

FXpredictOLS = pd.DataFrame(
    {'alpha':mod.intercept_[0],
     'beta':mod.coef_[0,0],
     'r-squared':mod.score(X,y)},
    index=['MXN Growth'])

FXpredictOLS

Unnamed: 0,alpha,beta,r-squared
MXN Growth,-0.0079,-1.0954,0.0111


## 6.5 Predicting Appreciation or Depreciation?
### As announced this problem should follow 6.6.

**In short, according to the regression, a larger MXN-USD interest-rate spread predicts a depreciation of the USD.**

Based on the OLS stats, an increase spread of MXN-USD (which is a **decrease** in our regressor, given that it is constructed as USD-MXN) predicts an **increase** in the FX rate, which is a **depreciation** in the USD relative to MXN.

Note that this logic is the opposite of the UIP: it says that the bigger interest rate differential is not offset by an appreciation of the USD, but rather is exacerbated by a depreciation of the USD.

## 6.8 Forecasting Excess Returns

In [34]:
forecast = pd.DataFrame(data=mod.predict(X),index=X.index,columns=['Forecasted FX Growth'])
forecast['Forecasted MXN Excess Return'] = forecast.subtract(logRFspread,axis=0)
forecast

Unnamed: 0_level_0,Forecasted FX Growth,Forecasted MXN Excess Return
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1999-02-28,0.0195,4.4442e-02
1999-03-31,0.0158,3.7378e-02
1999-04-30,0.0114,2.8932e-02
1999-05-31,0.0083,2.3169e-02
1999-06-30,0.0085,2.3497e-02
...,...,...
2021-06-30,-0.0041,-7.0651e-04
2021-07-31,-0.0041,-6.4460e-04
2021-08-31,-0.0038,-1.2599e-04
2021-09-30,-0.0037,9.6851e-05


## 6.7 Fraction of Positive Months

This question could have been interpreted as the fraction of months with a positive forecast of excess returns, (in which case, it is based on the forecasts reported in 6.8)

Or it could have been interpreted as the fraction of months with a positive forecast of FX growth, which would have flowed naturally given that the question came right after the FX growth forecast regression.

**Either interpretation is fine, and both solutions are given.**

In [35]:
forecast_positive = ((forecast.dropna() > 0).sum() / forecast.dropna().shape[0]).to_frame().T
forecast_positive.index = ['fraction positive']
forecast_positive

Unnamed: 0,Forecasted FX Growth,Forecasted MXN Excess Return
fraction positive,0.1099,0.6264


## 6.9

UIP Violations
- The mean excess returns to MXN (problem 6.2)
- The FX forecasting beta not equalling +1 (problem 6.6)
- The FX alpha not being zero
*The last two points are speaking to the point estimates, we did not look formally at the statistical significance.*

UIP Consistencies:
- UIP has nothing to say about excess return volatility (problem 6.2)
- UIP has nothing to say about forecasting r-squared (problem 6.6)

## 6.10

CIP Violations
- None!

CIP is a no-arbitrage relationship, and has nothing to say about the statistics we calculated here.

(Did no need to explain CIP, but as an aside, CIP would say something about the relationship of FX forward rates to interest-rate differentials.)