# Midterm #1A
## FINM 36700 - 2021

## Imports

In [38]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from arch import arch_model
from arch.univariate import GARCH, EWMAVariance 
from sklearn import linear_model
import scipy
import scipy.stats as stats
from statsmodels.regression.rolling import RollingOLS
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
pd.set_option("display.precision", 4)

## Data

In [40]:
file_path = '../data/proshares_analysis_data.xlsx'
xl = pd.ExcelFile(file_path)
print('Sheet names: %s'%(xl.sheet_names))

df_desc = xl.parse('descriptions')
df_hf = xl.parse('hedge_fund_series', index_col = 0, parse_dates=True)
df_mf = xl.parse('merrill_factors', index_col = 0, parse_dates=True)
df_other = xl.parse('other_data', index_col = 0, parse_dates=True)
df_mf_ex = df_mf.subtract(df_mf['USGG3M Index'],axis=0).drop(columns=['USGG3M Index'])

Sheet names: ['descriptions', 'hedge_fund_series', 'merrill_factors', 'other_data']


# 1. Short Answer 

(These answers are longer than I expect of students--just being thorough.)

## 1
**False**. MV Optimization seeks to maximize Sharpe of the portfolio, but that is not achieved by weighting individual assets proportional to their individual Sharpe ratios. Rather, an asset's covariances are an important determinant in whether it has a high/low, positive/negative weight.

## 2
**False**. Based on our discussion of LETFs, they do not track their benchmark over the long run due to nonlinearities of the compounding. However, many do track their benchmark over the short-run well. We saw for several S&P500 LETFs that over the long-run their cumulative return was a fraction of the benchmark.

## 3
**Either answer could make sense.**

**YES Intercept** This product is new, and we have high uncertainty about what its mean will be over the long-run. Its mean over a single year will be a very noisy, imprecise measure of this. Accordingly, it makes sense to include an intercept to ensure that the one-year sample average is not influencing the regression too much.

**NO Intercept** We may be confident that while BITO is new, we have enough data on Bitcoin to expect its long-run mean is much higher than SPY (possibly due to much higher risk.) Thus, if our prior is that the mean of BITO is much higher than the mean of SPY, the replication will fail at matching this unless we exclude an intercept and ensure this difference of means influences the replication regression.

## 4
**Yes**. Per HW#2, we found that HDG tracks HFRI (via ML Factor Exchange Series) with high R-squared both in-sample and out-of-sample.

It is reasonable if an answer emphasizes that HDG's direct benchmark is not HFRI, but rather the ML series. Still, HDG tracks this series with high (even higher!) r-squared both in-sample and out-of-sample.

## 5
**Alpha is relative to the regressors used in the estimation.** The hedge-fund is likely boasting about alpha relative to just the S&P500, or some other model. Even if the fund has unexplained mean returns relative to SPY, it does not mean it has unexplained mean returns relative to the ML style factors.

## 2 Allocation

In [41]:
def performanceMetrics(returns, annualization=1):
    metrics = pd.DataFrame(index=returns.columns)
    metrics['Mean'] = returns.mean() * annualization
    metrics['Vol'] = returns.std() * np.sqrt(annualization)
    metrics['Sharpe'] = (returns.mean() / returns.std()) * np.sqrt(annualization)

    metrics['Min'] = returns.min()
    metrics['Max'] = returns.max()

    return metrics

# When calculating the portfolio performance, we could also use the function below 
def portfolio_stats(omega, mu_tilde, Sigma, annualize_fac):
    
    # Mean
    mean = (mu_tilde @ omega) * annualize_fac

    # Volatility
    vol = np.sqrt(omega @ Sigma @ omega) * np.sqrt(annualize_fac)

    # Sharpe ratio
    sharpe_ratio = mean / vol
    
    df_result = pd.DataFrame(data = [mean, vol, sharpe_ratio], 
                              index = ['Mean', 'Volatility', 'Sharpe'], 
                              columns = ['Portfolio Stats'])

    return round(df_result, 4)

def compute_tangency(df_tilde, diagonalize_Sigma=False):

    Sigma = df_tilde.cov()

    # N is the number of assets

    N = Sigma.shape[0]

    Sigma_adj = Sigma.copy()

    if diagonalize_Sigma:

        Sigma_adj.loc[:,:] = np.diag(np.diag(Sigma_adj))

    mu_tilde = df_tilde.mean()

    Sigma_inv = np.linalg.inv(Sigma_adj)

    weights = Sigma_inv @ mu_tilde / (np.ones(N) @ Sigma_inv @ mu_tilde)

    # For convenience, I'll wrap the solution back into a pandas.Series object.
    omega_tangency = pd.Series(weights, index=mu_tilde.index)
    
    return omega_tangency, mu_tilde, Sigma_adj


def target_mv_portfolio(df_tilde, target_return=0.01, diagonalize_Sigma=False):

    omega_tangency, mu_tilde, Sigma = compute_tangency(df_tilde, diagonalize_Sigma=diagonalize_Sigma)

    Sigma_adj = Sigma.copy()

    if diagonalize_Sigma:

        Sigma_adj.loc[:,:] = np.diag(np.diag(Sigma_adj))

    Sigma_inv = np.linalg.inv(Sigma_adj)

    N = Sigma_adj.shape[0]

    delta_tilde = ((np.ones(N) @ Sigma_inv @ mu_tilde)/(mu_tilde @ Sigma_inv @ mu_tilde)) * target_return

    omega_star = delta_tilde * omega_tangency

    return omega_star, mu_tilde, Sigma_adj

#### 1) Weights of the tangency portfolio

In [42]:
omega_tangency, mu_tilde, Sigma = compute_tangency(df_mf_ex)
omega_tangency.to_frame('Tangency Weights')

Unnamed: 0,Tangency Weights
SPY US Equity,2.1736
EEM US Equity,-0.1521
EFA US Equity,-0.7548
EUO US Equity,0.1818
IWM US Equity,-0.4485


In [5]:
omega_tangency.sum()

1.0000000000000007

#### 2) Weights of the optimal portfolio, with targeted excess mean return of 0.02 per month. Is the optimal portfolio invested in the risk-free rate?

In [6]:
omega_star, mu_tilde, Sigma = target_mv_portfolio(df_mf_ex, target_return=0.02)
omega_star.to_frame('MV Portfolio Weights')

Unnamed: 0,MV Portfolio Weights
SPY US Equity,2.5161
EEM US Equity,-0.1761
EFA US Equity,-0.8737
EUO US Equity,0.2105
IWM US Equity,-0.5192


In [7]:
# Since the total weights added up is above 1, the portfolio is short in risk-free rate
omega_star.sum()

1.1575610833779102

#### 3) Report the man, vol, Sharpe ratio of the optimized portfolio. Annualize all stats

In [8]:
df_optimal_port = pd.DataFrame(df_mf_ex @ omega_star, columns= ['optimal portfolio'])
performanceMetrics(df_optimal_port, annualization=12)

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
optimal portfolio,0.24,0.1586,1.5136,-0.1123,0.1868


#### 4) Re-calculate the optimal portfolio, with the same targeted excess mean return. But only use data ```through 2018``` in doing the calculation. Calculate the return in 2019-2021 based on those optimal weights. Report on those optimal weights. Report mean, vol, Sharpe ratio of 2019-2020 performance.

In [9]:
df_mf_ex_is = df_mf_ex.loc[:'2018-12-31',]
df_mf_ex_oos = df_mf_ex.loc['2019-01-01':,]

In [10]:
omega_star_is, mu_tilde_is, Sigma_is = target_mv_portfolio(df_mf_ex_is, target_return=0.02)

omega_star_is

SPY US Equity    2.9575
EEM US Equity   -0.3045
EFA US Equity   -0.8239
EUO US Equity    0.1676
IWM US Equity   -0.7442
dtype: float64

In [11]:
df_optimal_port_oos = pd.DataFrame(df_mf_ex_oos @ omega_star_is, columns= ['optimal portfolio'])
performanceMetrics(df_optimal_port_oos, annualization=12)

Unnamed: 0,Mean,Vol,Sharpe,Min,Max
optimal portfolio,0.3531,0.2387,1.479,-0.0925,0.2045


#### 5) Suppose that instead of optimizing these 5 risky assets, we optimized 5 commodity futures: oil, coffee, cocoa, lumber, cattle, and gold. Do you think the out-of-sample fragility problem would be better or worse than what we have seen optimizing equities?

The biggest reason the MV solution is “fragile” out-of-sample is due to the inversion of the covariance matrix. In HW#1 we learned that optimization over highly correlated assets leads to over-fitting, (as seen in extreme long-short portfolios, etc.) Thus, we expect the optimization will be overfit particularly in cases where the assets are highly correlated. 

The five commodities are much less correlated to each other than our five factors, (which include several equity-focused securities.) We saw lower correlation in commodities in one of our demos, but just from the stated descriptions, we can infer the commodities will likely have less correlation and thus less of a problem with the inverted covariance matrix.

## 3 Hedging & Replication

**3.1:** (5pts) What is the optimal hedge ratio over the full sample of data? That is, for every dollar invested in EEM, what would you invest in SPY?

In [12]:
y = df_mf_ex['EEM US Equity']
X = df_mf_ex['SPY US Equity']

hedge_reg = sm.OLS(y, X).fit()

hedge_reg.params.to_frame(r'$h^{*}$')

Unnamed: 0,$h^{*}$
SPY US Equity,0.9257


For every $\$1$ invested in EEM, we would short $\$0.9279$ of SPY to build the market hedged position.

**3.2:** (5pts) What is the mean, volatility, and Sharpe ratio of the hedged position, had we applied
that hedge throughout the full sample? Annualize the statistics.

In [13]:
def summary_stats(df, annual_fac):
    ss_df = (df.mean() * annual_fac).to_frame('Mean')
    ss_df['Vol'] = df.std() * np.sqrt(annual_fac)
    ss_df['Sharpe'] = ss_df['Mean'] / ss_df['Vol']
    
    return ss_df.T

In [14]:
hedged_pos = (df_mf_ex['EEM US Equity'] - hedge_reg.params[0] * df_mf_ex['SPY US Equity']).to_frame('Market Hedged EEM')

summary_stats(hedged_pos, 12)

Unnamed: 0,Market Hedged EEM
Mean,-0.0935
Vol,0.1258
Sharpe,-0.7433


**3.3:** (5pts) Does it have the same mean as EEM? Why or why not?

In [15]:
print('EEM mean annualized excess return: ' + str(round(df_mf_ex['EEM US Equity'].mean() * 12, 4)))

EEM mean annualized excess return: 0.0378


No, the hedged portfolio does not have the same mean as EEM. This is because we have subtracted a hedged position of SPY from EEM. The mean of the hedged portfolio is the following:
> $\mu_{h} = \mu_{EEM} - \beta_{SPY,EEM} \cdot \mu_{SPY}$. 

The hedged portfolio is EEM with exposure to SPY hedged out, so as long as some exposure to SPY exists and SPY has a non-zero mean the mean of the hedged portfolio will be different than the mean of EEM.

**Another acceptable answer if students interpreted this question as asking if the mean of the lhs and rhs of the regression are the same:**

The mean is not the same as EEM due to the fact that we did not include an intercept. Thus, the regression is balancing two objectives: match the mean and match the variation. To fit the variation, the regression gives up a lot of error on the mean return. This is particularly acute in this regression due to the fact that SPY and EEM have substantially different means.

**3.4:** (5pts) Suppose we estimated a multifactor regression where in addition to SPY, we had IWM as a regressor. Why might this regression be difficult to use for attribution or even hedging?

In [16]:
df_desc

Unnamed: 0.1,Unnamed: 0,Descriptions
0,MLEIFCTR Index,ML Factor Model
1,EFA US Equity,ISHARES MSCI EAFE ETF
2,HFRIFWI Index,Hedge Fund Research HFRI Fund
3,HDG US Equity,PROSHARES HEDGE REPLICAT ETF
4,UPRO US Equity,PROSHARES ULTRAPRO S&P 500
5,TRVCI Index,Refinitiv VC Index
6,USGG3M Index,US Generic Govt 3 Mth
7,SPY US Equity,SPDR S&P 500 ETF TRUST
8,EEM US Equity,ISHARES MSCI EMERGING MARKET
9,EUO US Equity,PROSHARES ULTRASHORT EURO


In [17]:
print('Correlation between IWM and SPY: ' + str(round(df_mf_ex.corr().loc['IWM US Equity', 'SPY US Equity'], 4)))

Correlation between IWM and SPY: 0.8816


IWM and SPY are highly correlated (both are ETFs that track a large number of US equities across many industries), which would lead to multi-collinearity in our multifactor regression. This could lead to unreliable $\beta$ 's as the model will be prone to overfit. 

Because we will have more factors, rebalancing and transaction costs could also become potential issues. 

## 4 Modeling Risk

1. SPY and EFA are highly correlated, yet SPY has had a much higher return. How confident are we that SPY will overperform EFA over the next 10 years?
Note: $R_{t}$ denotes the log returns. 

$$
Pr(R^{SPY}_{t,t+10}>R^{EFA}_{t,t+10})
 = Pr(\overline{R^{SPY}_{t,t+10}}>\overline{R^{EFA}_{t,t+10}})
 = Pr(\overline{R^{SPY}_{t,t+10}} - \overline{R^{EFA}_{t,t+10}} >0)
$$

$$
 \overline{R^{SPY}_{t,t+10}} - \overline{R^{EFA}_{t,t+10}} \sim N(\mu_{SPY} - \mu_{EFA}, Var(R^{SPY} - R^{EFA}))
$$


In [18]:
df_mf['R_diff'] = np.log(1+df_mf['SPY US Equity']) - np.log(1+df_mf['EFA US Equity'])

In [19]:
tilde_mu = df_mf['R_diff'].mean()*12
tilde_sigma = df_mf['R_diff'].std()*np.sqrt(12)

def p(h, tilde_mu, tilde_sigma):
    x = - np.sqrt(h) * tilde_mu / tilde_sigma
    val = scipy.stats.norm.cdf(x)
    return val

print(f"The probability of SPY overperforming EFA over the next 10 years is {1 - p(10, tilde_mu, tilde_sigma)}.")

The probability of SPY overperforming EFA over the next 10 years is 0.9997096146741639.


2. Calculate the 60-month rolling volatility of EFA.  
Use the latest estimate of the volatility (Sep 2021), along with the normality formula, to calculate
a Sep 2021 estimate of the 1-month, 1% VaR. In using the VaR formula, assume that the mean
is zero.

In [20]:
sigma_roll = df_mf['EFA US Equity'].dropna().rolling(60).std()
sigma_roll.tail()

date
2021-05-31    0.0417
2021-06-30    0.0416
2021-07-31    0.0414
2021-08-31    0.0414
2021-09-30    0.0417
Name: EFA US Equity, dtype: float64

In [21]:
vol_latest = sigma_roll[-1]

In [22]:
mu = 0
z_phi = scipy.stats.norm.ppf(0.01)
VaR_estimate = mu + z_phi*vol_latest

In [23]:
VaR_estimate

-0.09699717879816319

In [24]:
z_phi

-2.3263478740408408