# Midterm 1

## 1. Short Answer

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm 
import matplotlib as mpl
import seaborn as sns
import scipy as scs
import math
plt.style.use("seaborn")
mpl.rcParams['font.family'] = 'serif'
%matplotlib inline

1. Suppose an endowment is optimizing across multiple asset classes: equities, bonds, commodities, and currencies. They are considering whether to add cryptocurrencies as a new asset
class for their mean-variance optimization. Do you think it is reasonable to make this decision
based on whether the Sharpe ratio of crypto is higher than the Sharpe ratios of their current
asset classes? Explain.

- No it is not reasonable to make this decision based on whether the Sharpe ratio of cryptocurrency is higher than the other asset classes. The endowment must consider the risk-return trade off and covariances with respect to other asset classes when they consider an appropriate allocation of cryptocurrency. 

2. (5pts) True or False. (And explain your reason.)
We found that small changes in estimated mean returns have large impacts on the mean-variance
frontier. (By “frontier”, we mean the set of achievable means and variances.)


- **True:** Small changes can drastically change the mean-variance frontier. The mean-variance frontier is extremly sensitive to small changes in capital market assumptions because of the invertible convariance matrix used to calculate these efficient allocation is very explosive - meaning one slight change in the asset will have significant changes the mean-variance frontier.

3. (5pts) One might estimate the Value-at-Risk directly from the historic data, (the empirical cdf,)
or from using an approximation based on the normal distribution.
Which did we find performed better in actual data? How did we judge which performed better?

Using historical Value-at-Risk from the empirical cdf performed worse than using a rolling value at risk measure from the Barnstable case. To measure this, you count the number of times the actual returns were less than the approximated 5% value at risk. 
Using the data I found that actual returns were less than the historical VaR 3% of the time, whereas the actual returns were less than the rolling VaR 4.9% of the time. Obviously, the closer our estimated VaR frequency is to 5% the better it is because that is the theoretical VaR meausure we try to compute. 

4.. (5pts) What approach does Harvard take to getting more realistic weights?
What is a problem with this approach?

Harvard used a non-negative least square approach to calcualte more realistic weights. The problem with this approach is that is the solution weights do not scale proportionaly with the target mean.

5. What aspect of tools such as Ridge regression and LASSO regression are useful in meanvariance optimization? That is, what problem in the classic MV solution are they addressing?

While classic MV optimization produces the *most efficient portfolio*, it produces extremley unrealistic portfolios with ridiculous weight allocations. Ride and Lasso regressions penalizes paramters to shrink them down to create a more focused portfolio. They solve the inverted covariance matrix problem.

6. (5pts) Consider three series: HFRIFWI, MLEIFCTR, and HDG.
Explain how they differ. That is, why were we interested in all three with regard to hedge-fund
replication.

Proshares developed the Hedge Replcation ETF (**HDG**) that investors could buy to get exposure to the broad risk-return profile of hedge-funds. In order to replicate the returns of hedge funds, they used the Merrill Lynch Factor Model as a benchmark to replicate the HFRI index. The reason HDG did not just try to replicate HFRI is because the HFRI index is not investable. 


## 2. Allocation (15 Points)

- Consider mean-variance optimization using total returns. (That is, you are NOT analyzing excess
returns. No need to subtract or otherwise consider a risk-free rate.)


1. (5pts) Report the weights of the Global Minimum Variance (GMV) portfolio, ω
v
, and the weights
of the Tangency portfolio, ω
t

In [5]:
file_path = "C:/Users/dcste/OneDrive/Portfolio_Theory/Homework_Jupyter/portfolio_theory/midterm_1.xlsx"
total_returns = pd.read_excel(file_path, parse_dates=True).set_index("date")
total_returns.head()

Unnamed: 0_level_0,CL1,GC1,KC1,ES1,BP1
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2009-01-31,-0.113532,0.048627,0.061111,-0.086109,-0.007831
2009-02-28,0.04468,0.015187,-0.079479,-0.107294,-0.008309
2009-03-31,0.087892,-0.021111,0.034402,0.087209,0.001745
2009-04-30,-0.013826,-0.036543,-0.006491,0.094679,0.032897
2009-05-31,0.287437,0.0983,0.185518,0.055172,0.088934


In [36]:
def gmv_portfolio(asset_return):
    """ 
        Returns the Global Minimum Variance portfolio weights in a (1 x n) vector
        Inputs: 
            asset_return - return for each asset (n x 1) Vector
            cov_matrix = nxn covariance matrix for the assets
    """
    mu_tilde = asset_return.mean()
    asset_cov = asset_return.cov()
    inverted_cov= np.linalg.inv(asset_cov)
    one_vector = np.ones(len(asset_cov.index))
    
    den = (one_vector @ inverted_cov) @ (one_vector)
    num =  inverted_cov @ one_vector
    gmv_p = pd.Series((1/den)*num, index = mu_tilde.index )
    return gmv_p
def tangency_portfolio(asset_return):

    mu_tilde = asset_return.mean()
    cov_matrix = asset_return.cov()
    inverted_cov = np.linalg.inv(cov_matrix)
    one_vector = cov_matrix.shape[0]
    weights = (inverted_cov@mu_tilde)/(np.ones(one_vector)@inverted_cov@mu_tilde)
    omega_tan = pd.Series(weights, index = mu_tilde.index)
    
    return omega_tan
def mv_portfolio(asset_return,target_ret,tangency_port):
    mu_tilde = asset_return.mean()
    sigma = asset_return.cov()
    sigma_inv = np.linalg.inv(sigma)
    n = sigma.shape[0]
    weight_v = (sigma_inv @ np.ones(n))/(np.ones(n)@sigma_inv @ np.ones(n))
    weight_t = tangency_port
    omega = (target_ret-mu_tilde.T@weight_v)/(mu_tilde.T@weight_t - mu_tilde@weight_v)
    omega_star = omega*weight_t + (1-omega)*weight_v
    opt_port = pd.Series(omega_star, index = mu_tilde.index)
    return opt_port


In [42]:
min_v = gmv_portfolio(total_returns)
tang_p = tangency_portfolio(total_returns)
mv_port = mv_portfolio(total_returns,.02,tang_p)


In [29]:
min_v

CL1   -0.030674
GC1    0.179111
KC1   -0.010639
ES1    0.092154
BP1    0.770048
dtype: float64

In [38]:
tang_p

CL1   -0.128124
GC1    1.191087
KC1    0.097813
ES1    4.220019
BP1   -4.380796
dtype: float64

In [43]:
mv_port

CL1   -0.065837
GC1    0.544268
KC1    0.028494
ES1    1.581636
BP1   -1.088561
dtype: float64

In [40]:
def portfolio_stats(df_tilde, omega, annualize_fac):
    # Mean
    mean = df_tilde.mean() @ omega * annualize_fac

    # Volatility
    vol = (df_tilde @ omega).std() * np.sqrt(12)


    # Sharpe ratio
    sharpe_ratio = mean / vol

    return round(pd.DataFrame(data = [mean, vol, sharpe_ratio], 
                              index = ['Mean', 'Volatility', 'Sharpe'], 
                              columns = ['Portfolio Stats']), 4)

In [44]:
portfolio_stats(total_returns,mv_port,12)

Unnamed: 0,Portfolio Stats
Mean,0.24
Volatility,0.2245
Sharpe,1.0692


Suppose we re-did this problem assuming the existence of a risk-free rate. Would the
solution with mean return of 0.20 be guaranteed to have less than or equal volatility as the
solution above, where we did not have a risk-free rate? Why?


The volatilitiy would be guarnteed to be less than the MV optimization because investors would allocate towards the safer asset that offers zero risk.

## Replication
Consider replicating ES1 with BP1.

In [47]:
y = total_returns["ES1"]

X = sm.add_constant(total_returns["BP1"])

ESI_regress = sm.OLS(y,X).fit()
ESI_regress.params

const    0.011401
BP1      0.817343
dtype: float64

In [48]:
from statsmodels.regression.rolling import RollingOLS

In [54]:
model_rolling = RollingOLS(y,X, window =36)
rolling_betas = model_rolling.fit().params.copy()

rolling_betas.mean()

const    0.011664
BP1      0.749920
dtype: float64

In [56]:
rep_IS = (rolling_betas*X).sum(axis = 1, skipna = False)
rep_IS.tail()

date
2022-02-28    0.010155
2022-03-31   -0.009789
2022-04-30   -0.038274
2022-05-31    0.013573
2022-06-30   -0.030127
dtype: float64

In [57]:
rolling_betas.tail()

Unnamed: 0_level_0,const,BP1
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-02-28,0.01366,1.178892
2022-03-31,0.014433,1.144168
2022-04-30,0.01227,1.247557
2022-05-31,0.013008,1.186588
2022-06-30,0.010191,1.251756


## Part Four


In [66]:
w = np.repeat(1/2,2)
data = total_returns[["GC1","ES1"]]


In [63]:
def tail_risk_report(data, q):
    df = data.copy()
    df.index = data.index.date
    report = pd.DataFrame(columns = df.columns)
    
    report.loc['Skewness'] = df.skew()
    report.loc['Excess Kurtosis'] = df.kurtosis()
    report.loc['VaR'] = df.quantile(q)
    report.loc['Expected Shortfall'] = df[df < df.quantile(q)].mean()
    
    cum_ret = (1 + df).cumprod()
    rolling_max = cum_ret.cummax()
    drawdown = (cum_ret - rolling_max) / rolling_max
    report.loc['Max Drawdown'] = drawdown.min()
    report.loc['MDD Start'] = None
    report.loc['MDD End'] = drawdown.idxmin()
    report.loc['Recovery Date'] = None
    
    for col in df.columns:
        report.loc['MDD Start', col] = (rolling_max.loc[:report.loc['MDD End', col]])[col].idxmax()
        recovery_df = (drawdown.loc[report.loc['MDD End', col]:])[col]
        try:
            report.loc['Recovery Date', col] = recovery_df[recovery_df >= 0].index[0]
            report.loc['Recovery period (days)'] = (report.loc['Recovery Date'] - report.loc['MDD Start']).dt.days

        except:
            report.loc['Recovery Date', col] = None
            report.loc['Recovery period (days)'] = None

    return round(report,4)

In [64]:
tail_risk_report(data,.05)

  report.loc['MDD Start'] = None
  report.loc['Recovery Date'] = None
  report.loc['Recovery period (days)'] = None


Unnamed: 0,GC1,ES1
Skewness,0.113357,-0.439491
Excess Kurtosis,0.142579,0.723828
VaR,-0.065664,-0.071244
Expected Shortfall,-0.08592,-0.091453
Max Drawdown,-0.429597,-0.203174
MDD Start,2011-08-31,2019-12-31
MDD End,2015-12-31,2020-03-31
Recovery Date,,2020-07-31
Recovery period (days),,213.0


In [67]:
portfolio_stats(data,w,12)

Unnamed: 0,Portfolio Stats
Mean,0.0916
Volatility,0.1141
Sharpe,0.8026
