# Midterm 1

## FINM 36700 - 2023

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

# Instructions

## Please note the following:

Points
* The exam is 100 points.
* You have 120 minutes to complete the exam.
* For every minute late you submit the exam, you will lose one point.
Final Exam

Submission
* You will upload your solution to the `Midterm 1` assignment on Canvas, where you downloaded this. (Be sure to **submit** on Canvas, not just **save** on Canvas.
* Your submission should be readable, (the graders can understand your answers,) and it should **include all code used in your analysis in a file format that the code can be executed.** 

Rules
* The exam is open-material, closed-communication.
* You do not need to cite material from the course github repo--you are welcome to use the code posted there without citation.

Advice
* If you find any question to be unclear, state your interpretation and proceed. We will only answer questions of interpretation if there is a typo, error, etc.
* The exam will be graded for partial credit.

## Data

**All data files are found in the class github repo, in the `data` folder.**

This exam makes use of the following data files:
* `midterm_data_1.xlsx`

This file has sheets for...
* `info` - names of each stock ticker
* `excess returns` - weekly excess returns on several stocks
* `SPY` - weekly excess returns on SPY

Note the data is **weekly** so any annualizations should use `52` weeks in a year.

#### If useful
here is code to load in the data.

In [1]:
import pandas as pd

FILEIN = '../data/midterm_1_data.xlsx'
sheet_exrets = 'excess returns'
sheet_spy = 'spy'

retsx = pd.read_excel(FILEIN, sheet_name=sheet_exrets).set_index('date')
spy = pd.read_excel(FILEIN, sheet_name=sheet_spy).set_index('date')

## Scoring

| Problem | Points |
|---------|--------|
| 1       | 20     |
| 2       | 35     |
| 3       | 30     |
| 4       | 15     |

### Each numbered question is worth 5 points.

### Notation
(Hidden LaTeX commands)

$$\newcommand{\mux}{\tilde{\boldsymbol{\mu}}}$$
$$\newcommand{\wtan}{\boldsymbol{\text{w}}^{\text{tan}}}$$
$$\newcommand{\wtarg}{\boldsymbol{\text{w}}^{\text{port}}}$$
$$\newcommand{\mutarg}{\tilde{\boldsymbol{\mu}}^{\text{port}}}$$
$$\newcommand{\wEW}{\boldsymbol{\text{w}}^{\text{EW}}}$$
$$\newcommand{\wRP}{\boldsymbol{\text{w}}^{\text{RP}}}$$
$$\newcommand{\wREG}{\boldsymbol{\text{w}}^{\text{REG}}}$$

# 1. Short Answer

### No Data Needed

These problem does not require any data file. Rather, analyze the situation conceptually, based on the information below. 

## 1

In what sense was ProShares `HDG` successful in hedging the `HFRI`, and in what sense was it unsuccessful in tracking the `HFRI`?

<font color='orange'>
Answer: ProShares replied HFRI in terms of variarion (because of that they showed a high correlation and R-squared), but they got a lower return in comparison with HFRI, having a lower Sharpe Ratio.
<font color='orange'>

## 2

We discussed multiple ways of calculating Value-at-Risk (VaR). What are the tradeoffs between using the normal distribution formula versus a directly empirical approach?

<font color='orange'>
Answer: 

* By using a normal distribution, we are assuming that the returns distributited normaly and which does not take into account the real tails of the returns (skwenss and Kurtorsis of the sample) . But you can prodive a good estimation with less data available

* On the other hand, the empirical or historial VaR, relaid on the data available, that means it's going to interpolate the percentile if the sample does not have enough data. It also assume iid returns. But it does not require any assumption on the data distribution and it is easy to implement


<font color='orange'>

## 3

Did we find that **TIPS** have been useful in expanding the mean-variance frontier in the past? Did we conclude they might be useful in the future? Explain.

<font color='orange'>
Answer: The tangent portfolio didn't change by excluding TIPS, but it could be usuful in the future, since in the HW 2 by only changing the expected return on one standard desviation the fronteir was expanded  
<font color='orange'>

## 4.

What aspect of the classic mean-variance optimization approach leads to extreme answers? How did regularization help with this issue?

<font color='orange'>
Answer: When the covariance matrix is nearly singular (ie. det( $\_sigma$ ) ~ 0), the result of the Mean-Variance optimization show a high variability of the weightes to a change on the expeted returns. Therefore it applifies the possible errors of the mean return stimations. By regularazing the covariance matrix, the covariance (not the variances of the assets) is reduced to the half, decreasing the estimated dependence of the assets and therefore reducioning the changes respect to the variation of the expected returns
<font color='orange'>

***

# 2. Allocation

Consider a mean-variance optimization of **excess** returns provided in `midterm_1_data.xlsx.`

In [37]:
import pandas as pd
import numpy as np
import seaborn as sns
from scipy.stats import skew,kurtosis,norm

In [3]:
def get_metrics(port_metrics,weights = [],adj_factor = 12):
    if len(weights) == 0:
        port_metrics_r = pd.DataFrame({"Mean": port_metrics.mean()*adj_factor,"Volatility":port_metrics.std()*np.sqrt(adj_factor)})
        port_metrics_r["Sharpe_Ratio"] = (port_metrics.mean() / port_metrics.std()) * np.sqrt(adj_factor)
        port_metrics_r["Skew"] = skew(port_metrics)
        port_metrics_r["Excess Kurtosis"] = kurtosis(port_metrics, fisher=True)    
    else:
        port_metrics = port_metrics @ weights
        port_metrics_r = pd.DataFrame({"Mean": port_metrics.mean()*adj_factor,"Volatility":port_metrics.std()*np.sqrt(adj_factor)})
        port_metrics_r["Sharpe_Ratio"] = (port_metrics.mean() / port_metrics.std()) * np.sqrt(adj_factor)
        port_metrics_r["Skew"] = skew(port_metrics)
        port_metrics_r["Excess Kurtosis"] = kurtosis(port_metrics, fisher=True)
    return port_metrics_r

def VaR_CVaR_Drawdown_metrics(data_daily_return):
    result = pd.DataFrame()
    for asset in data_daily_return.columns:

        data_aux = data_daily_return[[asset]].copy()

            
        VaR = np.percentile(sorted(data_aux.values),q = 5)
        CVaR = data_aux[data_aux[asset] <= VaR].mean().values[0]

        data_aux_acum_return = (data_aux + 1).cumprod()
        data_aux_max_cum_return = data_aux_acum_return.cummax()
        data_aux_drawdown = ((data_aux_acum_return-data_aux_max_cum_return)/data_aux_max_cum_return)
        max_drawdown = data_aux_drawdown.min().values[0]
        max_drawdown_date = data_aux_drawdown.idxmin().values[0]
        peak_idx = data_aux_max_cum_return.idxmax().values[0]

        recovery_idx = data_aux_drawdown[data_aux_drawdown.idxmin().values[0]:].gt(-0.00001).idxmax().values[0]

        aux_result = pd.DataFrame([[VaR,CVaR,max_drawdown,max_drawdown_date,peak_idx,recovery_idx,(recovery_idx - max_drawdown_date)/ np.timedelta64(1, 'D')]], columns= ["VaR","CVaR","Max Drawdown","Bottom","Peak","Recovery","Duration (days)"], index = [asset])
        result = pd.concat([result,aux_result],axis=0)

    return result,data_aux_drawdown

def get_metrics_all(returns,adj_factor = 12):
    metrics1 = get_metrics(returns,adj_factor = 12)
    metrics2,_ = VaR_CVaR_Drawdown_metrics(returns)
    return pd.merge(metrics1,metrics2, left_index= True, right_index=True, how = "left")

## 1. 

Report the following **annualized** statistics:
* mean
* volatility
* Sharpe ratio

Which assets have the highest / lowest Sharpe ratios?

In [4]:
metrics_ex_return = get_metrics(retsx, adj_factor=52)[["Mean","Volatility","Sharpe_Ratio"]]
metrics_ex_return = metrics_ex_return.sort_values(by = ["Sharpe_Ratio"],ascending=False)
metrics_ex_return.style.format('{:.2%}')

Unnamed: 0,Mean,Volatility,Sharpe_Ratio
NVDA,65.07%,46.81%,139.00%
MSFT,28.81%,24.02%,119.93%
AAPL,31.94%,28.39%,112.52%
TSLA,56.97%,60.70%,93.86%
AMZN,23.95%,31.04%,77.15%
GOOGL,19.33%,27.42%,70.50%
XOM,12.42%,31.16%,39.86%


In [5]:
print(f"The asset with the highest Sharpe Ratio is \n {metrics_ex_return[["Sharpe_Ratio"]].head(1)} \n and the asset with the lowest SR is: \n {metrics_ex_return[["Sharpe_Ratio"]].tail(1)}")

The asset with the highest Sharpe Ratio is 
       Sharpe_Ratio
NVDA      1.390011 
 and the asset with the lowest SR is: 
      Sharpe_Ratio
XOM      0.398557


## 2.

Report the weights of the tangency portfolio.

Also report the Sharpe ratio achieved by the tangency portfolio over this sample.

In [6]:
def weights_tang(return_db, adj_factor = 12):
    sigma = (return_db.cov()*adj_factor)
    mu_excess = (return_db.mean()*adj_factor)
    vector = np.ones(len(mu_excess))
    w_tan = (np.linalg.inv(sigma) @ mu_excess )/(np.transpose(vector) @ np.linalg.inv(sigma) @ mu_excess)
    weights_db = pd.DataFrame({"w_tan": w_tan})
    weights_db.index = return_db.columns
    return weights_db

In [7]:
w_tanget = weights_tang(retsx, adj_factor = 52)

In [8]:
get_metrics(retsx @ w_tanget, adj_factor = 52)

Unnamed: 0,Mean,Volatility,Sharpe_Ratio,Skew,Excess Kurtosis
w_tan,0.563474,0.358351,1.572409,0.004363,1.859662


## 3.

* What weight is given to the asset with the lowest Sharpe ratio?
* What Sharpe ratio does the lowest (most negative) weight asset have?

Explain. Support your answer with evidence.

In [9]:
Tanget_port = pd.merge(metrics_ex_return,w_tanget, left_index=True, right_index=True, how = "left")

In [10]:
print(f"The asset with the hightest SR {Tanget_port["Sharpe_Ratio"].max()} has a weight of {Tanget_port.loc[(Tanget_port["Sharpe_Ratio"] == Tanget_port["Sharpe_Ratio"].max()),"w_tan"]}")
display(Tanget_port[(Tanget_port["Sharpe_Ratio"] == Tanget_port["Sharpe_Ratio"].max())])

print(f"The asset with the lowest weight {Tanget_port.loc[(Tanget_port["w_tan"] == Tanget_port["w_tan"].min()),"w_tan"]} has a SR of {Tanget_port.loc[(Tanget_port["w_tan"] == Tanget_port["w_tan"].min()),"Sharpe_Ratio"]}")
display(Tanget_port.loc[(Tanget_port["w_tan"] == Tanget_port["w_tan"].min()),:])

The asset with the hightest SR 1.3900105643328675 has a weight of NVDA    0.495996
Name: w_tan, dtype: float64


Unnamed: 0,Mean,Volatility,Sharpe_Ratio,w_tan
NVDA,0.650658,0.468096,1.390011,0.495996


The asset with the lowest weight GOOGL   -0.502721
Name: w_tan, dtype: float64 has a SR of GOOGL    0.70502
Name: Sharpe_Ratio, dtype: float64


Unnamed: 0,Mean,Volatility,Sharpe_Ratio,w_tan
GOOGL,0.193328,0.274217,0.70502,-0.502721


## 4.

Let's examine the out-of-sample performance.

Calculate and report the following three allocations using only data through the end of 2022:
* tangency portfolio
* equally weighted portfolio
* a regularized approach, with a new formula shown below

where
$$\wEW_i = \frac{1}{n}$$

$$\wREG \sim \widehat{\Sigma}^{-1}\mux$$

$$\widehat{\Sigma} = \frac{\Sigma + \boldsymbol{2}\,\Sigma_D}{\boldsymbol{3}}$$
where $\Sigma_D$ denotes a *diagonal* matrix of the security variances, with zeros in the off-diagonals.

In [11]:
def weights_tang(return_db, adj_factor = 12):
    sigma = (return_db.cov()*adj_factor)
    mu_excess = (return_db.mean()*adj_factor)
    vector = np.ones(len(mu_excess))
    w_tan = (np.linalg.inv(sigma) @ mu_excess )/(np.transpose(vector) @ np.linalg.inv(sigma) @ mu_excess)
    weights_db = pd.DataFrame({"w_tan": w_tan})
    weights_db.index = return_db.columns
    return weights_db

In [12]:
def weights_tag_reg(return_db, adj_factor = 12, diagonal_factor = 2, denominator = 3):
    sigma = (return_db.cov()*adj_factor)
    sigma_reg = (sigma + diagonal_factor*np.diag(np.diag(sigma)))/denominator
    mu_excess = (return_db.mean()*adj_factor)
    vector = np.ones(len(mu_excess))
    w_tan = (np.linalg.inv(sigma_reg) @ mu_excess )/(np.transpose(vector) @ np.linalg.inv(sigma_reg) @ mu_excess)
    weights_db = pd.DataFrame({"w_tan_reg": w_tan})
    weights_db.index = return_db.columns
    return weights_db

In [13]:
def weights_equally_weighted(return_db):
    n = len(return_db.columns)
    weights_db = pd.DataFrame({"w_equally_weighted": [1/n]*n})
    weights_db.index = return_db.columns
    return weights_db

In [14]:
retsx_2022 = retsx[retsx.index <= "2022-12-31"]

In [15]:
get_metrics(retsx @ w_tanget, adj_factor = 52)


Unnamed: 0,Mean,Volatility,Sharpe_Ratio,Skew,Excess Kurtosis
w_tan,0.563474,0.358351,1.572409,0.004363,1.859662


In [16]:
w_tanget = weights_tang(retsx_2022, adj_factor = 52)
w_equally_weighted = weights_equally_weighted(retsx_2022)
q_tag_reg = weights_tag_reg(retsx_2022, adj_factor = 52, diagonal_factor = 2, denominator = 3)

in_sample_w = pd.merge(pd.merge(w_tanget,w_equally_weighted, left_index=True, right_index=True, how = "left"),q_tag_reg, left_index=True, right_index=True, how = "left")
in_sample_metrics = get_metrics(retsx_2022 @ in_sample_w, adj_factor = 52)

In [17]:
in_sample_w.style.format('{:.2%}')

Unnamed: 0,w_tan,w_equally_weighted,w_tan_reg
AAPL,31.06%,14.29%,23.73%
MSFT,107.31%,14.29%,33.08%
AMZN,-25.91%,14.29%,4.72%
NVDA,38.01%,14.29%,19.68%
GOOGL,-75.15%,14.29%,1.14%
TSLA,10.16%,14.29%,9.02%
XOM,14.53%,14.29%,8.64%


In [18]:
in_sample_metrics

Unnamed: 0,Mean,Volatility,Sharpe_Ratio,Skew,Excess Kurtosis
w_tan,0.471913,0.331179,1.424947,-0.120234,2.975133
w_equally_weighted,0.293432,0.260804,1.125106,-0.296262,1.672516
w_tan_reg,0.325593,0.260874,1.248085,-0.36496,1.859803


## 5.

Report the out-of-sample (2023) performance of all three portfolios in terms of annualized mean, vol, and Sharpe.

In [19]:
retsx_after_2022 = retsx[retsx.index > "2022-12-31"]

In [20]:
get_metrics(retsx_after_2022 @ in_sample_w, adj_factor = 52)

Unnamed: 0,Mean,Volatility,Sharpe_Ratio,Skew,Excess Kurtosis
w_tan,1.204709,0.443716,2.715043,-0.115814,-0.2066
w_equally_weighted,0.955133,0.246953,3.867668,-0.171646,0.645163
w_tan_reg,1.013509,0.250254,4.049917,-0.151873,0.125547


## 6.

Imagine just for this problem that this data is for **total** returns, not excess returns.

Report the weights of the global-minimum-variance portfolio.

In [21]:
def weights_MV(return_db, adj_factor = 12):
    sigma = (return_db.cov()*adj_factor)
    mu_excess = np.ones(len(return_db.columns))
    vector = np.ones(len(return_db.columns))
    w_tan = (np.linalg.inv(sigma) @ mu_excess )/(np.transpose(vector) @ np.linalg.inv(sigma) @ mu_excess)
    weights_db = pd.DataFrame({"w_MV": w_tan})
    weights_db.index = return_db.columns
    return weights_db

In [22]:
w_MV = weights_MV(retsx)

In [23]:
w_MV

Unnamed: 0,w_MV
AAPL,0.206231
MSFT,0.49125
AMZN,0.160866
NVDA,-0.119168
GOOGL,0.011378
TSLA,-0.046927
XOM,0.296369


In [24]:
MV_metrics = get_metrics(retsx @ w_MV, adj_factor = 52)

In [25]:
MV_metrics

Unnamed: 0,Mean,Volatility,Sharpe_Ratio,Skew,Excess Kurtosis
w_MV,0.180652,0.202905,0.890329,-0.443317,2.620049


In [26]:
T_metrics = get_metrics(retsx @ w_tanget, adj_factor = 52)

In [27]:
T_metrics

Unnamed: 0,Mean,Volatility,Sharpe_Ratio,Skew,Excess Kurtosis
w_tan,0.524255,0.340746,1.538553,-0.077035,2.531587


## 7.

To target a mean return of 0.005%, would you be long or short this global minimum variance portfolio?

In [28]:
r_target = 0.005/100
postion_tanget = (r_target-0.180652	)/(0.524255-0.180652)

In [29]:
print(f"The MV portfolio in order to achive a target return of 0.005% is {(1 - postion_tanget)*100:.2f}%, then it's a long position ")

The MV portfolio in order to achive a target return of 0.005% is 152.56%, then it's a long position 


***

# 3. Performance

## 1. 

Report the following performance metrics of excess returns for Tesla (`TSLA`).
* skewness
* kurtosis

You are not annualizing any of these stats.

What do these metrics indicate about the nature of the returns?

In [30]:
get_metrics(retsx).loc["TSLA"]

Mean               0.131476
Volatility         0.291606
Sharpe_Ratio       0.450868
Skew               0.439764
Excess Kurtosis    1.492697
Name: TSLA, dtype: float64

We can interpreted that the returns of TSLA does not have a normal distribution, due to Skewness > 0 -> Big positive returns and Kurtosis < 3 -> Fat Tails 

## 2. 

Report the maximum drawdown for `TSLA` over the sample.
* Ignore that your data is in excess returns rather than total returns.
* Simply proceed with the excess return data for this calculation.

In [31]:
get_metrics_all(retsx,adj_factor = 52).loc["TSLA","Max Drawdown"]

-0.6821852296331565

## 3.

For `TSLA`, calculate the following metrics, relative to `SPY`:
* market beta
* alpha
* sortino ratio

Annualize alpha and sortino ratio.

In [32]:
import statsmodels.api as sm

def benchmark_regresion(data,benchmark = "SPY US Equity",adj = 12):
    result = pd.DataFrame()
    for asset in data.drop([benchmark],axis=1).columns:
        X = sm.add_constant(data[benchmark])
        y = data[asset]
        mod = sm.OLS(y, X).fit()
        inter, beta = mod.params.values[0], mod.params.values[1]
        rsquare = mod.rsquared
        std_errors= mod.resid.std()
        TR = (y.mean()/beta)*adj
        IR = (inter/std_errors)*np.sqrt(adj)
        Sortino = y.mean()/data[data[asset]<0][asset].std() * np.sqrt(adj)
        aux_result = pd.DataFrame([[inter*adj,beta,rsquare,std_errors,y.mean()*adj,TR,IR,Sortino]],columns=["Alpha","Beta","R-square","std_errors","R_mean","Treynor Ratio","Information Ratio","Sortino Ratio"], index = [asset])
        result = pd.concat([result,aux_result],axis=0)
    return result
     

In [33]:
returns = pd.merge(retsx,spy,left_index=True,right_index=True,how="left")

In [34]:
benchmark_regresion(returns,benchmark = "SPY",adj = 52).loc["TSLA",["Alpha","Beta","Sortino Ratio"]]

Alpha            0.309470
Beta             1.776825
Sortino Ratio    1.642329
Name: TSLA, dtype: float64

## 4.

Continuing with `TSLA`, calculate the full-sample, 5th-percentile CVaR.
* Use the `normal` formula, assuming mean returns are zero.
* Use the full-sample volatility.

Use the entire sample to calculate a single CVaR number. 

CVaR Parametric (q = `0.05`):

$$ \sigma * (-norm.pdf(1.65)/0.05)$$

In [35]:
def CVaR_parametric(data, alpha = 0.05):
    return data.std()*(-norm.pdf(1.65)/alpha)

In [43]:
CVaR_parametric(retsx[["TSLA"]])

TSLA   -0.172172
dtype: float64

## 5.

Now calculate the 5th-percentile, one-period ahead, **VaR** for `TSLA`.

Here, calculate the running series of VaR estimates.

Again, 
* use the normal formula, with mean zero.

But now, use the rolling volatility, based on 
* rolling window or $m=52$ weeks.

Report the final 5 values of your calculated VaR series.

In [None]:
def Rolling_window_VaR_CVaR(data, asset, alpha = 0.05, m = 52):
    data = data[[asset]]
    data["Rolling_Vol"] = np.sqrt((data.shift()[asset]**2).rolling(m).mean())
    data["Rolling_VaR"] = data["Rolling_Vol"]*(-1.65)#round(norm.ppf(alpha),2)
    data["Rolling_CVaR"] = data["Rolling_Vol"]*(-norm.pdf(round(norm.ppf(alpha)))/alpha)
    Hit_Ratio = data.loc[data[asset] < data["Rolling_VaR"],asset].count()/len(data["Rolling_VaR"].dropna())
    return data, Hit_Ratio

In [77]:
var = -1.65 * retsx["TSLA"].rolling(52).std().shift().dropna()
var.tail(5)

date
2023-06-16   -0.157886
2023-06-23   -0.157712
2023-06-30   -0.155459
2023-07-07   -0.153663
2023-07-14   -0.152046
Name: TSLA, dtype: float64

In [85]:
data, Hit_Ratio = Rolling_window_VaR_CVaR(retsx, "TSLA", alpha = 0.05, m = 52)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["Rolling_Vol"] = np.sqrt((data.shift()[asset]**2).rolling(m).mean())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["Rolling_VaR"] = data["Rolling_Vol"]*(-1.65)#round(norm.ppf(alpha),2)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["Rolling_CVaR"] = data["Rolling_Vol"]*(-norm.pdf(ro

In [86]:
data["Rolling_VaR"].tail(5)

date
2023-06-16   -0.156624
2023-06-23   -0.156740
2023-06-30   -0.154200
2023-07-07   -0.152694
2023-07-14   -0.150967
Name: Rolling_VaR, dtype: float64

## 6. 

Calculate the out-of-sample **hit ratio** for your VaR series reported in your previous answer.

In [87]:
Hit_Ratio

0.05588235294117647

***

# 4. Hedging

## 1. 

Consider the following scenario: you are holding a \$100 million long position in `NVDA`. You wish to hedge the position using some combination of 
* `AAPL`
* `AMZN`
* `GOOGL`
* `MSFT`

Report the positions you would hold of those 4 securities for an optimal hedge.

Note:
* In the regression estimation, include an intercept.
* Use the full-sample regression. No need to worry about in-sample versus out-of-sample.

In [133]:
def Linear_Factor_Descomposition(data, y_asset,x_asset, adj = 12, constant = True):

    Y = data[y_asset]
    X = data[x_asset]

    if constant:
        X = sm.add_constant(X)

    mod = sm.OLS(Y, X).fit()
    inter = mod.params.values[0]

    rsquare = mod.rsquared
    std_errors= mod.resid.std()
    tracking_error = mod.resid.std() * np.sqrt(adj)

    metrics = pd.DataFrame([[inter,inter*adj,rsquare,std_errors,tracking_error]],columns=["Alpha","Alpha Adj","R-square","std_errors","tracking_error"], index = [y_asset])
    return pd.DataFrame(mod.params).T, metrics,mod.summary()

In [134]:
beta, metrics,summary = Linear_Factor_Descomposition(retsx, y_asset = "NVDA",x_asset = ["AAPL","AMZN","GOOGL","MSFT"], adj = 52,constant = True)

In [130]:
display((beta[["AAPL","AMZN","GOOGL","MSFT"]] * -100_000_000).style.format('${:,.0f}'))
print(f"Total amount invested: ${(beta[["AAPL","AMZN","GOOGL","MSFT"]] * -100_000_000).sum(axis=1)[0]:,.0f}")

Unnamed: 0,AAPL,AMZN,GOOGL,MSFT
0,"$-34,168,649","$-41,725,986","$784,795","$-58,789,673"


Total amount invested: $-133,899,513


## 2.

How well does the hedge do? Cite a regression statistic to support your answer.

Also estimate the volatility of the basis, (epsilon.)

In [131]:
summary

0,1,2,3
Dep. Variable:,NVDA,R-squared:,0.458
Model:,OLS,Adj. R-squared:,0.453
Method:,Least Squares,F-statistic:,81.81
Date:,"Sat, 19 Oct 2024",Prob (F-statistic):,2.85e-50
Time:,20:03:16,Log-Likelihood:,636.39
No. Observations:,392,AIC:,-1263.0
Df Residuals:,387,BIC:,-1243.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0053,0.002,2.134,0.033,0.000,0.010
AAPL,0.3417,0.084,4.045,0.000,0.176,0.508
AMZN,0.4173,0.079,5.269,0.000,0.262,0.573
GOOGL,-0.0078,0.102,-0.077,0.939,-0.208,0.192
MSFT,0.5879,0.126,4.669,0.000,0.340,0.835

0,1,2,3
Omnibus:,117.513,Durbin-Watson:,1.766
Prob(Omnibus):,0.0,Jarque-Bera (JB):,529.636
Skew:,1.224,Prob(JB):,9.79e-116
Kurtosis:,8.141,Cond. No.,58.4


In [135]:
metrics

Unnamed: 0,Alpha,Alpha Adj,R-square,std_errors,tracking_error
NVDA,0.005264,0.273752,0.458168,0.047782,0.344562


The R-Square value is relatively low at 45.81%, indicating that NDVA could potentially be hedged with a different combination of assets and the current hedge only cover 46% of the variance of NVDA. Upon examining the statistical summary, it is evident that both the intercept and the beta coefficient for GOOGL are low, suggesting that GOOGL is not a significant predictor in the model. Consequently, we can re-estimate the regression by excluding the intercept and GOOGL. This adjustment results in an increased R-Square value of 47%, and all remaining variables become statistically significant.

In [112]:
Linear_Factor_Descomposition(retsx, y_asset = "NVDA",x_asset = ["AAPL","AMZN","MSFT"], constant = False)

(      AAPL      AMZN      MSFT
 0  0.35139  0.413544  0.602423,
         Alpha  R-square  std_errors
 NVDA  0.35139  0.471434    0.047791,
 <class 'statsmodels.iolib.summary.Summary'>
 """
                                  OLS Regression Results                                
 Dep. Variable:                   NVDA   R-squared (uncentered):                   0.471
 Model:                            OLS   Adj. R-squared (uncentered):              0.467
 Method:                 Least Squares   F-statistic:                              115.7
 Date:                Sat, 19 Oct 2024   Prob (F-statistic):                    1.51e-53
 Time:                        19:54:00   Log-Likelihood:                          634.08
 No. Observations:                 392   AIC:                                     -1262.
 Df Residuals:                     389   BIC:                                     -1250.
 Df Model:                           3                                                  
 Covarian

## 3.

Report the annualized intercept. By including this intercept, what are you assuming about the nature of the returns of `NVDA` as well as the returns of the hedging instruments?

In [136]:
metrics

Unnamed: 0,Alpha,Alpha Adj,R-square,std_errors,tracking_error
NVDA,0.005264,0.273752,0.458168,0.047782,0.344562


By including the intercept, we are assuming that the sample averages are not good predictors of the future averages. Thus we are allowing an intercept in the hedging regression, to ensure differences in mean returns do not impact the betas, which are the hedge recommendations.

If we really believed these sample averages are predictive, we would want the hedge ratios to account for that, and thus exclude an intercept, forcing these averages to impact the betas.