# Midterm 1

## FINM 36700 - 2023

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

# Instructions

## Please note the following:

Points
* The exam is 100 points.
* You have 120 minutes to complete the exam.
* For every minute late you submit the exam, you will lose one point.
Final Exam

Submission
* You will upload your solution to the `Midterm 1` assignment on Canvas, where you downloaded this. (Be sure to **submit** on Canvas, not just **save** on Canvas.
* Your submission should be readable, (the graders can understand your answers,) and it should **include all code used in your analysis in a file format that the code can be executed.** 

Rules
* The exam is open-material, closed-communication.
* You do not need to cite material from the course github repo--you are welcome to use the code posted there without citation.

Advice
* If you find any question to be unclear, state your interpretation and proceed. We will only answer questions of interpretation if there is a typo, error, etc.
* The exam will be graded for partial credit.

## Data

**All data files are found in the class github repo, in the `data` folder.**

This exam makes use of the following data files:
* `midterm_data_1.xlsx`

This file has sheets for...
* `info` - names of each stock ticker
* `excess returns` - weekly excess returns on several stocks
* `SPY` - weekly excess returns on SPY

Note the data is **weekly** so any annualizations should use `52` weeks in a year.

#### If useful
here is code to load in the data.

In [1]:
#Important Functions
#Covering all imports

import pandas as pd
import numpy as np
from scipy.stats import kurtosis, skew
from scipy.stats import norm
import seaborn as sns
import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS
import warnings
warnings.filterwarnings("ignore")

%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib.patheffects as PathEffects

In [8]:
def performance_summary(return_data, annualization = 12):
    """ 
        Returns the Performance Stats for given set of returns
        Inputs: 
            return_data - DataFrame with Date index and Monthly Returns for different assets/strategies.
        Output:
            summary_stats - DataFrame with annualized mean return, vol, sharpe ratio. Skewness, Excess Kurtosis, Var (0.5) and
                            CVaR (0.5) and drawdown based on monthly returns. 
    """
    summary_stats = return_data.mean().to_frame('Mean').apply(lambda x: x*annualization)
    summary_stats['Volatility'] = return_data.std().apply(lambda x: x*np.sqrt(annualization))
    summary_stats['Sharpe Ratio'] = summary_stats['Mean']/summary_stats['Volatility']

    summary_stats['Skewness'] = return_data.skew()
    summary_stats['Excess Kurtosis'] = return_data.kurtosis()
    summary_stats['VaR (0.05)'] = return_data.quantile(.05, axis = 0)
    summary_stats['CVaR (0.05)'] = return_data[return_data <= return_data.quantile(.05, axis = 0)].mean()
    
    wealth_index = 1000*(1+return_data).cumprod()
    previous_peaks = wealth_index.cummax()
    drawdowns = (wealth_index - previous_peaks)/previous_peaks

    summary_stats['Max Drawdown'] = drawdowns.min()
    summary_stats['Peak'] = [previous_peaks[col][:drawdowns[col].idxmin()].idxmax() for col in previous_peaks.columns]
    summary_stats['Bottom'] = drawdowns.idxmin()
    
    recovery_date = []
    for col in wealth_index.columns:
        prev_max = previous_peaks[col][:drawdowns[col].idxmin()].max()
        recovery_wealth = pd.DataFrame([wealth_index[col][drawdowns[col].idxmin():]]).T
        recovery_date.append(recovery_wealth[recovery_wealth[col] >= prev_max].index.min())
    summary_stats['Recovery'] = recovery_date
    
    return summary_stats

def mvo_performance_stats(asset_returns,cov_matrix,port_weights, port_type,period):
    """ 
        Returns the Annualized Performance Stats for given asset returns, portfolio weights and covariance matrix
        Inputs: 
            asset_return - Excess return over the risk free rate for each asset (n x 1) Vector
            cov_matrix = nxn covariance matrix for the assets
            port_weights = weights of the assets in the portfolio (1 x n) Vector
            port_type = Type of Portfolio | Eg - Tangency or Mean-Variance Portfolio
            period = Monthly frequency
    """
    
    ret = np.dot(port_weights,asset_returns)
    vol = np.sqrt(port_weights @ cov_matrix @ port_weights.T)*np.sqrt(period)
    sharpe = ret/vol

    stats = pd.DataFrame([[ret,vol,sharpe]],columns= ["Annualized Return","Annualized Volatility","Annualized Sharpe Ratio"], index = [port_type])
    return stats

def tangency_portfolio_rfr(asset_return,cov_matrix, cov_diagnolize = False):
    """ 
        Returns the tangency portfolio weights in a (1 x n) vector
        Inputs: 
            asset_return - return for each asset (n x 1) Vector
            cov_matrix = nxn covariance matrix for the assets
    """
    if cov_diagnolize:
        asset_cov = np.diag(np.diag(cov_matrix))
    else:
        asset_cov = np.array(cov_matrix)
    inverted_cov= np.linalg.inv(asset_cov)
    one_vector = np.ones(len(cov_matrix.index))
    
    den = (one_vector @ inverted_cov) @ (asset_return)
    num =  inverted_cov @ asset_return
    return (1/den) * num

def mv_portfolio_rfr(asset_return,cov_matrix,target_ret,tangency_port):
    """ 
        Returns the Mean-Variance portfolio weights in a (1 x n) vector when a riskless assset is available
        Inputs: 
            asset_return - Excess return over the risk free rate for each asset (n x 1) Vector
            cov_matrix = nxn covariance matrix for the assets
            target_ret = Target Return (Annualized)
            tangency_port = Tangency portfolio when a riskless assset is available
    """
    asset_cov = np.array(cov_matrix)
    inverted_cov= np.linalg.inv(asset_cov)
    one_vector = np.ones(len(cov_matrix.index))
    
    delta_den = (asset_return.T @ inverted_cov) @ (asset_return)
    delta_num = (one_vector @ inverted_cov) @ (asset_return)
    delta_tilde = (delta_num/delta_den) * target_ret
    return (delta_tilde * tangency_port)

def gmv_portfolio(asset_return,cov_matrix):
    """ 
        Returns the Global Minimum Variance portfolio weights in a (1 x n) vector
        Inputs: 
            asset_return - return for each asset (n x 1) Vector
            cov_matrix = nxn covariance matrix for the assets
    """
    asset_cov = np.array(cov_matrix)
    inverted_cov= np.linalg.inv(asset_cov)
    one_vector = np.ones(len(cov_matrix.index))
    
    den = (one_vector @ inverted_cov) @ (one_vector)
    num =  inverted_cov @ one_vector
    return (1/den) * num

def mv_portfolio(asset_return,cov_matrix,target_ret,tangency_port):
    """ 
        Returns the Mean-Variance portfolio weights in a (1 x n) vector when no riskless assset is available
        Inputs: 
            asset_return - total return for each asset (n x 1) Vector
            cov_matrix = nxn covariance matrix for the assets
            target_ret = Target Return (Not-Annualized)
            tangency_port = Tangency portfolio
    """
    omega_tan = tangency_portfolio_rfr(asset_return.mean(),cov_matrix)
    omega_gmv = gmv_portfolio(asset_return,cov_matrix) 
    
    mu_tan = asset_return.mean() @ omega_tan
    mu_gmv = asset_return.mean() @ omega_gmv
    
    delta = (target_ret - mu_gmv)/(mu_tan - mu_gmv)
    mv_weights = delta * omega_tan + (1-delta)*omega_gmv
    return mv_weights

def regression_based_performance(factor,fund_ret,rf,constant = True):
    """ 
        Returns the Regression based performance Stats for given set of returns and factors
        Inputs:
            factor - Dataframe containing monthly returns of the regressors
            fund_ret - Dataframe containing monthly excess returns of the regressand fund
            rf - Monthly risk free rate of return
        Output:
            summary_stats - (Beta of regression, treynor ratio, information ratio, alpha). 
    """
    if constant:
        X = sm.tools.add_constant(factor)
    else:
        X = factor
    y=fund_ret
    model = sm.OLS(y,X,missing='drop').fit()
    
    if constant:
        beta = model.params[1:]
        alpha = round(float(model.params['const']),6)
        
    else:
        beta = model.params
    treynor_ratio = ((fund_ret.values-rf.values).mean()*12)/beta[0]
    tracking_error = (model.resid.std()*np.sqrt(12))
    if constant:        
        information_ratio = model.params[0]*12/tracking_error
    r_squared = model.rsquared
    if constant:
        return (beta,treynor_ratio,information_ratio,alpha,r_squared,tracking_error)
    else:
        return (beta,treynor_ratio,r_squared,tracking_error)
    

def rolling_regression_param(factor,fund_ret,roll_window = 60):
    """ 
        Returns the Rolling Regression parameters for given set of returns and factors
        Inputs:
            factor - Dataframe containing monthly returns of the regressors
            fund_ret - Dataframe containing monthly excess returns of the regressand fund
            roll_window = rolling window for regression
        Output:
            params - Dataframe with time-t as the index and constant and Betas as columns
    """
    X = sm.add_constant(factor)
    y= fund_ret
    rols = RollingOLS(y, X, window=roll_window)
    rres = rols.fit()
    params = rres.params.copy()
    params.index = np.arange(1, params.shape[0] + 1)
    return params
    
def calc_probability_lowret(num_years,mean_ret_check,mean_ret,vol):
        """ 
        Returns the Probability that the cumulative market return will fall short of the cumulative
        risk-free return for each period
        Inputs: 
            mean - annualized mean returns of market for a period.
            vol - annualized volatility of returns for a period
            num_years - Number of years to calculate
        Output:
            probability - DataFrame with probability for each period (step = 1)
        """
        lst = []
        for n in range (0,num_years+1,1):
            norm_val = np.sqrt(n)*(mean_ret_check - mean_ret)/(vol)
            prob = (norm.cdf(norm_val))*100
            lst.append(pd.DataFrame([[n,prob]],columns=['Time','Probability(%)']))
        probability = pd.concat(lst)
        return probability

def calc_return_metrics(data, as_df=False, adj=12):
    """
    Calculate return metrics for a DataFrame of assets.

    Args:
        data (pd.DataFrame): DataFrame of asset returns.
        as_df (bool, optional): Return a DF or a dict. Defaults to False (return a dict).
        adj (int, optional): Annualization. Defaults to 12.

    Returns:
        Union[dict, DataFrame]: Dict or DataFrame of return metrics.
    """
    summary = dict()
    summary["Annualized Return"] = data.mean() * adj
    summary["Annualized Volatility"] = data.std() * np.sqrt(adj)
    summary["Annualized Sharpe Ratio"] = (
        summary["Annualized Return"] / summary["Annualized Volatility"]
    )
    summary["Annualized Sortino Ratio"] = summary["Annualized Return"] / (
        data[data < 0].std() * np.sqrt(adj)
    )
    return pd.DataFrame(summary, index=data.columns) if as_df else summary

In [3]:
FILEIN = '../data/midterm_1_data.xlsx'
sheet_exrets = 'excess returns'
sheet_spy = 'spy'

retsx = pd.read_excel(FILEIN, sheet_name=sheet_exrets).set_index('date')
spy = pd.read_excel(FILEIN, sheet_name=sheet_spy).set_index('date')

## Scoring

| Problem | Points |
|---------|--------|
| 1       | 20     |
| 2       | 35     |
| 3       | 30     |
| 4       | 15     |

### Each numbered question is worth 5 points.

### Notation
(Hidden LaTeX commands)

$$\newcommand{\mux}{\tilde{\boldsymbol{\mu}}}$$
$$\newcommand{\wtan}{\boldsymbol{\text{w}}^{\text{tan}}}$$
$$\newcommand{\wtarg}{\boldsymbol{\text{w}}^{\text{port}}}$$
$$\newcommand{\mutarg}{\tilde{\boldsymbol{\mu}}^{\text{port}}}$$
$$\newcommand{\wEW}{\boldsymbol{\text{w}}^{\text{EW}}}$$
$$\newcommand{\wRP}{\boldsymbol{\text{w}}^{\text{RP}}}$$
$$\newcommand{\wREG}{\boldsymbol{\text{w}}^{\text{REG}}}$$

# 1. Short Answer

### No Data Needed

These problem does not require any data file. Rather, analyze the situation conceptually, based on the information below. 

## 1

In what sense was ProShares `HDG` successful in hedging the `HFRI`, and in what sense was it unsuccessful in tracking the `HFRI`?

HDG tracks a modified version of the ML Factor Model, MLFM-ES. The Merrill Lynch Factor Model involves indexes which cannot be exactly traded. For that reason, ProShares created a traded version of the Factor Model which replaces non-traded indexes with liquid, traded securities. But 

## 2

We discussed multiple ways of calculating Value-at-Risk (VaR). What are the tradeoffs between using the normal distribution formula versus a directly empirical approach?


**Normal Distribution:**

1. The normal distribution method assumes that asset returns follow a Gaussian (normal) distribution. This implies that returns are symmetric, unimodal, and have constant volatility, which are unrealistic. 
2. The normal distribution method is a parametric approach that relies on estimating the mean and standard deviation of returns. While it simplifies the calculation, it can be sensitive to outliers or data deviations.
3. The normal distribution method is relatively simple and computationally efficient.

**Empirical Approach:**

1. Takes no assumptions and uses historical data, which could be better especicially when normal distributions style of data is unrealistic. 
2. The empirical approach is non-parametric, which means it does not rely on estimated parameters such as mean and standard deviation. It can capture non-normal and fat-tailed return distributions.
3. The empirical approach can be more flexible in capturing the actual characteristics of the return distribution, including skewness and kurtosis, making it suitable for a wide range of assets.

**Tradeoffs:**
Which method we use depends on what we want out of our measures. Some trade-offs include:
1. **Assumption vs. Data:** The normal distribution approach simplifies the modeling process but relies on the assumption of normality. The empirical approach avoids such assumptions but requires a sufficiently long and relevant historical dataset.
2. **Accuracy vs. Robustness:** The normal distribution method can be more accurate if returns follow a normal distribution. However, it may perform poorly during extreme market events. The empirical approach is more robust but may be less accurate if the historical data does not capture future scenario or if we do not have enough data to work with. 
3. **Computational Complexity:** The empirical approach can be computationally more intensive, especially when dealing with large datasets, as it involves sorting and analyzing historical observations. The normal distribution approach is simpler in terms of computation.
4. **Backtesting and Validation:** Both methods require rigorous backtesting and validation to assess their accuracy and reliability. The empirical approach may require more extensive backtesting due to its non-parametric nature.

## 3

Did we find that **TIPS** have been useful in expanding the mean-variance frontier in the past? Did we conclude they might be useful in the future? Explain.

In our original homework 1, we concluded that:
* Dropping TIPS from the investment set barely impacts the weights or the resulting performance.
* Adjusting the mean of TIPS upward even just 1 standard error substantially impacts the allocations and moderately boosts the resulting performance.

Based on just a mean-variance analysis, it seems one could reasonably go either way with TIPS as an alternate asset class. In the argument to keep it separate, there is more diversification between TIPS and bonds than between SPY and many other equity buckets Harvard has. On the other hand, TIPS mostly impact the allocation to domestic bonds and might be seen as another asset in that bucket.

## 4.

What aspect of the classic mean-variance optimization approach leads to extreme answers? How did regularization help with this issue?

The classic mean-variance optimization approach, while powerful, can lead to extreme or impractical portfolio allocations, especially in cases where there is a high degree of noise or multicollinearity in the data. This issue arises because mean-variance optimization aims to find the portfolio with the highest Sharpe ratio, which can result in significant concentration in a few assets or even full allocation to a single asset.

Regularization, such as L2 regularization (ridge regularization), helps with this issue in the following ways:

1. **Reduces Extreme Allocations**: Regularization introduces a penalty term that discourages extreme or concentrated portfolio allocations. This penalty term is based on the sum of the squared values of the portfolio weights. By minimizing the weighted sum of squares, regularization helps prevent the optimization process from allocating an impractical amount to a single asset.

2. **Mitigates Overfitting**: Regularization mitigates overfitting by constraining the model's complexity. In the context of portfolio optimization, this means that the optimization process is less likely to overemphasize the historical returns of a specific asset. Regularization prevents the model from fitting the historical data too closely, leading to allocations that are more robust and balanced.

3. **Improved Stability**: Regularization adds stability to the optimization process. Without regularization, small changes in the input data can lead to significant changes in the optimal portfolio weights. Regularization dampens these fluctuations and provides a smoother and more stable allocation.

4. **Accounting for Data Imperfections**: Regularization acknowledges the presence of noise or errors in the data. In practice, financial data can be noisy, and regularization helps the optimizer account for this noise without allocating disproportionately to assets with unusually high past returns.

In summary, regularization introduces a degree of "smoothing" into the optimization process, resulting in more balanced and practical portfolio allocations. It helps prevent the extreme outcomes that can be associated with classic mean-variance optimization when applied to real-world financial data.

***

# 2. Allocation

Consider a mean-variance optimization of **excess** returns provided in `midterm_1_data.xlsx.`

## 1. 

Report the following **annualized** statistics:
* mean
* volatility
* Sharpe ratio

Which assets have the highest / lowest Sharpe ratios?

In [40]:
summary = performance_summary(retsx, annualization = 1)
sorted_summary = summary[['Mean', 'Volatility', 'Sharpe Ratio']].sort_values(by='Sharpe Ratio', ascending=True)
sorted_summary

Unnamed: 0,Mean,Volatility,Sharpe Ratio
XOM,0.002388,0.043213,0.05527
GOOGL,0.003718,0.038027,0.097769
AMZN,0.004605,0.043043,0.106984
TSLA,0.010956,0.084179,0.130154
AAPL,0.006143,0.039368,0.156035
MSFT,0.00554,0.033311,0.166318
NVDA,0.012513,0.064913,0.19276


XOM has the lowest Sharpe while NVDA has the highest Sharpe. 

## 2.

Report the weights of the tangency portfolio.

Also report the Sharpe ratio achieved by the tangency portfolio over this sample.

In [47]:
wtan = pd.DataFrame(tangency_portfolio_rfr(summary['Mean'], retsx.cov()), index = summary.index, columns=['Tangency Weight'])
wtan

Unnamed: 0,Tangency Weight
AAPL,0.322605
MSFT,0.787496
AMZN,-0.228607
NVDA,0.495996
GOOGL,-0.502721
TSLA,0.105975
XOM,0.019257


## 3.

* What weight is given to the asset with the lowest Sharpe ratio?
* What Sharpe ratio does the lowest (most negative) weight asset have?

Explain. Support your answer with evidence.

XOM has the lowest sharpe but has the a 0.019257 weightage while the GOOGL has a negative weightage of -0.502721, which is the lowest of all the assets. 

## 4.

Let's examine the out-of-sample performance.

Calculate and report the following three allocations using only data through the end of 2022:
* tangency portfolio
* equally weighted portfolio
* a regularized approach, with a new formula shown below

where
$$\wEW_i = \frac{1}{n}$$

$$\wREG \sim \widehat{\Sigma}^{-1}\mux$$

$$\widehat{\Sigma} = \frac{\Sigma + \boldsymbol{2}\,\Sigma_D}{\boldsymbol{3}}$$
where $\Sigma_D$ denotes a *diagonal* matrix of the security variances, with zeros in the off-diagonals.

In [64]:
end_date = '2022-12-31'
data_end_date = '2022-12-31'  # The end of data for calculation

# Filter the data for the desired date range
data = retsx[:end_date]

# Calculate expected returns (use mean of returns as an example)
expected_returns = data.mean()

# Calculate covariance matrix (use cov function as an example)
cov_matrix = retsx.cov()

# Step 3: Calculate portfolio weights

# Equally Weighted Portfolio
n = len(expected_returns)
equally_weighted_weights = np.array([1 / n] * n)

# Regularized Portfolio
# Calculate the regularized covariance matrix as per your formula
Sigma_D = np.diag(np.diag(cov_matrix))
regularized_cov_matrix = (cov_matrix + 2 * Sigma_D) / 3
inv_regularized_cov_matrix = np.linalg.inv(regularized_cov_matrix)
regularized_weights = inv_regularized_cov_matrix @ expected_returns

# Print or return the calculated portfolio weights
equally_weighted_weights_df = pd.DataFrame(equally_weighted_weights, columns=['Weight'], index=expected_returns.index)
regularized_weights_df = pd.DataFrame(regularized_weights, columns=['Weight'], index=expected_returns.index)

# Print the DataFrames
print("Tangency Portfolio Weights:\n", wtan)
print("Equally Weighted Portfolio Weights:\n", equally_weighted_weights_df)
print("Regularized Portfolio Weights:\n", regularized_weights_df)


Tangency Portfolio Weights:
        Tangency Weight
AAPL          0.322605
MSFT          0.787496
AMZN         -0.228607
NVDA          0.495996
GOOGL        -0.502721
TSLA          0.105975
XOM           0.019257
Equally Weighted Portfolio Weights:
          Weight
AAPL   0.142857
MSFT   0.142857
AMZN   0.142857
NVDA   0.142857
GOOGL  0.142857
TSLA   0.142857
XOM    0.142857
Regularized Portfolio Weights:
          Weight
AAPL   2.175852
MSFT   2.715273
AMZN   0.418795
NVDA   1.646301
GOOGL  0.141063
TSLA   0.734261
XOM    0.779944


## 5.

Report the out-of-sample (2023) performance of all three portfolios in terms of annualized mean, vol, and Sharpe.

In [65]:
returns_2023 = retsx['2022-12-31':]

# Calculate portfolio returns for 2023
tangency_portfolio_returns_2023 = np.dot(returns_2023, wtan)
equally_weighted_portfolio_returns_2023 = np.dot(returns_2023, equally_weighted_weights)
regularized_portfolio_returns_2023 = np.dot(returns_2023, regularized_weights)

# Calculate portfolio statistics for 2023
mean_2023 = {
    "Tangency Portfolio": tangency_portfolio_returns_2023.mean() * 52,
    "Equally Weighted Portfolio": equally_weighted_portfolio_returns_2023.mean() * 52,
    "Regularized Portfolio": regularized_portfolio_returns_2023.mean() * 52,
}

volatility_2023 = {
    "Tangency Portfolio": tangency_portfolio_returns_2023.std() * np.sqrt(52),
    "Equally Weighted Portfolio": equally_weighted_portfolio_returns_2023.std() * np.sqrt(52),
    "Regularized Portfolio": regularized_portfolio_returns_2023.std() * np.sqrt(52),
}

sharpe_ratio_2023 = {
    "Tangency Portfolio": mean_2023["Tangency Portfolio"] / volatility_2023["Tangency Portfolio"],
    "Equally Weighted Portfolio": mean_2023["Equally Weighted Portfolio"] / volatility_2023["Equally Weighted Portfolio"],
    "Regularized Portfolio": mean_2023["Regularized Portfolio"] / volatility_2023["Regularized Portfolio"],
}

# Print the out-of-sample portfolio performance for 2023
print("Portfolio Performance in 2023:")
for portfolio, mean in mean_2023.items():
    print(f"{portfolio} - Mean Return: {mean:.4f}")
for portfolio, vol in volatility_2023.items():
    print(f"{portfolio} - Volatility: {vol:.4f}")
for portfolio, sharpe in sharpe_ratio_2023.items():
    print(f"{portfolio} - Sharpe Ratio: {sharpe:.4f}")

Portfolio Performance in 2023:
Tangency Portfolio - Mean Return: 1.4920
Equally Weighted Portfolio - Mean Return: 0.9551
Regularized Portfolio - Mean Return: 8.5855
Tangency Portfolio - Volatility: 0.4074
Equally Weighted Portfolio - Volatility: 0.2425
Regularized Portfolio - Volatility: 2.0747
Tangency Portfolio - Sharpe Ratio: 3.6620
Equally Weighted Portfolio - Sharpe Ratio: 3.9386
Regularized Portfolio - Sharpe Ratio: 4.1383


## 6.

Imagine just for this problem that this data is for **total** returns, not excess returns.

Report the weights of the global-minimum-variance portfolio.

In [52]:
gmv = pd.DataFrame(gmv_portfolio(retsx,retsx.cov()), index = summary.index, columns=['GMV Weight'])
gmv

Unnamed: 0,GMV Weight
AAPL,0.206231
MSFT,0.49125
AMZN,0.160866
NVDA,-0.119168
GOOGL,0.011378
TSLA,-0.046927
XOM,0.296369


## 7.

To target a mean return of 0.005%, would you be long or short this global minimum variance portfolio?

In [55]:
gmv_ret = np.dot(summary['Mean'], gmv)
gmv_ret

array([0.00347408])

Since the GMV's return is more than the mean target return, we would be long the GMV but we would onyl want to be long a certain percentage of our portfolio in GMV, with the rest in risk free or uninvested, to achieve our target returns with less volatility assumed. 

***

# 3. Performance

## 1. 

Report the following performance metrics of excess returns for Tesla (`TSLA`).
* skewness
* kurtosis

You are not annualizing any of these stats.

What do these metrics indicate about the nature of the returns?

In [66]:
summary = performance_summary(retsx, annualization = 1)

In [68]:
summary.loc['TSLA']

Mean                          0.010956
Volatility                    0.084179
Sharpe Ratio                  0.130154
Skewness                      0.441455
Excess Kurtosis               1.527376
VaR (0.05)                   -0.122519
CVaR (0.05)                  -0.155313
Max Drawdown                 -0.682185
Peak               2021-11-05 00:00:00
Bottom             2023-01-06 00:00:00
Recovery                           NaT
Name: TSLA, dtype: object

The skewness tells us that TSLA generally has more returns that are positively skewed, suggesting tails of high returns. The kurtosis indicates that the tails of TSLA's returns are fairly fat, indicating that more outlier events occur or events that are very postive or negative happen more. 

## 2. 

Report the maximum drawdown for `TSLA` over the sample.
* Ignore that your data is in excess returns rather than total returns.
* Simply proceed with the excess return data for this calculation.

In [6]:
summary.loc['TSLA', 'Max Drawdown']

-0.6821852296331565

## 3.

For `TSLA`, calculate the following metrics, relative to `SPY`:
* market beta
* alpha
* sortino ratio

Annualize alpha and sortino ratio.

In [25]:
excess_tsla = pd.DataFrame(retsx['TSLA'] - spy['SPY'])
tsla_ret_metrics = calc_return_metrics(excess_tsla, as_df = True, adj = 52)

X = spy['SPY']
Y = retsx['TSLA']
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit()
beta = model.params['SPY']
alpha = model.params['const'] * 52
sortino = tsla_ret_metrics['Annualized Sortino Ratio'][0]

print(f'beta: {beta}\nalpha: {alpha}\nsortino: {sortino}')

beta: 1.7768245314443845
alpha: 0.3094703790703374
sortino: 1.4091507431551036


## 4.

Continuing with `TSLA`, calculate the full-sample, 5th-percentile CVaR.
* Use the `normal` formula, assuming mean returns are zero.
* Use the full-sample volatility.

Use the entire sample to calculate a single CVaR number. 

In [27]:
#this question did not ask for annualized number, so I am leaving it out...
print('TSLA CVaR(0.05):', summary['CVaR (0.05)'][5])

TSLA CVaR(0.05): -0.15531312622504762


## 5.

Now calculate the 5th-percentile, one-period ahead, **VaR** for `TSLA`.

Here, calculate the running series of VaR estimates.

Again, 
* use the normal formula, with mean zero.

But now, use the rolling volatility, based on 
* rolling window or $m=52$ weeks.

Report the final 5 values of your calculated VaR series.

In [34]:
rolling_volatility = np.sqrt((retsx['TSLA']**2).rolling(52).mean().shift())

rolling_VaR = -1.645 * rolling_volatility  # 5th percentile

final_5_VaR_values = rolling_VaR.tail(5)

print("Final 5 values of rolling VaR series:")
print(final_5_VaR_values)

Final 5 values of rolling VaR series:
date
2023-06-16   -0.156149
2023-06-23   -0.156265
2023-06-30   -0.153732
2023-07-07   -0.152232
2023-07-14   -0.150509
Name: TSLA, dtype: float64


## 6. 

Calculate the out-of-sample **hit ratio** for your VaR series reported in your previous answer.

In [35]:
rolling_VaR = -1.645 * rolling_volatility

hits = (retsx['TSLA'] < rolling_VaR).sum()
total_observations = len(retsx['TSLA'])

hit_ratio = hits / total_observations

print("Out-of-sample Hit Ratio:", hit_ratio)

Out-of-sample Hit Ratio: 0.04846938775510204


***

# 4. Hedging

## 1. 

Consider the following scenario: you are holding a \$100 million long position in `NVDA`. You wish to hedge the position using some combination of 
* `AAPL`
* `AMZN`
* `GOOGL`
* `MSFT`

Report the positions you would hold of those 4 securities for an optimal hedge.

Note:
* In the regression estimation, include an intercept.
* Use the full-sample regression. No need to worry about in-sample versus out-of-sample.

In [56]:
nvda_returns = retsx['NVDA']
other_securities_returns = retsx[['AAPL', 'AMZN', 'GOOGL', 'MSFT']]

# Add a constant (intercept) to the independent variables
X = sm.add_constant(other_securities_returns)

# Fit a linear regression model
model = sm.OLS(nvda_returns, X).fit()

# Get the coefficients of the regression
hedge_ratios = model.params[1:]  # Exclude the intercept

# Calculate the optimal hedge positions
total_investment = 100e6  # $100 million
hedge_positions = hedge_ratios * total_investment

# Display the optimal hedge positions
print("Optimal Hedge Positions:")
print(hedge_positions)

Optimal Hedge Positions:
AAPL     3.416865e+07
AMZN     4.172599e+07
GOOGL   -7.847952e+05
MSFT     5.878967e+07
dtype: float64


## 2.

How well does the hedge do? Cite a regression statistic to support your answer.

Also estimate the volatility of the basis, (epsilon.)

In [57]:
# Calculate the R-squared value
r_squared = model.rsquared
print("R-squared (R²):", r_squared)

# Calculate the residuals (basis)
residuals = model.resid

# Calculate the standard deviation (volatility) of the residuals
epsilon_volatility = residuals.std()
print("Volatility of the Basis (epsilon):", epsilon_volatility)

R-squared (R²): 0.4581682361155385
Volatility of the Basis (epsilon): 0.04778208768803615


## 3.

Report the annualized intercept. By including this intercept, what are you assuming about the nature of the returns of `NVDA` as well as the returns of the hedging instruments?

In [61]:
# Assuming you have the intercept in weekly terms
weekly_intercept = model.params['const']

# Annualize the intercept using the factor for weekly data
annualized_intercept = weekly_intercept * 52
print("Annualized Intercept:", annualized_intercept)


Annualized Intercept: 0.2737521756167878
