# Midterm 2

## FINM 36700 - 2024

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

# Instructions

## Please note the following:

Points
* The exam is 100 points.
* You have 120 minutes to complete the exam.
* For every minute late you submit the exam, you will lose one point.


Submission
* You will upload your solution to the `Midterm 2` assignment on Canvas, where you downloaded this. 
* Be sure to **submit** on Canvas, not just **save** on Canvas.
* Your submission should be readable, (the graders can understand your answers.)
* Your submission should **include all code used in your analysis in a file format that the code can be executed.** 

Rules
* The exam is open-material, closed-communication.
* You do not need to cite material from the course github repo - you are welcome to use the code posted there without citation.

Advice
* If you find any question to be unclear, state your interpretation and proceed. We will only answer questions of interpretation if there is a typo, error, etc.
* The exam will be graded for partial credit.

## Data

**All data files are found in the class github repo, in the `data` folder.**

This exam makes use of the following data files:
* `midterm_2_data.xlsx`

This file contains the following sheets:
- for Section 2:
    * `sector stocks excess returns` - MONTHLY excess returns for 49 sector stocks
    * `factors excess returns` - MONTHLY excess returns of AQR factor model from Homework 5
- for Section 3:
    * `factors excess returns` - MONTHLY excess returns of AQR factor model from Homework 5

## Scoring

| Problem | Points |
|---------|--------|
| 1       | 25     |
| 2       | 40     |
| 3       | 35     |

### Each numbered question is worth 5 points unless otherwise specified.

# 1. Short Answer

#### No Data Needed

These problems do not require any data file. Rather, analyze them conceptually. 

### 1.1.

Historically, which pricing factor among the ones we studied has shown a considerable decrease in importance?

Answer:

It is the SML factor, which represents the excess return of small-cap stocks over large-cap stocks

### 1.2.

True or False: For a given factor model and a set of test assets, the addition of one more factor to that model will surely decrease the cross-sectional MAE. 

True or False: For a given factor model and a set of test assets, the addition of one more factor to that model will surely decrease the time-series MAE. 

Along with stating T/F, explain your reasoning for the two statements.

Answer: 

1. True. From the lecture, it is shown that generally the cross-sectional MAE decreases. As it is true that for a given time point, adding more factors can explain more of the variation. From the lecture it is shown that from CAPM to FF3 then to FF5, the MAE reduces from 0.0215 to 0.0162 then to 0.0136. However, the MAE from FF3 to AQR 4 factor model changes from 0.0162 to 0.0169, which is inconsistent with the analysis above, which may come from the measure error or data problem, but not the theory itself.

2. False. From the lecture, the result from CAPM to FF3 then to FF5, the MAE changes from 0.0212 to 0.0253 then to 0.0296. It is because the time series regression cares more about whether the alpha term is zero, but not consider much about the variation of the data itself. While adding a new factor increases model flexibility, it only reduces MAE if the added factor is relevant to the time series itself.

### 1.3.

Consider the scenario in which you are helping two people with investments.

* The young person has a 50 year investment horizon.
* The elderly person has a 10 year investment horizon.
* Both individuals have the same portfolio holdings.

State who has the more certain cumulative return and explain your reasoning.

Answer:

The elderly person with a 10-year investment horizon has the more certain cumulative return. 

Firstly, when considering investment returns over multiple periods, the variance (or uncertainty) of the cumulative return increases with the length of the time horizon, which is given by the geometric brownian motion of stock price.

Secondly, the uncertainty of the future also increases if we give an investment for a young person of 50 year investment horizon, which is unpredictable and unmeasurable. It is possible that many things will change in the following 40 years than the elderly person.

Thirdly, elderly person is more risk-averse, which requires us to give a lower risk but more stable investment for them, like pension fund investment.

### 1.4.

Suppose we find that the 10-year bond yield works well as a new pricing factor, along with `MKT`.

Consider two ways of building this new factor.
1. Directly use the index of 10-year yields, `YLD`
1. Construct a Fama-French style portfolio of equities, `FFYLD`. (Rank all the stocks by their correlation to bond yield changes, and go long the highest ranked and short the lowest ranked.)

Could you test the model with `YLD` and the model with `FFYLD` in the exact same ways? Explain.

Answer:

No. YLD is a non-tradable macroeconomic index that is calculated based on many 10-year government bonds, and it's hard to think about what the risk premium is for YLD if we do a regression of it with MKT. However, if we use the Fama-French model to construct the FFYLD, we use the exact equities data to build the tradable asset, and the positions are also set by the historical ranking according to their correlation with the bond yield. Firslty, by nature it is a tradable asset that different factors will have impact on that, for example companies' idiosyncratic risk. Secondly, determining how much to long and how much to short for this replication is questionable.


### 1.5.

Suppose we implement a momentum strategy on cryptocurrencies rather than US stocks.

Conceptually speaking, but specific to the context of our course discussion, how would the risk profile differ from the momentum strategy of US equities?

Answer:

Firstly, crypto market is less efficient than the US market, where we believe that it should have higher volatility and more idiosyncratic risk that the coins undertake. Additionally, the macroeconomic condition may influence more on the crypto markets than just the US equities, hence maybe the momentum of crypto comes more from the exogenous variable but not its own momentum factor. 

***

# 2. Pricing and Tangency Portfolio

You work in a hedge fund that believes that the AQR 4-Factor Model (present in Homework 5) is the perfect pricing model for stocks.

$$
\mathbb{E} \left[ \tilde{r}^i \right] = \beta^{i,\text{MKT}} \mathbb{E} \left[ \tilde{f}_{\text{MKT}} \right] + \beta^{i,\text{HML}} \mathbb{E} \left[ \tilde{f}_{\text{HML}} \right] + \beta^{i,\text{RMW}} \mathbb{E} \left[ \tilde{f}_{\text{RMW}} \right] + \beta^{i,\text{UMD}} \mathbb{E} \left[ \tilde{f}_{\text{UMD}} \right]
$$

The factors are available in the sheet `factors excess returns`.

The hedge fund invests in sector-tracking ETFs available in the sheet `sectors excess returns`. You are to allocate into these sectors according to a mean-variance optimization with...

* regularization: elements outside the diagonal covariance matrix divided by 2.
* modeled risk premia: expected excess returns given by the factor model rather than just using the historic sample averages.

You are to train the portfolio and test out-of-sample. The timeframes should be:
* Training timeframe: Jan-2018 to Dec-2022.
* Testing timeframe: Jan-2023 to most recent observation.

In [65]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import re
import statsmodels.api as sm
from scipy.stats import norm

In [43]:
import sys
sys.path.insert(0,'../cmds')
import portfolio_management_helper as pmh

In [44]:
FILEIN = "../data/midterm_2_data.xlsx"
SHEET = "sector excess returns"

sector_er = pd.read_excel(FILEIN, sheet_name=SHEET).set_index('date')
factor_er = pd.read_excel(FILEIN, sheet_name='factors excess returns').set_index('date')
sector_er.columns = sector_er.columns.str.rstrip()
factor_er.columns = factor_er.columns.str.rstrip()


display(sector_er.head(5))
print(sector_er.shape)

display(factor_er.head(5))
print(factor_er.shape)

Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1980-01-01,-0.0076,0.0285,0.0084,0.1009,-0.0143,0.1002,0.0362,0.0323,0.0048,0.0059,...,0.0158,0.0875,0.0465,-0.0126,0.043,-0.0283,0.0258,0.0768,0.0308,0.0669
1980-02-01,0.0105,-0.0608,-0.0966,-0.0322,-0.0569,-0.0323,-0.0521,-0.08,-0.0555,-0.0167,...,-0.0079,-0.0541,-0.0346,-0.0639,-0.0652,-0.0854,-0.0959,-0.0347,-0.0282,-0.0274
1980-03-01,-0.2224,-0.1119,-0.0167,-0.1469,-0.0193,-0.1271,-0.0826,-0.1237,-0.0566,-0.0668,...,-0.0819,-0.1509,-0.1098,-0.0906,-0.1449,-0.056,-0.088,-0.2451,-0.1254,-0.1726
1980-04-01,0.0449,0.0766,0.0232,0.0321,0.083,-0.0529,0.0783,0.0153,0.0304,0.0115,...,0.042,-0.0103,-0.0312,0.0353,0.0542,0.0728,0.053,0.0977,0.0447,0.0769
1980-05-01,0.0632,0.0793,0.0457,0.0863,0.0815,0.0509,0.0324,0.0886,0.056,0.0098,...,0.0564,0.1063,0.1142,0.0877,0.1134,0.0578,0.0557,0.0915,0.0844,0.0685


(536, 49)


Unnamed: 0_level_0,MKT,HML,RMW,UMD
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1980-01-01,0.0551,0.0175,-0.017,0.0755
1980-02-01,-0.0122,0.0061,0.0004,0.0788
1980-03-01,-0.129,-0.0101,0.0146,-0.0955
1980-04-01,0.0397,0.0106,-0.021,-0.0043
1980-05-01,0.0526,0.0038,0.0034,-0.0112


(536, 4)


In [45]:
sector_er.describe()

Unnamed: 0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
count,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0,...,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0
mean,0.0074,0.008,0.009,0.0094,0.0112,0.0049,0.01,0.0064,0.0067,0.0077,...,0.0078,0.0075,0.0072,0.0095,0.0084,0.0077,0.0082,0.0041,0.0094,0.0047
std,0.0629,0.0435,0.0634,0.0489,0.0644,0.0719,0.0763,0.0597,0.0442,0.0648,...,0.0576,0.0579,0.0519,0.0539,0.0524,0.0619,0.0522,0.0727,0.0643,0.0608
min,-0.2966,-0.1836,-0.2646,-0.2017,-0.2535,-0.3511,-0.3252,-0.2478,-0.2228,-0.3141,...,-0.2889,-0.2869,-0.2936,-0.2973,-0.242,-0.2784,-0.2706,-0.3683,-0.2663,-0.2693
25%,-0.0301,-0.0164,-0.0237,-0.0184,-0.0279,-0.0393,-0.0295,-0.027,-0.0194,-0.0277,...,-0.025,-0.0263,-0.0239,-0.0225,-0.0231,-0.0297,-0.0201,-0.0311,-0.0301,-0.0303
50%,0.0065,0.0094,0.0111,0.0097,0.0146,0.008,0.0109,0.0042,0.0077,0.0078,...,0.0084,0.0105,0.0097,0.009,0.0088,0.0129,0.0124,0.0061,0.0135,0.0057
75%,0.0435,0.032,0.0454,0.0374,0.0491,0.0481,0.0549,0.0386,0.0336,0.0453,...,0.0447,0.0439,0.0399,0.0427,0.041,0.0462,0.0391,0.0407,0.048,0.0429
max,0.2813,0.1873,0.3858,0.2168,0.3237,0.2313,0.4126,0.3056,0.1839,0.2452,...,0.1866,0.1762,0.1729,0.1879,0.1867,0.1985,0.2287,0.6521,0.194,0.1986


### 2.1.
(8pts)

Calculate the model-implied expected excess returns of every asset.

The time-series estimations should...
* NOT include an intercept. (You assume the model holds perfectly.)
* use data from the `training` timeframe.

With the time-series estimates, use the `training` timeframe's sample average of the factors as the factor premia. Together, this will give you the model-implied risk premia, which we label as
$$
\lambda_i := \mathbb{E}[\tilde{r}_i]
$$

* Store $\lambda_i$ and $\boldsymbol{\beta}^i$ for each asset.
* Print $\lambda_i$ for `Agric`, `Food`, `Soda`

In [46]:
sector_er_train = sector_er[pd.to_datetime('2018-01-01'): pd.to_datetime('2022-12-31')]
sector_er_test = sector_er[pd.to_datetime('2023-01-01'):]

factor_er_train = factor_er[pd.to_datetime('2018-01-01'): pd.to_datetime('2022-12-31')]
factor_er_test = factor_er[pd.to_datetime('2023-01-01'):]

In [47]:
sector_names = sector_er.columns
betas = pd.DataFrame(index=sector_er_train.columns, columns=factor_er_train.columns)

def ts_retression_train(name):
    y = sector_er_train[name].values
    X = factor_er_train.values

    model = sm.OLS(y, X)
    results = model.fit()
    betas.loc[name] = results.params


for name in sector_names:
    ts_retression_train(name)

display(betas.head(5))

Unnamed: 0,MKT,HML,RMW,UMD
Agric,0.8324,0.5565,-0.5021,0.039
Food,0.5245,0.2055,0.3097,-0.0036
Soda,0.5402,0.1791,0.6384,0.0137
Beer,0.5413,0.023,0.6297,-0.0405
Smoke,0.4982,0.4431,0.4022,-0.134


In [48]:
factor_premia = factor_er_train.mean()
lambda_i = betas.astype(float).dot(factor_premia.astype(float))

In [49]:
print(f"Agric lambda: {lambda_i['Agric']}")
print(f"Food lambda: {lambda_i['Food']}")
print(f"Soda lambda: {lambda_i['Soda']}")

Agric lambda: 0.003655102106916501
Food lambda: 0.00545426786838044
Soda lambda: 0.007336244651963081


### 2.2.

Use the expected excess returns derived from (2.1) with the **regularized** covariance matrix to calculate the weights of the tangency portfolio.

- Use the covariance matrix only for `training` timeframe.
- Calculate and store the vector of weights for all the assets.
- Return the weights of the tangency portfolio for `Agric`, `Food`, `Soda`.

$$
\textbf{w}_{t} = \dfrac{\tilde{\Sigma}^{-1} \bm{\lambda}}{\bm{1}' \tilde{\Sigma}^{-1} \bm{\lambda}}
$$

Where $\tilde{\Sigma}^{-1}$ is the regularized covariance-matrix.

In [50]:
def tangency_portfolio_weights(lambda_i, returns):
    returns = returns[lambda_i.index]

    cov_matrix = returns.cov()

    cov_matrix_reg = cov_matrix.copy()
    for i in range(cov_matrix_reg.shape[0]):
        for j in range(cov_matrix_reg.shape[1]):
            if i != j:
                # Regularized
                cov_matrix_reg.iloc[i, j] /= 2
    
    inv_cov = np.linalg.inv(cov_matrix_reg.values)

    ones = np.ones(len(lambda_i))
    numerator = inv_cov @ lambda_i.values
    denominator = ones.T @ numerator

    weights = numerator / denominator
    weights_df = pd.DataFrame(weights, index=lambda_i.index, columns=['Tangency Weights'])

    return weights_df

In [51]:
weights_all = tangency_portfolio_weights(lambda_i, sector_er_train)

print(f"Agric weight: {weights_all.loc['Agric', 'Tangency Weights']}")
print(f"Food weight: {weights_all.loc['Food', 'Tangency Weights']}")
print(f"Soda weight: {weights_all.loc['Soda', 'Tangency Weights']}")

Agric weight: -0.03072271660169014
Food weight: 0.015320224544835071
Soda weight: 0.13294447809892723


### 2.3.

Evaluate the performance of this allocation in the `testing` period. Report the **annualized**
- mean
- vol
- Sharpe

In [54]:
def portfolio_statistics(df):
    statistics = pd.DataFrame()
    statistics['Mean'] = [np.mean(df) * 12]
    statistics['Volatility'] = [np.std(df, ddof = 1) * np.sqrt(12)]
    statistics['Sharpe'] = statistics['Mean'] / statistics['Volatility']

    return statistics

statistics_one = portfolio_statistics(sector_er_test @ weights_all['Tangency Weights'].T)
statistics_one.index = ['Tangency Portfolio Performance']

display(statistics_one)

Unnamed: 0,Mean,Volatility,Sharpe
Tangency Portfolio Performance,0.1812,0.1195,1.5155


### 2.4.

(7pts)

Construct the same tangency portfolio as in `2.2` but with one change:
* replace the risk premia of the assets, (denoted $\lambda_i$) with the sample averages of the excess returns from the `training` set.

So instead of using $\lambda_i$ suggested by the factor model (as in `2.1-2.3`) you're using sample averages for $\lambda_i$.

- Return the weights of the tangency portfolio for `Agric`, `Food`, `Soda`.

Evaluate the performance of this allocation in the `testing` period. Report the **annualized**
- mean
- vol
- Sharpe

In [57]:
lambda_i_sample = sector_er_train.mean()
display(lambda_i_sample.head(5))
weights_sample = tangency_portfolio_weights(lambda_i_sample, sector_er_train)

print(f"Agric weight (sample): {weights_sample.loc['Agric', 'Tangency Weights']}")
print(f"Food weight (sample): {weights_sample.loc['Food', 'Tangency Weights']}")
print(f"Soda weight (sample): {weights_sample.loc['Soda', 'Tangency Weights']}")

Agric   0.0102
Food    0.0046
Soda    0.0092
Beer    0.0070
Smoke   0.0031
dtype: float64

Agric weight (sample): 0.1440898645204342
Food weight (sample): -0.06980957545009599
Soda weight (sample): 0.32267987966472084


In [58]:
statistics_two = portfolio_statistics(sector_er_test @ weights_sample['Tangency Weights'].T)
statistics_two.index = ['Tangency Portfolio Performance (Sample)']

display(statistics_two)

Unnamed: 0,Mean,Volatility,Sharpe
Tangency Portfolio Performance (Sample),0.1768,0.153,1.1555


### 2.5.

Which allocation performed better in the `testing` period: the allocation based on premia from the factor model or from the sample averages?

Why might this be?

In [59]:
statistics = pd.concat([statistics_one, statistics_two])
display(statistics)

Unnamed: 0,Mean,Volatility,Sharpe
Tangency Portfolio Performance,0.1812,0.1195,1.5155
Tangency Portfolio Performance (Sample),0.1768,0.153,1.1555


Answer:

The tangency portfolio based on the AQR 4-Factor model to estimate the lambda_i does better than the just using the sample mean, as we can observe higher mean and lower volatility from it, leading to higher Sharpe Ratio.

Firstly, The factor model estimates expected returns based on the assets' sensitivities to AQR's risk factors, which somehow denoise the data series and take the influential factors out.

Secondly, it can also avoid overfitting as using the sample mean can incorporate white noise inside

Thirdly, it is risk Alignment that AQR's factor model aligns expected returns with the underlying risk exposures of assets. This helps in constructing a portfolio that efficiently balances risk and return based on systematic risk factors, but not idiosyncratic risk.

### 2.6.
Suppose you now want to build a tangency portfolio solely from the factors, without using the sector ETFs.

- Calculate the weights of the tangency portfolio using `training` data for the factors.
- Again, regularize the covariance matrix of factor returns by dividing off-diagonal elements by 2.

Report, in the `testing` period, the factor-based tangency stats **annualized**...
- mean
- vol
- Sharpe


In [61]:
lambda_i_factors = factor_er_train.mean()
weights_factor = tangency_portfolio_weights(lambda_i_factors, factor_er_train)

statistics_three = portfolio_statistics(factor_er_test @ weights_factor['Tangency Weights'].T)
statistics_three.index = ['Tangency Portfolio Performance (Factor)']

display(statistics_three)

Unnamed: 0,Mean,Volatility,Sharpe
Tangency Portfolio Performance (Factor),0.0624,0.0582,1.0719


### 2.7.

Based on the hedge fund's beliefs, would you prefer to use the ETF-based tangency or the factor-based tangency portfolio? Explain your reasoning. Note that you should answer based on broad principles and not on the particular estimation results.

Answer:

In terms of the sharpe ratio, from the factor-based tangency portfolio, we will have around 1.07 Sharpe Ratio, which is much lower than the Sharpe Ratio from ETF-based tangency portfolio that has 1.52 value. Hence, we should definitly select the ETF-based one.

Secondly, as ETF-based tangency portfolio is tradable by just trade the ETF with specified weights, however, factor-based portfolio is non-tradable otherwise we need to recalculate the weights of each underlying inside the factors.

Thirdly, Factors can exhibit significant volatility due to their construction and exposure to specific risks, as most of the factors are highly correlated. It caused the portfolio to be not fully diversified. Hence, we should select the ETF-based one that considers the effect from different sectors.

Also, the factor construction like HML and SML, already involves the long-short in one set of assets and a short position in another, which is meaningless to do another tangency portfolio that break the logic from each factor.

***

# 3. Long-Run Returns

For this question, use only the sheet `factors excess returns`.

Suppose we want to measure the long run returns of various pricing factors.

### 3.1.

Turn the data into log returns.
- Display the first 5 rows of the data.

Using these log returns, report the **annualized**
* mean
* vol
* Sharpe

### 3.2.

Consider 15-year cumulative log excess returns. Following the assumptions and modeling of Lecture 6, report the following 15-year stats:
- mean
- vol
- Sharpe

How do they compare to the estimated stats (1-year horizon) in `3.1`? 

In [62]:
log_returns = factor_er.apply(lambda x: np.log(x + 1))
display(log_returns.head(5))

Unnamed: 0_level_0,MKT,HML,RMW,UMD
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1980-01-01,0.0536,0.0173,-0.0171,0.0728
1980-02-01,-0.0123,0.0061,0.0004,0.0758
1980-03-01,-0.1381,-0.0102,0.0145,-0.1004
1980-04-01,0.0389,0.0105,-0.0212,-0.0043
1980-05-01,0.0513,0.0038,0.0034,-0.0113


In [63]:
def rolling_cumulative_stats(log_returns, window_years = 15):
    periods_per_year = 12
    
    window_size = window_years * periods_per_year
    
    rolling_cumulative_log_returns = log_returns.rolling(window=window_size).sum().dropna()
    
    mean_cumulative = rolling_cumulative_log_returns.mean()
    std_cumulative = rolling_cumulative_log_returns.std()
    sharpe_ratio_cumulative = mean_cumulative / std_cumulative
    
    stats_df = pd.DataFrame({
        f'{window_years}-Year Mean': mean_cumulative,
        f'{window_years}-Year Volatility': std_cumulative,
        f'{window_years}-Year Sharpe Ratio': sharpe_ratio_cumulative
    })
    
    return stats_df

stats_15_year = rolling_cumulative_stats(log_returns, window_years=15)

display(stats_15_year)

Unnamed: 0,15-Year Mean,15-Year Volatility,15-Year Sharpe Ratio
MKT,0.9976,0.3966,2.5155
HML,0.3159,0.4376,0.722
RMW,0.6183,0.151,4.0933
UMD,0.7415,0.7299,1.0158


### 3.3.

What is the probability that momentum factor has a negative mean excess return over the next 
* single period?
* 15 years?

In [70]:
def prob(mu, sigma, h):
    return norm.cdf(np.sqrt(h)*mu/sigma)

mu, sigma = stats_15_year.loc['UMD', '15-Year Mean'], stats_15_year.loc['UMD', '15-Year Volatility']
print(f"The stats of single period is {prob(mu = -mu, sigma = sigma, h = 1):,.2%}")
print(f"The stats of 15-years period period is {prob(mu = -mu, sigma = sigma, h = 15):,.2%}")

The stats of single period is 15.49%
The stats of 15-years period period is 0.00%


### 3.4.

Recall from the case that momentum has been underperforming since 2009. 

Using data from 2009 to present, what is the probability that momentum *outperforms* the market factor over the next
* period?
* 15 years?

In [71]:
stats_15_year_new = rolling_cumulative_stats(log_returns[pd.to_datetime('2009-01-01'): ], window_years=15)

display(stats_15_year_new)

Unnamed: 0,15-Year Mean,15-Year Volatility,15-Year Sharpe Ratio
MKT,1.9124,0.091,21.0201
HML,-0.244,0.0531,-4.5973
RMW,0.4939,0.0462,10.6901
UMD,-0.1234,0.3525,-0.3499


In [72]:
mu, sigma = stats_15_year_new.loc['UMD', '15-Year Mean'], stats_15_year.loc['UMD', '15-Year Volatility']
print(f"The stats of single period is {prob(mu = -mu, sigma = sigma, h = 1):,.2%}")
print(f"The stats of 15-years period period is {prob(mu = -mu, sigma = sigma, h = 15):,.2%}")

The stats of single period is 56.71%
The stats of 15-years period period is 74.36%


### 3.5.
Conceptually, why is there such a discrepancy between this probability for 1 period vs. 15 years?

What assumption about the log-returns are we making when we use this technique to estimate underperformance?

Answer:

By taking longer horizon to measure the probability, we strengthen the effect of np.sqrt(h) to the probability calculation, hence if the probability from single period < 50%, then the probability from 15-years will be lesser, vice versa.

The assumption is that we assume the log return is i.i.d normally distributed, as we remove the drift effect from the original return series by taking log.

### 3.6.

Using your previous answers, explain what is meant by time diversification.

Answer:

Time-diversification refers to this idea that mean annualized return becomes riskless for large investment horizons. Hence that's why from the previous answer, we will have magnified probability result.

### 3.7.

Is the probability that `HML` and `UMD` both have negative cumulative returns over the next year higher or lower than the probability that `HML` and `MKT` both have negative cumulative returns over the next year?

Answer conceptually, but specifically. (No need to calculate the specific probabilities.)

The probability should be larger. Taking HML as unchanged, the probability that UMD is negative is higher than the probability that MKT is negative, by the 15-Year Sharpe Ratio we witnessed that after 2009, MKT has higher Sharpe Ratio.

Cite: ChatGPT o1-preview

***