# Midterm 2

## FINM 36700 - 2024

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

### Answers provided by:

* Austin Galm
* agalm@uchicago.edu

# Instructions

## Please note the following:

Points
* The exam is 100 points.
* You have 120 minutes to complete the exam.
* For every minute late you submit the exam, you will lose one point.


Submission
* You will upload your solution to the `Midterm 2` assignment on Canvas, where you downloaded this. 
* Be sure to **submit** on Canvas, not just **save** on Canvas.
* Your submission should be readable, (the graders can understand your answers.)
* Your submission should **include all code used in your analysis in a file format that the code can be executed.** 

Rules
* The exam is open-material, closed-communication.
* You do not need to cite material from the course github repo - you are welcome to use the code posted there without citation.

Advice
* If you find any question to be unclear, state your interpretation and proceed. We will only answer questions of interpretation if there is a typo, error, etc.
* The exam will be graded for partial credit.

## Data

**All data files are found in the class github repo, in the `data` folder.**

This exam makes use of the following data files:
* `midterm_2_data.xlsx`

This file contains the following sheets:
- for Section 2:
    * `sector stocks excess returns` - MONTHLY excess returns for 49 sector stocks
    * `factors excess returns` - MONTHLY excess returns of AQR factor model from Homework 5
- for Section 3:
    * `factors excess returns` - MONTHLY excess returns of AQR factor model from Homework 5

## Scoring

| Problem | Points |
|---------|--------|
| 1       | 25     |
| 2       | 40     |
| 3       | 35     |

### Each numbered question is worth 5 points unless otherwise specified.

# 1. Short Answer

#### No Data Needed

These problems do not require any data file. Rather, analyze them conceptually. 

### 1.1.

Historically, which pricing factor among the ones we studied has shown a considerable decrease in importance?

<span style="color:red;">

The size factor has exhibited a considerable decrease in importance. This is evidenced in a tangency portfolio test of these factors' excess returns, which was explored in question 2.4 of homework 5.

</span>

### 1.2.

True or False: For a given factor model and a set of test assets, the addition of one more factor to that model will surely decrease the cross-sectional MAE. 

True or False: For a given factor model and a set of test assets, the addition of one more factor to that model will surely decrease the time-series MAE. 

Along with stating T/F, explain your reasoning for the two statements.

<span style="color:red;">

1. True, because adding another factor will add another parameter to the regression. Consequently, the model will be able to capture more of the variance in the dataset and have a better $R^2$, which is directly connected to the MAE of the cross-section regression since MAE is calculated in the cross-section regression using the residuals.

1. False, the MAE of the TS regression is measured by the alphas (intercepts) from the TS regressions. This alpha can shift up or down with the addition of more regressors to the model. Consequently, it is unclear whether the MAE of the TS regression would decrease with the addition of more factors to the model.

</span>

### 1.3.

Consider the scenario in which you are helping two people with investments.

* The young person has a 50 year investment horizon.
* The elderly person has a 10 year investment horizon.
* Both individuals have the same portfolio holdings.

State who has the more certain cumulative return and explain your reasoning.

<span style="color:red;">

The young person has more certain cumulative returns because they have a longer holding period. This conclusion is supported by the idea of time-diversification, which was explored in the Barnstable case. Time-diversification suggests that investors are rewarded with better risk-adjusted returns when they hold assets for a longer period of time. Intuitively, this is supported by the logic that investors take on more risk of loss over a longer period of time. Mathematically, this is supported by the fact that mean excess returns scale linearly with time while excess return volatility scale sub-linearly through time. Consequently, the sharpe ratio is bound to increase at a rate of $\sqrt{t}$ where t represents the number of periods the portfolio is held.

</span>

<span style="color:blue;">

This question was not a question about time diversification, but rather was a question about variance of returns over time. Consequently, the fact that variance scales linearly with time was the important fact for answering this question, not having a long time-horizon to potentially outperform some benchmark.

</span>

### 1.4.

Suppose we find that the 10-year bond yield works well as a new pricing factor, along with `MKT`.

Consider two ways of building this new factor.
1. Directly use the index of 10-year yields, `YLD`
1. Construct a Fama-French style portfolio of equities, `FFYLD`. (Rank all the stocks by their correlation to bond yield changes, and go long the highest ranked and shor tthe lowest ranked.)

Could you test the model with `YLD` and the model with `FFYLD` in the exact same ways? Explain.

<span style="color:red;">

Yes, you can test these models in the exact same ways even though `YLD` is not a market return series while `FFYLD` is a market return series. The fact that these regressors are different in spirit does not change the approach to specifying the factor model. We will still run a time-series regression of the `YLD` factor and then run a cross-section regression. The testing of these models will come down to the results of the cross-section regression, which will be consistent across models.

</span>

<span style="color:blue;">

While the organization of your thoughts was on track, you missed the mark. The time-series regression is still a suitable test and one that you can only conduct on the market-based asset, `FFYLD`. Consequently, the important takeaway here is that the assets are tested differently because only one can be subject to the time-series test.

<span>

### 1.5.

Suppose we implement a momentum strategy on cryptocurrencies rather than US stocks.

Conceptually speaking, but specific to the context of our course discussion, how would the risk profile differ from the momentum strategy of US equities?

<span style="color:red;">

The risk profile of this momentum strategy would likely be much more volatile for a couple reasons:
1. Cryptocurrencies have much more volatile returns by nature.
2. There are significantly fewer crypto-currencies in the market than there are stocks

The second point is particularly important in the scope of this class because this will lead to a much more concentrated portfolio. In homework 5, we explored how this concentration affects the mean excess returns and volatility of a factor portfolio, finding that these metrics increase in a momentum factor portfolio. In conclusion, the risk of this momentum factor would likely lead to lower risk-adjusted returns compared to a stock-focused momentum factor portfolio.

</span>

***

In [1]:
import os
import sys
import pandas as pd
from scipy.stats import norm, chi2
import statsmodels.api as sm
import numpy as np
from functools import partial
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

current_dir = os.getcwd()
parent_dir = os.path.abspath(os.path.join(current_dir, os.pardir))
grandparent_dir = os.path.abspath(os.path.join(parent_dir, os.pardir))
sys.path.insert(0, parent_dir)
sys.path.insert(0, grandparent_dir)
import cmds.portfolio_management_helper as pmh

plt.style.use("seaborn-v0_8-whitegrid")
PLOT_WIDTH, PLOT_HEIGHT = 8, 5
COLORS = ["blue", "red", "orange"]

warnings.filterwarnings('ignore')
pd.options.display.float_format = "{:.4f}".format
p = plt.rcParams

%matplotlib inline
%load_ext autoreload
%autoreload 2

# 2. Pricing and Tangency Portfolio

You work in a hedge fund that believes that the AQR 4-Factor Model (present in Homework 5) is the perfect pricing model for stocks.

$$
\mathbb{E} \left[ \tilde{r}^i \right] = \beta^{i,\text{MKT}} \mathbb{E} \left[ \tilde{f}_{\text{MKT}} \right] + \beta^{i,\text{HML}} \mathbb{E} \left[ \tilde{f}_{\text{HML}} \right] + \beta^{i,\text{RMW}} \mathbb{E} \left[ \tilde{f}_{\text{RMW}} \right] + \beta^{i,\text{UMD}} \mathbb{E} \left[ \tilde{f}_{\text{UMD}} \right]
$$

The factors are available in the sheet `factors excess returns`.

The hedge fund invests in sector-tracking ETFs available in the sheet `sectors excess returns`. You are to allocate into these sectors according to a mean-variance optimization with...

* regularization: elements outside the diagonal covariance matrix divided by 2.
* modeled risk premia: expected excess returns given by the factor model rather than just using the historic sample averages.

You are to train the portfolio and test out-of-sample. The timeframes should be:
* Training timeframe: Jan-2018 to Dec-2022.
* Testing timeframe: Jan-2023 to most recent observation.

In [2]:
data_path = parent_dir + '/data/'
ff_file_name = data_path + 'midterm_2_data.xlsx'
aqr_xs_rets = pmh.read_excel_default(ff_file_name, 
                                 sheet_name='factors excess returns',
                                 index_col='date', parse_dates=True)
sect_xs_rets = pmh.read_excel_default(ff_file_name, 
                                 sheet_name='sector excess returns',
                                 index_col='date', parse_dates=True)

display(aqr_xs_rets.head())
display(sect_xs_rets.head())

Unnamed: 0_level_0,MKT,HML,RMW,UMD
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1980-01-01,0.0551,0.0175,-0.017,0.0755
1980-02-01,-0.0122,0.0061,0.0004,0.0788
1980-03-01,-0.129,-0.0101,0.0146,-0.0955
1980-04-01,0.0397,0.0106,-0.021,-0.0043
1980-05-01,0.0526,0.0038,0.0034,-0.0112


Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1980-01-01,-0.0076,0.0285,0.0084,0.1009,-0.0143,0.1002,0.0362,0.0323,0.0048,0.0059,...,0.0158,0.0875,0.0465,-0.0126,0.043,-0.0283,0.0258,0.0768,0.0308,0.0669
1980-02-01,0.0105,-0.0608,-0.0966,-0.0322,-0.0569,-0.0323,-0.0521,-0.08,-0.0555,-0.0167,...,-0.0079,-0.0541,-0.0346,-0.0639,-0.0652,-0.0854,-0.0959,-0.0347,-0.0282,-0.0274
1980-03-01,-0.2224,-0.1119,-0.0167,-0.1469,-0.0193,-0.1271,-0.0826,-0.1237,-0.0566,-0.0668,...,-0.0819,-0.1509,-0.1098,-0.0906,-0.1449,-0.056,-0.088,-0.2451,-0.1254,-0.1726
1980-04-01,0.0449,0.0766,0.0232,0.0321,0.083,-0.0529,0.0783,0.0153,0.0304,0.0115,...,0.042,-0.0103,-0.0312,0.0353,0.0542,0.0728,0.053,0.0977,0.0447,0.0769
1980-05-01,0.0632,0.0793,0.0457,0.0863,0.0815,0.0509,0.0324,0.0886,0.056,0.0098,...,0.0564,0.1063,0.1142,0.0877,0.1134,0.0578,0.0557,0.0915,0.0844,0.0685


In [3]:
# Note: I am calculating the cross-section lambdas here as an exercise due to an initial mis-interpretation of the question.

aqr_train = aqr_xs_rets.loc['2018':'2022']
sect_train = sect_xs_rets.loc['2018':'2022']
aqr_test = aqr_xs_rets.loc['2023':]
sect_test = sect_xs_rets.loc['2023':]

cross_sec_model = pmh.calc_cross_section_regression(sect_train, aqr_train, annual_factor=12, provided_excess_returns=True,
                                  keep_columns=['Annualized Eta', 'Annualized Lambda']).T
intercept = cross_sec_model.loc['Annualized Eta']
lambdas = cross_sec_model.loc[['MKT Annualized Lambda', 
                               'HML Annualized Lambda', 
                               'RMW Annualized Lambda', 
                               'UMD Annualized Lambda']]
lambdas

Lambda represents the premium calculated by the cross-section regression and the historical premium is the average of the factor excess returns


Unnamed: 0,MKT + HML + RMW + UMD Cross-Section Regression
MKT Annualized Lambda,0.0999
HML Annualized Lambda,0.013
RMW Annualized Lambda,0.0151
UMD Annualized Lambda,0.1236


### 2.1.
(8pts)

Calculate the model-implied expected excess returns of every asset.

The time-series estimations should...
* NOT include an intercept. (You assume the model holds perfectly.)
* use data from the `training` timeframe.

With the time-series estimates, use the `training` timeframe's sample average of the factors as the factor premia. Together, this will give you the model-implied risk premia, which we label as
$$
\lambda_i := \mathbb{E}[\tilde{r}_i]
$$

* Store $\lambda_i$ and $\boldsymbol{\beta}^i$ for each asset.
* Print $\lambda_i$ for `Agric`, `Food`, `Soda`

In [4]:
betas = pmh.calc_iterative_regression(sect_train, aqr_train, annual_factor=12, intercept=False, keep_columns=['Beta'])
betas.columns = [c.split(' ')[0] for c in betas.columns]
factors_mean_xs = aqr_train.mean() * 12
betas.index = [c.split(' ')[0] for c in betas.index]

pred_xs_rets = (betas @ factors_mean_xs).to_frame('CS Predicted Excess Returns')
pred_xs_rets.iloc[0:3].style.format('{:.2%}')

"calc_regression" assumes excess returns to calculate Information and Treynor Ratios


Unnamed: 0,CS Predicted Excess Returns
Agric,4.39%
Food,6.55%
Soda,8.80%


### 2.2.

Use the expected excess returns derived from (2.1) with the **regularized** covariance matrix to calculate the weights of the tangency portfolio.

- Use the covariance matrix only for `training` timeframe.
- Calculate and store the vector of weights for all the assets.
- Return the weights of the tangency portfolio for `Agric`, `Food`, `Soda`.

$$
\textbf{w}_{t} = \dfrac{\tilde{\Sigma}^{-1} \bm{\lambda}}{\bm{1}' \tilde{\Sigma}^{-1} \bm{\lambda}}
$$

Where $\tilde{\Sigma}^{-1}$ is the regularized covariance-matrix.

In [5]:
# With the below function, you are estimating the covariance matrix using the predicted excess returns.
# However, you need to use the actual excess returns to estimate the covariance matrix.
# Then pass the predicted mean excess returns estimated above.
pred_rets = aqr_train.apply(lambda x: betas @ x, axis=1)
reg_tan_wts = pmh.calc_tangency_weights(pred_rets, cov_mat=0.5)
reg_tan_wts.iloc[:3].style.format('{:.2%}')

Unnamed: 0,Tangency Regularized 0.50 Weights
Agric,-3.05%
Food,6.35%
Soda,10.44%


In [6]:
# Corrected Solution:
cov = sect_train.cov()
SIGMA = (cov + cov * np.eye(cov.shape[0])) / 2
lmbda = pred_xs_rets
tan_wts = (np.linalg.inv(SIGMA) @ lmbda) / (np.ones(SIGMA.shape[0]).T @ np.linalg.inv(SIGMA) @ lmbda)
tan_wts.index = cov.index
tan_wts.iloc[:3].style.format('{:.2%}')

Unnamed: 0,CS Predicted Excess Returns
Agric,-3.07%
Food,1.53%
Soda,13.29%


### 2.3.

Evaluate the performance of this allocation in the `testing` period. Report the **annualized**
- mean
- vol
- Sharpe

In [7]:
sect_test.columns = [c.strip() for c in sect_test.columns]
pmh.calc_summary_statistics((sect_test.iloc[:, :3] @ reg_tan_wts.iloc[:3]), annual_factor=12, provided_excess_returns=True,
                            keep_columns=['Annualized Mean', 'Annualized Vol', 'Annualized Sharpe']).T.style.format('{:.2%}')

Unnamed: 0,Tangency Regularized 0.50 Weights
Annualized Mean,0.10%
Annualized Vol,1.70%
Annualized Sharpe,6.12%


In [8]:
# Corrected Solution - Did not need to filter to the specific sectors here...
tan_wts.index = [c.strip() for c in tan_wts.index]
pmh.calc_summary_statistics((sect_test @ tan_wts), annual_factor=12, provided_excess_returns=True,
                            keep_columns=['Annualized Mean', 'Annualized Vol', 'Annualized Sharpe']).iloc[:3].T.style.format('{:.2%}')

Unnamed: 0,CS Predicted Excess Returns
Annualized Mean,18.12%
Annualized Vol,11.95%
Annualized Sharpe,151.55%


### 2.4.

(7pts)

Construct the same tangency portfolio as in `2.2` but with one change:
* replace the risk premia of the assets, (denoted $\lambda_i$) with the sample averages of the excess returns from the `training` set.

So instead of using $\lambda_i$ suggested by the factor model (as in `2.1-2.3`) you're using sample averages for $\lambda_i$.

- Return the weights of the tangency portfolio for `Agric`, `Food`, `Soda`.

Evaluate the performance of this allocation in the `testing` period. Report the **annualized**
- mean
- vol
- Sharpe

In [9]:
new_reg_tan_wts = pmh.calc_tangency_weights(sect_train, cov_mat=0.5)
new_reg_tan_wts.iloc[:3].style.format('{:.2%}')

Unnamed: 0,Tangency Regularized 0.50 Weights
Agric,14.41%
Food,-6.98%
Soda,32.27%


In [10]:
new_reg_tan_wts.index = [c.strip() for c in new_reg_tan_wts.index]
pmh.calc_summary_statistics((sect_test.iloc[:, :3] @ new_reg_tan_wts.iloc[:3]), annual_factor=12, provided_excess_returns=True,
                            keep_columns=['Annualized Mean', 'Annualized Vol', 'Annualized Sharpe']).T.style.format('{:.2%}')

Unnamed: 0,Tangency Regularized 0.50 Weights
Annualized Mean,2.20%
Annualized Vol,5.28%
Annualized Sharpe,41.72%


In [11]:
# Corrected - Once again, did not need to filter to the specific sectors here...
new_reg_tan_wts.index = [c.strip() for c in new_reg_tan_wts.index]
pmh.calc_summary_statistics((sect_test @ new_reg_tan_wts), annual_factor=12, provided_excess_returns=True,
                            keep_columns=['Annualized Mean', 'Annualized Vol', 'Annualized Sharpe']).T.style.format('{:.2%}')

Unnamed: 0,Tangency Regularized 0.50 Weights
Annualized Mean,17.68%
Annualized Vol,15.30%
Annualized Sharpe,115.55%


### 2.5.

Which allocation performed better in the `testing` period: the allocation based on premia from the factor model or from the sample averages?

Why might this be?

<span style="color:red;">

The allocation built using the historical excess returns performed better in the testing period. This may be because the testing period is using historical returns. Consequently, our tangency portfolio that is constructed using predicted returns won't be as good of a representation of the time series of returns. This could be evidenced by the MAE of the time-series regression, which likely suggests that the predictive power of this model is not as good as we suspect.

</span>

<span style="color:blue;">

Corrected: The allocation using the AQR predicted excess returns performed better in the testing period.

Generally, we can state that historical returns are bad proxy for expected future returns.

On the other hand, generally, historical returns of factors are a better proxy for expected future factor returns. Expected returns (expected premia) of factor models are model stable, leading to better prediction of stock returns, conditional on the stability in the relationship between the factors and the assets ($\beta_i$). Focusing on factor estimation allows us to estimate the systematic risk component of assets, which is considered the only risk that has a premium.

The fact that the expected return model outperformed the historical return model does not imply that the factor model is a perfect pricing model.

**Extra**
- Numerous studies have demonstrated that portfolios constructed using factor-based expected returns outperform those using historical returns, particularly out-of-sample.
- Expected returns from factor models tend to be less extreme compared to historical averages of assets. Thus the expected return of assets also tend to be less extreme when using the factor-based model, thus providing less instability in the Mean-Variance optimization.

</span>

### 2.6.
Suppose you now want to build a tangency portfolio solely from the factors, without using the sector ETFs.

- Calculate the weights of the tangency portfolio using `training` data for the factors.
- Again, regularize the covariance matrix of factor returns by dividing off-diagonal elements by 2.

Report, in the `testing` period, the factor-based tangency stats **annualized**...
- mean
- vol
- Sharpe


In [12]:
factor_reg_tan_wts = pmh.calc_tangency_weights(aqr_train, cov_mat=0.5)
factor_reg_tan_wts.style.format('{:.2%}')

Unnamed: 0,Tangency Regularized 0.50 Weights
MKT,17.69%
HML,-1.62%
RMW,59.84%
UMD,24.09%


In [13]:
pmh.calc_summary_statistics((aqr_test @ factor_reg_tan_wts), annual_factor=12, provided_excess_returns=True,
                            keep_columns=['Annualized Mean', 'Annualized Vol', 'Annualized Sharpe']).T.style.format('{:.2%}')

Unnamed: 0,Tangency Regularized 0.50 Weights
Annualized Mean,6.24%
Annualized Vol,5.82%
Annualized Sharpe,107.19%


### 2.7.

Based on the hedge fund's beliefs, would you prefer to use the ETF-based tangency or the factor-based tangency portfolio? Explain your reasoning. Note that you should answer based on broad principles and not on the particular estimation results.

<span style="color:red;">

I would prefer to use the factor-based tangency portfolio because it provides better risk-adjusted (and absolute mean-excess) returns. However, I state this assuming that the factor portfolios are constructed in a way that they use the securities that underly the ETFs. Consequently, the investment universe using either approach results in investing in the same fundamental securities (i.e: set of company stocks). Therefore, using the factor portfolios seems to provide access to a better weighting scheme than the MVO portfolio constructed using the ETFs.

However, if the factor portfolios do not represent the same universe of assets, then I would prefer to use the ETF-based tangency portfolio because it is a better representation of the value I am endeavoring to provide to my investors. 

</span>

***

# 3. Long-Run Returns

For this question, use only the sheet `factors excess returns`.

Suppose we want to measure the long run returns of various pricing factors.

### 3.1.

Turn the data into log returns.
- Display the first 5 rows of the data.

Using these log returns, report the **annualized**
* mean
* vol
* Sharpe

### 3.2.

Consider 15-year cumulative log excess returns. Following the assumptions and modeling of Lecture 6, report the following 15-year stats:
- mean
- vol
- Sharpe

How do they compare to the estimated stats (1-year horizon) in `3.1`? 

In [14]:
# Answering 3.1 - Corrected
aqr_log_rets = np.log(1+aqr_xs_rets)
display(aqr_log_rets.head())
display(pmh.calc_summary_statistics(aqr_log_rets, annual_factor=12, provided_excess_returns=True,
                            keep_columns=['Annualized Mean', 'Annualized Vol', 'Annualized Sharpe']).T.style.format('{:.2%}'))

Unnamed: 0_level_0,MKT,HML,RMW,UMD
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1980-01-01,0.0536,0.0173,-0.0171,0.0728
1980-02-01,-0.0123,0.0061,0.0004,0.0758
1980-03-01,-0.1381,-0.0102,0.0145,-0.1004
1980-04-01,0.0389,0.0105,-0.0212,-0.0043
1980-05-01,0.0513,0.0038,0.0034,-0.0113


Unnamed: 0,MKT,HML,RMW,UMD
Annualized Mean,7.35%,1.98%,4.35%,5.01%
Annualized Vol,15.88%,10.98%,8.36%,16.04%
Annualized Sharpe,46.30%,18.01%,52.10%,31.22%


In [15]:
# Answering 3.2 - Corrected
# long_run_rets = aqr_log_rets.rolling(window=12*15).sum()
pmh.calc_summary_statistics(aqr_log_rets, annual_factor=12*15, provided_excess_returns=True,
                            keep_columns=['Annualized Mean', 'Annualized Vol', 'Annualized Sharpe']).T.style.format('{:.2%}')

Unnamed: 0,MKT,HML,RMW,UMD
Annualized Mean,110.32%,29.65%,65.31%,75.14%
Annualized Vol,61.52%,42.52%,32.37%,62.14%
Annualized Sharpe,179.33%,69.75%,201.77%,120.93%


<span style="color:red;">

The stats for the 15-year returns is substantially better than the 1-year returns horizon. This is a classic example of the benefits of time-diversification that was offered in the Barnstable case. This result shows that the excess mean returns scale linearly while the volatility of the excess returns only scales sub-linearly. Consequently, the risk-adjusted returns are substantially better.

</span>

### 3.3.

What is the probability that momentum factor has a negative mean excess return over the next 
* single period?
* 15 years?

In [16]:
def prob_under(mu, sigma, c, h):
    return norm.cdf(((c-mu)/sigma) * np.sqrt(h))

mu = aqr_log_rets.loc[:, 'UMD'].mean()
sigma = aqr_log_rets.loc[:, 'UMD'].std()

print(f'Single Period:\n\tPr(Mean Excess Rets < 0) = {prob_under(mu, sigma, c=0, h=1):.2%}')
print(f'15-Year:\n\tPr(Mean Excess Rets < 0) = {prob_under(mu, sigma, c=0, h=12*15):.2%}')

Single Period:
	Pr(Mean Excess Rets < 0) = 46.41%
15-Year:
	Pr(Mean Excess Rets < 0) = 11.33%


### 3.4.

Recall from the case that momentum has been underperforming since 2009. 

Using data from 2009 to present, what is the probability that momentum *outperforms* the market factor over the next
* period?
* 15 years?

In [17]:
# Corrected
mu = aqr_log_rets.loc['2009':, 'UMD'].mean()
sigma = aqr_log_rets.loc['2009':, 'UMD'].std()
c = aqr_log_rets.loc[:, 'MKT'].mean()

print(f'Single Period:\n\tPr(UMD Mean Excess Rets > MKT) = {1-prob_under(mu, sigma, c=c, h=1):.2%}')
print(f'15-Year:\n\tPr(UMD Mean Excess Rets > MKT) = {1-prob_under(mu, sigma, c=c, h=12*15):.2%}')

Single Period:
	Pr(UMD Mean Excess Rets > MKT) = 43.20%
15-Year:
	Pr(UMD Mean Excess Rets > MKT) = 1.07%


### 3.5.
Conceptually, why is there such a discrepancy between this probability for 1 period vs. 15 years?

What assumption about the log-returns are we making when we use this technique to estimate underperformance?

<span style="color:red;">

The probability of outperformance over the first period is less than 50%. This suggests that momentum underperforms the market factor on average. When we then analyze cumulative returns over a longer time horizon, those returns cumulate. So, if the momentum factors is expected to underperform the market factor on average in a single period, then the probability that it outperforms must diminish as we look at longer-term returns. This is because we would expect the momentum to cumulate at a slower pace than the market factor.

</span>

### 3.6.

Using your previous answers, explain what is meant by time diversification.

<span style="color:red;">

Time diversification refers to the idea that returns become "safer" as the holding period lengthens. This conclusion is drawn from the mathematical fact that returns scale linearly with time while volatility of returns scales sub-linearly with time. Consequently, the risk-adjusted returns of a longer holding period investment is expected to be higher than a shorter holding-period investment.

</span>

### 3.7.

Is the probability that `HML` and `UMD` both have negative cumulative returns over the next year higher or lower than the probability that `HML` and `MKT` both have negative cumulative returns over the next year?

Answer conceptually, but specifically. (No need to calculate the specific probabilities.)

In [18]:
mu = aqr_log_rets.loc[:, 'HML'].mean()
sigma = aqr_log_rets.loc[:, 'HML'].std()
print(f'HML Single Period:\n\tPr(Mean Excess Rets < 0) = {prob_under(mu, sigma, c=0, h=12):.2%}')

mu = aqr_log_rets.loc[:, 'UMD'].mean()
sigma = aqr_log_rets.loc[:, 'UMD'].std()
print(f'UMD Single Period:\n\tPr(Mean Excess Rets < 0) = {prob_under(mu, sigma, c=0, h=12):.2%}')

mu = aqr_log_rets.loc[:, 'MKT'].mean()
sigma = aqr_log_rets.loc[:, 'MKT'].std()
print(f'MKT Single Period:\n\tPr(Mean Excess Rets < 0) = {prob_under(mu, sigma, c=0, h=12):.2%}')

HML Single Period:
	Pr(Mean Excess Rets < 0) = 42.85%
UMD Single Period:
	Pr(Mean Excess Rets < 0) = 37.74%
MKT Single Period:
	Pr(Mean Excess Rets < 0) = 32.17%


<span style="color:red;">

The HML factor has the highest probability of experiencing negative returns over the next year, followed by UMD and then the Market. Consequently, we can conclude that the probability that both the HML and UMD factors both have negative cumulative excess returns over the next year is higher than the probability that both the HML and MKT factors have negative cumulative excess returns over the next year. 

</span>

***