# Midterm #2

## Imports

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from arch import arch_model
from arch.univariate import GARCH, EWMAVariance 
from sklearn import linear_model
import scipy.stats as stats
from statsmodels.regression.rolling import RollingOLS
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
pd.set_option("display.precision", 4)

## Data

In [4]:
#commodities = pd.read_excel('../data/midterm_2_data_pricing.xlsx', sheet_name = 'assets (excess returns)').set_index('Date')
commodities = pd.read_excel('midterm_2_data_pricing.xlsx', sheet_name = 'assets (excess returns)').set_index('Date')
#factors = pd.read_excel('../data/midterm_2_data_pricing.xlsx', sheet_name = 'factors (excess returns)').set_index('Date')
factors = pd.read_excel('midterm_2_data_pricing.xlsx', sheet_name = 'factors (excess returns)').set_index('Date')

commodities.head()

Unnamed: 0_level_0,NG1,KC1,CC1,LB1,CT1,SB1,LC1,W1,S1,C1,GC1,SI1,HG1,PA1
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2000-01-31,0.1389,-0.1217,-0.0543,-0.011,0.1362,-0.1185,0.0204,0.0271,0.0961,0.0717,-0.0262,-0.0274,-0.0141,0.0749
2000-02-29,0.0329,-0.1051,-0.0571,-0.0516,-0.0261,-0.1464,-0.01,-0.0404,-0.0176,-0.027,0.0345,-0.0495,-0.0718,0.4646
2000-03-31,0.0619,0.0333,0.0577,-0.0214,0.0225,0.2641,0.03,0.057,0.0836,0.093,-0.0584,-0.0102,0.0118,-0.1683
2000-04-30,0.062,-0.0856,-0.0709,-0.0822,-0.0464,-0.13,0.0242,-0.0809,-0.0394,-0.0565,-0.0179,-0.0166,-0.0171,0.0375
2000-05-31,0.3818,-0.0291,0.1222,-0.0133,0.1194,0.4582,-0.0923,0.1292,-0.0221,0.0006,-0.0159,-0.0088,0.0216,-0.0674


In [5]:
factors.head()

Unnamed: 0_level_0,MKT,CL1
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-01-31,-0.0474,0.0756
2000-02-29,0.0245,0.0966
2000-03-31,0.052,-0.1207
2000-04-30,-0.064,-0.0477
2000-05-31,-0.0442,0.122


## 1. Short Answer

**1.1:** 
False. If the factors work perfectly for pricing, then together they span the tangency. This means that a portfolio of the factors has the maximum Sharpe ratio--it says nothing about their individual Sharpe ratios.

**1.2:**
Depends. True if "investment" beta refers to the factor which is conservative minus agressive investment, which actually means beta to a low-investment factor. 

False if interpreted as beta to high (strong) investment. Fama-French find that correlation to conservative (low) investment is what boosts mean returns.

**1.3:**
Even if individual Sharpe Ratios are zero, they may have substantial marginal impact on a portfolio with just the market factor. So looking at their univariate stats is not enough.

Still, reasonable to point out that weak mean returns mean they may not be as important as DFA thought at the time of the case.


**1.4:**
Nothing. Fama-French does not make any prediction about how the asset's characteristics impacts mean returns. It only predicts that the asset's beta impacts returns. Thus, if it has zero correlation (and thus zero beta) to the factors, then its expected return in the model is zero. And its characteristic beyond that is irrelevant to the model.


**1.5:**
The construction of the momentum portfolio does two things:
* diversifies idiosyncratic risk by going long lots of winners and shorting lots of losers. Not just going long-short the most extreme, but in Fama-French construction, utilizing top/bottom 30%!

* Avoid too much turnover by ranking winners and losers on rolling 12 month period rather than a single period. This smoothes the ranking and reduces turnover.

**1.6:**
A long-only momentum fund is extremely correlated to the market equity factor, as seen in HW#6, section 2. (Note that you did not need to do Section 1 to answer this.) This greatly reduces the attractiveness of momentum relative to its long-short construction, which has higher Sharpe and near zero correlation to the market equity factor.

**1.7:**
* We know NOTHING about their time-series regression fits as seen in R-squared.

* All their Treynor ratios should be identical, and equal to the market premium.

* All their information ratios should be zero.

**1.8:**
* The Central Limit Theorem supported this. One could note that the assumptions were extreme, but the overall results support this. (Barnstable made the more tenuous bet that it would outperform a constant 6% rate, but we asked about the risk-free rate.)

* This is extremely likely given the mathematics of how means versus volatilities compound. Though the log iid assumption is strong, we saw much evidence that Sharpe ratios grow nearly with the square-root of the horizon, which would make the 100-year Sharpe about 10x the 1-year Sharpe.

* This is definitely false. The volatility of the cumulative return GROWS with the horizon. The Central Limit Theorem gave the result that the volatility of the *average* return shrinks with the horizon, (see the first bullet point,) but that the volatility of the *cumulative* return grows with horizon.

## 2 Pricing Model: Time-Series Test

In [6]:
ts_test = pd.DataFrame(data = None, index = commodities.columns, columns = [r'$\alpha$', 'MKT', 'CL1'])

for asset in ts_test.index:
    y = commodities[asset]
    X = sm.add_constant(factors[['MKT','CL1']])

    reg = sm.OLS(y, X).fit().params
    ts_test.loc[asset] = [reg[0] * 12, reg[1], reg[2]]
    
ts_test

Unnamed: 0,$\alpha$,MKT,CL1
NG1,0.1195,-0.0377,0.2502
KC1,0.0203,0.2992,0.0321
CC1,0.0632,0.1139,0.1243
LB1,0.0555,0.7791,0.1874
CT1,0.013,0.5291,0.0629
SB1,0.0696,0.0579,0.1628
LC1,0.0163,0.1068,0.0529
W1,0.0558,0.2912,-0.0026
S1,0.0421,0.3533,0.0386
C1,0.0609,0.2551,0.0652


**2.1:** (5pts) For the asset NG1, report the alpha and betas of the regression.

In [7]:
ts_test.loc['NG1'].to_frame()

Unnamed: 0,NG1
$\alpha$,0.1195
MKT,-0.0377
CL1,0.2502


**2.2:** (5pts) Report the two factor premia implied by the time-series test. Annualize them.

In [8]:
(factors.mean() * 12).to_frame('Factor Premia')

Unnamed: 0,Factor Premia
MKT,0.0707
CL1,0.1087


**2.3:** (5pts) Report the Mean Absolute Pricing Error (MAE) of the model. Annualize it.

In [9]:
print('MAE: ' + str(round(ts_test[r'$\alpha$'].abs().mean(), 4)))

MAE: 0.0549


**2.4:** (5pts) Report the largest predicted premium from the model, and note which asset it is.

In [10]:
(factors.mean() * 12 * ts_test[['MKT','CL1']]).sum(axis = 1).to_frame('Predicted Premium').nlargest(1, 'Predicted Premium')

Unnamed: 0,Predicted Premium
LB1,0.0754


LB1 is the lumber future. 

## 3 Pricing Model: Cross-Sectional Test

In [11]:
y = commodities.mean()
X = sm.add_constant(ts_test[['MKT','CL1']].astype(float))

cross_sect = sm.OLS(y, X).fit()

**3.1:** (5pts) For the cross-sectional regression, report the:
- $R^{2}$
- Intercept. Annualize this number.

In [12]:
print('R-squared: ' + str(round(cross_sect.rsquared, 4)))

R-squared: 0.6313


In [13]:
print('Alpha: ' + str(round(cross_sect.params[0] * 12, 4)))

Alpha: 0.0456


**3.2:** (4pts) Are either, neither, or both of these estimated metrics evidence against the model?


Both of these estimated metrics are evidence against the model. $R^{2}$ would be 1 and $\alpha$ would be equal to zero if all asset return were be explained by these two premia.  

**3.3:** (4pts) Report the estimated factor premia. (i.e. the two cross-sectional regression slopes). Annualize this number.

In [14]:
(cross_sect.params[1:] * 12).to_frame('Estimated Factor Premia')

Unnamed: 0,Estimated Factor Premia
MKT,0.0186
CL1,0.3319


**3.4:** (4pts) Report the Mean Absolute Pricing Error (MAE) of the model. Annualize it.

In [15]:
predicted = cross_sect.params[0] + (ts_test[['MKT','CL1']] * cross_sect.params[1:]).sum(axis=1)
MAE_cs = (commodities.mean() - predicted).abs().mean() * 12

print('MAE: ' + str(round(MAE_cs, 4)))

MAE: 0.0169


In [16]:
MAE_cs = cross_sect.resid.abs().mean() * 12

print('MAE: ' + str(round(MAE_cs, 4)))

MAE: 0.0169


### OR ( both 'including the intercept' and 'excluding the intercept' will get full credit on 3.4 and 3.5)

In [17]:
predicted_without_intercept = (ts_test[['MKT','CL1']] * cross_sect.params[1:]).sum(axis=1)
MAE_cs_without_intercept = (commodities.mean() - predicted_without_intercept).abs().mean() * 12

print('MAE: ' + str(round(MAE_cs_without_intercept, 4)))

MAE: 0.0456


**3.5:** (4pts) Report the largest predicted premium from the model, and note which asset it is.

In [159]:
(predicted * 12).nlargest(1).to_frame('Predicted Premium')

Unnamed: 0,Predicted Premium
NG1,0.1279


NG1 is the natural gas future.

In [18]:
# If not including the intercept in the model, 

(predicted_without_intercept * 12).nlargest(1).to_frame('Predicted Premium')

Unnamed: 0,Predicted Premium
NG1,0.0823


## 4 Pricing Model: Conceptual Questions

**4.1:** (5pts) Which is more useful in assessing the model’s fit for pricing: the r-squared of the timeseries regressions, the r-squared of the cross-sectional regression, or neither?

The r-squared of the cross-sectional regression. We do not care about the r-squared of the time series regressions for pricing, while for the cross-sectional regression if r-squared is not equal to one then we know the pricing model is not completely explaining all premia that exist.

**4.2:** (5pts) We calculated the MAE from the time-series estimation and from the cross-sectional (with intercept) estimation. Is one always bigger than the other? Why or why not?

- If we use an intercept in the cross-section, then the cross-sectional fit has to be better than the time-series fit. 
- The cross-sectional fit gets to adjust the factor premia, which can only help. So, the MAE from the time-series will always be bigger. 

**4.3:** (5pts) If we add another factor, will the time-series MAE decrease? And how about the crosssectional MAE? Explain.

- It is unclear whether MAE will increase or decrease in the time series test, as the intercept may increase or decrease when we add a factor.
- In the cross-sectional regression the fit must improve, as when we add a factor $R^{2}$ must increase.

**4.4:** Suppose we built a tangency portfolio using only the factors.

(a) (2pts) Compute tangency weights using just the two factors as the asset space. Does CL1 have much weight in this factor-tangency portfolio?

In [142]:
mu = factors.mean()
sigma = factors.cov()
w_tan_unscaled = np.linalg.inv(sigma) @ mu

wtan = pd.DataFrame(w_tan_unscaled / w_tan_unscaled.sum(), 
                    index = ['MKT','CL1'], 
                    columns = ['Tangency Weights'])

wtan

Unnamed: 0,Tangency Weights
MKT,0.8811
CL1,0.1189


CL1 does not have much weight in this factor-tangency portfolio.

(b)  (3pts) Conceptually, does this seem like evidence that CL1 is a useful pricing factor? Why?

It is a much less useful pricing factor than the market. If factors price well, we expect them to have large weights in the tangency portfolio as that is where we would seek to extract premium. 