# Factor Models and the Tangency

## Weight on the Tangency portfolio

Above we looked at the tangency portfolio weights of the securities and factors.

Are these weights "significant"? That is, are we sure that these weights are not just the 4 factors plus random noise?

It turns out that running OLS of y=1 on X = returns gives us the tangency weights. (They are a scaled version, but that doesn't matter.) 

Since this comes from regression betas, we also automatically get the t-stats and p-values. If the p-values are less than .05, then we say the regressor is statistically significant at the 5th percentile.

Let's see whether the factors are the only significant weights when included with the equities.

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
import seaborn as sns
import matplotlib.pyplot as plt

import sys
sys.path.insert(0, '../cmds')
from portfolio import *

In [2]:
facs = pd.read_excel('../data/factor_pricing_data.xlsx',sheet_name='factors (excess)').set_index('Date')
assets = pd.read_excel('../data/factor_pricing_data.xlsx',sheet_name='assets (excess)').set_index('Date')

FREQ = 12

In [3]:
temp = sm.add_constant(assets.join(facs))
X = temp.drop(columns=['const'])
y = temp[['const']]
mod = sm.OLS(y,X).fit()
pvals = mod.pvalues.to_frame()
pvals.columns = ['p-values']
pvals.T.style.format('{:.2%}')

Unnamed: 0,NoDur,Durbl,Manuf,Enrgy,Chems,BusEq,Telcm,Utils,Shops,Hlth,Money,Other,MKT,SMB,HML,UMD
p-values,26.48%,22.90%,5.43%,19.71%,17.81%,0.00%,26.97%,7.35%,1.55%,0.20%,14.33%,0.48%,0.42%,26.71%,3.34%,0.01%


The p-values for MKT, HML, and UMD are less than .05, saying they're significant in testing. 

However, we also see significance (p-value less than .05) in many of the test assets.

### Do the extra 3 factors beyond the CAPM help much?

We could see whether the tangency portfolio is improved much by using the four factors (versus just using MKT.)

In [4]:
temp = sm.add_constant(facs)
X = temp[['MKT','SMB','HML','UMD']]
y = temp[['const']]

mod = sm.OLS(y,X).fit()
pvals = mod.pvalues.to_frame()
pvals.columns = ['p-values']
pvals.T.style.format('{:.2%}')

Unnamed: 0,MKT,SMB,HML,UMD
p-values,0.00%,97.40%,2.83%,0.06%


It appears that the size factor, SMB, is not significant at any interesting level.

So why is the size factors used in pricing? It seems to help when we test large 

But hopefully this also helps illustrate why CAPM is still the baseline for many applications.