## Factor Analysis using the CAPM and Fama-French Factor Models

The main idea in Factor Analysis is to take a set of observed returns and decompose it into a set of explanatory returns.

In [1]:
"""
Asset Management(Ang 2014, Oxford University Press) Chapter 10
Dataset: Returns of Berkshire Hathaway
"""
import pandas as pd
brka_d = pd.read_csv("data/brka_d_ret.csv", parse_dates=True, index_col=0)
brka_d.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01-01,-0.140634
1990-02-01,-0.030852
1990-03-01,-0.069204
1990-04-01,-0.003717
1990-05-01,0.067164


Goal is to take the returns of Berkshire Hathaway and decompose them into a bunch of explanatory returns and try to determine what exactly is driving Berkshire Hathaway.

In [2]:
brka_d.tail()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
2018-08-01,0.047256
2018-09-01,0.0133
2018-10-01,-0.038422
2018-11-01,0.059456
2018-12-01,-0.06135


The data is in daily returns.The factor explanatory variables that I'll use comes from Fama-French and to keep it simple, I'l convert it to monthly returns. The daily returns are compounded to get monthly returns.

In [3]:
import risk_kit as kit
%load_ext autoreload
%autoreload 2

In [4]:
# Resample daily to monthly returns
brka_m = brka_d.resample('M').apply(kit.compound).to_period('M')
brka_m.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.140634
1990-02,-0.030852
1990-03,-0.069204
1990-04,-0.003717
1990-05,0.067164


In [5]:
# Save for future use
brka_m.to_csv("data/brka_m_ret.csv")

Next, I'll load the explanatory variables, which is the Fama-French monthly returns dataset. 

In [6]:
fama_french = kit.get_fff_returns()
fama_french

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07,0.0296,-0.0230,-0.0287,0.0022
1926-08,0.0264,-0.0140,0.0419,0.0025
1926-09,0.0036,-0.0132,0.0001,0.0023
1926-10,-0.0324,0.0004,0.0051,0.0032
1926-11,0.0253,-0.0020,-0.0035,0.0031
...,...,...,...,...
2018-08,0.0344,0.0123,-0.0412,0.0016
2018-09,0.0006,-0.0237,-0.0134,0.0015
2018-10,-0.0768,-0.0468,0.0341,0.0019
2018-11,0.0169,-0.0074,0.0020,0.0018


**Mkt-RF** --> is the excess return on the market.

**RF** --> Risk Free rate.

Mkt-RF + RF gives the market returns.

The other two variables are the factor mimicking portfolios. In practice, the return of a factor is measured as the return of a real portfolio that mimicks the factor.

The return on the market is nothing more than the excess return on the market which is  a factor - it is a portfolio which holds the market and then subtract out the risk-free rate.

SMB and HML are also portfolios.

**SMB** --> Small Minus Big. SMB is a factor that is a long-short portfolio, long in small caps and short in large caps
    
    --> a long-short portfolio is a portfolio that has just the excess return associated with size. So this is essentially the returns of the size factor, small minus big. The way this is setup as SMB rather than BMS is because, Fama-French picks that particular order because the factor has a factor premium and therefore this portfolio on average should give you a positive return. 
    
    --> The important thing to note is this portfolio should not really have any market in it by and large and that is why it is set up as long short portfolio and it is constructed in such a way to try and minimize the effect of the market in it. The effect of portfolio is purely of size.
    
**HML**  --> High minus Low. HML is book to price. So it's basically value. The things that score high, the long part is value and the short part is anti-value or growth. So it is value minus growth or HML.


#### Using CAPM

Factor analysis is taking the returns and decomposing into different pieces according to what the factor model tells you. Now decompose BRKA into the portion that is due to the market and the rest that is not due to the market, using the CAPM as the explanatory model.

i.e.   $$ R_{brka,t}-R_{f,t} = \alpha + \beta (R_{brka,t}-R_{f,t}) +\epsilon_{t} $$

With CAPM, the returns can be broken up into one major piece and two little add-ons. The major part is the part that is driven by the market - beta times excess return on the market plus some fixed return plus something that is a noise term.

This is nothing more than a linear regression.

In [7]:
import statsmodels.api as sm
import numpy as np

# Decompose the observed BRKA 1990 - May 2012 as in Ang(2014)
brka_excess = brka_m["1990":"2012-05"] - fama_french.loc["1990":"2012-05", ['RF']].values
market_excess = fama_french.loc["1990":"2012-05", ['Mkt-RF']]
exp_var = market_excess.copy()
exp_var["Constant"] = 1
lm = sm.OLS(brka_excess, exp_var).fit()

In [8]:
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.154
Model:,OLS,Adj. R-squared:,0.15
Method:,Least Squares,F-statistic:,48.45
Date:,"Fri, 09 Jul 2021",Prob (F-statistic):,2.62e-11
Time:,12:02:14,Log-Likelihood:,388.47
No. Observations:,269,AIC:,-772.9
Df Residuals:,267,BIC:,-765.7
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.5402,0.078,6.961,0.000,0.387,0.693
Constant,0.0061,0.004,1.744,0.082,-0.001,0.013

0,1,2,3
Omnibus:,45.698,Durbin-Watson:,2.079
Prob(Omnibus):,0.0,Jarque-Bera (JB):,102.573
Skew:,0.825,Prob(JB):,5.33e-23
Kurtosis:,5.535,Cond. No.,22.2


#### The CAPM benchmark interpretation

This implies that the CAPM benchmark consists of 46cents in T-bills and 54cents in the market. i.e. each dollar in the Berkshire Hathaway portfolio is equivalent to 46cents in T-bills and 54cents in the market. Relative to this, the Berkshire Hathaway portfolio is adding (i.e. as an $\alpha$ of 0.61% (per month) although the degree of statistical significance is not very high.


### Fama-French Benchmark 

Now, add in some additional explanatory variables, namely Value and Size.

In [9]:
exp_var["Value"] = fama_french.loc["1990":"2012-05", ["HML"]]
exp_var["Size"] = fama_french.loc["1990":"2012-05", ["SMB"]]
exp_var.head()

Unnamed: 0,Mkt-RF,Constant,Value,Size
1990-01,-0.0785,1,0.0087,-0.0129
1990-02,0.0111,1,0.0061,0.0103
1990-03,0.0183,1,-0.029,0.0152
1990-04,-0.0336,1,-0.0255,-0.005
1990-05,0.0842,1,-0.0374,-0.0257


In [10]:
brka_excess.shape

(269, 1)

In [11]:
exp_var.shape

(269, 4)

In [12]:
lm = sm.OLS(brka_excess, exp_var).fit()
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.29
Model:,OLS,Adj. R-squared:,0.282
Method:,Least Squares,F-statistic:,36.06
Date:,"Fri, 09 Jul 2021",Prob (F-statistic):,1.41e-19
Time:,12:02:15,Log-Likelihood:,412.09
No. Observations:,269,AIC:,-816.2
Df Residuals:,265,BIC:,-801.8
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.6761,0.074,9.155,0.000,0.531,0.821
Constant,0.0055,0.003,1.679,0.094,-0.001,0.012
Value,0.3814,0.109,3.508,0.001,0.167,0.595
Size,-0.5023,0.101,-4.962,0.000,-0.702,-0.303

0,1,2,3
Omnibus:,42.261,Durbin-Watson:,2.146
Prob(Omnibus):,0.0,Jarque-Bera (JB):,67.954
Skew:,0.904,Prob(JB):,1.75e-15
Kurtosis:,4.671,Cond. No.,37.2


#### The Fama-French Benchmark Interpretation.

The $\alpha$ has fallen from 0.61% to about 0.55% per month. The loading on the market has moved up from 0.54 to 0.67, which means that adding these new explanatory factors did change things. If we had added irrelavant variables the loading in the market would be unaffected.


The positive loadings on Value implies that the Hathaway manager has a value tilt. It would not be a shock to anyone following Warren Buffet. Additionally, the negative tilt on size suggests that Hathaway tends to invest in large companies, not small companies.

In other words, Hathaway appears to be a Large Value investor. This was shown using the numbers above.

The new way to interpret each dollar invested in Hathway is: 67 cents in the market and 33 cents in Bills, 38 cents in Value stocks and short 38 cents in growth stocks; short 50 cents in small-cap stocks and long 50cents in large-cap stocks.

With all this you would still end up underperforming Hathaway by about 55 basis points per month.



***Now check if Buffets tilts are consistent over the whole time period***

In [13]:
brka_m = pd.read_csv("data/brka_m_ret.csv", index_col=0)
brka_m.index = pd.to_datetime(brka_m.index, format="%Y-%m").to_period('M')
fama_french = kit.get_fff_returns()

In [14]:
brka_excess = brka_m["1990":"2018-12"] - fama_french.loc["1990":"2018", ['RF']].values
market_excess = fama_french.loc["1990":"2018", ['Mkt-RF']]
exp_var = market_excess.copy()

In [15]:
exp_var["Value"] = fama_french.loc["1990":"2018", ["HML"]]
exp_var["Size"] = fama_french.loc["1990":"2018", ["SMB"]]

In [16]:
result = kit.regress(brka_excess, exp_var)
result.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.317
Model:,OLS,Adj. R-squared:,0.311
Method:,Least Squares,F-statistic:,53.29
Date:,"Fri, 09 Jul 2021",Prob (F-statistic):,2.59e-28
Time:,12:02:15,Log-Likelihood:,567.01
No. Observations:,348,AIC:,-1126.0
Df Residuals:,344,BIC:,-1111.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.7096,0.063,11.350,0.000,0.587,0.833
Value,0.4053,0.090,4.494,0.000,0.228,0.583
Size,-0.4829,0.085,-5.696,0.000,-0.650,-0.316
Alpha,0.0052,0.003,1.991,0.047,6.18e-05,0.010

0,1,2,3
Omnibus:,64.922,Durbin-Watson:,2.162
Prob(Omnibus):,0.0,Jarque-Bera (JB):,138.061
Skew:,0.962,Prob(JB):,1.0499999999999999e-30
Kurtosis:,5.413,Cond. No.,38.1
