# Factor Analysis using the CAPM and Fama-French Factor models

main idea: take a set of observed returns and decompose it into a set of several explanatory returns. 

textbook: _Asset Management_ chapter 10

returns of Berkshire Hathaway: `data/brka_d_ret.csv`

In [11]:
import pandas as pd

brka_d = pd.read_csv('data/brka_d_ret.csv',parse_dates = True,index_col = 0)
brka_d.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01-02,-0.005764
1990-01-03,0.0
1990-01-04,0.005797
1990-01-05,-0.005764
1990-01-08,0.0


In [12]:
brka_d.tail()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
2018-12-24,-0.018611
2018-12-26,0.0432
2018-12-27,0.012379
2018-12-28,0.013735
2018-12-31,0.011236


convert data to monthly returns by using `.resample` method and the grouping rule of 'M' 

In [20]:
import edhec_risk_kit_201 as erk

%load_ext autoreload
%autoreload 2

brka_m = brka_d.resample('M').apply(erk.compound).to_period('M')
brka_m.head()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.140634
1990-02,-0.030852
1990-03,-0.069204
1990-04,-0.003717
1990-05,0.067164


In [21]:
brka_m.to_csv('brka_m.csv') # save as csv

Next, load the explanatory variables, which is the Fama-French monthly returns data set

In [22]:
fff = erk.get_fff_returns()
fff.head()

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07,0.0296,-0.023,-0.0287,0.0022
1926-08,0.0264,-0.014,0.0419,0.0025
1926-09,0.0036,-0.0132,0.0001,0.0023
1926-10,-0.0324,0.0004,0.0051,0.0032
1926-11,0.0253,-0.002,-0.0035,0.0031


Now decompose using `ststs.api` for linear regression(CAPM)

$$ R_{brka,t} - R_{f,t} = \alpha + \beta(R_{mkt,t} - R_{f,t}) + \epsilon_t $$

In [40]:
import statsmodels.api as sm
import numpy as np
brka_excess = brka_m['1990':'2012-05'] - fff.loc['1990':'2012-05',['RF']].values
mkt_excess = fff.loc['1990':'2012-05',['Mkt-RF']]
exp_var = mkt_excess.copy()
exp_var['Constant'] = 1 #add a column called constant
lm = sm.OLS(brka_excess,exp_var).fit()

In [41]:
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.154
Model:,OLS,Adj. R-squared:,0.15
Method:,Least Squares,F-statistic:,48.45
Date:,"Sat, 14 Aug 2021",Prob (F-statistic):,2.62e-11
Time:,16:22:54,Log-Likelihood:,388.47
No. Observations:,269,AIC:,-772.9
Df Residuals:,267,BIC:,-765.7
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.5402,0.078,6.961,0.000,0.387,0.693
Constant,0.0061,0.004,1.744,0.082,-0.001,0.013

0,1,2,3
Omnibus:,45.698,Durbin-Watson:,2.079
Prob(Omnibus):,0.0,Jarque-Bera (JB):,102.573
Skew:,0.825,Prob(JB):,5.33e-23
Kurtosis:,5.535,Cond. No.,22.2


$\alpha$ is 0.0061, and $\beta$ is about 0.54.

### The CAPM benchmark interpretation

This implies that the CAPM benchmark consists of 46 cents in T-Bills and 54 cents in the market. i.e. each dollar in the Berkshire Hathaway portfolio is equivalent to 46 cents in T-Bills and 54 cents in the market. Relative to this, the Berkshire Hathaway is adding 0.61% _(per month!)_ although the degree of statistica significance is not very high.

Then, add in additional explanatory variables, namely Value and Size.

In [42]:
exp_var['Value'] = fff.loc['1990':'2012-05',['HML']] #add a column called Value
exp_var['Size'] = fff.loc['1990':'2012-05',['SMB']] #add a column called Size
exp_var.head()

Unnamed: 0,Mkt-RF,Constant,Value,Size
1990-01,-0.0785,1,0.0087,-0.0129
1990-02,0.0111,1,0.0061,0.0103
1990-03,0.0183,1,-0.029,0.0152
1990-04,-0.0336,1,-0.0255,-0.005
1990-05,0.0842,1,-0.0374,-0.0257


In [44]:
lm = sm.OLS(brka_excess, exp_var).fit()
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.29
Model:,OLS,Adj. R-squared:,0.282
Method:,Least Squares,F-statistic:,36.06
Date:,"Sat, 14 Aug 2021",Prob (F-statistic):,1.41e-19
Time:,16:24:10,Log-Likelihood:,412.09
No. Observations:,269,AIC:,-816.2
Df Residuals:,265,BIC:,-801.8
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.6761,0.074,9.155,0.000,0.531,0.821
Constant,0.0055,0.003,1.679,0.094,-0.001,0.012
Value,0.3814,0.109,3.508,0.001,0.167,0.595
Size,-0.5023,0.101,-4.962,0.000,-0.702,-0.303

0,1,2,3
Omnibus:,42.261,Durbin-Watson:,2.146
Prob(Omnibus):,0.0,Jarque-Bera (JB):,67.954
Skew:,0.904,Prob(JB):,1.75e-15
Kurtosis:,4.671,Cond. No.,37.2


### The Fama-French Benchmark Interpretation

Alpha has fallen from 0.0061 to 0.0055 per month.The loading on the market has moved up from 0.54 to 0.67, which means that adding these new explanatory factors did change things. If we had added irrelevant variables, the loading on the market would be unaffected.

We can interpret the loadings on Value being positive as saying that Hathaway has a significant Value tilt. Additionally, the negative tilt on size suggests that Hathaway tends to invest in large companies, not small companies.In other words, Hathaway appears to be a Large Value investor. 

The new way to interpret each dollar invested in Hathaway is: 67 cents in the market, 33 cents in Bills, 38 cents in Value stocks and short 38 cents in Growth stocks, short 50 cents in SmallCap stocks and long 50 cents in LargeCap stocks. If you did all this, you would still end up underperforming Hathaway by about 55 basis points per month.

Add the following code to the toolkit:

```python
import statsmodels.api as sm
def regress(dependent_variable, explanatory_variables, alpha=True):
    """
    Runs a linear regression to decompose the dependent variable into the explanatory variables
    returns an object of type statsmodel's Regression Results on which you can call
       .summary() to print a full summary
       .params for the coefficients
       .tvalues and .pvalues for the significance levels
       .rsquared_adj and .rsquared for quality of fit
    """
    if alpha:
        explanatory_variables = explanatory_variables.copy()
        explanatory_variables["Alpha"] = 1
    
    lm = sm.OLS(dependent_variable, explanatory_variables).fit()
    return lm
```

In [45]:
result = erk.regress(brka_excess, mkt_excess)

In [46]:
result.params

Mkt-RF    0.540175
Alpha     0.006133
dtype: float64

In [47]:
result.tvalues

Mkt-RF    6.960550
Alpha     1.744449
dtype: float64

In [48]:
result.pvalues

Mkt-RF    2.622873e-11
Alpha     8.223148e-02
dtype: float64

In [49]:
result.rsquared_adj

0.15041804337083975

In [50]:
exp_var.head()

Unnamed: 0,Mkt-RF,Constant,Value,Size
1990-01,-0.0785,1,0.0087,-0.0129
1990-02,0.0111,1,0.0061,0.0103
1990-03,0.0183,1,-0.029,0.0152
1990-04,-0.0336,1,-0.0255,-0.005
1990-05,0.0842,1,-0.0374,-0.0257


In [51]:
erk.regress(brka_excess, exp_var, alpha=False).summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.29
Model:,OLS,Adj. R-squared:,0.282
Method:,Least Squares,F-statistic:,36.06
Date:,"Sat, 14 Aug 2021",Prob (F-statistic):,1.41e-19
Time:,16:40:17,Log-Likelihood:,412.09
No. Observations:,269,AIC:,-816.2
Df Residuals:,265,BIC:,-801.8
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.6761,0.074,9.155,0.000,0.531,0.821
Constant,0.0055,0.003,1.679,0.094,-0.001,0.012
Value,0.3814,0.109,3.508,0.001,0.167,0.595
Size,-0.5023,0.101,-4.962,0.000,-0.702,-0.303

0,1,2,3
Omnibus:,42.261,Durbin-Watson:,2.146
Prob(Omnibus):,0.0,Jarque-Bera (JB):,67.954
Skew:,0.904,Prob(JB):,1.75e-15
Kurtosis:,4.671,Cond. No.,37.2


## exercise
### from 2013 to 2018

In [53]:
import statsmodels.api as sm
import numpy as np
brka_excess18 = brka_m['2013':'2018'] - fff.loc['2013':'2018',['RF']].values
mkt_excess18 = fff.loc['2013':'2018',['Mkt-RF']]
exp_var18 = mkt_excess18.copy()
exp_var18['Constanct'] = 1
lm18 = sm.OLS(brka_excess18, exp_var18).fit()

In [54]:
lm18.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.491
Model:,OLS,Adj. R-squared:,0.484
Method:,Least Squares,F-statistic:,67.55
Date:,"Sat, 14 Aug 2021",Prob (F-statistic):,7.22e-12
Time:,17:08:35,Log-Likelihood:,156.99
No. Observations:,72,AIC:,-310.0
Df Residuals:,70,BIC:,-305.4
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.8376,0.102,8.219,0.000,0.634,1.041
Constanct,0.0037,0.003,1.086,0.281,-0.003,0.011

0,1,2,3
Omnibus:,2.996,Durbin-Watson:,1.752
Prob(Omnibus):,0.224,Jarque-Bera (JB):,2.674
Skew:,0.379,Prob(JB):,0.263
Kurtosis:,2.437,Cond. No.,31.2


In [55]:
exp_var18['Value'] = fff.loc['2013':'2018',['HML']]
exp_var18['Size'] = fff.loc['2013':'2018',['SMB']]
exp_var18.head()

Unnamed: 0,Mkt-RF,Constanct,Value,Size
2013-01,0.0557,1,0.0095,0.0039
2013-02,0.0129,1,0.0003,-0.0045
2013-03,0.0403,1,-0.0029,0.0078
2013-04,0.0155,1,0.0063,-0.0242
2013-05,0.028,1,0.026,0.0167


In [56]:
exp_var18.tail()

Unnamed: 0,Mkt-RF,Constanct,Value,Size
2018-08,0.0344,1,-0.0412,0.0123
2018-09,0.0006,1,-0.0134,-0.0237
2018-10,-0.0768,1,0.0341,-0.0468
2018-11,0.0169,1,0.002,-0.0074
2018-12,-0.0955,1,-0.0151,-0.0261


In [57]:
lm = sm.OLS(brka_excess18,exp_var18).fit()
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.635
Model:,OLS,Adj. R-squared:,0.619
Method:,Least Squares,F-statistic:,39.42
Date:,"Sat, 14 Aug 2021",Prob (F-statistic):,7.08e-15
Time:,17:12:37,Log-Likelihood:,168.94
No. Observations:,72,AIC:,-329.9
Df Residuals:,68,BIC:,-320.8
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.9257,0.091,10.183,0.000,0.744,1.107
Constanct,0.0035,0.003,1.180,0.242,-0.002,0.009
Value,0.5126,0.126,4.083,0.000,0.262,0.763
Size,-0.4153,0.121,-3.438,0.001,-0.656,-0.174

0,1,2,3
Omnibus:,1.307,Durbin-Watson:,1.999
Prob(Omnibus):,0.52,Jarque-Bera (JB):,1.328
Skew:,0.243,Prob(JB):,0.515
Kurtosis:,2.545,Cond. No.,46.1


In [59]:
erk.regress(brka_excess18, exp_var18, alpha=False).summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.635
Model:,OLS,Adj. R-squared:,0.619
Method:,Least Squares,F-statistic:,39.42
Date:,"Sat, 14 Aug 2021",Prob (F-statistic):,7.08e-15
Time:,17:14:50,Log-Likelihood:,168.94
No. Observations:,72,AIC:,-329.9
Df Residuals:,68,BIC:,-320.8
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.9257,0.091,10.183,0.000,0.744,1.107
Constanct,0.0035,0.003,1.180,0.242,-0.002,0.009
Value,0.5126,0.126,4.083,0.000,0.262,0.763
Size,-0.4153,0.121,-3.438,0.001,-0.656,-0.174

0,1,2,3
Omnibus:,1.307,Durbin-Watson:,1.999
Prob(Omnibus):,0.52,Jarque-Bera (JB):,1.328
Skew:,0.243,Prob(JB):,0.515
Kurtosis:,2.545,Cond. No.,46.1
