# Factor Analysis using the CAPM and Fama-French Factor models

The main idea in Factor Analysis is to take a set of observed returns and decompose it into a set of explanatory returns.

We'll analyze the returns of Berkshire Hathaway.

In [55]:
import pandas as pd

brka_d = pd.read_csv("data/brka_d_ret.csv", parse_dates=True, index_col=0)
brka_d.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01-02,-0.005764
1990-01-03,0.0
1990-01-04,0.005797
1990-01-05,-0.005764
1990-01-08,0.0


In [57]:
brka_d.tail()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
2018-12-24,-0.018611
2018-12-26,0.0432
2018-12-27,0.012379
2018-12-28,0.013735
2018-12-31,0.011236


Next, we need to convert these to monthly returns. The simplest way to do so is by using the `.resample` method, which allows you to run an aggregation function on each group of returns in a time series. We'll give it the grouping rule of 'M' which means _monthly_ (consult the `pandas`) documentation for other codes)

We want to compound the returns:

In [60]:
import edhec_risk_kit_201 as erk

%load_ext autoreload
%autoreload 2

brka_m = brka_d.resample('M').apply(erk.compound).to_period('M')
brka_m.head()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.140634
1990-02,-0.030852
1990-03,-0.069204
1990-04,-0.003717
1990-05,0.067164


In [62]:
brka_m.to_csv("brka_m.csv") # for possible future use!

Next, we need to load the explanatory variables, which is the Fama-French monthly returns data set:

```python
def get_fff_returns():
    """
    Load the Fama-French Research Factor Monthly Dataset
    """
    rets = pd.read_csv("data/F-F_Research_Data_Factors_m.csv",
                       header=0, index_col=0, na_values=-99.99)/100
    rets.index = pd.to_datetime(rets.index, format="%Y%m").to_period('M')
    return rets
```    

In [66]:
fff = erk.get_fff_returns()
fff.tail()

Unnamed: 0,Mkt-RF,SMB,HML,RF
2018-08,0.0344,0.0123,-0.0412,0.0016
2018-09,0.0006,-0.0237,-0.0134,0.0015
2018-10,-0.0768,-0.0468,0.0341,0.0019
2018-11,0.0169,-0.0074,0.002,0.0018
2018-12,-0.0955,-0.0261,-0.0151,0.0019


Next, we need to decompose the observed BRKA 1990-May 2012 as in Ang(2014) into the portion that's due to the market and the rest that is not due to the market, using the CAPM as the explanatory model.

i.e.

$$ R_{brka,t} - R_{f,t} = \alpha + \beta(R_{mkt,t} - R_{f,t}) + \epsilon_t $$

We can use the `stats.api` for the linear regression as follows:

In [69]:
import statsmodels.api as sm
import numpy as np
brka_excess = brka_m["1990":"2012-05"] - fff.loc["1990":"2012-05", ['RF']].values
mkt_excess = fff.loc["1990":"2012-05",['Mkt-RF']]
exp_var = mkt_excess.copy()
exp_var["Constant"] = 1
lm = sm.OLS(brka_excess, exp_var).fit()

In [82]:
brka_excess

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.146334
1990-02,-0.036552
1990-03,-0.075604
1990-04,-0.010617
1990-05,0.060364
...,...
2012-01,0.027624
2012-02,0.000076
2012-03,0.033629
2012-04,-0.009024


In [84]:
exp_var

Unnamed: 0,Mkt-RF,Constant
1990-01,-0.0785,1
1990-02,0.0111,1
1990-03,0.0183,1
1990-04,-0.0336,1
1990-05,0.0842,1
...,...,...
2012-01,0.0505,1
2012-02,0.0442,1
2012-03,0.0311,1
2012-04,-0.0085,1


In [71]:
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.154
Model:,OLS,Adj. R-squared:,0.15
Method:,Least Squares,F-statistic:,48.45
Date:,"Tue, 02 Apr 2024",Prob (F-statistic):,2.62e-11
Time:,10:00:10,Log-Likelihood:,388.47
No. Observations:,269,AIC:,-772.9
Df Residuals:,267,BIC:,-765.7
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.5402,0.078,6.961,0.000,0.387,0.693
Constant,0.0061,0.004,1.744,0.082,-0.001,0.013

0,1,2,3
Omnibus:,45.698,Durbin-Watson:,2.079
Prob(Omnibus):,0.0,Jarque-Bera (JB):,102.573
Skew:,0.825,Prob(JB):,5.33e-23
Kurtosis:,5.535,Cond. No.,22.2


### The CAPM benchmark interpretation

This implies that the CAPM benchmark consists of 46 cents in T-Bills and 54 cents in the market. i.e. each dollar in the Berkshire Hathaway portfolio is equivalent to 46 cents in T-Bills and 54 cents in the market. Relative to this, the Berkshire Hathaway is adding (i.e. has $\alpha$ of) 0.61% although the degree of statistical significance is not very high.

Now, let's add in some additional explanatory variables, namely Value and Size.

In [13]:
exp_var["Value"] = fff.loc["1990":"2012-05",['HML']]
exp_var["Size"] = fff.loc["1990":"2012-05",['SMB']]
exp_var.head()

Unnamed: 0,Mkt-RF,Constant,Value,Size
1990-01,-0.0785,1,0.0087,-0.0129
1990-02,0.0111,1,0.0061,0.0103
1990-03,0.0183,1,-0.029,0.0152
1990-04,-0.0336,1,-0.0255,-0.005
1990-05,0.0842,1,-0.0374,-0.0257


In [75]:
lm = sm.OLS(brka_excess, exp_var).fit()
lm.summary()

0,1,2,3
Dep. Variable:,BRKA,R-squared:,0.154
Model:,OLS,Adj. R-squared:,0.15
Method:,Least Squares,F-statistic:,48.45
Date:,"Tue, 02 Apr 2024",Prob (F-statistic):,2.62e-11
Time:,10:02:16,Log-Likelihood:,388.47
No. Observations:,269,AIC:,-772.9
Df Residuals:,267,BIC:,-765.7
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.5402,0.078,6.961,0.000,0.387,0.693
Constant,0.0061,0.004,1.744,0.082,-0.001,0.013

0,1,2,3
Omnibus:,45.698,Durbin-Watson:,2.079
Prob(Omnibus):,0.0,Jarque-Bera (JB):,102.573
Skew:,0.825,Prob(JB):,5.33e-23
Kurtosis:,5.535,Cond. No.,22.2


### The Fama-French Benchmark Interpretation

The alpha has fallen from .61% to about 0.55% per month. The loading on the market has moved up from 0.54 to 0.67, which means that adding these new explanatory factors did change things. If we had added irrelevant variables, the loading on the market would be unaffected.

We can interpret the loadings on Value being positive as saying that Hathaway has a significant Value tilt. Additionally, the negative tilt on size suggests that Hathaway tends to invest in large companies, not small companies.

In other words, Hathaway appears to be a Large Value investor.

The new way to interpret each dollar invested in Hathaway is: 67 cents in the market, 33 cents in Bills, 38 cents in Value stocks and short 38 cents in Growth stocks, short 50 cents in SmallCap stocks and long 50 cents in LargeCap stocks.

In [18]:
import yfinance as yf

# Fetch data
brka_2018 = pd.DataFrame(yf.download("BRK-A", start="2018-01-01", end="2024-01-31")["Adj Close"].pct_change().dropna())
brka_2018

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,Adj Close
Date,Unnamed: 1_level_1
2018-01-03,0.014032
2018-01-04,0.002037
2018-01-05,0.003358
2018-01-08,0.008805
2018-01-09,0.001052
...,...
2024-01-24,0.012132
2024-01-25,0.009167
2024-01-26,0.010567
2024-01-29,-0.006011


In [20]:
brka_2018_m = brka_2018.resample('M').apply(erk.compound).to_period('M')
brka_2018_m.columns = ["Return"]
brka_2018_m

Unnamed: 0_level_0,Return
Date,Unnamed: 1_level_1
2018-01,0.093388
2018-02,-0.040588
2018-03,-0.035939
2018-04,-0.028251
2018-05,-0.011870
...,...
2023-09,-0.027890
2023-10,-0.025687
2023-11,0.054217
2023-12,-0.005999


In [37]:
# Download the Fama-French Research Factor Monthly Dataset from the source
ff_factors = pd.read_csv('https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip', 
                         skiprows=3,  # Skip the first 3 rows which contain information
                         nrows=1171,  # Read up to the specified row
                         index_col=0,  # Use the first column as index
                         parse_dates=True)  # Parse dates in the DataFrame

ff_factors.index = pd.to_datetime(ff_factors.index, format="%Y%m").to_period('M')
ff_factors = ff_factors / 100
ff_factors

  ff_factors = pd.read_csv('https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip',


Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07,0.0296,-0.0256,-0.0243,0.0022
1926-08,0.0264,-0.0117,0.0382,0.0025
1926-09,0.0036,-0.0140,0.0013,0.0023
1926-10,-0.0324,-0.0009,0.0070,0.0032
1926-11,0.0253,-0.0010,-0.0051,0.0031
...,...,...,...,...
2023-09,-0.0524,-0.0251,0.0152,0.0043
2023-10,-0.0319,-0.0387,0.0019,0.0047
2023-11,0.0884,-0.0002,0.0164,0.0044
2023-12,0.0485,0.0634,0.0493,0.0043


In [39]:
brka_2018_excess = brka_2018_m - ff_factors.loc["2018-01":"2024-01", ['RF']].values
brka_2018_excess

Unnamed: 0_level_0,Return
Date,Unnamed: 1_level_1
2018-01,0.092188
2018-02,-0.041688
2018-03,-0.037039
2018-04,-0.029651
2018-05,-0.013270
...,...
2023-09,-0.032190
2023-10,-0.030387
2023-11,0.049817
2023-12,-0.010299


In [41]:
exp_var_2018 = ff_factors.loc['2018-01':]
exp_var_2018.rename(columns={'HML': 'Value', 'SMB': 'Size'}, inplace=True)
exp_var_2018.drop(columns=['RF'], inplace=True)
exp_var_2018

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exp_var_2018.rename(columns={'HML': 'Value', 'SMB': 'Size'}, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exp_var_2018.drop(columns=['RF'], inplace=True)


Unnamed: 0,Mkt-RF,Size,Value
2018-01,0.0557,-0.0313,-0.0129
2018-02,-0.0365,0.0025,-0.0104
2018-03,-0.0235,0.0406,-0.0021
2018-04,0.0028,0.0113,0.0054
2018-05,0.0265,0.0525,-0.0320
...,...,...,...
2023-09,-0.0524,-0.0251,0.0152
2023-10,-0.0319,-0.0387,0.0019
2023-11,0.0884,-0.0002,0.0164
2023-12,0.0485,0.0634,0.0493


In [43]:
result = erk.regress(brka_2018_excess, exp_var_2018)
result.summary()

0,1,2,3
Dep. Variable:,Return,R-squared:,0.74
Model:,OLS,Adj. R-squared:,0.729
Method:,Least Squares,F-statistic:,65.61
Date:,"Tue, 02 Apr 2024",Prob (F-statistic):,3.58e-20
Time:,06:03:58,Log-Likelihood:,154.2
No. Observations:,73,AIC:,-300.4
Df Residuals:,69,BIC:,-291.2
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Mkt-RF,0.9230,0.070,13.247,0.000,0.784,1.062
Size,-0.7103,0.127,-5.580,0.000,-0.964,-0.456
Value,0.3274,0.082,4.011,0.000,0.165,0.490
Alpha,0.0008,0.004,0.236,0.814,-0.006,0.008

0,1,2,3
Omnibus:,0.148,Durbin-Watson:,1.835
Prob(Omnibus):,0.929,Jarque-Bera (JB):,0.34
Skew:,0.036,Prob(JB):,0.844
Kurtosis:,2.673,Cond. No.,36.9


# Comparison of style over time

1990-2012

	coef	std err	t	P>|t|	[0.025	0.975]
Mkt-RF	0.6761	0.074	9.155	0.000	0.531	0.821
Constant	0.0055	0.003	1.679	0.094	-0.001	0.012
Value	0.3814	0.109	3.508	0.001	0.167	0.595
Size	-0.5023	0.101	-4.962	0.000	-0.702	-0.303


2018-2024

	coef	std err	t	P>|t|	[0.025	0.975]
Mkt-RF	0.9230	0.070	13.247	0.000	0.784	1.062
Size	-0.7103	0.127	-5.580	0.000	-0.964	-0.456
Value	0.3274	0.082	4.011	0.000	0.165	0.490
Alpha	0.0008	0.004	0.236	0.814	-0.006	0.008
