# Factor Analysis using the CAPM and Fama-French Factor models

The main idea in Factor is to take a set of observed returns and decompose it into a set of explanory returns

We'll follow $Asset$ $Management$ (Ang 2014, Oxford University Press) Chapter 10 and analyze the return of Berkshire Hathaway

First, we'll need the returns of Berkshire Hathaway which is contained in data/brka_d_rets.csdv Read it in as follows:

In [1]:
import pandas as pd

In [2]:
brka_d = pd.read_csv("data/brka_d_ret.csv", parse_dates=True, index_col=0)
brka_d.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01-02,-0.005764
1990-01-03,0.0
1990-01-04,0.005797
1990-01-05,-0.005764
1990-01-08,0.0


In [3]:
brka_d.tail()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
2018-12-24,-0.018611
2018-12-26,0.0432
2018-12-27,0.012379
2018-12-28,0.013735
2018-12-31,0.011236


In [4]:
# Test compounding method
# 1 - Mine
(brka_d["1990-01"]+1).prod()-1

  (brka_d["1990-01"]+1).prod()-1


BRKA   -0.140634
dtype: float64

In [5]:
# 2 - from prof with efficiency 
import numpy as np
np.expm1(np.log1p(brka_d["1990-01"]).sum())

BRKA   -0.140634
dtype: float64

In [6]:
# 3 - both yield the same results

Next, we need to convert these to monthly returns. The simplest way to do so is by using .resample method, which allows you to run an aggregation function on each group of return in a time series. We'll give it the gruping rule of 'M' which means monthlu (onsult the pandas) documentation for other codes.

We want to compound the returns, and we already have the compound function in our toolkit so let(s load that ump now, and then apply it to the daily returns

In [7]:
import edhec_risk_kit_endOf_Course1_copie as erk

%load_ext autoreload
%autoreload 2

brka_m = brka_d.resample('M').apply(erk.compound).to_period('M')
brka_m

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.140634
1990-02,-0.030852
1990-03,-0.069204
1990-04,-0.003717
1990-05,0.067164
...,...
2018-08,0.047256
2018-09,0.013300
2018-10,-0.038422
2018-11,0.059456


In [8]:
import edhec_risk_kit_endOf_Course2 as erk
test_brka_m = brka_d.resample('M').apply(erk.compound).to_period('M')
test_brka_m

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.140634
1990-02,-0.030852
1990-03,-0.069204
1990-04,-0.003717
1990-05,0.067164
...,...
2018-08,0.047256
2018-09,0.013300
2018-10,-0.038422
2018-11,0.059456


In [9]:
brka_m.to_csv("brka_m.csv") # save in excel for future use !

Next, we need to load the explanatory variables, which is the Fama-French monthly returns data set. Load that as follows.

In [10]:
fff = pd.read_csv('data/F-F_Research_Data_Factors_m.csv', parse_dates=True, index_col=0)
fff.index = pd.to_datetime(fff.index, format="%Y%m")
fff.head()

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07-01,2.96,-2.3,-2.87,0.22
1926-08-01,2.64,-1.4,4.19,0.25
1926-09-01,0.36,-1.32,0.01,0.23
1926-10-01,-3.24,0.04,0.51,0.32
1926-11-01,2.53,-0.2,-0.35,0.31


In [11]:
# my function 
def get_fff_returns():
    """
    get the fama french factor and returns a data with the index as date in proper formating
    """
    fff = pd.read_csv('data/F-F_Research_Data_Factors_m.csv', parse_dates=True, index_col=0)
    fff.index = pd.to_datetime(fff.index, format="%Y%m")
    return fff
fff_v2 = get_fff_returns()
fff_v2.head()

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07-01,2.96,-2.3,-2.87,0.22
1926-08-01,2.64,-1.4,4.19,0.25
1926-09-01,0.36,-1.32,0.01,0.23
1926-10-01,-3.24,0.04,0.51,0.32
1926-11-01,2.53,-0.2,-0.35,0.31


In [12]:
test_fff = erk.get_fff_returns_mine()
test_fff.head()

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07-01,2.96,-2.3,-2.87,0.22
1926-08-01,2.64,-1.4,4.19,0.25
1926-09-01,0.36,-1.32,0.01,0.23
1926-10-01,-3.24,0.04,0.51,0.32
1926-11-01,2.53,-0.2,-0.35,0.31


In [13]:
fff = erk.get_fff_returns()
fff.head()

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07,0.0296,-0.023,-0.0287,0.0022
1926-08,0.0264,-0.014,0.0419,0.0025
1926-09,0.0036,-0.0132,0.0001,0.0023
1926-10,-0.0324,0.0004,0.0051,0.0032
1926-11,0.0253,-0.002,-0.0035,0.0031


Next, we need to decompose the observed BKRA 1990-May 2012 as in Ang(2014) into the portion that's due to the market and the rest that is not due to the market, using the CAPM as the explanatory model

i.e : 

$$ R_{brka,t} - R_{f,t} = \alpha + \beta (R_{mkt,t} - R_{f,t}) + \epsilon_{t} $$

We can use the stats.api for the linear regression as follows


In [5]:
import statsmodels.api as sm

In [11]:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

AttributeError: module 'statsmodels' has no attribute 'tools'