<a href="https://colab.research.google.com/github/AjmalSarwary/invest_ml/blob/master/code/FactorAnalysisFFF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Factor Analysis using the CAPM and Fama-French Factor models
The main idea in Factor Analysis is to take a set of observed returns and decompose it into a set of explanatory returns.

This notebook will follow Asset Management (Ang 2014, Oxford University Press) Chapter 10 and analyze the returns of Berkshire Hathaway.

First, the daily returns of Berkshire Hathaway are needed:

In [1]:
!git clone https://github.com/ajmalsarwary/invest_ml.git
%cd /content/invest_ml
import sys
sys.path.append('/content/invest_ml/code')

Cloning into 'invest_ml'...
remote: Enumerating objects: 292, done.[K
remote: Counting objects: 100% (87/87), done.[K
remote: Compressing objects: 100% (63/63), done.[K
remote: Total 292 (delta 34), reused 57 (delta 24), pack-reused 205[K
Receiving objects: 100% (292/292), 4.14 MiB | 21.00 MiB/s, done.
Resolving deltas: 100% (101/101), done.
/content/invest_ml


In [2]:
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import invest_risk_kit as rk
brka_d = pd.read_csv("data/brka_d_ret.csv", parse_dates=True, index_col=0)
brka_d.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01-02,-0.005764
1990-01-03,0.0
1990-01-04,0.005797
1990-01-05,-0.005764
1990-01-08,0.0


In [3]:
brka_d.tail()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
2018-12-24,-0.018611
2018-12-26,0.0432
2018-12-27,0.012379
2018-12-28,0.013735
2018-12-31,0.011236


Next, these need to be converted to monthly returns. The simplest way to do so is by using the .resample method, which allows you to run an aggregation function on each group of returns in a time series. It will receive the grouping rule of 'M'.

Goal is to compound the returns, and there already is a compound function in the toolkit, so let's load that up now, and then apply it to the daily returns.

In [4]:
brka_m = brka_d.resample('M').apply(rk.compound).to_period('M')
brka_m.head()

Unnamed: 0_level_0,BRKA
DATE,Unnamed: 1_level_1
1990-01,-0.140634
1990-02,-0.030852
1990-03,-0.069204
1990-04,-0.003717
1990-05,0.067164


In [5]:
brka_m.to_csv("brka_m.csv")


Next, we need to load the explanatory variables, which is the Fama-French monthly returns data set.

In [6]:
fff = rk.get_fff_returns()
fff.head()

Unnamed: 0,Mkt-RF,SMB,HML,RF
1926-07,0.0296,-0.023,-0.0287,0.0022
1926-08,0.0264,-0.014,0.0419,0.0025
1926-09,0.0036,-0.0132,0.0001,0.0023
1926-10,-0.0324,0.0004,0.0051,0.0032
1926-11,0.0253,-0.002,-0.0035,0.0031


Next, we need to decompose the observed BRKA 1990-May 2012 as in Ang(2014) into the portion that's due to the market and the rest that is not due to the market, using the CAPM as the explanatory model.

$$ R_{brka,t} - R_{f,t} = \alpha + \beta(R_{mkt,t} - R_{f,t}) +\epsilon_t $$





We can use the `stats.api` for the linear regression as follows:

In [7]:
import statsmodels.api as sm

brka_excess = brka_m["1990":"2012-05"] - fff.loc["1990":"2012-05", ['RF']].values
mkt_excess = fff.loc["1990":"2012-05",['Mkt-RF']]
exp_var = mkt_excess.copy()

# OLS aims to find the linear relationship between the dependent variable (in this case,
# brka_excess, which represents the excess returns of Berkshire Hathaway over the
# risk-free rate) and one or more independent variables (exp_var, which in this scenario
# includes the market excess returns and a constant term).

# brka_excess: These are the dependent variable returns from Berkshire Hathaway,
# adjusted for the risk-free rate, implying these are the excess returns over what
# could be earned without taking any risk.
# exp_var: The explanatory variables, including the market excess returns (Mkt-RF)
# and a constant term. The constant term is included to account for the y-intercept
# in the linear regression equation.

exp_var["Constant"] = 1
lm = sm.OLS(brka_excess, exp_var).fit()

In [8]:
print(lm.summary())

                            OLS Regression Results                            
Dep. Variable:                   BRKA   R-squared:                       0.154
Model:                            OLS   Adj. R-squared:                  0.150
Method:                 Least Squares   F-statistic:                     48.45
Date:                Tue, 13 Feb 2024   Prob (F-statistic):           2.62e-11
Time:                        23:33:11   Log-Likelihood:                 388.47
No. Observations:                 269   AIC:                            -772.9
Df Residuals:                     267   BIC:                            -765.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Mkt-RF         0.5402      0.078      6.961      0.0

###The CAPM benchmark interpretation
This implies that the CAPM benchmark consists of 46 cents in T-Bills and 54 cents in the market. i.e. each dollar in the Berkshire Hathaway portfolio is equivalent to 46 cents in T-Bills and 54 cents in the market. Relative to this, the Berkshire Hathaway is adding (i.e. has
 of) 0.61% (per month!) although the degree of statistica significance is not very high.

Now, let's add in some additional explanatory variables, namely Value and Size.

In [9]:
exp_var["Value"] = fff.loc["1990":"2012-05",['HML']]
exp_var["Size"] = fff.loc["1990":"2012-05",['SMB']]
exp_var.head()

Unnamed: 0,Mkt-RF,Constant,Value,Size
1990-01,-0.0785,1,0.0087,-0.0129
1990-02,0.0111,1,0.0061,0.0103
1990-03,0.0183,1,-0.029,0.0152
1990-04,-0.0336,1,-0.0255,-0.005
1990-05,0.0842,1,-0.0374,-0.0257


In [10]:
lm = sm.OLS(brka_excess, exp_var).fit()
print(lm.summary())

                            OLS Regression Results                            
Dep. Variable:                   BRKA   R-squared:                       0.290
Model:                            OLS   Adj. R-squared:                  0.282
Method:                 Least Squares   F-statistic:                     36.06
Date:                Tue, 13 Feb 2024   Prob (F-statistic):           1.41e-19
Time:                        23:33:12   Log-Likelihood:                 412.09
No. Observations:                 269   AIC:                            -816.2
Df Residuals:                     265   BIC:                            -801.8
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Mkt-RF         0.6761      0.074      9.155      0.0

### The Fama-French Benchmark Interpretation

    The alpha has fallen from .61% to about 0.55% per month. The loading on the market has moved up from 0.54 to 0.67, which means that adding these new explanatory factors did change things. If we had added irrelevant variables, the loading on the market would be unaffected.
    
    We can interpret the loadings on Value being positive as saying that Hathaway has a significant Value tilt - which should not be a shock to anyone that follows Buffet. Additionally, the negative tilt on size suggests that Hathaway tends to invest in large companies, not small companies.
  
    In other words, Hathaway appears to be a Large Value investor. Of course, you knew this if you followed the company, but the point here is that numbers reveal it!
   
    The new way to interpret each dollar invested in Hathaway is: 67 cents in the market, 33 cents in Bills, 38 cents in Value stocks and short 38 cents in Growth stocks, short 50 cents in SmallCap stocks and long 50 cents in LargeCap stocks. If you did all this, you would still end up underperforming Hathaway by about 55 basis points per month.
   
   Code for the toolkit:
    
  ```python
    import statsmodels.api as sm
    def regress(dependent_variable, explanatory_variables, alpha=True):
        '''
        Runs a linear regression to decompose the dependent variable into the explanatory variables
        returns an object of type statsmodel's RegressionResults on which you can call
           .summary() to print a full summary
           .params for the coefficients
           .tvalues and .pvalues for the significance levels
           .rsquared_adj and .rsquared for quality of fit
       '''
        if alpha:
            explanatory_variables = explanatory_variables.copy()
            explanatory_variables["Alpha"] = 1
      
        lm = sm.OLS(dependent_variable, explanatory_variables).fit()
       return lm
  ```

In [11]:
result = rk.regress(brka_excess, mkt_excess)

In [12]:
result.params

Mkt-RF    0.540175
Alpha     0.006133
dtype: float64

In [13]:
result.tvalues

Mkt-RF    6.960550
Alpha     1.744449
dtype: float64

In [14]:
result.pvalues

Mkt-RF    2.622873e-11
Alpha     8.223148e-02
dtype: float64

In [15]:
result.rsquared_adj

0.15041804337083975

In [16]:
exp_var.head()

Unnamed: 0,Mkt-RF,Constant,Value,Size
1990-01,-0.0785,1,0.0087,-0.0129
1990-02,0.0111,1,0.0061,0.0103
1990-03,0.0183,1,-0.029,0.0152
1990-04,-0.0336,1,-0.0255,-0.005
1990-05,0.0842,1,-0.0374,-0.0257


In [17]:
print(rk.regress(brka_excess, exp_var, alpha=False).summary())

                            OLS Regression Results                            
Dep. Variable:                   BRKA   R-squared:                       0.290
Model:                            OLS   Adj. R-squared:                  0.282
Method:                 Least Squares   F-statistic:                     36.06
Date:                Tue, 13 Feb 2024   Prob (F-statistic):           1.41e-19
Time:                        23:34:08   Log-Likelihood:                 412.09
No. Observations:                 269   AIC:                            -816.2
Df Residuals:                     265   BIC:                            -801.8
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Mkt-RF         0.6761      0.074      9.155      0.0