# Project 2 - Fama & French three-factor model
Three factor model is an expanded version of the CAPM model. This is done by adding two additional factors explaining the excess return of an asset or portfolio. The new factors are:
+ the **market factor (MKT)**: measures the excess return of the market, analogical to the one computed in the CAMP analysis see [Project 1](https://github.com/aambroo/AI4Finance/tree/master/Project1).
+ the **size factor SMB** (**S**mall **M**inus **B**ig): measures the excess return of stocks with a small market cap over those with a large market cap.
+ the **value factor HML** (**H**igh **M**inus **L**ow): measures the excess return of value stocks over growth stocks. Value stocks have a high book-to-market ratio, while the growth stocks are characterized by a low ratio.
  
The model is represented by the following formula:
$$E(r_a) = r_f + \alpha + \beta_{mkt}(E(r_m)-r_f) + \beta_{sbm}SBM + \beta_{hml}HML$$
$$(E(r_a) - r_f) = \alpha + \beta_{mkt}(E(r_m)-r_f) + \beta_{sbm}SBM + \beta_{hml}HML$$

Where:
+ $E(r_a)$ denotes the expected return on the asset
+ $r_f$ is the risk-free rate (i.e. Government Bond)
+ $\alpha$ is the intercept:
+ + we really want to make sure that $\alpha = 0$, because this would confirm that the three factor model evaluates the relationship between the excess terurns and the factors correctly.

The following lines of code contain an implementation of the three-factor model.


In [1]:
# IMPORTS
import pandas as pd
import pandas_datareader as pdr
import statsmodels.formula.api as smf
import datetime as dt
import matplotlib.pyplot as plt
from utils import three_factor_model

## Single-Stock Portfolio

In [2]:
# DEFINE PARAMETERS
ASSET = 'AAPL'
BENCHMARK = '^GSPC'
START_DATE = '1992-01-01'
END_DATE = '2022-01-01'

In [3]:
# USE FAMA&FRENCH DATAFRAME
dateparse = lambda x: dt.datetime.strptime(x,'%Y%m')
factor_df = pd.read_csv('./data/F-F_Research_Data_Factors.CSV',
                        header=0,
                        names=['Date','Mkt-RF','SMB','HML','RF'],
                        parse_dates=['Date'], date_parser=dateparse,
                        index_col=0,
                        skipfooter=99,
                        skiprows=3,
                        engine='python')
# Rename columns
factor_df.columns = ['mkt', 'smb', 'hml', 'rf']
# Filter dataframe by START_DATE and END_DATE
factor_df = factor_df.loc[START_DATE:END_DATE]
# Convert values to numeric and divide by 100
factor_df = factor_df.apply(pd.to_numeric, errors='coerce').div(100)
factor_df

Unnamed: 0_level_0,mkt,smb,hml,rf
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1992-01-01,-0.0059,0.0846,0.0470,0.0034
1992-02-01,0.0109,0.0087,0.0647,0.0028
1992-03-01,-0.0266,-0.0104,0.0355,0.0034
1992-04-01,0.0107,-0.0606,0.0432,0.0032
1992-05-01,0.0030,0.0041,0.0119,0.0028
...,...,...,...,...
2021-09-01,-0.0437,0.0080,0.0509,0.0000
2021-10-01,0.0665,-0.0228,-0.0044,0.0000
2021-11-01,-0.0155,-0.0135,-0.0053,0.0000
2021-12-01,0.0310,-0.0157,0.0323,0.0001


In [4]:
# DOWNLOAD DATA FROM YAHOO FINANCE OF ASSET
stock_df = pdr.get_data_yahoo(ASSET, START_DATE, END_DATE, interval='m')
stock_df.tail()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-09-01,157.259995,141.270004,152.830002,141.5,1797835000.0,141.113998
2021-10-01,153.169998,138.270004,141.899994,149.800003,1565079000.0,149.391357
2021-11-01,165.699997,147.479996,148.990005,165.300003,1691029000.0,164.849091
2021-12-01,182.130005,157.800003,167.479996,177.570007,2444767000.0,177.344055
2022-01-01,182.940002,154.699997,177.830002,174.779999,2108446000.0,174.557602


In [5]:
# CALCULATE MONTHLY RETURNS
monthly_rets = stock_df['Adj Close'].pct_change().dropna()
monthly_rets.name = 'rtn'
monthly_rets

Date
1992-02-01    0.042471
1992-03-01   -0.135423
1992-04-01    0.032189
1992-05-01   -0.006237
1992-06-01   -0.196653
                ...   
2021-09-01   -0.066640
2021-10-01    0.058657
2021-11-01    0.103471
2021-12-01    0.075796
2022-01-01   -0.015712
Name: rtn, Length: 360, dtype: float64

In [6]:
# MERGE DATASETS AND CALCULATE EXCESS RETURN
# (Excess Return) = (Return on Asset) - (Risk-free Return)
ff_data = factor_df.join(monthly_rets, on=factor_df.index).dropna()
#ff_data.columns = ['mkt', 'smb', 'hml', 'rf', 'rtn']
ff_data['excess_rtn'] = ff_data.rtn - ff_data.rf    # Excess Return
ff_data

Unnamed: 0_level_0,mkt,smb,hml,rf,rtn,excess_rtn
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1992-02-01,0.0109,0.0087,0.0647,0.0028,0.042471,0.039671
1992-03-01,-0.0266,-0.0104,0.0355,0.0034,-0.135423,-0.138823
1992-04-01,0.0107,-0.0606,0.0432,0.0032,0.032189,0.028989
1992-05-01,0.0030,0.0041,0.0119,0.0028,-0.006237,-0.009037
1992-06-01,-0.0234,-0.0307,0.0325,0.0032,-0.196653,-0.199853
...,...,...,...,...,...,...
2021-09-01,-0.0437,0.0080,0.0509,0.0000,-0.066640,-0.066640
2021-10-01,0.0665,-0.0228,-0.0044,0.0000,0.058657,0.058657
2021-11-01,-0.0155,-0.0135,-0.0053,0.0000,0.103471,0.103471
2021-12-01,0.0310,-0.0157,0.0323,0.0001,0.075796,0.075696


Let's slice the dataset into the following timeframes:
+ $30$ Y
+ $20$ Y
+ $15$ Y
+ $10$ Y
+ $5$ Y

In [7]:
# DATAFRAME SLICING
ff_data_30y = ff_data.loc[ff_data.index >= '1992-01-01']
ff_data_20y = ff_data.loc[ff_data.index >= '2002-01-01']
ff_data_10y = ff_data.loc[ff_data.index >= '2012-01-01']
ff_data_5y = ff_data.loc[ff_data.index >= '2017-01-01']
datasets = [ff_data_30y, ff_data_20y, ff_data_10y, ff_data_5y]

In [8]:
# ESTIMATE THE THREE-FACTOR MODEL
ff_model = smf.ols(
    formula='excess_rtn ~ mkt + smb + hml',
    data=ff_data_5y).fit()
ff_model.summary()

0,1,2,3
Dep. Variable:,excess_rtn,R-squared:,0.475
Model:,OLS,Adj. R-squared:,0.448
Method:,Least Squares,F-statistic:,17.22
Date:,"Sun, 03 Apr 2022",Prob (F-statistic):,4.43e-08
Time:,20:25:06,Log-Likelihood:,84.478
No. Observations:,61,AIC:,-161.0
Df Residuals:,57,BIC:,-152.5
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0140,0.008,1.643,0.106,-0.003,0.031
mkt,1.2512,0.183,6.850,0.000,0.885,1.617
smb,-0.4828,0.312,-1.545,0.128,-1.108,0.143
hml,-0.5206,0.203,-2.563,0.013,-0.927,-0.114

0,1,2,3
Omnibus:,7.408,Durbin-Watson:,2.212
Prob(Omnibus):,0.025,Jarque-Bera (JB):,8.304
Skew:,-0.494,Prob(JB):,0.0157
Kurtosis:,4.513,Cond. No.,40.0


In [10]:
# alpha
alpha = three_factor_model(datasets)[['interval', 'alpha', 't-alpha', 'p-alpha']]
alpha

Unnamed: 0,interval,alpha,t-alpha,p-alpha
0,30Y,0.0151,2.6561,0.0083
1,20Y,0.0203,3.9138,0.0001
2,10Y,0.0078,1.324,0.1881
3,5Y,0.014,1.6434,0.1058


In [11]:
# mkt
mkt = three_factor_model(datasets)[['interval', 'mkt', 't-mkt', 'p-mkt']]
mkt

Unnamed: 0,interval,mkt,t-mkt,p-mkt
0,30Y,1.1763,8.7915,0.0
1,20Y,1.2752,10.2043,0.0
2,10Y,1.2796,8.6331,0.0
3,5Y,1.2512,6.8502,0.0


In [12]:
# smb
smb = three_factor_model(datasets)[['interval', 'smb', 't-smb', 'p-mkt']]
smb

Unnamed: 0,interval,smb,t-smb,p-mkt
0,30Y,0.1568,0.8448,0.0
1,20Y,-0.1191,-0.5458,0.0
2,10Y,-0.5287,-2.243,0.0
3,5Y,-0.4828,-1.5454,0.0


In [13]:
# hml
hml = three_factor_model(datasets)[['interval', 'hml', 't-hml', 'p-hml']]
hml

Unnamed: 0,interval,hml,t-hml,p-hml
0,30Y,-0.8297,-4.6838,0.0
1,20Y,-0.6009,-3.3893,0.0008
2,10Y,-0.5861,-3.3811,0.001
3,5Y,-0.5206,-2.5627,0.0131


In [14]:
# # PLOTTING THE RESULTS
# plot_summary = three_factor_model(datasets)[['Intercept', 'mkt', 'smb', 'hml']]
# plot_summary.plot(
#     title = 'Fama and French Three Factor Model',
#     )
# plt.show()

## Commenting on the Results
We should pay attention to two issues mainly:
+ whether the intercept is positive and statistically significant
+ which factors are statistically significant and if their direction matches the past results or our assumptions.

In this case the intercept is positive ($0.0140$), but not statistically significant at the $5$% significance level.



<!-- I drew a table featuring each parameter's value and t-value (to assess its statistical significance): -->
<!-- <table>
  <tr>
    <th>Iterval</th>
    <th>Alpha</th> 
    <th>t-alpha</th>
  </tr>
  <tr>
    <td>30Y</td>
    <td>0.015121</td> 
    <td>2.656052</td>
  </tr>
  <tr>
    <td>20Y</td>
    <td>0.020302</td> 
    <td>3.913835</td>
  </tr>
  <tr>
    <td>10Y</td>
    <td>0.007810</td> 
    <td>1.323986</td>
  </tr>
  <tr>
    <td>5Y</td>
    <td>0.013967</td> 
    <td>1.643435</td>
  </tr>
</table> -->

<!-- | Interval  | Alpha       | t-alpha     |
| :---:     | :---:       | :---:       |
| 30Y       | $0.015121$  | $2.656052$  |
| 20Y       | $0.020302$  | $3.913835$  |
| 10Y       | $0.007810$  | $1.323986$  |
| 5Y        | $0.013967$  | $1.643435$  |


| Interval  | Market      | t-market    |
| :---:     | :---:       | :---:       |
| 30Y       | $1.176278$  | $8.791539$  |
| 20Y       | $1.275240$  | $10.204346$ |
| 10Y       | $1.279572$  | $8.633122$  |
| 5Y        | $1.251181$  | $6.850182$  | -->

## Multiple-Stock Portfolio

In [15]:
ASSETS = ['AAPL', 'MSFT', 'NKE', 'IBM', 'AMD', 'HD', 'BA', 'DIS', 'MO', 'PFE']
START_DATE = dt.datetime(1992,1,1)
END_DATE = dt.datetime(2022,1,1)

In [16]:
# DOWNLOAD DATA FROM YAHOO FINANCE OF PORTFOLIO
portfolio_df = pdr.get_data_yahoo(ASSETS, START_DATE, END_DATE, interval='m')
portfolio_df.tail()

Attributes,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Symbols,AAPL,MSFT,NKE,IBM,AMD,HD,BA,DIS,MO,PFE,...,AAPL,MSFT,NKE,IBM,AMD,HD,BA,DIS,MO,PFE
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2021-09-01,141.113998,280.824524,144.639389,129.5112,102.900002,323.303223,219.940002,169.169998,43.118046,42.31403,...,1797835000.0,502918700.0,171235500.0,80160627.0,867104200.0,67432900.0,194135500.0,183753100.0,157267400.0,544666400.0
2021-10-01,149.391357,330.33136,166.60968,116.618813,120.230003,367.988159,207.029999,169.070007,42.54707,43.032223,...,1565079000.0,516515800.0,132971400.0,150725359.0,930236100.0,59686400.0,171554300.0,181076800.0,157527000.0,471555700.0
2021-11-01,164.849091,329.305389,168.551758,114.182579,158.369995,396.566772,197.850006,144.899994,41.129154,52.860565,...,1691029000.0,509885200.0,117595600.0,120104799.0,1373609000.0,76047100.0,222628900.0,349411000.0,151096900.0,1010246000.0
2021-12-01,177.344055,335.626038,165.992203,132.069168,143.899994,410.821472,201.320007,154.889999,45.71085,58.60442,...,2444767000.0,625674800.0,123481200.0,113968900.0,1175494000.0,84890300.0,212678500.0,250556100.0,187391300.0,1064029000.0
2022-01-01,174.557602,310.338318,147.732895,131.98024,114.25,364.778625,200.240005,142.970001,50.019573,52.292412,...,2108446000.0,947531400.0,131502000.0,146976800.0,1638613000.0,101082900.0,219087100.0,269830300.0,208636800.0,778212000.0


In [17]:
# CALCULATE MONTHLY RETURNS
portfolio_monthly_rets = portfolio_df['Adj Close'].pct_change().mean(axis=1).dropna()
portfolio_monthly_rets.name = 'rtn'
portfolio_monthly_rets

Date
1992-02-01    0.009603
1992-03-01   -0.043091
1992-04-01   -0.002011
1992-05-01    0.011113
1992-06-01   -0.085974
                ...   
2021-09-01   -0.053541
2021-10-01    0.053837
2021-11-01    0.049379
2021-12-01    0.048757
2022-01-01   -0.061562
Name: rtn, Length: 360, dtype: float64

In [18]:
# MERGE DATASETS AND CALCULATE EXCESS RETURN
# (Excess Return) = (Return on Asset) - (Risk-free Return)
ff_data = factor_df.join(portfolio_monthly_rets, on=factor_df.index).dropna()
#ff_data.columns = ['mkt', 'smb', 'hml', 'rf', 'rtn']
ff_data['excess_rtn'] = ff_data.rtn - ff_data.rf    # Excess Return
ff_data

Unnamed: 0_level_0,mkt,smb,hml,rf,rtn,excess_rtn
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1992-02-01,0.0109,0.0087,0.0647,0.0028,0.009603,0.006803
1992-03-01,-0.0266,-0.0104,0.0355,0.0034,-0.043091,-0.046491
1992-04-01,0.0107,-0.0606,0.0432,0.0032,-0.002011,-0.005211
1992-05-01,0.0030,0.0041,0.0119,0.0028,0.011113,0.008313
1992-06-01,-0.0234,-0.0307,0.0325,0.0032,-0.085974,-0.089174
...,...,...,...,...,...,...
2021-09-01,-0.0437,0.0080,0.0509,0.0000,-0.053541,-0.053541
2021-10-01,0.0665,-0.0228,-0.0044,0.0000,0.053837,0.053837
2021-11-01,-0.0155,-0.0135,-0.0053,0.0000,0.049379,0.049379
2021-12-01,0.0310,-0.0157,0.0323,0.0001,0.048757,0.048657


In [19]:
# PORTFOLIO DATAFRAME SLICING
ff_data_30y = ff_data.loc[ff_data.index >= '1992-01-01']
ff_data_20y = ff_data.loc[ff_data.index >= '2002-01-01']
ff_data_10y = ff_data.loc[ff_data.index >= '2012-01-01']
ff_data_5y = ff_data.loc[ff_data.index >= '2017-01-01']
pf_datasets = [ff_data_30y, ff_data_20y, ff_data_10y, ff_data_5y]

In [20]:
pf_summary = three_factor_model(pf_datasets)

In [21]:
# alpha
pf_alpha = three_factor_model(pf_datasets)[['interval', 'alpha', 't-alpha', 'p-alpha']]
pf_alpha

Unnamed: 0,interval,alpha,t-alpha,p-alpha
0,30Y,0.0061,3.7218,0.0002
1,20Y,0.0057,3.415,0.0008
2,10Y,0.0036,1.916,0.0578
3,5Y,0.0039,1.4042,0.1657


In [22]:
# mkt
pf_mkt = three_factor_model(pf_datasets)[['interval', 'mkt', 't-mkt', 'p-mkt']]
pf_mkt

Unnamed: 0,interval,mkt,t-mkt,p-mkt
0,30Y,1.0872,28.241,0.0
1,20Y,1.1354,28.3607,0.0
2,10Y,1.1116,23.2247,0.0
3,5Y,1.1046,18.4772,0.0


In [23]:
# smb
pf_smb = three_factor_model(pf_datasets)[['interval', 'smb', 't-smb', 'p-mkt']]
pf_smb

Unnamed: 0,interval,smb,t-smb,p-mkt
0,30Y,-0.1745,-3.2674,0.0
1,20Y,-0.1657,-2.3716,0.0
2,10Y,-0.2514,-3.3025,0.0
3,5Y,-0.2286,-2.2353,0.0


In [24]:
# hml
pf_hml = three_factor_model(pf_datasets)[['interval', 'hml', 't-hml', 'p-hml']]
pf_hml

Unnamed: 0,interval,hml,t-hml,p-hml
0,30Y,-0.0911,-1.7881,0.0746
1,20Y,-0.0999,-1.7585,0.08
2,10Y,-0.1098,-1.9623,0.0521
3,5Y,-0.1034,-1.5544,0.1256


In [25]:
print('Single Stock:\n {}'.format(alpha))
print('Portfolio: \n{}'.format(pf_alpha))

Single Stock:
   interval   alpha  t-alpha  p-alpha
0      30Y  0.0151   2.6561   0.0083
1      20Y  0.0203   3.9138   0.0001
2      10Y  0.0078   1.3240   0.1881
3       5Y  0.0140   1.6434   0.1058
Portfolio: 
  interval   alpha  t-alpha  p-alpha
0      30Y  0.0061   3.7218   0.0002
1      20Y  0.0057   3.4150   0.0008
2      10Y  0.0036   1.9160   0.0578
3       5Y  0.0039   1.4042   0.1657


In [26]:
print('Single Stock:\n {}'.format(mkt))
print('Portfolio: \n{}'.format(pf_mkt))

Single Stock:
   interval     mkt    t-mkt  p-mkt
0      30Y  1.1763   8.7915    0.0
1      20Y  1.2752  10.2043    0.0
2      10Y  1.2796   8.6331    0.0
3       5Y  1.2512   6.8502    0.0
Portfolio: 
  interval     mkt    t-mkt  p-mkt
0      30Y  1.0872  28.2410    0.0
1      20Y  1.1354  28.3607    0.0
2      10Y  1.1116  23.2247    0.0
3       5Y  1.1046  18.4772    0.0


In [27]:
print('Single Stock:\n {}'.format(smb))
print('Portfolio: \n{}'.format(pf_smb))

Single Stock:
   interval     smb   t-smb  p-mkt
0      30Y  0.1568  0.8448    0.0
1      20Y -0.1191 -0.5458    0.0
2      10Y -0.5287 -2.2430    0.0
3       5Y -0.4828 -1.5454    0.0
Portfolio: 
  interval     smb   t-smb  p-mkt
0      30Y -0.1745 -3.2674    0.0
1      20Y -0.1657 -2.3716    0.0
2      10Y -0.2514 -3.3025    0.0
3       5Y -0.2286 -2.2353    0.0


In [28]:
print('Single Stock:\n {}'.format(hml))
print('Portfolio: \n{}'.format(pf_hml))

Single Stock:
   interval     hml   t-hml   p-hml
0      30Y -0.8297 -4.6838  0.0000
1      20Y -0.6009 -3.3893  0.0008
2      10Y -0.5861 -3.3811  0.0010
3       5Y -0.5206 -2.5627  0.0131
Portfolio: 
  interval     hml   t-hml   p-hml
0      30Y -0.0911 -1.7881  0.0746
1      20Y -0.0999 -1.7585  0.0800
2      10Y -0.1098 -1.9623  0.0521
3       5Y -0.1034 -1.5544  0.1256


## Commenting on the Results
We should pay attention to two issues mainly:
+ whether the intercept is positive and statistically significant
+ which factors are statistically significant and if their direction matches the past results or our assumptions.

In this case the intercept is positive ($0.0140$), but not statistically significant at the $5$% significance level.

In [29]:
pf_summary = three_factor_model(pf_datasets)
pf_summary

Unnamed: 0,interval,alpha,t-alpha,p-alpha,mkt,t-mkt,p-mkt,smb,t-smb,p-smb,hml,t-hml,p-hml
0,30Y,0.0061,3.7218,0.0002,1.0872,28.241,0.0,-0.1745,-3.2674,0.0012,-0.0911,-1.7881,0.0746
1,20Y,0.0057,3.415,0.0008,1.1354,28.3607,0.0,-0.1657,-2.3716,0.0185,-0.0999,-1.7585,0.08
2,10Y,0.0036,1.916,0.0578,1.1116,23.2247,0.0,-0.2514,-3.3025,0.0013,-0.1098,-1.9623,0.0521
3,5Y,0.0039,1.4042,0.1657,1.1046,18.4772,0.0,-0.2286,-2.2353,0.0293,-0.1034,-1.5544,0.1256
