The FAMA - French Five-Factor model.

In [10]:
import pandas_datareader.data as web
import statsmodels.api as stp
import pandas as pd

Use returns at a monthly frequency obtaining from period 2010-2017.

In [3]:
ff_factor = 'F-F_Research_Data_5_Factors_2x3'
ff_factor_data = web.DataReader(ff_factor, 'famafrench',
                                start = '2010', end = '2017-12')[0]
ff_factor_data.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 6 columns):
Mkt-RF    96 non-null float64
SMB       96 non-null float64
HML       96 non-null float64
RMW       96 non-null float64
CMA       96 non-null float64
RF        96 non-null float64
dtypes: float64(6)
memory usage: 5.2 KB


Use a panel of the 17 industry portfolios at a monthly frequency.

In [4]:
ff_portf = '17_Industry_Portfolios'
ff_portf_data = web.DataReader(ff_portf, 'famafrench', start = '2010',
                               end = '2017-12')[0]
ff_portf_data = ff_portf_data.sub(ff_factor_data.RF, axis = 0)
ff_factor_data = ff_factor_data.drop('RF', axis = 1)
ff_portf_data.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 17 columns):
Food     96 non-null float64
Mines    96 non-null float64
Oil      96 non-null float64
Clths    96 non-null float64
Durbl    96 non-null float64
Chems    96 non-null float64
Cnsum    96 non-null float64
Cnstr    96 non-null float64
Steel    96 non-null float64
FabPr    96 non-null float64
Machn    96 non-null float64
Cars     96 non-null float64
Trans    96 non-null float64
Utils    96 non-null float64
Rtail    96 non-null float64
Finan    96 non-null float64
Other    96 non-null float64
dtypes: float64(17)
memory usage: 13.5 KB


# Fama-Macbeth Regression

### The first stage: N time-series regression

In [12]:
betas = []
for industry in ff_portf_data:
    step1 = stp.OLS(endog=ff_portf_data[industry],
                    exog = stp.add_constant(ff_factor_data)).fit()
    betas.append(step1.params.drop('const'))
betas = pd.DataFrame(betas, columns = ff_factor_data.columns,
                     index = ff_portf_data.columns)
betas.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17 entries, Food  to Other
Data columns (total 5 columns):
Mkt-RF    17 non-null float64
SMB       17 non-null float64
HML       17 non-null float64
RMW       17 non-null float64
CMA       17 non-null float64
dtypes: float64(5)
memory usage: 1.4+ KB


### The second stage: T cross-sectional regression, one for each time period

In [13]:
lambdas = []
for period in ff_portf_data.index:
    step2 = stp.OLS(endog = ff_portf_data.loc[period, betas.index],
                    exog = betas).fit()
    lambdas.append(step2.params)
lambdas = pd.DataFrame(lambdas, index = ff_portf_data.index,
                       columns = betas.columns.tolist())
lambdas.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 5 columns):
Mkt-RF    96 non-null float64
SMB       96 non-null float64
HML       96 non-null float64
RMW       96 non-null float64
CMA       96 non-null float64
dtypes: float64(5)
memory usage: 7.0 KB


Then compute the average for the 96 periods to obtain our factor risk premium estimates:

In [14]:
lambdas.mean()

Mkt-RF    1.181043
SMB       0.112553
HML      -1.234931
RMW      -0.341728
CMA      -0.627899
dtype: float64

Then use the linear_models library.

# Use Linear Regression to Predict Returns

## Data Preparation

### Universe Creation and Time Horizon

Use equity data between 2014 and 2015 from a custom Q100US universe, select the 100 stocks with the highest average dollar volume of the last 200 trading days.

In [19]:
def Q100US():
    return filters.make_us_equity_universe(target_size = 100,
                                          rankby = factors.AverageDollarVolume(window_length = 200),
                                          mask = filters.default_us_equity_universe_mask(),
                                          groupby = classifiers.fundamentals.Sector(),
                                          max_group_weight = 0.3,
                                          smoothing_func = lambda f: f.downsample('month_start'))

### Target Return Computation

Test predictions for various lookahead periods to identify the best holding periods that generate the best predictability, measured by the information coefficient.

ModuleNotFoundError: No module named 'zipline'