Replicate [Dynamic Return Dependencies Across Industries: A Machine Learning Approach](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3120110&download=yes) by David Rapach, Jack Strauss, Jun Tu and Guofu Zhou.

1) Use industry returns from [Ken French](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html)

2) Forecast (for example) this month's Chemical industry return using last month's returns from all 30 industries 

3) Use LASSO for predictor subset selection over the entire 1960-2016 period to determine that e.g. Beer is predicted by Food, Clothing, Coal

4) Use those predictors and simple linear regression to predict returns

5) Generate portfolios and run backtests.

- Predictor selection - finds same predictors except 2 industries. Possibly use of AICc instead of AIC (don't see an sklearn implementation that uses AICc)

- Prediction by industry - R-squareds line up pretty closely

- Portfolio performance, similar ballpark results. Since prediction is similar but return profile is different, must be some difference in portfolio construction. (am taking equal weight top 6 predicted as long and bottom 6 as short, every month)

- For some reason their mean returns don't line up to geometric mean annualized, they seem to be calculating something different.

- But it does replicate closely and perform pretty well

In [40]:
# run MLP with and without scaling, see if you get better prediction

import os
import sys
import warnings
import numpy as np
import pandas as pd
import time 
import copy
import random
from itertools import product

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' #Hide messy TensorFlow warnings
warnings.filterwarnings("ignore") #Hide messy numpy warnings

from sklearn.model_selection import train_test_split, KFold
from sklearn.metrics import mean_squared_error, explained_variance_score, r2_score
from sklearn.linear_model import LinearRegression, Lasso, lasso_path, lars_path, LassoLarsIC
from sklearn.ensemble.forest import RandomForestRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import MinMaxScaler

import ffn
%matplotlib inline

import plotly as py
# print (py.__version__) # requires version >= 1.9.0
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.graph_objs import *
import plotly.figure_factory as ff

init_notebook_mode(connected=True)

random.seed(1764)
np.random.seed(1764)


In [41]:
print("Loading data...")
data = pd.read_csv("30_Industry_Portfolios.csv")
data = data.set_index('yyyymm')
industries = list(data.columns)
# map industry names to col nums
ind_reverse_dict = dict([(industries[i], i) for i in range(len(industries))])

rfdata = pd.read_csv("F-F_Research_Data_Factors.csv")
rfdata = rfdata.set_index('yyyymm')
data['rf'] = rfdata['RF']

# subtract risk-free rate
# create a response variable led by 1 period to predict
for ind in industries:
    data[ind] = data[ind] - data['rf']

#for ind in industries:
#    data[ind+".3m"] = pd.rolling_mean(data[ind],3)
    
#for ind in industries:
#    data[ind+".6m"] = pd.rolling_mean(data[ind],6)

#for ind in industries:
#    data[ind+".12m"] = pd.rolling_mean(data[ind],12)
    
for ind in industries:
    data[ind+".lead"] = data[ind].shift(-1)

data = data.loc[data.index[data.index > 195911]]
data = data.drop(columns=['rf'])    
data = data.dropna(axis=0, how='any')

nresponses = len(industries)
npredictors = data.shape[1]-nresponses

predictors = list(data.columns[:npredictors])
predictor_reverse_dict = dict([(predictors[i], i) for i in range(len(predictors))])

responses = list(data.columns[-nresponses:])
response_reverse_dict = dict([(responses[i], i) for i in range(len(responses))])

print(data.shape)

data[['Food', 'Food.lead']]


Loading data...
(697, 60)


Unnamed: 0_level_0,Food,Food.lead
yyyymm,Unnamed: 1_level_1,Unnamed: 2_level_1
195912,2.01,-4.49
196001,-4.49,3.35
196002,3.35,-1.67
196003,-1.67,1.17
196004,1.17,8.20
196005,8.20,5.39
196006,5.39,-2.11
196007,-2.11,4.57
196008,4.57,-3.88
196009,-3.88,1.02


In [42]:
# exclude 2017 and later to tie to paper
data = data.loc[data.index[data.index < 201701]]
data = data.loc[data.index[data.index > 195911]]
data


Unnamed: 0_level_0,Food,Beer,Smoke,Games,Books,Hshld,Clths,Hlth,Chems,Txtls,...,Telcm.lead,Servs.lead,BusEq.lead,Paper.lead,Trans.lead,Whlsl.lead,Rtail.lead,Meals.lead,Fin.lead,Other.lead
yyyymm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
195912,2.01,0.35,-3.02,1.64,7.29,0.67,1.87,-1.97,3.08,0.74,...,0.62,-6.18,-7.93,-9.41,-4.31,-5.33,-6.09,-10.08,-4.68,-3.98
196001,-4.49,-5.71,-2.05,1.21,-5.47,-7.84,-8.53,-6.68,-10.03,-4.77,...,8.07,9.13,5.09,3.00,-0.94,1.42,4.00,1.81,-0.98,6.32
196002,3.35,-2.14,2.27,4.23,2.39,9.31,1.44,-0.02,-0.74,0.32,...,-0.21,-0.31,3.34,-2.43,-4.99,-1.37,-0.13,-3.88,0.05,-2.43
196003,-1.67,-2.94,-0.18,-0.65,2.18,-0.56,-2.59,1.26,-2.75,-6.79,...,-1.24,7.14,1.77,0.41,-2.13,0.45,-0.53,8.86,-0.64,0.55
196004,1.17,-2.16,1.35,6.46,-1.17,-1.27,0.21,1.49,-5.53,-1.10,...,3.05,-1.75,11.90,2.85,0.90,1.65,3.11,0.80,-0.45,1.02
196005,8.20,-0.52,2.44,7.28,11.67,7.74,1.74,13.50,3.40,2.10,...,-0.58,-8.07,2.39,3.50,2.17,5.96,3.41,1.03,3.72,6.41
196006,5.39,0.47,4.73,2.24,0.02,6.38,-1.59,-0.40,0.45,4.04,...,-0.03,2.84,-2.02,-4.10,-3.11,-6.16,-2.99,-1.25,0.09,-5.95
196007,-2.11,-0.79,4.60,-4.72,0.23,-0.60,-1.10,-3.99,-6.80,-3.14,...,6.94,5.69,2.71,1.18,1.98,4.51,2.85,2.05,3.47,3.48
196008,4.57,3.24,5.20,7.16,3.63,5.09,3.34,2.29,1.17,-0.84,...,-6.07,-3.53,-7.61,-7.37,-7.07,-8.44,-8.57,-1.90,-5.78,-4.21
196009,-3.88,-5.00,-2.09,-2.33,-6.20,-9.18,-4.23,-8.87,-6.70,-5.25,...,-0.08,4.62,-3.40,-1.85,-1.02,-4.22,0.31,-4.54,-0.40,0.38


In [43]:
desc = data.describe()
desc
# min, max line up with Table 1

Unnamed: 0,Food,Beer,Smoke,Games,Books,Hshld,Clths,Hlth,Chems,Txtls,...,Telcm.lead,Servs.lead,BusEq.lead,Paper.lead,Trans.lead,Whlsl.lead,Rtail.lead,Meals.lead,Fin.lead,Other.lead
count,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,...,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0
mean,0.690715,0.710613,0.982321,0.701708,0.528277,0.55419,0.66946,0.650905,0.519781,0.667416,...,0.520847,0.694234,0.584175,0.511241,0.582088,0.625562,0.662219,0.70273,0.60981,0.38562
std,4.339811,5.090215,6.061582,7.180918,5.809314,4.759874,6.386027,4.928072,5.518477,7.022552,...,4.62852,6.527984,6.738979,5.055314,5.739306,5.605317,5.349341,6.104515,5.411766,5.815446
min,-18.15,-20.19,-25.32,-33.4,-26.56,-22.24,-31.5,-21.06,-28.6,-33.11,...,-16.44,-28.67,-32.07,-27.74,-28.5,-29.25,-29.74,-31.89,-22.53,-28.09
25%,-1.64,-2.1,-2.78,-3.49,-2.69,-2.11,-2.81,-2.24,-2.8,-3.2,...,-2.11,-3.09,-3.29,-2.43,-2.78,-2.57,-2.43,-2.94,-2.42,-2.99
50%,0.74,0.71,1.28,0.89,0.51,0.75,0.69,0.75,0.67,0.63,...,0.61,0.97,0.56,0.69,0.86,0.94,0.47,1.03,0.82,0.47
75%,3.12,3.66,4.64,5.31,3.72,3.55,4.31,3.56,3.76,4.49,...,3.36,4.29,4.59,3.46,4.06,3.88,4.0,4.33,4.0,4.2
max,19.89,25.51,32.38,34.52,33.13,18.22,31.79,29.01,21.68,59.03,...,21.22,23.38,24.66,21.0,18.5,17.53,26.49,27.38,20.59,19.96


In [44]:
# annualized returns don't match Table 1, oddly
# geometric mean, annualized
pd.DataFrame((np.prod(data/100 + 1)**(12.0/len(data))-1)[:30], columns=['Mean Ann. Return'])

Unnamed: 0,Mean Ann. Return
Food,0.07402
Beer,0.072005
Smoke,0.100147
Games,0.054031
Books,0.043953
Hshld,0.054098
Clths,0.05717
Hlth,0.065463
Chems,0.044917
Txtls,0.051888


In [45]:
# try this way, arithmetic mean then annualize (not very correct)
#print(pd.DataFrame(((desc.loc['mean']/100+1)**12-1)[:30]))
#nope

# same
pd.DataFrame(((1 + np.mean(data, axis=0)/100)**12 -1)[:30], columns=['Mean Ann. Return'])

Unnamed: 0,Mean Ann. Return
Food,0.086108
Beer,0.088687
Smoke,0.12446
Games,0.087532
Books,0.065268
Hshld,0.068568
Clths,0.08336
Hlth,0.080966
Chems,0.064188
Txtls,0.083096


In [46]:
#annualized volatility 
pd.DataFrame((desc.loc['std']*np.sqrt(12))[:30].round(2))
# lines up with table 1

Unnamed: 0,std
Food,15.03
Beer,17.63
Smoke,21.0
Games,24.88
Books,20.12
Hshld,16.49
Clths,22.12
Hlth,17.07
Chems,19.12
Txtls,24.33


In [47]:
# Run LASSO, then OLS on selected variables

# skip last row to better match published r-squared
# looks like they forecast actuals 1960-2016 using 1959m12 to 2016m11
# not exact matches to Table 2 R-squared but almost within rounding error 
X = data.values[:-1,:npredictors]
Y = data.values[:-1,-nresponses:]
nrows = X.shape[0]
X.shape

(684, 30)

In [48]:
def subset_selection(X, Y, model_aic, verbose=False):
    
    global responses
    global response_reverse_dict
    global predictors
    global predictor_reverse_dict
    
    coef_dict = {}
    for response_index, response in enumerate(responses):
        y = Y[:,response_reverse_dict[response]]
        
        model_aic.fit(X, y)

        coef_dict[response] = [predstr for i, predstr in enumerate(predictors) if model_aic.coef_[i] !=0]
        #y_response = model_aic.responseict(X)
        # print ("In-sample LASSO R-squared: %.6f" % r2_score(y, y_response))
        if verbose:
            print("LASSO variables selected for %s: " % response)
            print(coef_dict[response])
        
        if not coef_dict[response]:
            if verbose:
                print("No coefs selected for " + response + ", using all")
                print("---")
            coef_dict[response] = predictors            
        # fit OLS vs. selected vars, better fit w/o LASSO penalties
        # in-sample R-squared using LASSO coeffs
        if verbose:
            print("Running OLS for " + response + " against " + str(coef_dict[response]))
            # col nums of selected responses
            predcols = [predictor_reverse_dict[predstr] for predstr in coef_dict[response]]
            model_ols = LinearRegression()
            model_ols.fit(X[:, predcols], y)
            y_pred = model_ols.predict(X[:, predcols])
            print ("In-sample OLS R-squared: %.2f" % (100 * r2_score(y, y_pred)))
            print("---")
            
    return coef_dict

coef_dict = subset_selection(X, Y, LassoLarsIC(criterion='aic'), verbose=True)

# These subsets line up closely with Table 2
# except Clths, Whlsl, we get different responses

LASSO variables selected for Food.lead: 
['Clths', 'Coal', 'Util', 'Rtail']
Running OLS for Food.lead against ['Clths', 'Coal', 'Util', 'Rtail']
In-sample OLS R-squared: 2.24
---
LASSO variables selected for Beer.lead: 
['Food', 'Clths', 'Coal']
Running OLS for Beer.lead against ['Food', 'Clths', 'Coal']
In-sample OLS R-squared: 2.52
---
LASSO variables selected for Smoke.lead: 
['Txtls', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'Paper', 'Trans', 'Fin']
Running OLS for Smoke.lead against ['Txtls', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'Paper', 'Trans', 'Fin']
In-sample OLS R-squared: 6.55
---
LASSO variables selected for Games.lead: 
['Books', 'Clths', 'Coal', 'Fin']
Running OLS for Games.lead against ['Books', 'Clths', 'Coal', 'Fin']
In-sample OLS R-squared: 5.05
---
LASSO variables selected for Books.lead: 
['Games', 'Books', 'Coal', 'Oil', 'Util', 'Servs', 'BusEq', 'Rtail', 'Fin']
Running OLS for Books.lead against ['Games', 'Books', 'Coal', 'O

In [49]:
# same predictors selected for all but 2 response vars
# use predictors from paper to match results
if True: # turn off/on
    coef_dict = {}
    coef_dict['Food.lead'] = ['Clths', 'Coal', 'Util', 'Rtail']
    coef_dict['Beer.lead'] = ['Food', 'Clths', 'Coal']
    coef_dict['Smoke.lead'] = ['Txtls', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'Paper', 'Trans', 'Fin']
    coef_dict['Games.lead'] = ['Books', 'Clths', 'Coal', 'Fin']
    coef_dict['Books.lead'] = ['Games', 'Books', 'Coal', 'Oil', 'Util', 'Servs', 'BusEq', 'Rtail', 'Fin']
    coef_dict['Hshld.lead'] = ['Clths', 'Coal', 'Rtail']
    coef_dict['Clths.lead'] = ['Books', 'Clths', 'Chems', 'Steel', 'ElcEq', 'Carry',  'Coal', 'Oil', 'Util','Telcm', 'Servs', 'BusEq', 'Rtail']
    # Running OLS for Clths against ['Clths', 'Coal', 'Oil', 'Servs', 'Rtail']
    coef_dict['Hlth.lead'] = ['Books', 'Mines', 'Coal', 'Util']
    coef_dict['Chems.lead'] = ['Clths']
    coef_dict['Txtls.lead'] = ['Clths', 'Autos', 'Coal', 'Oil', 'Rtail', 'Fin']
    coef_dict['Cnstr.lead'] = ['Clths', 'Coal', 'Oil', 'Util', 'Trans', 'Rtail', 'Fin']
    coef_dict['Steel.lead'] = ['Fin']
    coef_dict['FabPr.lead'] = ['Trans', 'Fin']
    coef_dict['ElcEq.lead'] = ['Fin']
    coef_dict['Autos.lead'] = ['Hshld', 'Clths', 'Coal', 'Oil', 'Util', 'BusEq', 'Rtail', 'Fin']
    coef_dict['Carry.lead'] = ['Trans']
    coef_dict['Mines.lead'] = []
    coef_dict['Coal.lead'] = ['Beer', 'Smoke', 'Books', 'Autos', 'Coal', 'Oil', 'Paper', 'Rtail']
    coef_dict['Oil.lead'] = ['Beer', 'Hlth', 'Carry']
    coef_dict['Util.lead'] = ['Food', 'Beer', 'Smoke', 'Hshld', 'Hlth', 'Cnstr', 'FabPr', 'Carry', 'Mines', 'Oil', 'Util', 'Telcm', 'BusEq', 'Whlsl', 'Fin', 'Other']
    coef_dict['Telcm.lead'] = ['Beer', 'Smoke', 'Books', 'Hshld', 'Cnstr', 'Autos', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Servs', 'BusEq', 'Rtail', 'Meals', 'Fin']
    coef_dict['Servs.lead'] = ['Smoke', 'Books', 'Steel', 'Oil', 'Util', 'Fin']
    coef_dict['BusEq.lead'] = ['Smoke', 'Books', 'Util']
    coef_dict['Paper.lead'] = ['Clths', 'Coal', 'Oil', 'Rtail', 'Fin']
    coef_dict['Trans.lead'] = ['Fin']
    coef_dict['Whlsl.lead'] = ['Food', 'Beer', 'Smoke', 'Books', 'Hlth', 'Carry', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'BusEq', 'Fin', 'Other']
    # Running OLS for Whlsl against ['Food', 'Smoke', 'Books', 'Carry', 'Coal', 'Oil', 'Util', 'Servs', 'Fin', 'Other']
    coef_dict['Rtail.lead'] = ['Rtail']
    coef_dict['Meals.lead'] = ['Smoke', 'Books', 'Clths', 'Steel', 'Carry', 'Coal', 'Oil', 'Util', 'Servs', 'BusEq', 'Meals', 'Fin']
    coef_dict['Fin.lead'] = ['Fin']
    coef_dict['Other.lead'] = ['Clths', 'Fin']


In [50]:
def predict_with_subsets(X, Y, model, coef_dict, verbose=False):

    global responses
    global response_reverse_dict
    
    scores = []
    for response in responses:
        y = Y[:,response_reverse_dict[response]]

#        print("LASSO variables selected for %s: " % pred)
#        print(coef_dict[pred])
        
        if not coef_dict[response]:
            if verbose:
                print("No coefs selected for " + response)
 #           print("---")
            continue
        # fit model vs. selected vars, better fit w/o LASSO penalties
        # in-sample R-squared using LASSO coeffs
        #print("Running model for " + pred + " against " + str(coef_dict[pred]))
        # col nums of selected predictors
        predcols = [predictor_reverse_dict[predstr] for predstr in coef_dict[response]]
        model.fit(X[:, predcols], y)
        y_pred = model.predict(X[:, predcols])
        score = r2_score(y, y_pred)
        scores.append(score)
        if verbose:
            print ("In-sample R-squared: %.4f for %s against %s" % (score, response, str(coef_dict[response])))
#        print("---")
    
    if verbose:
        print("Mean R-squared: %.4f" % np.mean(np.array(scores)))
    return np.mean(np.array(scores))
    

predict_with_subsets(X, Y, LinearRegression(), coef_dict, verbose=True)


In-sample R-squared: 0.0224 for Food.lead against ['Clths', 'Coal', 'Util', 'Rtail']
In-sample R-squared: 0.0252 for Beer.lead against ['Food', 'Clths', 'Coal']
In-sample R-squared: 0.0655 for Smoke.lead against ['Txtls', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'Paper', 'Trans', 'Fin']
In-sample R-squared: 0.0505 for Games.lead against ['Books', 'Clths', 'Coal', 'Fin']
In-sample R-squared: 0.0630 for Books.lead against ['Games', 'Books', 'Coal', 'Oil', 'Util', 'Servs', 'BusEq', 'Rtail', 'Fin']
In-sample R-squared: 0.0297 for Hshld.lead against ['Clths', 'Coal', 'Rtail']
In-sample R-squared: 0.0782 for Clths.lead against ['Books', 'Clths', 'Chems', 'Steel', 'ElcEq', 'Carry', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'BusEq', 'Rtail']
In-sample R-squared: 0.0268 for Hlth.lead against ['Books', 'Mines', 'Coal', 'Util']
In-sample R-squared: 0.0078 for Chems.lead against ['Clths']
In-sample R-squared: 0.0791 for Txtls.lead against ['Clths', 'Autos', 'Coal', 'Oil', 'Rtail',

0.038622786316912544

In [51]:
coef_dict_all = {}
for response in responses:
    coef_dict_all[response] = predictors
predict_with_subsets(X, Y, LinearRegression(), coef_dict_all, verbose=True)


In-sample R-squared: 0.0513 for Food.lead against ['Food', 'Beer', 'Smoke', 'Games', 'Books', 'Hshld', 'Clths', 'Hlth', 'Chems', 'Txtls', 'Cnstr', 'Steel', 'FabPr', 'ElcEq', 'Autos', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'BusEq', 'Paper', 'Trans', 'Whlsl', 'Rtail', 'Meals', 'Fin', 'Other']
In-sample R-squared: 0.0506 for Beer.lead against ['Food', 'Beer', 'Smoke', 'Games', 'Books', 'Hshld', 'Clths', 'Hlth', 'Chems', 'Txtls', 'Cnstr', 'Steel', 'FabPr', 'ElcEq', 'Autos', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'BusEq', 'Paper', 'Trans', 'Whlsl', 'Rtail', 'Meals', 'Fin', 'Other']
In-sample R-squared: 0.0748 for Smoke.lead against ['Food', 'Beer', 'Smoke', 'Games', 'Books', 'Hshld', 'Clths', 'Hlth', 'Chems', 'Txtls', 'Cnstr', 'Steel', 'FabPr', 'ElcEq', 'Autos', 'Carry', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm', 'Servs', 'BusEq', 'Paper', 'Trans', 'Whlsl', 'Rtail', 'Meals', 'Fin', 'Other']
In-sample R-squared: 0.0756 for Games.lead against ['Food', 'Be

0.06637486888237351

In [60]:
def fit_predict(X, Y, model, coef_dict=None, npredict=1):
    """for backtest, train model using Y_list v. X using n-npredict rows
    generate npredict prediction Y_list using last npredict rows of X
    if npredict=1, fit using n-1 rows, return prediction using X for final month
    if npredict=26, fit using n-26 rows, return prediction using X for final 26 months"""
    
    global responses
    global response_reverse_dict
    
    # keep last row to predict against
    X_predict = X[-npredict:]
    ncols = X.shape[1]
    X_predict = X_predict.reshape(npredict,ncols)
    # fit on remaining rows
    X_fit = X[:-npredict]
    Y_fit = Y[:-npredict]

    # if no coef_dict select predictors into coef_dict
    if coef_dict is None:
        coef_dict = subset_selection(X_fit, Y_fit, LassoLarsIC(criterion='aic'))
    # if coef_dict == "all" use all predictors for each response        
    elif coef_dict == 'all':
        coef_dict = {}
        for response in responses:
            coef_dict[response]=predictors

    predictions = []
    for response in responses:
        if not coef_dict[response]:
            predictions.append([np.nan]*npredict)
            continue
        # column indexes to fit against each other
        predcols = [predictor_reverse_dict[predstr] for predstr in coef_dict[response]]
        responsecol = response_reverse_dict[response]
        model.fit(X_fit[:, predcols], Y_fit[:,responsecol])
        y_pred = model.predict(X_predict[:,predcols])        
        predictions.append(y_pred)
        
    return np.array(predictions).transpose()

#    return np.argsort(predictions)

X = data.values[:,:npredictors]
Y = data.values[:, -nresponses:]
model = LinearRegression()
predictions = fit_predict(X, Y, model, coef_dict, 1)
predictions

array([[ 1.4836355 ,  1.75771382,  1.91101552,  1.96962233,  1.31106724,
         0.92052413,  1.49257673,  1.7530845 ,  0.43144952,  1.94583012,
         1.87792229,  0.77822931,  0.8564701 ,  1.02533527,  1.25730406,
         0.76996524,         nan, -1.14851378,  0.55307421,  1.59688546,
         1.17452009,  1.31729877,  0.6933691 ,  0.99440421,  0.97025453,
         1.09404707,  0.45281322,  1.61771085,  1.0268271 ,  0.6264228 ]])

In [53]:
# 197001 = 121
STARTMONTH = 121
print(X[STARTMONTH])
print(data.iloc[STARTMONTH][:30])

[ -3.34  -1.95  -7.59  -7.76 -12.05  -7.5   -5.69  -7.71  -7.37  -5.26
  -9.84  -6.31  -7.15  -6.89  -9.35 -12.49  -2.34  -0.77 -12.16  -4.83
  -3.16 -11.17  -9.73  -8.89  -8.17  -8.28  -6.31 -13.12  -9.78  -6.2 ]
Food     -3.34
Beer     -1.95
Smoke    -7.59
Games    -7.76
Books   -12.05
Hshld    -7.50
Clths    -5.69
Hlth     -7.71
Chems    -7.37
Txtls    -5.26
Cnstr    -9.84
Steel    -6.31
FabPr    -7.15
ElcEq    -6.89
Autos    -9.35
Carry   -12.49
Mines    -2.34
Coal     -0.77
Oil     -12.16
Util     -4.83
Telcm    -3.16
Servs   -11.17
BusEq    -9.73
Paper    -8.89
Trans    -8.17
Whlsl    -8.28
Rtail    -6.31
Meals   -13.12
Fin      -9.78
Other    -6.20
Name: 197001, dtype: float64


In [96]:
# predict all months starting STARTMONTH
# initialize predictions matrix P

def run_backtest(X, Y, model, coef_dict=None, startmonth=0, minmaxscale=False):
    global P
    global R 

    P = np.zeros_like(Y)
    count = 0
    for month_index in range(startmonth, X.shape[0]+1):
        # 0 to month_index-1
        Xscale = X.copy()
        Yscale = Y.copy()

        if minmaxscale:
            # minmaxscale each row - transpose because MinMaxScaler scales by columns
            Xscale = MinMaxScaler().fit_transform(Xscale.transpose()).transpose()
            Yscale = MinMaxScaler().fit_transform(Yscale.transpose()).transpose()
        
        predictions = fit_predict(Xscale[:month_index, :], 
                                  Yscale[:month_index], 
                                  model,
                                  coef_dict)
        try:
            P[month_index]= predictions
            sys.stdout.write('.')
            count += 1
            if count % 80 == 0:
                print("")
                print("%s Still training %d of %d" % (time.strftime("%H:%M:%S"), count, (X.shape[0]-startmonth+1)))
                
            sys.stdout.flush()
        except IndexError:
            # I want to run the fit and see the R-squared on full dataset
            # but we are storing the predictions in row of the month predicted
            # so we have no row to store the last prediction (2017-01)
            print("\nlast prediction not stored")

    mse = np.mean((P[startmonth:]-X[startmonth:])**2)
    print("MSE across all predictions: %.4f" % mse)
    
    R = np.zeros(P.shape[0])
    numstocks = 6 # top quintile (and bottom)

    indcount = []
    longcount = []
    shortcount = []

    for response in responses:
        indcount.append(0)
        longcount.append(0)
        shortcount.append(0)
        
    for month_index in range(startmonth, X.shape[0]):
        # get indexes of sorted smallest to largest
        # leftmost 6
        # ignore nan
        short_sort_array = [999999 if np.isnan(x) else x for x in P[month_index]]
        select_array = np.argsort(short_sort_array)
        short_indexes = select_array[:numstocks]
        # rightmost 6
        long_sort_array = [-999999 if np.isnan(x) else x for x in P[month_index]]
        select_array = np.argsort(long_sort_array)
        long_indexes = select_array[-numstocks:]
        # compute equal weighted long/short return
        R[month_index] = np.mean(X[month_index, long_indexes])/2 - np.mean(X[month_index, short_indexes])/2
        # count occurrences of each industry
        for i in short_indexes:
            indcount[i]+=1
            shortcount[i]+=1
        for i in long_indexes:
            indcount[i]+=1
            longcount[i]+=1

    for response in responses:
        i = response_reverse_dict[response]
        print("%s: long %d times, short %d times, total %d times" % (response, longcount[i], shortcount[i], indcount[i]))
        
    results = R[startmonth:]

    index = pd.date_range('01/01/1970',periods=results.shape[0], freq='M')
    perfdata = pd.DataFrame(results,index=index,columns=['Returns'])
    perfdata['Equity'] = 100 * np.cumprod(1 + results / 100)

    stats = perfdata['Equity'].calc_stats()

    retframe = pd.DataFrame([stats.stats.loc['start'],
                             stats.stats.loc['end'],
                             stats.stats.loc['cagr'],
                             stats.stats.loc['yearly_vol'],
                             stats.stats.loc['yearly_sharpe'],
                             stats.stats.loc['max_drawdown'],
                             ffn.core.calc_sortino_ratio(perfdata.Returns, rf=0, nperiods=564, annualize=False),
                            ],
                            index = ['start',
                                     'end',
                                     'cagr',
                                     'yearly_vol',
                                     'yearly_sharpe',
                                     'max_drawdown',
                                     'sortino',
                                    ],
                            columns=['Value'])   
    return retframe


In [73]:
model = LinearRegression()
run_backtest(X, Y, model, coef_dict, startmonth=STARTMONTH, minmaxscale=False)

................................................................................
15:46:23 Still training 80 of 565
................................................................................
15:46:24 Still training 160 of 565
................................................................................
15:46:25 Still training 240 of 565
................................................................................
15:46:27 Still training 320 of 565
................................................................................
15:46:28 Still training 400 of 565
................................................................................
15:46:29 Still training 480 of 565
................................................................................
15:46:30 Still training 560 of 565
....
last prediction not stored
MSE across all predictions: nan
Food.lead: long 102 times, short 41 times, total 143 times
Beer.lead: long 129 times, short 100 times, total 229 times
Smoke.

Unnamed: 0,Value
start,1970-01-31 00:00:00
end,2016-12-31 00:00:00
cagr,0.0650898
yearly_vol,0.0819691
yearly_sharpe,0.809419
max_drawdown,-0.0911841
sortino,0.622152


In [65]:
# double check results_post_LASSO
#model = LinearRegression()
#R = run_backtest(X, Y, model, coef_dict_paper, startmonth=STARTMONTH, summary=False)
results_post_LASSO = R[STARTMONTH:]
print(len(results_post_LASSO))
#print(results_post_LASSO)
print(np.mean(results_post_LASSO))
print(np.std(results_post_LASSO) * np.sqrt(12))
print(np.prod(1 + results_post_LASSO / 100))
print(np.prod(1 + results_post_LASSO / 100) ** (12.0/results_post_LASSO.shape[0]))-1

564
0.5413164893617022
5.826034731666633
19.4255053964083
0.06515344703464421


In [66]:
# run performance chart
perf_post_LASSO = 100 * np.cumprod(1 + results_post_LASSO / 100)

def mychart(args, names=None):
    x_coords = np.linspace(1970, 2016, args[0].shape[0])
    
    plotdata = []
    for i in range(len(args)):
        tracelabel = "Trace %d" % i
        if names:
                tracelabel=names[i]
        plotdata.append(Scatter(x=x_coords,
                                y=args[i].reshape(-1),
                                mode = 'line',
                                name=tracelabel))    

    layout = Layout(
        autosize=False,
        width=600,
        height=480,
        yaxis=dict(
            type='log',
            autorange=True
        )
    )
    
    fig = Figure(data=plotdata, layout=layout)
    
    return iplot(fig)
    
mychart([perf_post_LASSO],["Post-LASSO"])

In [19]:
# pass coef_dict as None
# fit_predict will do subset selection at each timestep using data it trains on
model = LinearRegression()
run_backtest(X, Y, model, coef_dict=None, startmonth=STARTMONTH)

................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
....
last prediction not stored
MSE across all predictions: 41.5883


Unnamed: 0,Value
start,1970-01-31 00:00:00
end,2016-12-31 00:00:00
cagr,0.0352209
yearly_vol,0.0479525
yearly_sharpe,0.751411
max_drawdown,-0.128334
sortino,0.326473


In [21]:
results_LASSO_each_timestep = R[STARTMONTH:]
perf_LASSO_each_timestep = 100 * np.cumprod(1 + results_LASSO_each_timestep / 100)
mychart([perf_LASSO_each_timestep])

In [22]:
# pass coef_dict as 'all'
# fit_predict will use all predictors (no subset selection)
model = LinearRegression()
run_backtest(X, Y, model, coef_dict='all', startmonth=STARTMONTH)

................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
....
last prediction not stored
MSE across all predictions: 44.0218


Unnamed: 0,Value
start,1970-01-31 00:00:00
end,2016-12-31 00:00:00
cagr,0.0275257
yearly_vol,0.0602329
yearly_sharpe,0.48398
max_drawdown,-0.16111
sortino,0.219749


In [23]:
results_OLS = R[STARTMONTH:]
perf_OLS = 100 * np.cumprod(1 + results_OLS / 100)
mychart([perf_OLS])

In [26]:
mychart([perf_post_LASSO, perf_LASSO_each_timestep, perf_OLS],["Post-LASSO", "LASSO each timestep", "OLS"])

In [27]:
def walkforward_xval (X, Y, model, coef_dict=None):

    start = time.time()

    # generate k-folds
    n_splits = 5
    kf = KFold(n_splits=n_splits)
    kf.get_n_splits(X)
    last_indexes = []
    for train_index, test_index in kf.split(X):
        # use test_index as last index to train
        last_index = test_index[-1] + 1
        last_indexes.append(last_index)
    print("%s Generate splits %s" % (time.strftime("%H:%M:%S"), str([i for i in last_indexes])))

    print("%s Starting training" % (time.strftime("%H:%M:%S")))
    
    avg_bests = []
    for i in range(1, n_splits-1):

        models = []
        losses = []
        scores = []
        count = 0        
        # skip kfold 0 so you start with train 2x size of eval set
        last_train_index = last_indexes[i]
        last_xval_index = last_indexes[i+1]

        # set up train, xval
        # train from beginning to last_train_index        
        print("Training indexes 0 to %d" % (last_train_index-1))
        X_fit = X[:last_train_index]
        Y_fit = Y[:last_train_index]
        # xval from last_train_index to last_xval_index
        print("Cross-validating indexes %d to %d" % (last_train_index, last_xval_index -1 ))
        X_xval = X[last_train_index:last_xval_index]
        Y_xval = Y[last_train_index:last_xval_index]

        if coef_dict is None:
            print("Performing LASSO subset selection on training set")
            coef_dict = subset_selection(X_fit, Y_fit, LassoLarsIC(criterion='aic'), verbose=False)
        
        mse_list = []
        
        for response in responses:
            predcols = [predictor_reverse_dict[indstr] for indstr in coef_dict[response]]
            if len(predcols) == 0:
                continue
            responsecol = response_reverse_dict[response]
            
            fit = model.fit(X_fit[:,predcols], Y_fit[:,responsecol])
            # evaluate ... run prediction, calc MSE by industry, and average
            y_xval_pred = fit.predict(X_xval[:,predcols])
            mse_list.append(mean_squared_error(Y_xval[:,i], y_xval_pred))
            sys.stdout.write('.')
            count += 1
            if count % 80 == 0:
                print("")
                print("%s Still training" % (time.strftime("%H:%M:%S")))
            sys.stdout.flush()             
        # mean mse over industry ys for this fold
        xval_score = np.mean(np.array(mse_list))            

        # choose model with lowest xval loss
        print ("\n%s Xval MSE %f" % (time.strftime("%H:%M:%S"), xval_score))
        avg_bests.append(xval_score)
    
    print ("Last Xval loss %f" % (xval_score))
    # mean over folds
    avg_loss = np.mean(np.array(avg_bests))
    print ("Avg Xval loss %f" % avg_loss)
    print("--------------------------------------------------------------------------------")
    return (avg_loss, model)


In [28]:
model = LinearRegression()
walkforward_xval (X, Y, model, coef_dict=coef_dict)


19:50:05 Generate splits [137, 274, 411, 548, 685]
19:50:05 Starting training
Training indexes 0 to 273
Cross-validating indexes 274 to 410
.............................
19:50:05 Xval MSE 31.244196
Training indexes 0 to 410
Cross-validating indexes 411 to 547
.............................
19:50:05 Xval MSE 69.965879
Training indexes 0 to 547
Cross-validating indexes 548 to 684
.............................
19:50:06 Xval MSE 60.234598
Last Xval loss 60.234598
Avg Xval loss 53.814891
--------------------------------------------------------------------------------


(53.814891092676824,
 LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))

In [29]:
model = MLPRegressor(hidden_layer_sizes=(2,2,2),
                     alpha=1.0,
                     activation='tanh',
                     max_iter=10000, 
                     tol=1e-10,
                     solver='lbfgs')
walkforward_xval (X, Y, model, coef_dict=coef_dict)


19:50:06 Generate splits [137, 274, 411, 548, 685]
19:50:06 Starting training
Training indexes 0 to 273
Cross-validating indexes 274 to 410
.............................
19:50:17 Xval MSE 34.001979
Training indexes 0 to 410
Cross-validating indexes 411 to 547
.............................
19:50:35 Xval MSE 72.516713
Training indexes 0 to 547
Cross-validating indexes 548 to 684
.............................
19:50:50 Xval MSE 62.353781
Last Xval loss 62.353781
Avg Xval loss 56.290824
--------------------------------------------------------------------------------


(56.29082447590586,
 MLPRegressor(activation='tanh', alpha=1.0, batch_size='auto', beta_1=0.9,
        beta_2=0.999, early_stopping=False, epsilon=1e-08,
        hidden_layer_sizes=(2, 2, 2), learning_rate='constant',
        learning_rate_init=0.001, max_iter=10000, momentum=0.9,
        nesterovs_momentum=True, power_t=0.5, random_state=None,
        shuffle=True, solver='lbfgs', tol=1e-10, validation_fraction=0.1,
        verbose=False, warm_start=False))

In [30]:
MODELPREFIX = "MLP"

n_hiddens = [1, 2, 3]
layer_sizes = [1, 2, 4, 8]
reg_penalties = [0.0, 0.001, 0.01, 0.1, 1]
hyperparameter_combos = list(product(n_hiddens, layer_sizes, reg_penalties))

print("%s Running %d experiments" % (time.strftime("%H:%M:%S"), len(hyperparameter_combos)))

experiments = {}

# minmaxscale each row
for i in range(Xscale.shape[0]):
    Xscale[i] = Xscale[i] - np.min(Xscale[i])
    Xscale[i] = Xscale[i]/np.max(Xscale[i])


for i in range(Yscale.shape[0]):
    Yscale[i] = Yscale[i] - np.min(Yscale[i])
                Yscale[i] = Yscale[i]/np.max(Yscale[i])
        
for counter, param_list in enumerate(hyperparameter_combos):
    n_hidden_layers, layer_size, reg_penalty = param_list
    print("%s Running experiment %d of %d" % (time.strftime("%H:%M:%S"), counter+1, len(hyperparameter_combos)))
    key = (n_hidden_layers, layer_size, reg_penalty)
    print("%s n_hidden_layers = %d, hidden_layer_size = %d, reg_penalty = %.6f" % 
          (time.strftime("%H:%M:%S"), n_hidden_layers, layer_size, reg_penalty))
    hls = tuple([layer_size]*n_hidden_layers)
    model = MLPRegressor(hidden_layer_sizes=hls,
                         alpha=reg_penalty,
                         activation='tanh',
                         max_iter=10000, 
                         tol=1e-10,
                         solver='lbfgs')
    
    score, model = walkforward_xval (X, Y, model, coef_dict=coef_dict)

    experiments[key] = score


19:50:50 Running 60 experiments
19:50:50 Running experiment 1 of 60
19:50:50 n_hidden_layers = 1, hidden_layer_size = 1, reg_penalty = 0.000000
19:50:50 Generate splits [137, 274, 411, 548, 685]
19:50:50 Starting training
Training indexes 0 to 273
Cross-validating indexes 274 to 410
.............................
19:50:50 Xval MSE 31.757389
Training indexes 0 to 410
Cross-validating indexes 411 to 547
.............................
19:50:51 Xval MSE 69.954439
Training indexes 0 to 547
Cross-validating indexes 548 to 684
.............................
19:50:51 Xval MSE 61.582940
Last Xval loss 61.582940
Avg Xval loss 54.431589
--------------------------------------------------------------------------------
19:50:51 Running experiment 2 of 60
19:50:51 n_hidden_layers = 1, hidden_layer_size = 1, reg_penalty = 0.001000
19:50:51 Generate splits [137, 274, 411, 548, 685]
19:50:51 Starting training
Training indexes 0 to 273
Cross-validating indexes 274 to 410
.............................
19:50:

.............................
19:51:35 Xval MSE 37.227501
Training indexes 0 to 410
Cross-validating indexes 411 to 547
.............................
19:51:39 Xval MSE 72.878927
Training indexes 0 to 547
Cross-validating indexes 548 to 684
.............................
19:51:43 Xval MSE 62.865359
Last Xval loss 62.865359
Avg Xval loss 57.657262
--------------------------------------------------------------------------------
19:51:43 Running experiment 14 of 60
19:51:43 n_hidden_layers = 1, hidden_layer_size = 4, reg_penalty = 0.100000
19:51:43 Generate splits [137, 274, 411, 548, 685]
19:51:43 Starting training
Training indexes 0 to 273
Cross-validating indexes 274 to 410
.............................
19:51:46 Xval MSE 36.888061
Training indexes 0 to 410
Cross-validating indexes 411 to 547
.............................
19:51:49 Xval MSE 72.553733
Training indexes 0 to 547
Cross-validating indexes 548 to 684
.............................
19:51:53 Xval MSE 63.554094
Last Xval loss 63.554

.............................
20:12:54 Xval MSE 84.932361
Training indexes 0 to 410
Cross-validating indexes 411 to 547
.............................
20:14:24 Xval MSE 97.815730
Training indexes 0 to 547
Cross-validating indexes 548 to 684
.............................
20:16:10 Xval MSE 77.679901
Last Xval loss 77.679901
Avg Xval loss 86.809331
--------------------------------------------------------------------------------
20:16:10 Running experiment 38 of 60
20:16:10 n_hidden_layers = 2, hidden_layer_size = 8, reg_penalty = 0.010000
20:16:10 Generate splits [137, 274, 411, 548, 685]
20:16:10 Starting training
Training indexes 0 to 273
Cross-validating indexes 274 to 410
.............................
20:17:28 Xval MSE 67.440807
Training indexes 0 to 410
Cross-validating indexes 411 to 547
.............................
20:19:03 Xval MSE 91.443931
Training indexes 0 to 547
Cross-validating indexes 548 to 684
.............................
20:20:54 Xval MSE 77.812710
Last Xval loss 77.812

In [31]:
# list and chart experiments
flatlist = [list(l[0]) + [l[1]] for l in experiments.items()]
 
lossframe = pd.DataFrame(flatlist, columns=["n_hidden_layers", "layer_size", "reg_penalty", "loss"])
lossframe.sort_values(['loss'])

Unnamed: 0,n_hidden_layers,layer_size,reg_penalty,loss
55,1,1,0.001,54.322175
50,2,1,0.001,54.327089
15,2,1,0.1,54.36459
1,2,1,1.0,54.379211
43,3,1,0.01,54.386167
34,1,1,0.01,54.391798
44,1,1,1.0,54.405743
38,3,1,0.0,54.417954
47,1,1,0.0,54.431589
31,3,1,0.001,54.432149


In [32]:
# we can pick lowest loss , but first we look at patterns by hyperparameter
pd.DataFrame(lossframe.groupby(['n_hidden_layers'])['loss'].mean())


Unnamed: 0_level_0,loss
n_hidden_layers,Unnamed: 1_level_1
1,95.037897
2,63.24133
3,64.520195


In [33]:
pd.DataFrame(lossframe.groupby(['layer_size'])['loss'].mean())


Unnamed: 0_level_0,loss
layer_size,Unnamed: 1_level_1
1,54.439597
2,55.829029
4,61.406183
8,125.391087


In [34]:
pd.DataFrame(lossframe.groupby(['reg_penalty'])['loss'].mean())


Unnamed: 0_level_0,loss
reg_penalty,Unnamed: 1_level_1
0.0,123.973792
0.001,63.283206
0.01,61.948197
0.1,61.589883
1.0,60.53729


In [35]:
def plot_matrix(lossframe, x_labels, y_labels, x_suffix="", y_suffix=""):

    pivot = lossframe.pivot_table(index=[x_labels], columns=[y_labels], values=['loss'])
    # specify labels as strings, to force it to use a discrete axis
    if lossframe[x_labels].dtype == np.float64 or lossframe[x_labels].dtype == np.float32:
        xaxis = ["%f %s" % (i, x_suffix) for i in pivot.columns.levels[1].values]
    else:
        xaxis = ["%d %s" % (i, x_suffix) for i in pivot.columns.levels[1].values]
    if lossframe[y_labels].dtype == np.float64 or lossframe[y_labels].dtype == np.float32:
        yaxis = ["%f %s" % (i, y_suffix) for i in pivot.index.values]
    else:
        yaxis = ["%d %s" % (i, y_suffix) for i in pivot.index.values]
        
    print(xaxis, yaxis)
    """plot a heat map of a matrix"""
    chart_width=640
    chart_height=480
    
    layout = Layout(
        title="%s v. %s" % (x_labels, y_labels),
        height=chart_height,
        width=chart_width,     
        margin=dict(
            l=150,
            r=30,
            b=120,
            t=100,
        ),
        xaxis=dict(
            title=y_labels,
            tickfont=dict(
                family='Arial, sans-serif',
                size=10,
                color='black'
            ),
        ),
        yaxis=dict(
            title=x_labels,
            tickfont=dict(
                family='Arial, sans-serif',
                size=10,
                color='black'
            ),
        ),
    )
    
    data = [Heatmap(z=pivot.values,
                    x=xaxis,
                    y=yaxis,
                    colorscale=[[0, 'rgb(0,0,255)', [1, 'rgb(255,0,0)']]],
                   )
           ]

    fig = Figure(data=data, layout=layout)
    return iplot(fig, link_text="")

plot_matrix(lossframe, "n_hidden_layers", "layer_size", x_suffix=" units", y_suffix=" layers")



(['1  units', '2  units', '4  units', '8  units'], ['1  layers', '2  layers', '3  layers'])


In [36]:
plot_matrix(lossframe, "n_hidden_layers", "reg_penalty", x_suffix="p", y_suffix=" layers")


(['0 p', '0 p', '0 p', '0 p', '1 p'], ['1.000000  layers', '2.000000  layers', '3.000000  layers'])


In [37]:
plot_matrix(lossframe, "reg_penalty", "layer_size", x_suffix=" units", y_suffix="p")


(['1.000000  units', '2.000000  units', '4.000000  units', '8.000000  units'], ['0 p', '0 p', '0 p', '0 p', '1 p'])


In [97]:
# 1-unit layers is not really a NN but anyway let's see how it does
model = MLPRegressor(hidden_layer_sizes=(1,1,1),
                     alpha=0.01,
                     activation='tanh',
                     max_iter=10000, 
                     tol=1e-10,
                     solver='lbfgs')
run_backtest(X, Y, model, startmonth=STARTMONTH, minmaxscale=False)

................................................................................
16:31:48 Still training 80 of 565
................................................................................
16:33:49 Still training 160 of 565
................................................................................
16:35:52 Still training 240 of 565
................................................................................
16:38:04 Still training 320 of 565
................................................................................
16:40:36 Still training 400 of 565
................................................................................
16:43:16 Still training 480 of 565
................................................................................
16:46:00 Still training 560 of 565
....
last prediction not stored
MSE across all predictions: 39.7718
Food.lead: long 109 times, short 61 times, total 170 times
Beer.lead: long 149 times, short 57 times, total 206 times
Smo

Unnamed: 0,Value
start,1970-01-31 00:00:00
end,2016-12-31 00:00:00
cagr,0.0253768
yearly_vol,0.0609663
yearly_sharpe,0.420087
max_drawdown,-0.247863
sortino,0.242389


In [None]:
was .028 -> .0219