# 5. Model selection

Now that we have data to train the models, we will search for good pipelines using the TPOT library. The hyperparameters for each pipeline will later be optimized using the hyperopt library.

In [1]:
import pandas as pd
import numpy as np

from utils import *

import joblib
from tpot import TPOTRegressor

In [2]:
filtered_vars = joblib.load('models/filtered_vars.joblib')
cutoff_date = joblib.load('models/cutoff_date.joblib')
df = pd.read_csv('models/req_data.csv', index_col=0, parse_dates=True).dropna()
feats = df.drop(labels=['target'], axis=1)
to_predict = df.loc[:, 'target']
complete_data = pd.read_csv('models/ohlcv.csv', index_col=0, parse_dates=True)
del df

### 5.1 Model by cluster phase

Following our market structure analysis, we will create a regression model for each cluster in the market structure using the subset of variables filtered during the previous step. We will predict a 7 period rolling mean of the closing price returns.  

For model optimization, the TPOT library will be implemented. Pipelines will be evaluated with the pearson correlation coefficient using a 10 split time series cross-validation (to avoid look-ahead bias as we are dealing with time series) in the training set. Best pipelines will then forecast the test set and these forecasts will be used in the trading strategy (not the training set ones).

To use pearson correlation as the performance measure for optimizing pipelines, you must replace the gp_deap file in the TPOT library with the gp_deap_sec file located in this repository (more specifically, add lines 40 to 49 and line 484 while commenting the rest of the try block, as in the file).

First, we'll define the search space for TPOT.

In [3]:
regressor_config_dict = {

    'sklearn.linear_model.ElasticNetCV': {
        'l1_ratio': np.arange(0.0, 1.01, 0.05),
        'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1]
    },

    'sklearn.ensemble.ExtraTreesRegressor': {
        'n_estimators': [100],
        'max_features': np.arange(0.05, 1.01, 0.05),
        'min_samples_split': range(2, 21),
        'min_samples_leaf': range(1, 21),
        'bootstrap': [True, False]
    },

    'lightgbm.LGBMRegressor':{
        'learning_rate': [0.01, 0.05, 0.1, 0.5],
        'num_leaves': np.arange(20, 120, 5),
        'min_child_samples': np.arange(5, 50, 5),
        'subsample': [0.7, 0.9, 1]  
    },
    
    'sklearn.ensemble.GradientBoostingRegressor': {
        'n_estimators': [100],
        'loss': ["ls", "lad", "huber", "quantile"],
        'learning_rate': [1e-3, 1e-2, 1e-1, 0.5, 1.],
        'max_depth': range(1, 11),
        'min_samples_split': range(2, 21),
        'min_samples_leaf': range(1, 21),
        'subsample': np.arange(0.05, 1.01, 0.05),
        'max_features': np.arange(0.05, 1.01, 0.05),
        'alpha': [0.75, 0.8, 0.85, 0.9, 0.95, 0.99]
    },

    'sklearn.ensemble.AdaBoostRegressor': {
        'n_estimators': [100],
        'learning_rate': [1e-3, 1e-2, 1e-1, 0.5, 1.],
        'loss': ["linear", "square", "exponential"]
    },

    'sklearn.tree.DecisionTreeRegressor': {
        'max_depth': range(1, 11),
        'min_samples_split': range(2, 21),
        'min_samples_leaf': range(1, 21)
    },

    'sklearn.neighbors.KNeighborsRegressor': {
        'n_neighbors': range(1, 101),
        'weights': ["uniform", "distance"],
        'p': [1, 2]
    },
    
    'sklearn.neighbors.RadiusNeighborsRegressor': {
        'leaf_size': np.arange(10, 60, 10),
        'p': [1, 2],
        'weights': ['uniform', 'distance']
    },

    'sklearn.linear_model.LassoLarsCV': {
        'normalize': [True, False]
    },

    'sklearn.svm.LinearSVR': {
        'loss': ["epsilon_insensitive", "squared_epsilon_insensitive"],
        'dual': [True, False],
        'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1],
        'C': [1e-4, 1e-3, 1e-2, 1e-1, 0.5, 1., 5., 10., 15., 20., 25.],
        'epsilon': [1e-4, 1e-3, 1e-2, 1e-1, 1.]
    },

    'sklearn.ensemble.RandomForestRegressor': {
        'n_estimators': [100],
        'max_features': np.arange(0.05, 1.01, 0.05),
        'min_samples_split': range(2, 21),
        'min_samples_leaf': range(1, 21),
        'bootstrap': [True, False]
    },

    'sklearn.linear_model.RidgeCV': {
    },
    
    'xgboost.XGBRegressor': {
        'n_estimators': [100],
        'max_depth': range(1, 11),
        'learning_rate': [1e-3, 1e-2, 1e-1, 0.5, 1.],
        'subsample': np.arange(0.05, 1.01, 0.05),
        'min_child_weight': range(1, 21),
        'n_jobs': [1],
        'verbosity': [0],
        'objective': ['reg:squarederror']
    },

    'sklearn.linear_model.SGDRegressor': {
        'loss': ['squared_loss', 'huber', 'epsilon_insensitive'],
        'penalty': ['elasticnet'],
        'alpha': [0.0, 0.01, 0.001] ,
        'learning_rate': ['invscaling', 'constant'] ,
        'fit_intercept': [True, False],
        'l1_ratio': [0.25, 0.0, 1.0, 0.75, 0.5],
        'eta0': [0.1, 1.0, 0.01],
        'power_t': [0.5, 0.0, 1.0, 0.1, 100.0, 10.0, 50.0]
    },

    # Preprocessors
    'sklearn.preprocessing.Binarizer': {
        'threshold': np.arange(0.0, 1.01, 0.05)
    },

    'sklearn.decomposition.FastICA': {
        'tol': np.arange(0.0, 1.01, 0.05)
    },

    'sklearn.cluster.FeatureAgglomeration': {
        'linkage': ['ward', 'complete', 'average'],
        'affinity': ['euclidean', 'l1', 'l2', 'manhattan', 'cosine']
    },

    'sklearn.preprocessing.MaxAbsScaler': {
    },

    'sklearn.preprocessing.PowerTransformer': {
    },
    
    'sklearn.preprocessing.QuantileTransformer': {
    },
    
    'sklearn.preprocessing.MinMaxScaler': {
    },

    'sklearn.preprocessing.Normalizer': {
        'norm': ['l1', 'l2', 'max']
    },
    
    'sklearn.preprocessing.KBinsDiscretizer': {
        'n_bins': [10, 50, 250, 500, 1000],
        'encode': ['ordinal'],
        'strategy': ['uniform', 'quantile']
    },

    'sklearn.kernel_approximation.Nystroem': {
        'kernel': ['rbf', 'cosine', 'chi2', 'laplacian', 'polynomial', 'poly', 'linear', 'additive_chi2', 'sigmoid'],
        'gamma': np.arange(0.0, 1.01, 0.05),
        'n_components': range(1, 11)
    },

    'sklearn.decomposition.PCA': {
        'svd_solver': ['randomized'],
        'iterated_power': range(1, 11)
    },

    'sklearn.preprocessing.PolynomialFeatures': {
        'degree': [2],
        'include_bias': [False],
        'interaction_only': [False]
    },

    'sklearn.kernel_approximation.RBFSampler': {
        'gamma': np.arange(0.0, 1.01, 0.05)
    },

    'sklearn.preprocessing.RobustScaler': {
    },

    'sklearn.preprocessing.StandardScaler': {
    },

    'tpot.builtins.ZeroCount': {
    },

    'tpot.builtins.OneHotEncoder': {
        'minimum_fraction': [0.05, 0.1, 0.15, 0.2, 0.25],
        'sparse': [False],
        'threshold': [10]
    },

    # Selectors
    'sklearn.feature_selection.SelectFwe': {
        'alpha': np.arange(0, 0.05, 0.001),
        'score_func': {
            'sklearn.feature_selection.f_regression': None
        }
    },

    'sklearn.feature_selection.SelectPercentile': {
        'percentile': range(1, 100),
        'score_func': {
            'sklearn.feature_selection.f_regression': None
        }
    },

    'sklearn.feature_selection.VarianceThreshold': {
        'threshold': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2]
    },

    'sklearn.feature_selection.SelectFromModel': {
        'threshold': np.arange(0, 1.01, 0.05),
        'estimator': {
            'sklearn.ensemble.ExtraTreesRegressor': {
                'n_estimators': [100],
                'max_features': np.arange(0.05, 1.01, 0.05)
            }
        }
    }

}

In [4]:
vars_to_lag = ['h_high_close', 'h_low_close', 'h_candle_body', 'h_rsi_13h', 'h_ema_50', 'h_ema_200', 'h_obv10_obv50',
              'h_obv50_obv200', 'h_close_ma']

As we saw earlier, asset returns are highly skewed and have high kurtosis as well. There are some data transformations that can be applied to control this. For this problem, we will test the cube root and arcsin.

In [5]:
data_transformations = {'none': [lambda x: x, lambda x: x], 'arsinh': [lambda x: np.arcsinh(x), lambda x: np.sinh(x)],
                       'cuberoot': [lambda x: np.cbrt(x), lambda x: x**(3)]}

In [6]:
oos_predictions = {}
# Variables that won't get transformed (these are the categorical ones)
do_not_transform = ['h_weekday', 'hour_sin', 'hour_cos', 'month_sin', 'month_cos', 'cluster_mode', 'd_obv10_obv50',
                   'd_obv50_obv200', 'd_hc_15davg', 'd_lc_15davg', 'd_cb_15davg', 'd_rsi_13', 'd_ret60d']

In [7]:
cv = TimeSeriesSplit(n_splits=10, gap=50)
tpot_model = TPOTRegressor(generations=10, population_size=25, periodic_checkpoint_folder='checkpoints/', random_state=33, 
                           verbosity=2, cv=cv, config_dict=regressor_config_dict)

In [8]:
for key, value in data_transformations.items():
    for cluster in filtered_vars.keys():
        if cluster != 'all':
            print(key, cluster)
            cluster_index = feats[feats['cluster_mode']==cluster].index
            temp_features = feats.copy()
            temp_features.loc[:, ~temp_features.columns.isin(do_not_transform)] = temp_features.loc[:, ~temp_features.columns.isin(do_not_transform)].apply(value[0], axis=1) 
            lagged_feature = shift_dataset(temp_features.copy(), lag=True, forecast=False, nlag=50, dropna=True,
                                          var_lags=vars_to_lag)
            lagged_feature_train = lagged_feature.loc[np.intersect1d(cluster_index, lagged_feature.index), filtered_vars[cluster]]
            lagged_feature_train = lagged_feature_train.loc[:cutoff_date, :]
            lagged_feature_test = lagged_feature.loc[cutoff_date:, filtered_vars[cluster]]
            temp_target = to_predict.loc[lagged_feature_train.index].apply(value[0])
            tpot_model.fit(lagged_feature_train, temp_target)
            tpot_model.export('checkpoints/' + str(int(cluster)) + '_transf_' + key + '_best_model.py')
            joblib.dump(tpot_model.fitted_pipeline_, 'models/' + str(cluster) + '_transf_' + key + '_model.joblib')
            oos_predictions[str(int(cluster)) + '_transf_' + key] = pd.Series(data=tpot_model.predict(lagged_feature_test), index=lagged_feature_test.index).apply(value[1])
            del temp_features, lagged_feature, lagged_feature_train, lagged_feature_test

none 0.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.052236197785574534

Generation 2 - Current best internal CV score: 0.052236197785574534

Generation 3 - Current best internal CV score: 0.052236197785574534

Generation 4 - Current best internal CV score: 0.052236197785574534

Generation 5 - Current best internal CV score: 0.052236197785574534

Generation 6 - Current best internal CV score: 0.05405854415814428

Generation 7 - Current best internal CV score: 0.05457910161308192

Generation 8 - Current best internal CV score: 0.05457910161308192

Generation 9 - Current best internal CV score: 0.05457910161308192

Generation 10 - Current best internal CV score: 0.05457910161308192

Best pipeline: GradientBoostingRegressor(QuantileTransformer(KBinsDiscretizer(input_matrix, encode=ordinal, n_bins=500, strategy=uniform)), alpha=0.9, learning_rate=0.01, loss=huber, max_depth=9, max_features=0.05, min_samples_leaf=9, min_samples_split=8, n_estimators=100, subsample=0.3)
none 1.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.1633456626511661

Generation 2 - Current best internal CV score: 0.182855678405291

Generation 3 - Current best internal CV score: 0.182855678405291

Generation 4 - Current best internal CV score: 0.182855678405291

Generation 5 - Current best internal CV score: 0.182855678405291

Generation 6 - Current best internal CV score: 0.19794663444619195

Generation 7 - Current best internal CV score: 0.20003949834726836

Generation 8 - Current best internal CV score: 0.20003949834726836

Generation 9 - Current best internal CV score: 0.21312950614370854

Generation 10 - Current best internal CV score: 0.21312950614370854

Best pipeline: ExtraTreesRegressor(input_matrix, bootstrap=True, max_features=0.05, min_samples_leaf=15, min_samples_split=15, n_estimators=100)
none 2.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.13395351088854857

Generation 2 - Current best internal CV score: 0.13395351088854857

Generation 3 - Current best internal CV score: 0.1373292411921046

Generation 4 - Current best internal CV score: 0.1373292411921046

Generation 5 - Current best internal CV score: 0.1373292411921046

Generation 6 - Current best internal CV score: 0.1566121109699179

Generation 7 - Current best internal CV score: 0.1566121109699179

Generation 8 - Current best internal CV score: 0.1592132277562311

Generation 9 - Current best internal CV score: 0.1592132277562311

Generation 10 - Current best internal CV score: 0.1592132277562311

Best pipeline: ExtraTreesRegressor(KBinsDiscretizer(input_matrix, encode=ordinal, n_bins=50, strategy=quantile), bootstrap=True, max_features=0.5, min_samples_leaf=12, min_samples_split=8, n_estimators=100)
none 3.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.14905548407589736

Generation 2 - Current best internal CV score: 0.17353322307831523

Generation 3 - Current best internal CV score: 0.17353322307831523

Generation 4 - Current best internal CV score: 0.17353322307831523

Generation 5 - Current best internal CV score: 0.17353322307831523

Generation 6 - Current best internal CV score: 0.17353322307831523

Generation 7 - Current best internal CV score: 0.17353322307831523

Generation 8 - Current best internal CV score: 0.17771117926724003

Generation 9 - Current best internal CV score: 0.17771117926724003

Generation 10 - Current best internal CV score: 0.17771117926724003

Best pipeline: RandomForestRegressor(Normalizer(GradientBoostingRegressor(input_matrix, alpha=0.95, learning_rate=0.5, loss=ls, max_depth=2, max_features=0.7000000000000001, min_samples_leaf=8, min_samples_split=14, n_estimators=100, subsample=0.55), norm=l1), bootstrap=True, max_features=1.0, min_samples_leaf=5, min

  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.049879233651453886

Generation 2 - Current best internal CV score: 0.04988466736926878

Generation 3 - Current best internal CV score: 0.04988466736926878

Generation 4 - Current best internal CV score: 0.04988466736926878

Generation 5 - Current best internal CV score: 0.05477470785051934

Generation 6 - Current best internal CV score: 0.056335188060048405

Generation 7 - Current best internal CV score: 0.056335188060048405

Generation 8 - Current best internal CV score: 0.06015583901525355

Generation 9 - Current best internal CV score: 0.06015583901525355

Generation 10 - Current best internal CV score: 0.06015583901525355

Best pipeline: XGBRegressor(input_matrix, learning_rate=0.1, max_depth=1, min_child_weight=13, n_estimators=100, n_jobs=1, objective=reg:squarederror, subsample=0.05, verbosity=0)
arsinh 1.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.17421370906452213

Generation 2 - Current best internal CV score: 0.17421370906452213

Generation 3 - Current best internal CV score: 0.18094518836007834

Generation 4 - Current best internal CV score: 0.1865424929910794

Generation 5 - Current best internal CV score: 0.1865424929910794

Generation 6 - Current best internal CV score: 0.19509754964449658

Generation 7 - Current best internal CV score: 0.19509754964449658

Generation 8 - Current best internal CV score: 0.19509754964449658

Generation 9 - Current best internal CV score: 0.19749571949702951

Generation 10 - Current best internal CV score: 0.2066364058706891

Best pipeline: GradientBoostingRegressor(GradientBoostingRegressor(Normalizer(QuantileTransformer(input_matrix), norm=l1), alpha=0.85, learning_rate=0.5, loss=quantile, max_depth=10, max_features=0.9000000000000001, min_samples_leaf=2, min_samples_split=18, n_estimators=100, subsample=0.05), alpha=0.99, learning_rate=0.

  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.13740276110358832

Generation 2 - Current best internal CV score: 0.14262578139672916

Generation 3 - Current best internal CV score: 0.14262578139672916

Generation 4 - Current best internal CV score: 0.14262578139672916

Generation 5 - Current best internal CV score: 0.14262578139672916

Generation 6 - Current best internal CV score: 0.14568479429851106

Generation 7 - Current best internal CV score: 0.14568479429851106

Generation 8 - Current best internal CV score: 0.14568479429851106

Generation 9 - Current best internal CV score: 0.14568479429851106

Generation 10 - Current best internal CV score: 0.14568479429851106

Best pipeline: ExtraTreesRegressor(KBinsDiscretizer(input_matrix, encode=ordinal, n_bins=50, strategy=quantile), bootstrap=True, max_features=0.5, min_samples_leaf=4, min_samples_split=8, n_estimators=100)
arsinh 3.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.14844588123201108

Generation 2 - Current best internal CV score: 0.1495956501142729

Generation 3 - Current best internal CV score: 0.16424017973554356

Generation 4 - Current best internal CV score: 0.16424017973554356

Generation 5 - Current best internal CV score: 0.16424017973554356

Generation 6 - Current best internal CV score: 0.16424017973554356

Generation 7 - Current best internal CV score: 0.16424017973554356

Generation 8 - Current best internal CV score: 0.16424017973554356

Generation 9 - Current best internal CV score: 0.16424017973554356

Generation 10 - Current best internal CV score: 0.17626444152144083

Best pipeline: XGBRegressor(SGDRegressor(LassoLarsCV(input_matrix, normalize=True), alpha=0.01, eta0=0.1, fit_intercept=True, l1_ratio=0.5, learning_rate=constant, loss=epsilon_insensitive, penalty=elasticnet, power_t=50.0), learning_rate=0.5, max_depth=3, min_child_weight=13, n_estimators=100, n_jobs=1, objective=reg

  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.06610868297807639

Generation 2 - Current best internal CV score: 0.06610868297807639

Generation 3 - Current best internal CV score: 0.06867542476641568

Generation 4 - Current best internal CV score: 0.0716354643428535

Generation 5 - Current best internal CV score: 0.07274280841008887

Generation 6 - Current best internal CV score: 0.0778398984644227

Generation 7 - Current best internal CV score: 0.0778398984644227

Generation 8 - Current best internal CV score: 0.0778398984644227

Generation 9 - Current best internal CV score: 0.0778398984644227

Generation 10 - Current best internal CV score: 0.0778398984644227

Best pipeline: XGBRegressor(input_matrix, learning_rate=0.1, max_depth=5, min_child_weight=12, n_estimators=100, n_jobs=1, objective=reg:squarederror, subsample=0.8500000000000001, verbosity=0)
cuberoot 1.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.14707204622543327

Generation 2 - Current best internal CV score: 0.1592765856291382

Generation 3 - Current best internal CV score: 0.16029502516249988

Generation 4 - Current best internal CV score: 0.16781695504282915

Generation 5 - Current best internal CV score: 0.16975520463249744

Generation 6 - Current best internal CV score: 0.16975520463249744

Generation 7 - Current best internal CV score: 0.16975520463249744

Generation 8 - Current best internal CV score: 0.18796847677110312

Generation 9 - Current best internal CV score: 0.18796847677110312

Generation 10 - Current best internal CV score: 0.18796847677110312

Best pipeline: ExtraTreesRegressor(SGDRegressor(GradientBoostingRegressor(input_matrix, alpha=0.85, learning_rate=0.001, loss=quantile, max_depth=8, max_features=0.9500000000000001, min_samples_leaf=19, min_samples_split=6, n_estimators=100, subsample=0.7000000000000001), alpha=0.01, eta0=0.01, fit_intercept=True, l1_

  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.15414712388266555

Generation 2 - Current best internal CV score: 0.15600407107684258

Generation 3 - Current best internal CV score: 0.15600407107684258

Generation 4 - Current best internal CV score: 0.15681966528797417

Generation 5 - Current best internal CV score: 0.15899557939254186

Generation 6 - Current best internal CV score: 0.1637743166506415

Generation 7 - Current best internal CV score: 0.16729377360957737

Generation 8 - Current best internal CV score: 0.16729377360957737

Generation 9 - Current best internal CV score: 0.16729377360957737

Generation 10 - Current best internal CV score: 0.16729377360957737

Best pipeline: ExtraTreesRegressor(Normalizer(input_matrix, norm=l1), bootstrap=True, max_features=0.5, min_samples_leaf=4, min_samples_split=8, n_estimators=100)
cuberoot 3.0


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.14000734639828488

Generation 2 - Current best internal CV score: 0.14528802728049944

Generation 3 - Current best internal CV score: 0.14528802728049944

Generation 4 - Current best internal CV score: 0.14528802728049944

Generation 5 - Current best internal CV score: 0.16114890701060053

Generation 6 - Current best internal CV score: 0.16979638710208972

Generation 7 - Current best internal CV score: 0.16979638710208972

Generation 8 - Current best internal CV score: 0.16979638710208972

Generation 9 - Current best internal CV score: 0.16979638710208972

Generation 10 - Current best internal CV score: 0.16979638710208972

Best pipeline: RandomForestRegressor(Normalizer(input_matrix, norm=l1), bootstrap=True, max_features=0.45, min_samples_leaf=8, min_samples_split=7, n_estimators=100)


### 5.2 Model for all data

Our initial hypothesis was that clustering market structure and creating a model for each cluster would improve performance given that a specific model would be made for different distributions. 

To check if this holds true, we will create an additional model trained on the entire dataset. 

In [9]:
for key, value in data_transformations.items():
    print(key)
    temp_features = feats.copy()
    temp_features.loc[:, ~temp_features.columns.isin(do_not_transform)] = temp_features.loc[:, ~temp_features.columns.isin(do_not_transform)].apply(value[0], axis=1) 
    lagged_feature = shift_dataset(temp_features.copy(), lag=True, forecast=False, nlag=50, var_lags=vars_to_lag, dropna=True)
    lagged_feature_train = lagged_feature.loc[:cutoff_date, filtered_vars['all']]
    lagged_feature_test = lagged_feature.loc[cutoff_date:, filtered_vars['all']]
    temp_target = to_predict.loc[lagged_feature_train.index].apply(value[0])
    tpot_model.fit(lagged_feature_train, temp_target)
    tpot_model.export('checkpoints/best_model_all_' + key + '.py')
    joblib.dump(tpot_model.fitted_pipeline_, 'models/best_model_all_' + key + '.joblib')
    oos_predictions['all_' + key] = pd.Series(data=tpot_model.predict(lagged_feature_test), index=lagged_feature_test.index).apply(value[1])
    del lagged_feature, temp_features, lagged_feature_train, lagged_feature_test

none


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.12071817080508569

Generation 2 - Current best internal CV score: 0.12071817080508569

Generation 3 - Current best internal CV score: 0.12110870375150125

Generation 4 - Current best internal CV score: 0.1218100428607282

Generation 5 - Current best internal CV score: 0.1218100428607282

Generation 6 - Current best internal CV score: 0.12397649637376836

Generation 7 - Current best internal CV score: 0.12397649637376836

Generation 8 - Current best internal CV score: 0.12397649637376836

Generation 9 - Current best internal CV score: 0.12397649637376836

Generation 10 - Current best internal CV score: 0.12477542006995652

Best pipeline: GradientBoostingRegressor(SelectFwe(input_matrix, alpha=0.029), alpha=0.99, learning_rate=0.001, loss=lad, max_depth=9, max_features=0.2, min_samples_leaf=13, min_samples_split=8, n_estimators=100, subsample=0.7500000000000001)
arsinh


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.12310819534039193

Generation 2 - Current best internal CV score: 0.12310819534039193

Generation 3 - Current best internal CV score: 0.12346068754985876

Generation 4 - Current best internal CV score: 0.12346068754985876

Generation 5 - Current best internal CV score: 0.12422596892024666

Generation 6 - Current best internal CV score: 0.12422596892024666

Generation 7 - Current best internal CV score: 0.12433560006491413

Generation 8 - Current best internal CV score: 0.12531003565377277

Generation 9 - Current best internal CV score: 0.1262059879568359

Generation 10 - Current best internal CV score: 0.1274978221247393

Best pipeline: GradientBoostingRegressor(StandardScaler(SelectFwe(input_matrix, alpha=0.048)), alpha=0.75, learning_rate=0.001, loss=lad, max_depth=9, max_features=0.2, min_samples_leaf=16, min_samples_split=18, n_estimators=100, subsample=0.4)
cuberoot


  self.obj[key] = value


HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=275.0, style=ProgressStyle(de…


Generation 1 - Current best internal CV score: 0.10556322762573167

Generation 2 - Current best internal CV score: 0.1202716138538257

Generation 3 - Current best internal CV score: 0.1202716138538257

Generation 4 - Current best internal CV score: 0.1202716138538257

Generation 5 - Current best internal CV score: 0.1202716138538257

Generation 6 - Current best internal CV score: 0.1202716138538257

Generation 7 - Current best internal CV score: 0.12041377047744233

Generation 8 - Current best internal CV score: 0.12281671220320015

Generation 9 - Current best internal CV score: 0.12281671220320015

Generation 10 - Current best internal CV score: 0.12281671220320015

Best pipeline: ExtraTreesRegressor(KBinsDiscretizer(input_matrix, encode=ordinal, n_bins=500, strategy=quantile), bootstrap=True, max_features=0.5, min_samples_leaf=18, min_samples_split=8, n_estimators=100)
