<div align='center'>
    <h1>AutoML Tutorial</h1>
    <img src='https://github.com/vopani/fortyone/blob/main/images/automl_banner_530_x_455.png?raw=true'>
</div>

**Auto**mated **M**achine **L**earning (**AutoML**) has become widely adopted for building, experimenting and productionizing various types of machine learning models across business use-cases.

There are different open source solutions available and this notebook explores a simple baseline solution for some of them on the [Kaggle TPS (September 2021) competition](https://www.kaggle.com/c/tabular-playground-series-sep-2021).

* [AutoGluon](#AutoGluon)
* [EvalML](#EvalML)
* [FLAML](#FLAML)
* [H2O AutoML](#H2O-AutoML)
* [LightAutoML](#LightAutoML)
* [MLJAR](#MLJAR)
* [TPOT](#TPOT)

In [1]:
## define configuration
PATH_TRAIN = '../input/tabular-playground-series-sep-2021/train.csv'
PATH_TEST = '../input/tabular-playground-series-sep-2021/test.csv'

PATH_AUTOGLUON_SUBMISSION = 'submission_autogluon.csv'
PATH_EVALML_SUBMISSION = 'submission_evalml.csv'
PATH_FLAML_SUBMISSION = 'submission_flaml.csv'
PATH_H2OAML_SUBMISSION = 'submission_h2oaml.csv'
PATH_LAML_SUBMISSION = 'submission_laml.csv'
PATH_MLJAR_SUBMISSION = 'submission_mljar.csv'
PATH_TPOT_SUBMISSION = 'submission_tpot.csv'

MAX_MODEL_RUNTIME_MINS = 10
MAX_MODEL_RUNTIME_SECS = MAX_MODEL_RUNTIME_MINS * 60

In [2]:
## prepare data
import gc
import os
import shutil
import datatable as dt
from pathlib import Path
import warnings

warnings.filterwarnings('ignore')

train = dt.fread(PATH_TRAIN)
test = dt.fread(PATH_TEST)

target = train['claim'].to_numpy().ravel()
test_ids = test['id']

del train[:, ['id', 'claim']]
test = test[:, train.names]

## AutoGluon
<img src='https://user-images.githubusercontent.com/16392542/77208906-224aa500-6aba-11ea-96bd-e81806074030.png' width='250px'>

[AutoGluon](https://auto.gluon.ai/stable/index.html) is an automl library open sourced by [Amazon](http://amazon.com/aws)

In [3]:
## install packages
!python3 -m pip install -q "mxnet<2.0.0"
!python3 -m pip install -q autogluon
!python3 -m pip install -q -U graphviz
!python3 -m pip install -q -U scikit-learn

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.6.1 requires tokenizers<0.11,>=0.10.1, but you have tokenizers 0.9.4 which is incompatible.
kornia 0.5.5 requires numpy<=1.19, but you have numpy 1.19.5 which is incompatible.
gym 0.18.3 requires Pillow<=8.2.0, but you have pillow 8.3.2 which is incompatible.
allennlp 2.5.0 requires torch<1.9.0,>=1.6.0, but you have torch 1.9.0 which is incompatible.
allennlp 2.5.0 requires torchvision<0.10.0,>=0.8.1, but you have torchvision 0.10.0 which is incompatible.[0m
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mxnet 1.8.0.post0 requires graphviz<0.9.0,>=0.8.1, but you have graphviz 0.17 which is incompatible.[0m
[31mERROR: pip's dependency resolver does not c

In [4]:
## import packages
from autogluon.tabular import TabularPredictor

In [5]:
## run model
train['target'] = dt.Frame(target)

model_autogluon = TabularPredictor(label='target')
model_autogluon.fit(train_data=train.to_pandas(), excluded_model_types=['KNN'], time_limit=MAX_MODEL_RUNTIME_SECS)

del train['target']

In [6]:
## check leaderboard
model_autogluon.leaderboard()

                 model  score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0  WeightedEnsemble_L2   0.745407       0.327230  102.390846                0.034116           4.288770            2       True          7
1             LightGBM   0.745303       0.088773   19.326672                0.088773          19.326672            1       True          2
2           LightGBMXT   0.741232       0.093913   52.793999                0.093913          52.793999            1       True          1
3             CatBoost   0.729541       0.039162   17.665062                0.039162          17.665062            1       True          3
4       ExtraTreesEntr   0.568476       0.343795  121.263236                0.343795         121.263236            1       True          5
5       ExtraTreesGini   0.560334       0.342965  116.546933                0.342965         116.546933            1       True          4
6              XGBoost   0.

Unnamed: 0,model,score_val,pred_time_val,fit_time,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L2,0.745407,0.32723,102.390846,0.034116,4.28877,2,True,7
1,LightGBM,0.745303,0.088773,19.326672,0.088773,19.326672,1,True,2
2,LightGBMXT,0.741232,0.093913,52.793999,0.093913,52.793999,1,True,1
3,CatBoost,0.729541,0.039162,17.665062,0.039162,17.665062,1,True,3
4,ExtraTreesEntr,0.568476,0.343795,121.263236,0.343795,121.263236,1,True,5
5,ExtraTreesGini,0.560334,0.342965,116.546933,0.342965,116.546933,1,True,4
6,XGBoost,0.542276,0.110428,25.981405,0.110428,25.981405,1,True,6


In [7]:
## generate predictions
preds_autogluon = model_autogluon.predict_proba(test.to_pandas())[True]

In [8]:
## create submission
submission = dt.Frame(id=test_ids, claim=dt.Frame(preds_autogluon))
submission.head()

Unnamed: 0_level_0,id,claim
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪▪▪▪▪
0,957919,0.485707
1,957920,0.436409
2,957921,0.470409
3,957922,0.436781
4,957923,0.436115
5,957924,0.438366
6,957925,0.554601
7,957926,0.436548
8,957927,0.485476
9,957928,0.536736


In [9]:
## save submission
submission.to_csv(PATH_AUTOGLUON_SUBMISSION)

In [10]:
## clear memory
shutil.rmtree('AutogluonModels')
del model_autogluon

gc.collect()

577

Read more in [Documentation of AutoGluon](https://auto.gluon.ai/stable/index.html)

## EvalML
<img src='https://evalml.alteryx.com/en/stable/_images/evalml_horizontal.svg' width='250px'>

[EvalML](https://evalml.alteryx.com/en/stable) is an automl library open sourced by [Alteryx](http://www.alteryx.com)

In [11]:
## install packages
!python3 -m pip install -q evalml==0.28.0

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.2 which is incompatible.
transformers 4.6.1 requires tokenizers<0.11,>=0.10.1, but you have tokenizers 0.9.4 which is incompatible.
tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.21.2 which is incompatible.
preprocessing 0.1.13 requires nltk==3.2.4, but you have nltk 3.6.2 which is incompatible.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.2 which is incompatible.
mxnet 1.8.0.post0 requires graphviz<0.9.0,>=0.8.1, but you have graphviz 0.17 which is incompatible.
matrixprofile 1.1.10 requires protobuf==3.11.2, but you have protobuf 3.17.3 which is incompatible.
kornia 0.5.5 requires numpy<=1.19, but you have numpy 1.21.2 which is incompatible.
hypertools 0.7.0 requires scikit-learn!=0.22

In [12]:
## import packages
from evalml.automl import AutoMLSearch

In [13]:
## run model
model_evalml = AutoMLSearch(X_train=train.to_pandas(), y_train=target, problem_type='binary', max_time=MAX_MODEL_RUNTIME_SECS)
model_evalml.search()

Generating pipelines to search over...
8 pipelines ready for search.

*****************************
* Beginning pipeline search *
*****************************

Optimizing for Log Loss Binary. 
Lower score is better.

Using SequentialEngine to train and score pipelines.
Will stop searching for new pipelines after 600 seconds.

Allowed model families: linear_model, xgboost, extra_trees, random_forest, decision_tree, catboost, lightgbm



FigureWidget({
    'data': [{'mode': 'lines+markers',
              'name': 'Best Score',
              'type'…

Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline
Mode Baseline Binary Classification Pipeline:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 17.240

*****************************
* Evaluating Batch Number 1 *
*****************************

Elastic Net Classifier w/ Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.691
Decision Tree Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.703
Random Forest Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.690
LightGBM Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.635
Logistic Regression Classifier w/ Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.691
XGBoost Classifier w/ Imputer:
	Starting cross validation
	Fin

In [14]:
## check leaderboard
model_evalml.rankings

Unnamed: 0,id,pipeline_name,search_order,mean_cv_score,standard_deviation_cv_score,validation_score,percent_better_than_baseline,high_variance_cv,parameters
0,4,LightGBM Classifier w/ Imputer,4,0.634804,,0.634804,96.317805,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'LightGBM Classifier': {'boosting_type': 'gbdt', 'learning_rate': 0.1, 'n_estimators': 100, 'max_depth': 0, 'num_leaves': 31, 'min_child_samples': 20, 'n_jobs': -1, 'bagging_freq': 0, 'bagging_fraction': 0.9}}"
1,6,XGBoost Classifier w/ Imputer,6,0.639599,,0.639599,96.289992,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'XGBoost Classifier': {'eta': 0.1, 'max_depth': 6, 'min_child_weight': 1, 'n_estimators': 100, 'n_jobs': -1}}"
6,3,Random Forest Classifier w/ Imputer,3,0.689998,,0.689998,95.997649,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}"
7,1,Elastic Net Classifier w/ Imputer + Standard Scaler,1,0.690828,,0.690828,95.992833,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'Elastic Net Classifier': {'penalty': 'elasticnet', 'C': 1.0, 'l1_ratio': 0.15, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'saga'}}"
8,5,Logistic Regression Classifier w/ Imputer + Standard Scaler,5,0.690829,,0.690829,95.992832,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'}}"
9,8,CatBoost Classifier w/ Imputer,8,0.691957,,0.691957,95.986285,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'CatBoost Classifier': {'n_estimators': 10, 'eta': 0.03, 'max_depth': 6, 'bootstrap_type': None, 'silent': True, 'allow_writing_files': False, 'n_jobs': -1}}"
10,7,Extra Trees Classifier w/ Imputer,7,0.69197,,0.69197,95.986211,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'Extra Trees Classifier': {'n_estimators': 100, 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_jobs': -1}}"
11,2,Decision Tree Classifier w/ Imputer,2,0.703189,,0.703189,95.921135,False,"{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'Decision Tree Classifier': {'criterion': 'gini', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0}}"
12,0,Mode Baseline Binary Classification Pipeline,0,17.239822,,17.239822,0.0,False,{'Baseline Classifier': {'strategy': 'mode'}}


In [15]:
## generate predictions
preds_evalml = model_evalml.best_pipeline.predict_proba(test.to_pandas())[True]

In [16]:
## create submission
submission = dt.Frame(id=test_ids, claim=dt.Frame(preds_evalml))
submission.head()

Unnamed: 0_level_0,id,claim
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪▪▪▪▪
0,957919,0.557697
1,957920,0.380795
2,957921,0.43986
3,957922,0.408037
4,957923,0.417989
5,957924,0.477723
6,957925,0.637851
7,957926,0.434575
8,957927,0.494238
9,957928,0.494289


In [17]:
## save submission
submission.to_csv(PATH_EVALML_SUBMISSION)

In [18]:
## clear memory
os.remove('evalml_debug.log')
del model_evalml

gc.collect()

603

Read more in [Documentation of EvalML](https://evalml.alteryx.com)

## FLAML
<img src='https://github.com/microsoft/FLAML/raw/main/docs/images/FLAML.png' width='150px'>

[FLAML](https://microsoft.github.io/FLAML) is a fast and light automl library open sourced by [Microsoft](https://opensource.microsoft.com)

In [19]:
## install packages
!python3 -m pip install -q flaml
!python3 -m pip install -q -U graphviz
!python3 -m pip install -q -U scikit-learn



In [20]:
## import packages
from flaml import AutoML

In [21]:
## run model
model_flaml = AutoML()
model_flaml.fit(X_train=train.to_pandas(), y_train=target, metric='roc_auc', time_budget=MAX_MODEL_RUNTIME_SECS)

[flaml.automl: 09-05 06:27:05] {1289} INFO - Evaluation method: holdout
[flaml.automl: 09-05 06:27:10] {1318} INFO - Minimizing error metric: 1-roc_auc
[flaml.automl: 09-05 06:27:10] {1345} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree', 'lrl1']
[flaml.automl: 09-05 06:27:10] {1538} INFO - iteration 0, current learner lgbm
[flaml.automl: 09-05 06:27:10] {1702} INFO -  at 46.9s,	best lgbm's error=0.4834,	best lgbm's error=0.4834
[flaml.automl: 09-05 06:27:10] {1538} INFO - iteration 1, current learner lgbm
[flaml.automl: 09-05 06:27:11] {1702} INFO -  at 47.3s,	best lgbm's error=0.4834,	best lgbm's error=0.4834
[flaml.automl: 09-05 06:27:11] {1538} INFO - iteration 2, current learner lgbm
[flaml.automl: 09-05 06:27:11] {1702} INFO -  at 47.7s,	best lgbm's error=0.4740,	best lgbm's error=0.4740
[flaml.automl: 09-05 06:27:11] {1538} INFO - iteration 3, current learner lgbm
[flaml.automl: 09-05 06:27:11] {1702} INFO -  at 48.2s,	best lgbm's err

In [22]:
## generate predictions
preds_flaml = model_flaml.predict_proba(test.to_pandas())[:, 1]

In [23]:
## create submission
submission = dt.Frame(id=test_ids, claim=preds_flaml)
submission.head()

Unnamed: 0_level_0,id,claim
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪▪▪▪▪
0,957919,0.5217
1,957920,0.29014
2,957921,0.433299
3,957922,0.263984
4,957923,0.261445
5,957924,0.388713
6,957925,0.82714
7,957926,0.25082
8,957927,0.397971
9,957928,0.648069


In [24]:
## save submission
submission.to_csv(PATH_FLAML_SUBMISSION)

In [25]:
## clear memory
if Path('catboost_info').exists():
    shutil.rmtree('catboost_info')

os.remove('flaml.log')
del model_flaml

gc.collect()

515

Read more in [Documentation of FLAML](https://microsoft.github.io/FLAML)

## H2O AutoML
<img src='https://docs.h2o.ai/h2o/latest-stable/h2o-docs/_images/h2o-automl-logo.jpg' width='150px'>

[H2O AutoML](https://www.h2o.ai/products/h2o-automl) is an automated machine learning library open sourced by [H2O.ai](https://h2o.ai)

In [26]:
## import packages
import h2o
from h2o.automl import H2OAutoML

In [27]:
## prepare data
h2o.init()

h2o_train = h2o.H2OFrame(train.to_pandas())
h2o_test = h2o.H2OFrame(test.to_pandas())

h2o_train['target'] = h2o.H2OFrame(target).asfactor()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.11" 2021-04-20; OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04); OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
  Starting server from /opt/conda/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpi4g_g5ic
  JVM stdout: /tmp/tmpi4g_g5ic/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpi4g_g5ic/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,03 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.32.1.3
H2O_cluster_version_age:,3 months and 16 days !!!
H2O_cluster_name:,H2O_from_python_unknownUser_gwvpju
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,4 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [28]:
## run model
features = [x for x in h2o_train.columns if x != 'target']

model_h2o = H2OAutoML(stopping_metric='AUC', max_runtime_secs=MAX_MODEL_RUNTIME_SECS)
model_h2o.train(x=features, y='target', training_frame=h2o_train)

AutoML progress: |████████████████████████████████████████████████████████| 100%


In [29]:
## check leaderboard
model_h2o.leaderboard

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_AllModels_AutoML_20210905_064603,0.638401,0.652366,0.644942,0.499997,0.479156,0.229591
XGBoost_2_AutoML_20210905_064603,0.615248,0.67386,0.628427,0.5,0.490335,0.240429
XGBoost_1_AutoML_20210905_064603,0.566819,0.683063,0.589298,0.5,0.494966,0.244991




In [30]:
## generate predictions
preds_h2o = model_h2o.leader.predict(h2o_test).as_data_frame()['True']

stackedensemble prediction progress: |████████████████████████████████████| 100%


In [31]:
## create submission
submission = dt.Frame(id=test_ids, claim=dt.Frame(preds_h2o))
submission.head()

Unnamed: 0_level_0,id,claim
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪▪▪▪▪
0,957919,0.416486
1,957920,0.416486
2,957921,0.416486
3,957922,0.416486
4,957923,0.416486
5,957924,0.416486
6,957925,0.544679
7,957926,0.416486
8,957927,0.416486
9,957928,0.812271


In [32]:
## save submission
submission.to_csv(PATH_H2OAML_SUBMISSION)

In [33]:
## clear memory
h2o.cluster().shutdown()
del model_h2o

gc.collect()

H2O session _sid_a401 closed.


429

Read more in [Documentation of H2O AutoML](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)

## LightAutoML
<img src='https://github.com/sberbank-ai-lab/LightAutoML/blob/master/imgs/LightAutoML_logo_small.png?raw=true' width='150px'>

[LightAutoML](https://github.com/sberbank-ai-lab/LightAutoML) is a framework for automatic classification and regression model creation open sourced by [Sberbank](https://www.sberbank.com) AI Lab.

In [34]:
## install packages
!python3 -m pip install -q lightautoml
!python3 -m pip install -q -U torch
!python3 -m pip install -q -U torchvision

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
kornia 0.5.5 requires numpy<=1.19, but you have numpy 1.21.2 which is incompatible.
autogluon-contrib-nlp 0.0.1b20210201 requires tokenizers==0.9.4, but you have tokenizers 0.10.3 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.9.1 requires torch==1.8.1, but you have torch 1.9.0 which is incompatible.
lightautoml 0.2.16 requires torch<1.9, but you have torch 1.9.0 which is incompatible.
kornia 0.5.5 requires numpy<=1.19, but you have numpy 1.21.2 which is incompatible.
allennlp 2.5.0 requires torch<1.9.0,>=1.6.0, but you have torch 1.9.0 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages

In [35]:
## import packages
from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

In [36]:
## run model
train['target'] = dt.Frame(target)

model_laml = TabularAutoML(task = Task('binary'), timeout = MAX_MODEL_RUNTIME_SECS)
model_laml.fit_predict(train_data=train.to_pandas(), roles={'target': 'target'})

del train['target']

Copying TaskTimer may affect the parent PipelineTimer, so copy will create new unlimited TaskTimer


Start automl preset with listed constraints:
- time: 600 seconds
- cpus: 4 cores
- memory: 16 gb

Train data shape: (957919, 119)
Feats was rejected during automatic roles guess: []


Layer 1 ...
Train process start. Time left 540.8030400276184 secs
Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...

===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LinearL2 =====

Linear model: C = 1e-05 score = 0.7071240200800339
Linear model: C = 5e-05 score = 0.7826485420745058
Linear model: C = 0.0001 score = 0.7924084779147242
Linear model: C = 0.0005 score = 0.7979261436180876
Linear model: C = 0.001 score = 0.7982724930379466
Linear model: C = 0.005 score = 0.7984107927563805
Linear model: C = 0.01 score = 0.7984107927563805
Linear model: C = 0.05 score = 0.7984107927563805

===== Start working with fold 1 for Lvl_0_Pipe_0_Mod_0_LinearL2 =====

Linear model: C = 1e-05 score = 0.7099818882846585
Linear model: C = 5e-05 score = 0.7850737496179123
Linear model: C = 0.0001 score = 0.794405576441630

Time limit exceeded after calculating fold 1


Lvl_0_Pipe_0_Mod_0_LinearL2 fitting and predicting completed
Time left 493.62562346458435
Start fitting Selector_LightGBM ...

===== Start working with fold 0 for Selector_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's auc: 0.805119
[200]	valid's auc: 0.806746
[300]	valid's auc: 0.806778
Early stopping, best iteration is:
[262]	valid's auc: 0.80697
Selector_LightGBM fitting and predicting completed
Start fitting Lvl_0_Pipe_1_Mod_0_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_1_Mod_0_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's auc: 0.804177
[200]	valid's auc: 0.806939
[300]	valid's auc: 0.807725
[400]	valid's auc: 0.807866
[500]	valid's auc: 0.807814
Early stopping, best iteration is:
[409]	valid's auc: 0.807894


Time limit exceeded after calculating fold 0


Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed
Time left 34.82653570175171


Time limit exceeded in one of the tasks. AutoML will blend level 1 models.


Blending: Optimization starts with equal weights and score 0.79980098793866
Blending, iter 0: score = 0.7998954308612451, weights = [0.6976864 0.3023136]
Blending, iter 1: score = 0.7998954308612451, weights = [0.6976864 0.3023136]
No score update. Terminated

Automl preset training completed in 571.11 seconds.


In [37]:
## generate predictions
preds_laml = model_laml.predict(test.to_pandas()).data.ravel()

In [38]:
## create submission
submission = dt.Frame(id=test_ids, claim=preds_laml)
submission.head()

Unnamed: 0_level_0,id,claim
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪
0,957919,0.4213
1,957920,0.182804
2,957921,0.436794
3,957922,0.244025
4,957923,0.273309
5,957924,0.269304
6,957925,0.764747
7,957926,0.265707
8,957927,0.440218
9,957928,0.668147


In [39]:
## save submission
submission.to_csv(PATH_LAML_SUBMISSION)

In [40]:
## clear memory
if Path('catboost_info').exists():
    shutil.rmtree('catboost_info')

del model_laml

gc.collect()

171

Read more in [Documentation of LightAutoML](https://lightautoml.readthedocs.io/en/latest/index.html)

## MLJAR
<img src='https://mljar.com/images/logo/mljar_circle3.svg' width='150px'>

[MLJAR](https://mljar.com) is an automated machine learning tool for tabular data

In [41]:
## install packages
!python3 -m pip install -q mljar-supervised
!python3 -m pip install -q -U graphviz

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lightautoml 0.2.16 requires lightgbm<3.0,>=2.3, but you have lightgbm 3.2.1 which is incompatible.
lightautoml 0.2.16 requires torch<1.9, but you have torch 1.9.0 which is incompatible.
flaml 0.6.2 requires xgboost<=1.3.3,>=0.90, but you have xgboost 1.4.2 which is incompatible.
evalml 0.28.0 requires xgboost<1.3.0,>=1.1.0, but you have xgboost 1.4.2 which is incompatible.


In [42]:
## import packages
from supervised import AutoML

In [43]:
## run model
model_mljar = AutoML(eval_metric='auc', total_time_limit=MAX_MODEL_RUNTIME_SECS, results_path='./model_mljar')
model_mljar.fit(X=train.to_pandas(), y=target)

Linear algorithm was disabled.
AutoML directory: ./model_mljar
The task is binary_classification with evaluation metric auc
AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble availabe models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 2 models
1_Baseline auc 0.5 trained in 32.06 seconds
2_DecisionTree auc 0.516824 trained in 208.68 seconds
* Step default_algorithms will try to check up to 3 models
3_Default_Xgboost auc 0.772663 trained in 629.05 seconds
* Step ensemble will try to check up to 1 model
Ensemble auc 0.772663 trained in 95.48 seconds
AutoML fit time: 1200.87 seconds
AutoML best model: 3_Default_Xgboost


AutoML(eval_metric='auc', results_path='./model_mljar', total_time_limit=600)

In [44]:
## check leaderboard
model_mljar.get_leaderboard()

Unnamed: 0,name,model_type,metric_type,metric_value,train_time
0,1_Baseline,Baseline,auc,-0.5,33.81
1,2_DecisionTree,Decision Tree,auc,-0.516824,210.93
2,3_Default_Xgboost,Xgboost,auc,-0.772663,631.58
3,Ensemble,Ensemble,auc,-0.772663,95.48


In [45]:
## generate predictions
preds_mljar = model_mljar.predict_proba(test.to_pandas())[:, 1]

In [46]:
## create submission
submission = dt.Frame(id=test_ids, claim=preds_mljar)
submission.head()

Unnamed: 0_level_0,id,claim
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪
0,957919,0.404375
1,957920,0.271768
2,957921,0.472501
3,957922,0.304574
4,957923,0.301043
5,957924,0.315587
6,957925,0.709262
7,957926,0.263653
8,957927,0.511456
9,957928,0.546277


In [47]:
## save submission
submission.to_csv(PATH_MLJAR_SUBMISSION)

In [48]:
## clear memory
shutil.rmtree('model_mljar')
del model_mljar

gc.collect()

2646955

Read more in [Documentation of MLJAR](https://supervised.mljar.com)

## TPOT
<img src='https://raw.githubusercontent.com/EpistasisLab/tpot/master/images/tpot-logo.jpg' width='150px'>

[TPOT](http://epistasislab.github.io/tpot) is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

In [49]:
## import packages
from tpot import TPOTClassifier

In [50]:
## run model
model_tpot = TPOTClassifier(scoring='roc_auc', n_jobs=2, max_time_mins=MAX_MODEL_RUNTIME_MINS)
model_tpot.fit(features=train.to_pandas(), target=target)

TPOTClassifier(max_time_mins=10, n_jobs=2, scoring='roc_auc')

In [51]:
## check pipeline
print(model_tpot.fitted_pipeline_)

Pipeline(steps=[('stackingestimator',
                 StackingEstimator(estimator=GaussianNB())),
                ('bernoullinb', BernoulliNB(alpha=0.001, fit_prior=False))])


In [52]:
## generate predictions
preds_tpot = model_tpot.predict_proba(test.to_pandas())[:, 1]

In [53]:
## create submission
submission = dt.Frame(id=test_ids, claim=preds_tpot)
submission.head()

Unnamed: 0_level_0,id,claim
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪▪▪▪▪
0,957919,0.517605
1,957920,0.50012
2,957921,0.519684
3,957922,0.491518
4,957923,0.476896
5,957924,0.534934
6,957925,0.483992
7,957926,0.495547
8,957927,0.524238
9,957928,0.465047


In [54]:
## save submission
submission.to_csv(PATH_TPOT_SUBMISSION)

In [55]:
## clear memory
del model_tpot

gc.collect()

509

Read more in [Documentation of TPOT](http://epistasislab.github.io/tpot)

## Similar Tutorials
Similar tutorials on other Kaggle TPS competitions are published here:

* [AutoML Tutorial: TPS (January 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-january-2021)
* [AutoML Tutorial: TPS (February 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-february-2021)
* [AutoML Tutorial: TPS (March 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-march-2021)
* [AutoML Tutorial: TPS (April 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-april-2021)
* [AutoML Tutorial: TPS (May 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-may-2021)
* [AutoML Tutorial: TPS (June 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-june-2021)
* [AutoML Tutorial: TPS (July 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-july-2021)
* [AutoML Tutorial: TPS (August 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-august-2021)