## H2O AutoML
<img src='https://docs.h2o.ai/h2o/latest-stable/h2o-docs/_images/h2o-automl-logo.jpg' width='150px'>

[H2O AutoML](https://www.h2o.ai/products/h2o-automl) is an automated machine learning library open sourced by [H2O.ai](https://h2o.ai)

In [1]:
## import packages
import pandas as pd

import h2o
from h2o.automl import H2OAutoML

In [2]:
## prepare data
h2o.init()

h2o_train = h2o.import_file('../input/tabular-playground-series-may-2021/train.csv')
h2o_test = h2o.import_file('../input/tabular-playground-series-may-2021/test.csv')
sample = h2o.import_file('../input/tabular-playground-series-may-2021/sample_submission.csv')
h2o_train['target'] = h2o_train['target'].asfactor()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.10" 2021-01-19; OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.18.04); OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.18.04, mixed mode, sharing)
  Starting server from /opt/conda/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpsr55fb5b
  JVM stdout: /tmp/tmpsr55fb5b/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpsr55fb5b/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.32.1.1
H2O_cluster_version_age:,1 month and 9 days
H2O_cluster_name:,H2O_from_python_unknownUser_sxrcop
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,4 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [3]:
## run model
features = [x for x in h2o_train.columns if x not in ['id', 'target']]

h2oaml = H2OAutoML(
    max_runtime_secs=3600,
    stopping_metric='logloss',
    sort_metric='logloss',
    preprocessing=["target_encoding"]
)

h2oaml.train(x=features, y='target', training_frame=h2o_train)

AutoML progress: |████████████████████████████████████████████████████████| 100%


In [4]:
## check leaderboard
h2oaml.leaderboard

model_id,logloss,mean_per_class_error,rmse,mse,auc,aucpr
StackedEnsemble_AllModels_AutoML_20210505_165851,1.09204,0.735617,0.624506,0.390008,,
StackedEnsemble_BestOfFamily_AutoML_20210505_165851,1.09242,0.736548,0.624823,0.390404,,
XGBoost_grid__1_AutoML_20210505_165851_model_1,1.09593,0.739728,0.625905,0.391757,,
XGBoost_3_AutoML_20210505_165851,1.09634,0.741294,0.626193,0.392118,,
GBM_grid__1_AutoML_20210505_165851_model_3,1.09659,0.741179,0.626485,0.392483,,
XGBoost_grid__1_AutoML_20210505_165851_model_2,1.09695,0.740725,0.626246,0.392185,,
GBM_grid__1_AutoML_20210505_165851_model_2,1.09832,0.741601,0.627315,0.393524,,
GBM_grid__1_AutoML_20210505_165851_model_1,1.09951,0.742598,0.627677,0.393979,,
GBM_2_AutoML_20210505_165851,1.10088,0.746223,0.631544,0.398848,,
GBM_1_AutoML_20210505_165851,1.101,0.746445,0.63104,0.398211,,




In [5]:
## generate predictions
preds_h2oaml = h2oaml.leader.predict(h2o_test)

stackedensemble prediction progress: |████████████████████████████████████| 100%


In [6]:
## create submission
submission = pd.concat([
    pd.DataFrame({'id': h2o_test['id'].as_data_frame().id}),
    preds_h2oaml.as_data_frame().drop('predict', axis=1)
], axis=1)

submission.head()

Unnamed: 0,id,Class_1,Class_2,Class_3,Class_4
0,100000,0.082998,0.592065,0.204065,0.120871
1,100001,0.096633,0.667166,0.146522,0.08968
2,100002,0.080046,0.628606,0.189742,0.101607
3,100003,0.092952,0.571452,0.223778,0.111818
4,100004,0.086516,0.610032,0.195763,0.107688


In [7]:
## save submission
submission.to_csv('h2o_1.csv', index=False)

This is just a baseline submission over which a lot of improvement can be made. You can read more about H2O AutoML's workflow, settings, hyperparameters, interpretability and more here:

* [Documentation of H2O AutoML](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)
* [Deep dive of H2O AutoML](https://github.com/vopani/fortyone#automl-series-)

## Similar Tutorials
Similar tutorials on other Kaggle TPS competitions are published here:

* [AutoML Tutorial: TPS (January 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-january-2021)
* [AutoML Tutorial: TPS (February 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-february-2021)
* [AutoML Tutorial: TPS (March 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-march-2021)
* [AutoML Tutorial: TPS (April 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-april-2021)