# 🎰 Comparison of LightAutoML & h2o.ai & FLAML
![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSl0cUeLOZ9Q5buRejZWlnKuj8wBPq4zei8o6L7uSuYCGY2C72bVwHi4hDxrrdYC8wtaqs&usqp=CAU)

* The main idea of this kernel is to compare the LightAutoML, h2o.ai and FLAML AutoMl algorithms in terms of setup and this competition performance (RMSE).
* All models are submitted to the competition and results are shared at the end of each model in this notebook.🏵 

* In all three cases I set timeout as (1200 seconds~20 min) for better model development among time. 
* This notebook idea is mainly based on valuable notebook written early by @andreshg with link [here](https://www.kaggle.com/andreshg/automl-libraries-comparison) where he compared the effectiveness of 7 different AutoML models by their competition performance.


In [None]:
# Standard libraries
import os
import time
import numpy as np
import pandas as pd

In [None]:
!pip install scikit-learn --upgrade
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

In [None]:
train_df = pd.read_csv('../input/tabular-playground-series-aug-2021/train.csv')
test_df = pd.read_csv('../input/tabular-playground-series-aug-2021/test.csv')
sample_submission = pd.read_csv('../input/tabular-playground-series-aug-2021/sample_submission.csv')

In [None]:
X = train_df.drop(['id', 'loss'], axis=1)
y = train_df['loss'].values
X_test = test_df.drop(['id'], axis=1)

In [None]:
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_test = scaler.transform(X_test)

In [None]:
target = train_df['loss']
train_df.drop(['id'], axis=1, inplace=True)
test_df.drop(['id'], axis=1, inplace=True)

In [None]:
from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
features = [f'f{i}' for i in range(100)]
train_df[features] = ss.fit_transform(train_df[features])
test_df[features] = ss.transform(test_df[features])

# LightAutoML 

*LightAutoML project from Sberbank AI Lab AutoML group is the framework for automatic classification and regression model creation.*

* Thank you for this valuable detailed notebook for the usage of LightAutoML, credits to the author @alexryzhkov
[Aug21 LightAutoML starter notebook](https://www.kaggle.com/alexryzhkov/aug21-lightautoml-starter)

[For detailed LightAutoML documentation](https://lightautoml.readthedocs.io/en/latest/)

[For Github repository](https://github.com/sberbank-ai-lab/LightAutoML)

In [None]:
!pip install -U lightautoml

In [None]:
from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML
from lightautoml.tasks import Task

import torch

In [None]:
N_THREADS = 4 
N_FOLDS = 5
RANDOM_STATE = 42
TEST_SIZE = 0.2
TIMEOUT = 1200

np.random.seed(RANDOM_STATE)
torch.set_num_threads(N_THREADS)

In [None]:
%%time

def rmse(y_true, y_pred, **kwargs):
    return mean_squared_error(y_true, y_pred, squared = False, **kwargs)

task = Task('reg', metric = rmse)

roles = {'target': 'loss',
        'drop': ['id']}


**These parameters below are gathered from my own notebook with using Optuna Parameter Tuning. I recommend you to see and upvote my notebook if you are interested.**

**🤖 Optuna Tuning with XGBoost+CatBoost+LGBM [Link](https://www.kaggle.com/tolgakurtulus/optuna-tuning-with-xgboost-catboost-lgbm)**



In [None]:
lgb_params = {
    'metric': 'RMSE',
    'lambda_l1': 0.1912487104284709,
    'lambda_l2': 0.06374015849652141,
    'num_leaves': 53, 
    'learning_rate': 0.10398927752362405,
    'feature_fraction': 0.8612490357778249,
    'bagging_fraction': 0.8969003388461672,
    'bagging_freq': 0,
    'min_child_samples': 95,
    'num_threads': 8
}


cb_params = {
     #'iterations': 8908,
     'od_wait': 1707,
     'learning_rate': 0.010395447212764725,
     #'reg_lambda': 99.12580252995424,
     'subsample': 0.9982266060286022,
     'random_strength': 17.782673214289556,
     'min_data_in_leaf': 12,
     'leaf_estimation_iterations': 3,
     'loss_function': 'RMSE',
     'eval_metric': 'RMSE',
     'bootstrap_type': 'Bernoulli',
     'leaf_estimation_method': 'Newton',
     'random_seed': 42,
     'thread_count': 4
}

In [None]:
%%time 
automl = TabularAutoML(task = task, 
                       timeout = TIMEOUT,
                       cpu_limit = N_THREADS,
                       reader_params = {'n_jobs': N_THREADS, 'cv': N_FOLDS, 'random_state': RANDOM_STATE},
                       general_params = {'use_algos': [['linear_l2', 'cb', 'lgb', 'lgb_tuned']]},
                       lgb_params = {'default_params': lgb_params, 'freeze_defaults': True}, # LGBM params
                       cb_params = {'default_params': cb_params, 'freeze_defaults': True}, # CatBoost params
                       verbose = 2
                      )

In [None]:
%%time
oof_pred = automl.fit_predict(train_df,  roles = roles)
test_pred = automl.predict(test_df)

In [None]:
sample_submission['loss'] = test_pred.data[:, 0]
sample_submission.to_csv('lightautomlsubmission.csv', index=False)

**# LightAutoML Competition Submission Score is 7.89745**

# H2o.ai AutoML

![](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/_static/logo.png)

*H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.*

* Mainly inferred from main documentation on (h2o.ai) for this regression problem. [For detailed documentation](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html)

In [None]:
import h2o
from h2o.automl import H2OAutoML

In [None]:
h2o.init()

In [None]:
%%time
train_hf = h2o.H2OFrame(train_df.copy())
test_hf = h2o.H2OFrame(test_df.copy())

In [None]:
%%time
aml = H2OAutoML(seed=2021, max_runtime_secs=1200, sort_metric = "RMSE")

aml.train(x = train_hf.columns, y = 'loss',training_frame = train_hf)

# View the AutoML Leaderboard
lb = aml.leaderboard
lb.head(rows=lb.nrows)

In [None]:
%%time

preds = aml.predict(test_hf)
preds_df = h2o.as_list(preds)
preds_df

In [None]:
sample_submission['loss'] = preds_df['predict']
sample_submission.to_csv('h2o_automl_submission.csv', index=False)

**# H2o.ai Competition Submission Score is 7.91883**

# FLAML: Fast and Lightweight AutoML by Microsoft Research

![](https://raw.githubusercontent.com/microsoft/FLAML/main/docs/images/FLAML.png)

*FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically. It frees users from selecting learners and hyperparameters for each learner. It is fast and economical. The simple and lightweight design makes it easy to extend, such as adding customized learners or metrics. FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection method invented by Microsoft Research*

* Mainly inferred from main documentation on Github repository for this regression problem. [For detailed documentation](https://github.com/microsoft/FLAML)

In [None]:
!pip install -U flaml

In [None]:
from flaml import AutoML

In [None]:
X = train_df.drop(['loss'], axis=1)
y = train_df['loss'].values

In [None]:
%%time

# Initialize an AutoML instance
automl = AutoML()

# Specify automl goal and constraint
automl_settings = {
    "time_budget": 1200,
    "metric": 'rmse',
    "task": 'regression',
    "seed": 2021,
    "log_file_name": 'tpsaug21log.log', 
}

# Train with labeled input data
automl.fit(X_train=X , y_train=y,
                        **automl_settings)

In [None]:
# Retrieve best config and best learner
print('Best ML leaner:', automl.best_estimator)
print('Best hyperparmeter config:', automl.best_config)
print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))
print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))

In [None]:
%%time
ypred = automl.predict(test_df.values)

In [None]:
ypred

In [None]:
sample_submission['loss'] = ypred
sample_submission.to_csv('microsoft_flaml_submission.csv', index=False)

**# FLAML Competition Submission score 7.89733**

# Results

**# LightAutoML Competition Submission Score is 7.88525**

**# H2o.ai Competition Submission Score is 7.95611**

**# FLAML Competition Submission score 7.89733**

> ***Thank you for reading my notebook. Please don't forget to comment & upvote! 🥳🤩🤓***