# 🛳 Fast AutoML with AutoGluon and Intel® Extension for Scikit-learn* - Titanic - Machine Learning from Disaster

AutoML is powerful tool to get good solution for small problems in little time. In this notebook, we will show it with [AutoGluon](https://github.com/awslabs/autogluon) AutoML framework and  [**Intel® Extension for Scikit-learn***](https://github.com/intel/scikit-learn-intelex) which accelerates Scikit-learn algorithms with just two lines of code.

I will show you how to **speed up** your kernel without changing your code using **Intel® Extension for Scikit-learn**!

In [None]:
!pip install autogluon.tabular[all] -q --progress-bar off > /dev/null 2>&1

In [None]:
from timeit import default_timer as timer
import pandas as pd
import numpy as np
from IPython.display import HTML
import logging

### Data loading

In [None]:
competition_prefix = 'titanic'

id_column = 'PassengerId'
label = 'Survived'
train_data = pd.read_csv(f'../input/{competition_prefix}/train.csv', index_col=id_column)
test_data = pd.read_csv(f'../input/{competition_prefix}/test.csv', index_col=id_column)
sample_submission = pd.read_csv(f'../input/{competition_prefix}/gender_submission.csv', index_col=id_column)

In [None]:
train_data[:5]

In [None]:
from sklearn.model_selection import train_test_split

random_state = 42
train_data, valid_data = train_test_split(train_data, test_size=0.1, random_state=random_state)

# AutoGluon with default Scikit-learn

Let's define parameters for Gradient Boosting, Random Forest and kNN:

In [None]:
max_features_list = ['sqrt', 'log2', 0.25, 0.5, 0.75]
n_neighbors_list = [4 ** i for i in range(1, 4)]
hyperparameters = {
    'GBM': [
        {'extra_trees': True, 'seed': random_state, 'ag_args': {'name_suffix': 'XT'}},
        {},
        'GBMLarge',
    ],
    'RF': [
        {'criterion': 'gini', 'random_state': random_state, 'max_features': max_features, 'n_estimators': 500,
         'ag_args': {'name_suffix': f'Gini_{str(max_features)}', 'problem_types': ['binary', 'multiclass']}}
        for max_features in max_features_list
    ] + [
        {'criterion': 'entropy', 'random_state': random_state, 'max_features': max_features, 'n_estimators': 500,
         'ag_args': {'name_suffix': f'Entr_{str(max_features)}', 'problem_types': ['binary', 'multiclass']}}
        for max_features in max_features_list
    ],
    'KNN': [
        {'weights': 'uniform', 'n_neighbors': n_neighbors, 'ag_args': {'name_suffix': f'Unif_{n_neighbors}'},
         'ag_args_fit': {'use_daal': False}}
        for n_neighbors in n_neighbors_list
    ] + [
        {'weights': 'distance', 'n_neighbors': n_neighbors, 'ag_args': {'name_suffix': f'Dist_{n_neighbors}'},
         'ag_args_fit': {'use_daal': False}}
        for n_neighbors in n_neighbors_list
    ]
}

Fit AutoGluon with best quality:

In [None]:
from autogluon.tabular import TabularPredictor

t0 = timer()
autogluon_predictor = TabularPredictor(
    label=label,
    eval_metric="accuracy",
    learner_kwargs={'ignored_columns': [id_column]}
).fit(
    train_data=train_data,
    verbosity=2,
    presets='best_quality',
    hyperparameters=hyperparameters
)
t1 = timer()
default_leaderboard = autogluon_predictor.leaderboard(valid_data)
t2 = timer()

default_ag_fitting_time = t1 - t0
default_ag_evaluation_time = t2 - t1

In [None]:
default_leaderboard

# AutoGluon with optimized Scikit-learn

### Intel® Extension for Scikit-learn installation:

In [None]:
!pip install scikit-learn-intelex -q --progress-bar off > /dev/null 2>&1

### Accelerate Scikit-learn with two lines of code:

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

Setup logging to track accelerated cases:

In [None]:
logger = logging.getLogger()
fh = logging.FileHandler('log.txt')
fh.setLevel(10)
logger.addHandler(fh)

Don't forget reimport modules to get effect of patch:

In [None]:
from autogluon.tabular import TabularPredictor


t0 = timer()
autogluon_predictor = TabularPredictor(
    label=label,
    eval_metric="accuracy",
    learner_kwargs={'ignored_columns': [id_column]}
).fit(
    train_data=train_data,
    verbosity=2,
    presets='best_quality',
    hyperparameters=hyperparameters
)
t1 = timer()
opt_leaderboard = autogluon_predictor.leaderboard(valid_data)
t2 = timer()

opt_ag_fitting_time = t1 - t0
opt_ag_evaluation_time = t2 - t1

In [None]:
opt_leaderboard

In [None]:
fitting_speedup = round(default_ag_fitting_time / opt_ag_fitting_time, 2)
evaluation_speedup = round(default_ag_evaluation_time / opt_ag_evaluation_time, 2)
HTML(f'<h2>Fitting speedup: {fitting_speedup}x</h2>'
     f'(from {round(default_ag_fitting_time, 2)} to {round(opt_ag_fitting_time, 2)} seconds)'
     f'<h2>Evaluation speedup: {evaluation_speedup}x</h2>'
     f'(from {round(default_ag_evaluation_time, 2)} to {round(opt_ag_evaluation_time, 2)} seconds)')

In [None]:
speedups = default_leaderboard.set_index('model')['fit_time'] / opt_leaderboard.set_index('model')['fit_time']
speedups = speedups.filter(like='RandomForest')
HTML(f'<h2>Random Forest fitting speedup: {round(speedups.mean(), 2)}x</h2>')

In [None]:
logger.removeHandler(fh)

### Accelerated functions:

In [None]:
!cat log.txt | grep 'running accelerated version' | sort | uniq

In [None]:
predictions = autogluon_predictor.predict(test_data)
sample_submission[label] = predictions
sample_submission[:5]

In [None]:
sample_submission.to_csv("submission.csv")

In [None]:
!rm -rf AutogluonModels

# Conclusions

Intel® Extension for Scikit-learn gives you opportunities to:

* Use your Scikit-learn code for training and inference without modification.
* Train models and use them for prediction up to 1.4 - 1.6 times faster.
* Get predictions of the similar quality as the other tested frameworks.

*Please upvote if you liked it.*

# Other notebooks with sklearnex usage

### [[predict sales] Stacking with scikit-learn-intelex](https://www.kaggle.com/alexeykolobyanin/predict-sales-stacking-with-scikit-learn-intelex)

### [[TPS-Aug] NuSVR with Intel Extension for Sklearn](https://www.kaggle.com/alexeykolobyanin/tps-aug-nusvr-with-intel-extension-for-sklearn)

### [Using scikit-learn-intelex for What's Cooking](https://www.kaggle.com/kppetrov/using-scikit-learn-intelex-for-what-s-cooking?scriptVersionId=58739642)

### [Fast KNN using  scikit-learn-intelex for MNIST](https://www.kaggle.com/kppetrov/fast-knn-using-scikit-learn-intelex-for-mnist?scriptVersionId=58738635)

### [Fast SVC using scikit-learn-intelex for MNIST](https://www.kaggle.com/kppetrov/fast-svc-using-scikit-learn-intelex-for-mnist?scriptVersionId=58739300)

### [Fast SVC using scikit-learn-intelex for NLP](https://www.kaggle.com/kppetrov/fast-svc-using-scikit-learn-intelex-for-nlp?scriptVersionId=58739339)