<br>
<h1 style = "font-size:25px; font-family:cursive ; font-weight : bold; color : #020296; text-align: center; border-radius: 10px 15px;"> 🚀 Fast ML stack with Intel(R) Extension for Scikit-learn  </h1>
<br>

For classical machine learning algorithms, we often use the most popular Python library, Scikit-learn. We use it to fit models and search for optimal parameters, but scikit-learn sometimes works for hours, if not days. Speeding up this process is something anyone who uses Scikit-learn would be interested in.

I want to show you how to get results faster without changing the code. To do this, we will use another Python library, **[Intel(R) Extension for Scikit-learn](https://github.com/intel/scikit-learn-intelex)**. It accelerates Scikit-learn and does not require you changing the code written for scikit-learn.

I will show you how to speed up your kernel from **18 minutes** to **5 minutes** without changes of code!


# 🔨 Installing Intel(R) Extension for Scikit-learn

Let's try to use Intel(R) Extension for Scikit-learn. First, download it. Package also avaialble in conda - please refer to details https://github.com/intel/scikit-learn-intelex

In [None]:
!pip install scikit-learn-intelex -q --progress-bar off

In [None]:
import time
import pandas as pd

# 📋 Reading data and splitting on training and validation datasets

In [None]:
train = pd.read_csv('../input/tabular-playground-series-jun-2021/train.csv')
test = pd.read_csv('../input/tabular-playground-series-jun-2021/test.csv')
sample_submission = pd.read_csv('../input/tabular-playground-series-jun-2021/sample_submission.csv')

y_train = train['target']
x_train = train.drop(['id','target'], axis=1)
x_test = test.drop(['id'], axis=1)    

from sklearn.model_selection import train_test_split
x_train_sub, x_val, y_train_sub, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42)
print(x_train_sub.shape, x_val.shape)

# 🤖 Creating ML model

Function presents model that is a stack includes two baggings of LogisticRegression and kNearestClassifier from Scikit-learn.

In [None]:
def stack_model(params, x_train, y_train, x_test):
    from sklearn.preprocessing import QuantileTransformer
    from sklearn.linear_model import LogisticRegression
    from sklearn.ensemble import BaggingClassifier
    from sklearn.ensemble import StackingClassifier
    from sklearn.pipeline import make_pipeline
    from sklearn.neighbors import KNeighborsClassifier
    n_first_bag_est = 6
    n_second_bag_est = 4
    params_quantile = {
        'n_quantiles': params['n_quantiles_first'],
        'random_state': 46,
    }
    params_quantile_second = {
        'n_quantiles': params['n_quantiles_second'],
        'random_state': 33,
    }
    params_quantile_final = {
        'n_quantiles': params['n_quantiles_final'],
        'random_state': 35,
    }
    params_logreg = {
        'C': params['C_lr']
    }
    params_logreg_second = {
        'C': params['C_lr2']
    }
    params_knn = {
        'n_neighbors': params['n_neighbors_knn'],
        'metric': params['metric_knn']
    }
    estimators = [
         ('lr', BaggingClassifier(base_estimator=make_pipeline(QuantileTransformer(**params_quantile), 
                                                               LogisticRegression(**params_logreg)),
                                  n_estimators=n_first_bag_est, random_state=0)),
         ('lr2', BaggingClassifier(base_estimator=make_pipeline(QuantileTransformer(**params_quantile_second), 
                                                               LogisticRegression(**params_logreg_second)),
                                  n_estimators=n_second_bag_est, random_state=0)),
    ]
    clf = StackingClassifier(
       estimators=estimators, final_estimator=make_pipeline(QuantileTransformer(**params_quantile_final), 
                                                            KNeighborsClassifier(**params_knn)), stack_method='predict_proba'
    )
    clf.fit(x_train, y_train)
    y_pred = clf.predict_proba(x_test)
    return y_pred


# ⚙️ Best parameters
This set of parameters was found by the search on the grid of parameters.

In [None]:
parameters = {
    'n_quantiles_first': 5, 
    'n_quantiles_second': 4, 
    'n_quantiles_final': 3, 
    'C_lr': 0.00024812627870458766, 
    'C_lr2': 0.0008462404365990055, 
    'n_neighbors_knn': 2500, 
    'metric_knn': 'euclidean'
}

# 🚝 Fit model with Intel(R) Extension for Scikit-learn

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

In [None]:
from sklearn.metrics import log_loss
t0 = time.time()
y_pred = stack_model(parameters, x_train_sub, y_train_sub, x_val)
t1 = time.time()

In [None]:
print(f"Time for Intel(R) Extension for Scikit-learn: {t1 - t0} sec")
print(f"Metric value: {log_loss(y_val, y_pred)}")

# 🚂 Fit model with original Scikit-learn

In [None]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

In [None]:
from sklearn.metrics import log_loss
t0 = time.time()
y_pred = stack_model(parameters, x_train_sub, y_train_sub, x_val)
t1 = time.time()

In [None]:
print(f"Time for original Scikit-learn: {t1 - t0} sec")
print(f"Metric value: {log_loss(y_val, y_pred)}")

# 🥇Prepare submission

In [None]:
patch_sklearn()
y_pred = stack_model(parameters, x_train, y_train, x_test)
sample_submission[['Class_1','Class_2', 'Class_3', 'Class_4','Class_5','Class_6', 'Class_7', 'Class_8', 'Class_9']] = y_pred
sample_submission.to_csv('stack_model.csv', index=False)

# 📜 Conclusions

With Intel(R) Extension for Scikit-learn patching you can:

- Use your scikit-learn code for training and inference without modification;
- Train and predict scikit-learn models and get more time for experiments;
- Get the same quality of predictions.

*Please, upvote if you like.*