# 🚀 Fast SVC with Intel® Extension for Scikit-learn*

<big>For classical machine learning algorithms, we often use the most popular Python library, Scikit-learn. With Scikit-learn you can fit models and search for optimal parameters, but it sometimes works for hours.</big><br><br>

<big>I want to show you how to use Scikit-learn library and get the results faster without changing the code. To do this, we will make use of another Python library, <strong> <a href='https://github.com/intel/scikit-learn-intelex'>Intel® Extension for Scikit-learn*</a></strong>.</big><br><br>

<big>I will show you how to <strong>speed up your kernel more than 20 times</strong> without changing your code!</big><big>

In [None]:
import pandas as pd
import numpy as np
from timeit import default_timer as timer
from IPython.display import HTML
from sklearn.model_selection import train_test_split

<h2>Importing data</h2>

In [None]:
train = pd.read_csv('../input/tabular-playground-series-dec-2021/train.csv', index_col='Id')
test = pd.read_csv('../input/tabular-playground-series-dec-2021/test.csv', index_col='Id')
sample_sub = pd.read_csv('../input/tabular-playground-series-dec-2021/sample_submission.csv')

In [None]:
train.head()

In [None]:
test.head()

<h2>Preprocessing</h2>

<big>Delete example with <code>Cover_type = 5</code> because only one sample has this cover type.</big>

In [None]:
train.drop(train[train["Cover_Type"] == 5].index, axis=0, inplace=True)

<big>Add some new features.</big>

In [None]:
cols = test.columns
categorical_features = cols[10:]

train["mean"] = train[cols].mean(axis=1)
train["min"] = train[cols].min(axis=1)
train["max"] = train[cols].max(axis=1)
train["cat_feat_cnt"] = train[categorical_features].sum(axis=1)

test["mean"] = test[cols].mean(axis=1)
test["min"] = test[cols].min(axis=1)
test["max"] = test[cols].max(axis=1)
test["cat_feat_cnt"] = test[categorical_features].sum(axis=1)

<big>Split the data into features and target.</big>

In [None]:
X = train.drop(['Cover_Type'], axis=1)
y = train['Cover_Type']

<big>Let's take 10 percent of the data from the entire dataset for training, because SVC does not have time to train in 9 hours. On the full dataset SVC has a score of about 0.95158.</big>

In [None]:
x_train, _, y_train, _ = train_test_split(X, y, test_size=0.9, random_state=1)
del X
del y

<big>Normalize the data</big>

In [None]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler

scaler_x = MinMaxScaler()
x_train = scaler_x.fit_transform(x_train)
test = scaler_x.transform(test)

<h2>Installing Intel® Extension for Scikit-learn</h2>

<big>Use Intel® Extension for Scikit-learn* for fast compute Scikit-learn estimators.</big>

In [None]:
!pip install scikit-learn-intelex -q --progress-bar off

<big>Patch original scikit-learn.</big>

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

# Train SVC algorithm
<big>The main idea of the method is to transfer the initial vectors to a space of higher dimension and search for a separating hyperplane with the largest gap in this space. Two parallel hyperplanes are drawn on both sides of the hyperplane separating the classes. The separating hyperplane is the hyperplane that creates the greatest distance to two parallel hyperplanes.</big><br><br>

<big>Parameter:</big><br>
<big>* <code>C</code> -  Parameter inverse to the regularization coefficient.<br></big>

In [None]:
from sklearn.svm import SVC

start = timer()
model = SVC(C=0.55, random_state=1).fit(x_train, y_train)
end = timer()
fit_time_opt = end - start
print(f'Intel® Extension for Scikit-learn* fit time: {fit_time_opt / 60} minutes')

<h2>Prediction</h2>

In [None]:
start = timer()
y_pred = model.predict(test)
end = timer()
predict_time_opt = end - start
print(f'Intel® Extension for Scikit-learn* prediction time: {predict_time_opt / 60} minutes')

<h2>Now we use the same algorithm with original scikit-learn</h2>
<big>Let’s run the same code with original Scikit-learn and compare it's execution time with the execution time of the patched by Intel® Extension for Scikit-learn.</big><br>
<big>In order to cancel optimizations, we use unpatch_sklearn and reimport the class SVC</big>

In [None]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

In [None]:
from sklearn.svm import SVC

start = timer()
model = SVC(C=0.55, random_state=1).fit(x_train, y_train)
end = timer()
fit_time_original = end - start
print(f'Original Scikit-learn fit time: {fit_time_original / 60} minutes')

<big>Let's look at prediction time of original Scikit-learn.</big>

In [None]:
start = timer()
y_pred = model.predict(test)
end = timer()
predict_time_original = end - start
print(f'Original Scikit-learn prediction time: {predict_time_original / 60} minutes')

In [None]:
HTML(f'<h2>Fit stage speedup: {(fit_time_original/fit_time_opt):.2f}x</h2>'
     f'(from {(fit_time_original / 60):.2f} minutes to {(fit_time_opt / 60):.2f} minutes)'
     f'<h2>Prediction stage speedup: {(predict_time_original/predict_time_opt):.2f}x</h2>'
     f'(from {(predict_time_original / 60):.2f} minutes to {(predict_time_opt / 60):.2f} minutes)')

<h2>Conclusions</h2>
<big>We can see that using only one classical machine learning algorithm may give you a pretty hight accuracy score. We also use well-known libraries Scikit-learn and Optuna, as well as the increasingly popular library Intel® Extension for Scikit-learn. Noted that Intel® Extension for Scikit-learn gives you opportunities to:</big>

* <big>Use your Scikit-learn code for training and inference without modification.</big>
* <big>Speed up training and prediction stages</big>.