# 🚀 Fast KNN with Intel® Extension for Scikit-learn*

<big>For classical machine learning algorithms, we often use the most popular Python library, Scikit-learn. With Scikit-learn you can fit models and search for optimal parameters, but it sometimes works for hours.</big><br><br>
​
<big>I want to show you how to use Scikit-learn library and get the results faster without changing the code. To do this, we will make use of another Python library, <strong> <a href='https://github.com/intel/scikit-learn-intelex'>Intel® Extension for Scikit-learn*</a></strong>.</big><br><br>
​
<big>I will show you how to <strong>speed up your kernel more than 13 times</strong> without changing your code!</big><big>

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from timeit import default_timer as timer
from IPython.display import HTML

<h2>Importing data</h2>

In [None]:
train = pd.read_csv('../input/tabular-playground-series-feb-2022/train.csv')
test = pd.read_csv('../input/tabular-playground-series-feb-2022/test.csv')
sample_sub = pd.read_csv('../input/tabular-playground-series-feb-2022/sample_submission.csv')

In [None]:
train.head()

In [None]:
test.head()

<h2>Preprocessing</h2>
<big>Let's encode target value.</big>

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
train['target'] = le.fit_transform(train['target'])

<big>Split the data into features and target.</big>

In [None]:
X = train.drop(['target', 'row_id'], axis=1)
y = train['target']
test.drop('row_id', axis=1, inplace=True)

<big>Normalize the data</big>

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler().fit(X)
X = scaler.transform(X)
test = scaler.transform(test)

<h2>Installing Intel® Extension for Scikit-learn</h2>

<big>Use Intel® Extension for Scikit-learn* for fast compute Scikit-learn estimators.</big>

In [None]:
!pip install scikit-learn-intelex -q --progress-bar off

<big>Patch original scikit-learn.</big>

In [None]:
from sklearnex import patch_sklearn
patch_sklearn()

<h2>Train KNN Classifier model</h2><br><br>
<big>Metrical algorithm for automatic object classification or regression.
In the case of using the method for classification, the object is assigned to the class that is the most common among the <i>k</i> neighbors of this element, whose classes are already known. In the case of using the method for regression, the object is assigned the average value of <i>k</i> objects closest to it, the values of which are already known.</big><br><br>
<big>Parameters:</big><br>
<big>1. <code>n_neighbors</code> -  number of neighbors to use.<br></big>
<big>2. <code>n_jobs</code> -  The number of parallel jobs to run for neighbors search.<br></big>


In [None]:
from sklearn.neighbors import KNeighborsClassifier

start = timer()
model = KNeighborsClassifier(n_neighbors=2, n_jobs=-1).fit(X, y)
y_pred = model.predict(test)
end = timer()
time_opt = end - start
print(f'Intel® Extension for Scikit-learn* time: {time_opt} seconds')

<h2>Now we use the same algorithm with original scikit-learn</h2>
<big>Let’s run the same code with original Scikit-learn and compare it's execution time with the execution time of the patched by Intel® Extension for Scikit-learn.</big><br>
<big>In order to cancel optimizations, we use unpatch_sklearn and reimport the class KNeighborsClassifier</big>

In [None]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

In [None]:
from sklearn.neighbors import KNeighborsClassifier

start = timer()
model = KNeighborsClassifier(n_neighbors=2, n_jobs=-1).fit(X, y)
y_pred = model.predict(test)
end = timer()
time_original = end - start
print(f'Original Scikit-learn time: {time_original} seconds')

In [None]:
HTML(f'<h2>Workflow speedup: {(time_original/time_opt):.2f}x</h2>'
     f'(from {(time_original):.2f} seconds to {(time_opt):.2f} seconds)')

<h2>Conclusions</h2>
<big>We can see that using only one classical machine learning algorithm may give you a pretty hight accuracy score. We also use well-known libraries Scikit-learn and Optuna, as well as the increasingly popular library Intel® Extension for Scikit-learn. Noted that Intel® Extension for Scikit-learn gives you opportunities to:</big>

* <big>Use your Scikit-learn code for training and inference without modification.</big>
* <big>Speed up training and prediction stages</big>.

# Other notebooks with scikit-learn-intelex usage
<big><a href='https://www.kaggle.com/lordozvlad/introduction-to-scikit-learn-intelex'>Introduction to scikit-learn-intelex</a></big><br>
<big><a href='https://www.kaggle.com/alexeykolobyanin/tps-dec-svc-with-sklearnex-20x-speedup'>[TPS-Dec]SVC with sklearnex 20x speedup</a></big><br>
<big><a href='https://www.kaggle.com/alexeykolobyanin/tps-aug-nusvr-with-intel-extension-for-sklearn'>[TPS-Aug]NuSVR with Intel Extension for Sklearn</a></big><br>
<big><a href='https://www.kaggle.com/lordozvlad/fast-random-forest-using-scikit-learn-intelex'>Fast Random Forest using scikit-learn-intelex</a></big><br>
<big><a href='https://www.kaggle.com/lordozvlad/tps-dec-fast-feature-importance-with-sklearnex'>[TPS-DEC] Fast Feature Importance with sklearnex</a></big><br>
<big><a href='https://www.kaggle.com/alexeykolobyanin/tps-nov-log-regression-with-sklearnex-17x-speedup'>[TPS-Nov]Log Regression with sklearnex 17x speedup</a></big><br>