# Accelerating Machine Learning Applications on Intel GPUs
### Intel Extension for Scikit-learn

  *Scikit-learn*  is a popular Python library for machine learning. **Intel Extension**  for *scikit-learn* seamlessly speeds up your scikit-learn applications for Intel CPUs and GPUs across single and multi-node configurations. This extension package dynamically patches scikit-learn estimators while improving performance for machine learning algorithms.
#### Using Scikit-learn with Intel extension, you can:
 * *Significantly speed up training and inference, with the equivalent mathematical accuracy.*  
 * *Continue to use the open source scikit-learn API.*
* *Enable and disable the extension with a couple lines of code or at the command line.*

### Installations
* Intel Extension for Scikit-learn can be installed via the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.
* It is also available as part of Intel AI Analytics Toolkit (AI Kit). If you already have AI Kit installed, you do not need to separately install the extension.


### Install from PyPI (recommended by default)

1. **[Optional step] [Recommended] To prevent version conflicts, create and activate a new environment:**

`python -m venv env` <br>
`source env/bin/activate`

2. **Install scikit-learn-intelex:** <br>

`pip install scikit-learn-intelex`

### Install from Anaconda cloud: Conda-Forge channel
* **Into a newly created environment** <br> `conda create -n env -c conda-forge python=3.x scikit-learn-intelex` <br>
* **Into your current environment** <br> `conda install scikit-learn-intelex -c conda-forge`

## Intel Extension for Scikit-learn DBSCAN for spoken arabic digit dataset

In [1]:
# Import the necessary library
from timeit import default_timer as timer
from sklearn.model_selection import train_test_split
from sklearn.metrics import davies_bouldin_score
from sklearn.datasets import fetch_openml
#from IPython.display import HTML
import warnings
warnings.filterwarnings("ignore")

**Download the dataset**

In [10]:
x, y = fetch_openml(name="spoken-arabic-digit", return_X_y=True)


**Split the data into training and testing sets**

In [3]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

**Normalize the data**


In [4]:
from sklearn.preprocessing import MinMaxScaler

scaler_x = MinMaxScaler()
scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_test = scaler_x.transform(x_test)

**Patch original scikit-learn with intel Extension for scikit-learn**

In [5]:
from sklearnex import patch_sklearn

patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


**Train DBSCAN algorithm with Intelex for Scikit-learn for spoken arabic digit dataset**


In [6]:
from sklearn.cluster import DBSCAN

params = {
    "n_jobs": -1,
}
start = timer()
y_pred = DBSCAN(**params).fit_predict(x_train)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"


'Intel® extension for Scikit-learn time: 22.38 s'

**Evaluate DBSCAN performance with Intel Extension for Scikit-learn using Davies-Bouldin score**

In [7]:
dbs_score = davies_bouldin_score(x_train, y_pred)
f"Intel® extension for Scikit-learn Davies-Bouldin score: {dbs_score}"

'Intel® extension for Scikit-learn Davies-Bouldin score: 0.8506779263727179'

### Train the same algorithm with original Scikit-learn
In order to cancel optimizations, we use unpatch_sklearn and reimport DBSCAN

In [8]:
from sklearnex import unpatch_sklearn

unpatch_sklearn()

**Training of the DBSCAN algorithm with original Scikit-learn library for spoken arabic digit dataset**

In [9]:
from sklearn.cluster import DBSCAN

start = timer()
y_pred = DBSCAN(**params).fit_predict(x_train)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"

'Original Scikit-learn time: 400.32 s'

**Evaluate performance using Davies-Bouldin score**

In [11]:
score_original = davies_bouldin_score(x_train, y_pred)
f"Original Scikit-learn Davies-Bouldin score: {score_original}"

'Original Scikit-learn Davies-Bouldin score: 0.8506779263727179'