# Accelerating Machine Learning Applications on Intel GPUs
### Intel Extension for Scikit-learn

  *Scikit-learn*  is a popular Python library for machine learning. **Intel Extension**  for *scikit-learn* seamlessly speeds up your scikit-learn applications for Intel CPUs and GPUs across single and multi-node configurations. This extension package dynamically patches scikit-learn estimators while improving performance for machine learning algorithms.
#### Using Scikit-learn with Intel extension, you can:
 * *Significantly speed up training and inference, with the equivalent mathematical accuracy.*  
 * *Continue to use the open source scikit-learn API.*
* *Enable and disable the extension with a couple lines of code or at the command line.*

### Installations
1. **Install scikit-learn** <br>
`pip install -U scikit-Learn`<br>

2. **Install scikit-learn-intelex:** <br>

 `pip install scikit-learn-intelex`



## Intel Extension for Scikit-learn DBSCAN for spoken arabic digit dataset

In [2]:
# Import the necessary library
from sklearn.model_selection import train_test_split
from sklearn.metrics import davies_bouldin_score
from sklearn.datasets import fetch_openml
import time
import warnings
warnings.filterwarnings("ignore")

**Download the dataset**

In [3]:
x, y = fetch_openml(name="spoken-arabic-digit", return_X_y=True)


**Split the data into training and testing sets**

In [4]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

**Normalize the data**


In [5]:
from sklearn.preprocessing import MinMaxScaler

scaler_x = MinMaxScaler()
scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_test = scaler_x.transform(x_test)

**Patch original scikit-learn with intel Extension for scikit-learn**

In [None]:
from sklearnex import patch_sklearn

patch_sklearn()

**Train DBSCAN algorithm with Intelex for Scikit-learn for spoken arabic digit dataset**


In [None]:
from sklearn.cluster import DBSCAN

params = {
    "n_jobs": -1,
}
start_time = time.time()
y_pred = DBSCAN(**params).fit_predict(x_train)
train_patched = time.time() - start_time
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"


**Evaluate DBSCAN performance with Intel Extension for Scikit-learn using Davies-Bouldin score**

In [None]:
dbs_score = davies_bouldin_score(x_train, y_pred)
f"Intel® extension for Scikit-learn Davies-Bouldin score: {dbs_score:.4f}"

### Train the same algorithm with original Scikit-learn
In order to cancel optimizations, we use unpatch_sklearn and reimport DBSCAN

In [9]:
from sklearnex import unpatch_sklearn

unpatch_sklearn()

**Training of the DBSCAN algorithm with original Scikit-learn library for spoken arabic digit dataset**

In [None]:
from sklearn.cluster import DBSCAN

start_time = time.time()
y_pred = DBSCAN(**params).fit_predict(x_train)
train_unpatched = time.time() - start_time
f"Original Scikit-learn time: {train_unpatched:.2f} s"


**Evaluate performance using Davies-Bouldin score**

In [None]:
score_original = davies_bouldin_score(x_train, y_pred)
f"Original Scikit-learn Davies-Bouldin score: {score_original:.4f}"

**For execution on GPU, DPC++ compiler runtime and driver are required**

**Install from PyPI:**<br>
`pip install dpcpp-cpp-rt`<br>
**Install from Anaconda:**<br>
`conda install dpcpp_cpp_rt -c intel`


In [None]:
from sklearnex import patch_sklearn, config_context
patch_sklearn()

In [None]:
from sklearn.cluster import DBSCAN

with config_context(target_offload="gpu:0"):
   start_time = time.time()
   y_pred = DBSCAN().fit(x_train)
   train_gpu_time = time.time() - start_time
f"Intel® extension for Scikit-learn time on GPU: {train_gpu_time:.2f} s"

 Further Examples:
 1. https://github.com/intel/scikit-learn-intelex/blob/main/examples/sklearnex/dbscan_spmd.py
 2. https://github.com/IntelSoftware/Machine-Learning-using-oneAPI