# Kmeans Algorithm Using Intel® Extension for Scikit-learn*

![Assets/kmeans.png](Assets/kmeans.png)

<a id='Back-to-Sections'></a>
# Sections
- _Discussion:_ [Kmeans Algorithm](#Kmeans-Algorithm)
- _Code:_ [Implementation of Kmeans targeting CPU using Intel Extension for Scikit-learn for Kmeans interactive](#Implementation-of-Kmeans-in-Batch-mode)
- _Code:_ [Implementation of Kmeans targeting **Distributed CPU** using Intel Extension for Scikit-learn for Kmeans](#Implementation-of-Kmeans-using-Distributed-Processing)
- _Code:_ [Implementation of Kmeans targeting **GPU** using Intel Extension for Scikit-learn for Kmeans](#Implementation-of-Kmeans-targeting-GPU)

You will review, modify and execute code for unsupervised clustering of data using Intel Extension for Scikit-learn for Kmeans and DBSCAN on a single CPU, single Gpu, and distributed across multiple CPU



## Learning Objectives

* Describe the value of Intel® Extension for Scikit-learn methodology in extending scikit-learn optimzation capabilites
* Name key imports and function calls to use Intel Extension for Scikit-learn to target Kmeans for use on CPU, GPU and distributed CPU environments
* Apply a single Daal4py function to enable Kmeans targeting CPU and GPU using SYCL context
* Build a Sklearn implementation of Kmeans targeting CPU and GPU using Intel optimized Sklearn Extensions for Kmeans


## Library Dependencies:
 - pip install pickle
 - also requires these libraries if they are not already installed: **matplotlib, numpy, pandas**
 

# Intel Extension for Scikit-learn

Intel® Extension for Scikit-learn contains drop-in replacement patching functionality for the Scikit-learn machine learning library for Python. The patches were originally available in the daal4py package. All future updates for the patching will be available only in Intel Extension for Scikit-learn. All performance claims obtained using daal4py are applicable for Intel Extension for Scikit-learn.

The value of the patch is providing optimized versions of common Scikit-learn machine learning algorithms used for data science. An added value is the ability to invoke these functions on either CPU or GPU.

Applying Intel(R) Extension for Scikit-learn will impact the following existing [scikit-learn algorithms:](https://intel.github.io/scikit-learn-intelex/algorithms.html)

You can take advantage of the optimizations of Intel Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

 - from sklearnex import patch_sklearn
 - patch_sklearn()

 - from sklearn.cluster import KMeans
 - ... import other sklearn algoritms as needed ...
 
Learn more about [various ways to patch](https://intel.github.io/scikit-learn-intelex/) scikit-learn very selectively or upon entire python scripts, or global patching, or even how to unpatch.

Intel Extension for Scikit-learn uses Intel® oneAPI Data Analytics Library (oneDAL) to achieve its acceleration. The optimizations aim for the efficient use of CPU resources. The library enables all the latest vector instructions, such as Intel® Advanced Vector Extensions (Intel AVX-512). It also uses cache-friendly data blocking, fast Basic Linear Algebra Subprograms (BLAS) operations with Intel OneAPI Math Kernel Library (oneMKL), scalable multi-threading with Intel oneAPI Thread Building Blocks (oneTBB) library, and more.

## Intel® oneAPI Data Analytics Library (oneDAL) aka daal4py
As mentioned, Intel Extension for Scikit-learn uses Intel® oneAPI Data Analytics Library (oneDAL) under the hood to achieve its acceleration, and our general recommendation is to use Intel Extension for Scikit-learn whenever possible.  Most functionality found in Intel® oneAPI Data Analytics Library (oneDAL)  is exposed through the higher level interface, Intel Extension for Scikit-learn, and this is the preferred interface. However, there are a few funcctions found in Intel® oneAPI Data Analytics Library (oneDAL) not yet ported to Intel Extension for Scikit-learn so it is good to know how to leverage the functionality in either interface for now. For example, in the code below, we use daal4py to invoke the distributed compute mode for Kmeans.

oneDAL has a Python API that is provided as a standalone Python library called daal4py.

Daal4py, included in Intel® Distribution for Python* as part of the Intel® AI Analytics Toolkit, is an easy-to-use Python* API  that provides superior performance for your machine learning algorithms and frameworks. Designed for data scientists, it provides a simple way to utilize powerful Intel® DAAL machine learning algorithms in a flexible and customizable manner. For scaling capabilities, daal4py also provides you the option to process and analyze data via batch, streaming, or distributed processing modes, allowing you to choose the option to best fit your system's needs. 

The example below shows how daal4py can be used to calculate K-Means clusters:

# Kmeans Algorithm
Kmeans is a clustering algorithm that partitions observations from a dataset into a requested number of geometric clusters of points closest to the cluster’s own center of mass. Using an initial estimate of the centroids, the algorithm iteratively updates the positions of the centroids until a fixed point.


Kmeans is a simple and powerful ML algorithm to cluster data into similar groups. Its objective is to split a set of N observations into K clusters. This is achieved by minimizing inertia (i.e., the sum of squared Euclidian distances from observations to the cluster centers, or centroids). The algorithm is iterative, with two steps in each iteration:
* For each observation, compute the distance from it to each centroid, and then reassign each observation to the cluster with the nearest centroid.
* For each cluster, compute the centroid as the mean of observations assigned to this cluster.

Repeat these steps until the number of iterations exceeds a predefined maximum or the algorithm converges (i.e., the difference between two consecutive inertias is less than a predefined threshold).
Different methods are used to get initial centroids for the first iteration. The algorithm can select random observations as initial centroids or use more complex methods such as kmeans

- [Back to Sections](#Back-to-Sections)

### About the data
The data included in these exercises was built seperately using the **sklearn.datasets make_blobs** function which synthesizes data for analysis by specifying: 
 - The number of samples in the dataset called n_samples, for example n_sample = 200000
 - The number of columns in the dataset called n_features, for exmaple n_features = 50,
 - The number of cluster centers called centers, for example centers = 10, 
 - The standard deviation for each cluster called cluster_std, for example cluster_std = 0.2,
 - The spatial range over which the clusters range, called center_box for example center_box = (-10.0, 10.0), 
 - A seed called random_state, for example random_state = 777

# Implementation of Kmeans in Batch mode
Batch Processing: For small quantities of data, you can input the data all at once using batch processing mode. Batch processing is daal4py's default process mode, so no changes need to be made to your daal4py code in order to run it.




1. Inspect the code cell below and click run ▶ to save the code to a file.
2. Next run ▶ the cell in the __Build and Run__ section below the code to compile and execute the code.

- [Back to Sections](#Back-to-Sections)

In [None]:
#%%writefile lab/kmeans_cpu.py
#===============================================================================
# Copyright 2014-2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#===============================================================================

# daal4py Kmeans example for shared memory systems

from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.cluster import KMeans
import numpy as np
#import logging
#logging.basicConfig(filename='bobOut.log', encoding='utf-8', level=logging.DEBUG)

# let's try to use pandas' fast csv reader
try:
    import pandas

    def read_csv(f, c, t=np.float64):
        return pandas.read_csv(f, usecols=c, delimiter=',', header=None, dtype=t)
except ImportError:
    # fall back to numpy loadtxt
    def read_csv(f, c, t=np.float64):
        return np.loadtxt(f, usecols=c, delimiter=',', ndmin=2)


def main(readcsv=read_csv, method='defaultDense'):
    infile = "./data/batch/kmeans_dense.csv"
    nClusters = 20
    maxIter = 5
            
    data = readcsv(infile, range(20))
    
    kmeans = KMeans(nClusters, init='random', max_iter=300, random_state=0)
    y_km = kmeans.fit_predict(data)

    print("kmeans.labels_")
    print(kmeans.labels_)   
    
    print("kmeans.cluster_centers_")
    print(kmeans.cluster_centers_)

result = main()    
print('All looks good!')



_If the Jupyter cells are not responsive or if they error out when you compile the code samples, please restart the Jupyter Kernel: 
"Kernel->Restart Kernel and Clear All Outputs" and compile the code samples again__

## Implementation of Kmeans using Distributed Processing
daal4py operates in Single Program Multiple Data (SPMD) style, which means your program is executed on several processes (e.g. similar to MPI). The use of MPI is not required for daal4py’s SPMD-mode to work- all necessary communication and synchronization happens thourgh daal4py. However, it is possible to use daal4py and mpi4py in the same program.

Only very minimal changes are needed to your daal4py code to allow daal4py to run on a cluster of workstations. Add this line near the top of the python program to initialize SPMD mode.

```
daalinit()

```
Add the distribution parameter to the algorithm construction:

```
kmi = kmeans_init(10, method="plusPlusDense", distributed=True)

```
When calling the actual computation each process expects an input file or input array/DataFrame. Your program needs to tell each process which file/array/DataFrame it should operate on.

Finally stop the distribution engine:

```
daalfini()

```

To actually get it executed on several processes use standard MPI mechanics, like:

```
mpirun -n 4 python ./kmeans.py

```
The binaries provided by Intel use the Intel® MPI library, but daal4py can also be compiled for any other MPI implementation.

1. Inspect the code cell below and click run ▶ to save the code to a file.
2. Next run ▶ the cell in the __Build and Run__ section below the code to compile and execute the code.

- [Back to Sections](#Back-to-Sections)

In [None]:
%%writefile lab/kmeans_spmd.py

#===============================================================================
# Copyright 2014-2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#===============================================================================

# daal4py Kmeans example for distributed memory systems; SPMD mode
# run like this:
#    mpirun -n 4 python ./kmeans_spmd.py

import daal4py as d4p
from numpy import loadtxt


def main(method='plusPlusDense'):
    infile = "./data/distributed/kmeans_dense.csv"
    nClusters = 10
    maxIter = 25
    
    print("output expected below:")
    
    # configure a kmeans-init
    init_algo = d4p.kmeans_init(nClusters, method=method, distributed=True)
    # Load the data
    data = loadtxt(infile, delimiter=',')
    # now slice the data,
    # it would have been better to read only what we need, of course...
    rpp = int(data.shape[0] / d4p.num_procs())
    data = data[rpp * d4p.my_procid(): rpp * d4p.my_procid() + rpp, :]

    # compute initial centroids
    init_result = init_algo.compute(data)
    # The results provides the initial centroids
    assert init_result.centroids.shape[0] == nClusters

    # configure kmeans main object
    algo = d4p.kmeans(nClusters, maxIter, distributed=True)
    # compute the clusters/centroids
    result = algo.compute(data, init_result.centroids)

    # Kmeans result objects provide centroids, goalFunction,
    # nIterations and objectiveFunction
    assert result.centroids.shape[0] == nClusters
    assert result.nIterations <= maxIter
    # we need an extra call to kmeans to get the assignments
    # (not directly supported through parameter assignFlag yet in SPMD mode)
    algo = d4p.kmeans(nClusters, 0, assignFlag=True)
    # maxIt=0; not distributed, we compute on local data only!
    assignments = algo.compute(data, result.centroids).assignments

    return (assignments, result)


if __name__ == "__main__":
    # Initialize SPMD mode
    d4p.daalinit()
    (assignments, result) = main()
    # result is available on all processes - but we print only on root
    if d4p.my_procid() == 0:
        print("\nFirst 10 cluster assignments:\n", assignments[0:10])
        print("\nFirst 10 dimensions of centroids:\n", result.centroids[:, 0:10])
        print("\nObjective function value:\n", result.objectiveFunction)
        print('All looks good!')
    d4p.daalfini()    

### Build and Run
Select the cell below and click run ▶ to compile and execute the code:

In [None]:
! chmod 755 q; chmod 755 run_kmeans_spmd.sh; if [ -x "$(command -v qsub)" ]; then ./q  run_kmeans_spmd.sh; else ./run_kmeans_spmd.sh; fi

## Implementation of Kmeans targeting GPU

1. Inspect the code cell below and click run ▶ to save the code to a file.
2. Next run ▶ the cell in the __Build and Run__ section below the code to compile and execute the code.

- [Back to Sections](#Back-to-Sections)

In [None]:
%%writefile lab/kmeans_gpu.py 
# daal4py Kmeans example for shared memory systems
import pickle
import dpctl
from sklearnex import patch_sklearn
patch_sklearn()

from daal4py.oneapi import sycl_context
from daal4py.oneapi import sycl_buffer
from sklearn.cluster import KMeans
import numpy as np
import os

def write_results(resultsDict):
    print("write_results...")
    file_to_write = open("resultsDict.pkl", "wb")
    pickle.dump(resultsDict, file_to_write)
    file_to_write.close()
    print("write complete...")
    
# let's try to use pandas' fast csv reader
try:
    import pandas
    def read_csv(f, c, t=np.float64):
        return pandas.read_csv(f, usecols=c, delimiter=',', header=None, dtype=t)
except ImportError:
    # fall back to numpy loadtxt
    def read_csv(f, c, t=np.float64):
        return np.loadtxt(f, usecols=c, delimiter=',', ndmin=2)

# Commone code for both CPU and GPU computations
def compute(data, nClusters, maxIter, method):    
    kmeans = KMeans(nClusters, init='random', max_iter=maxIter, random_state=0)
    #kmeans = KMeans(nClusters, random_state=0, init='random', maxIter=5)
    y_km = kmeans.fit(data)
    pred_y = kmeans.fit_predict(data)
    
    print("kmeans.labels_")
    print(kmeans.labels_)   
    
    print("kmeans.cluster_centers_")
    #print(kmeans.cluster_centers_)    
    print("\nFirst 3 cluster centers:\n", kmeans.cluster_centers_[0:3])
    resultsDict = {}
    resultsDict['y_km'] = y_km
    resultsDict['pred_y'] = pred_y
    resultsDict['kmeans.labels_'] = kmeans.labels_
    resultsDict['kmeans.cluster_centers_'] = kmeans.cluster_centers_
    return resultsDict


# At this moment with sycl we are working only with numpy arrays
def to_numpy(data):
    try:
        from pandas import DataFrame
        if isinstance(data, DataFrame):
            return np.ascontiguousarray(data.values)
    except ImportError:
        pass
    try:
        from scipy.sparse import csr_matrix
        if isinstance(data, csr_matrix):
            return data.toarray()
    except ImportError:
        pass
    return data


def main(readcsv=read_csv, method='randomDense'):
    infile = os.path.join('data', 'batch', 'kmeans_dense.csv')
    nClusters = 20
    maxIter = 5
    
    print("output expected below:")
    
    # Load the data
    data = readcsv(infile, range(20), t=np.float32)   

    # convert to numpy
    data = to_numpy(data) 

    for d in dpctl.get_devices():
        if d.is_gpu:
            device = dpctl.select_gpu_device()
        else:
            device = dpctl.select_cpu_device() 
            
    print(device.device_type)
    with dpctl.device_context(device):        
        resultsDict = compute(data, nClusters, maxIter, method) 

    write_results(resultsDict)
if __name__ == "__main__":
    result = main()    
    print('All looks good!')

### Build and Run
Select the cell below and click run ▶ to compile and execute the code:

In [None]:
! chmod 755 q; chmod 755 run_kmeans_gpu.sh; if [ -x "$(command -v qsub)" ]; then ./q run_kmeans_gpu.sh; else ./run_kmeans_gpu.sh; fi

_If the Jupyter cells are not responsive or if they error out when you compile the code samples, please restart the Jupyter Kernel: 
"Kernel->Restart Kernel and Clear All Outputs" and compile the code samples again__

# Plot kmeans results as computed on GPU

In [None]:
import pickle
def read_results():
    f = open('resultsDict.pkl', 'rb')   # 'rb' for reading binary file
    resultsDict = pickle.load(f)     
    f.close()  
    return(resultsDict)

resultsDict = read_results()
resultsDict

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

infile = os.path.join('data', 'batch', 'kmeans_dense.csv')
# Load the data
df = pd.read_csv(infile,  delimiter=',', usecols = range(20) , header=None, dtype=np.float32)
X = df.to_numpy()

pred_y = resultsDict['pred_y']
cluster_centers_ =  resultsDict['kmeans.cluster_centers_']
labels_ = resultsDict['kmeans.labels_']
c1 = 9
c2 = 19
plt.title('kmeans cluster centers')
plt.scatter(cluster_centers_[:, c1], cluster_centers_[:, c2], s=300, c='red')
plt.show()

## Summary
In this module you will have learned the following:
* Able to Describe Daal4py and Intel Extension for Scikit-learn methodology in extending scikit-learn optimzation capabilites
* Able to Name key imports and function calls to use Intel Extension for Scikit-learn to target Kmeans for use on CPU, GPU and distributed CPU environments
* Able to Apply a single Daal4py function to enable Kmeans targeting CPU and GPU using SYCL context
* Able to Build a Sklearn implementation of Kmeans targeting CPU and GPU using Intel optimized Sklearn Extensions for Kmeans

# Notices & Disclaimers 

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. 
*Other names and brands may be claimed as the property of others.