# Module 05_03: Gallery of Functions on GPU

### Learning Objectives:

By the end of this lession you will be able to:

- Apply the patch functions with varying granularities.
- Leverage the Compute Follows Data methodology using Intel DPCTL library to target Intel GPU.
- Apply DPCTL and Patching to variety of Scikit-learn Algorithsm in a simple test harness structure.
- For the current hardware configurationson the Intel DevCloud - we are **NOT focusing on performance**


**Not all Scikit-learn functions are available yet on Intel GPU**. These are the [currently optimized functions optimized with Intel(R) Extensions for scikit learn](https://intel.github.io/scikit-learn-intelex/algorithms.html). **Pay specific attenstion to the GPU section** Different algorithms have been optimized for Intel CPU and Intel GPU. This allows a develoeprs to check which functions are currently optimized. Some functions are simply aliases to others in that list, to see the 23 optimized "unique" functions - open the data/sklearnex_gallery.csv or run the follow cell.

**Pandas Gallery of Functions Explorer**
Below you can interact with a Pandas Dataframe containing more information obout the algorithms optimized using Intel Extensions for scikit-learn*

In [1]:
import pandas as pd
sklearnex_gallery_gpu = pd.read_csv('data/sklearnex_gallery_gpu.csv')
sklearnex_gallery_gpu

Unnamed: 0,Acronym,Name,What is it used for,Advantages,Disadvantages
0,dbscan,Density-based spatial clustering of applicatio...,Density-based spatial clustering of applicatio...,does not require one to specify the number of ...,scklearnex must only use ‘euclidean’ or ‘minko...
1,kmeans,K-Means clustering.,KMeans algorithm clusters data by trying to se...,Relatively simple to implement. Scales to larg...,intel sklearnex: All parameters except precomp...
2,kneighborsclassifier,K Nearest Neighbors Classifier,Classifier: Find the K-neighbors of a point. R...,K-NN is pretty intuitive and simple: \nK-NN al...,Intel sklearex: All parameters except algorit...
3,kneighborsregressor,K Nearest Neighbors Regressor,Regressor: Find the K-neighbors of a point. Re...,K-NN is pretty intuitive and simple: \nK-NN al...,Intel sklearex: All parameters except algorit...
4,linear,Linear Regression,Fitting line of best fit through data,Simple method \nGood interpretation \nEasy to...,Intel sklearnex: All parameters except normali...
5,logistic,Logistic Regression,Classifier: Generally used for quick binary cl...,Doesn’t assume linear relationship between ind...,Intel sklearnex: All parameters solver must N...
6,pca,Principal Component Analysis (PCA),Dimensionality reduction. Pre processto reduce...,PCA can help us improve performance at a very ...,intel sklearnex: All parameters svd_solver MU...
7,random_forest_classifier,Random Forest Classifier,Classifier: A random forest produces good pred...,"One third of data is not used for training, he...","SKLEARNEX: warm_start must be False,cpp_alpha ..."
8,random_forest_regressor,Random Forest Regressor,Regressor: A random forest produces good predi...,"One third of data is not used for training, he...","SKLEARNEX: warm_start must be False,cpp_alpha ..."
9,svc,Support Vector Classifier,Classification: requires X & y. It is based o...,1. Regularization capabilities: SVM has L2 Reg...,SKLEARNEX: Only binary dense data is supported...


Below you can filter the dataframe to see details about sklearnex library (aka Intel Extensions to scikit-learn*)

- familiarize yourself with a handful of algorithms, explore details aboput 'What is it used for', 'Advantages', 'Disadvantages'
 

In [2]:
def print_gallery_details(acroynm, column):
    print(sklearnex_gallery_gpu[sklearnex_gallery_gpu['Acronym'] == acroynm][column].tolist()[0])

details = ['What is it used for', 'Advantages', 'Disadvantages']
print_gallery_details('kneighborsclassifier', 'Advantages')

K-NN is pretty intuitive and simple: 
K-NN algorithm is very simple to understand and equally easy to implement. To classify the new data point K-NN algorithm reads through whole dataset to find out K nearest neighbors.    
K-NN has no assumptions: 
K-NN is a non-parametric algorithm which means there are assumptions to be met to implement K-NN. 
Parametric models like linear regression has lots of assumptions to be met by data before it can be implemented which is not the case with K-NN.    
No Training Step: K-NN does not explicitly build any model, it simply tags the new data entry based learning from historical data. New data entry would be tagged with majority class in the nearest neighbor.    
It constantly evolves: Given it’s an instance-based learning; k-NN is a memory-based approach. 
The classifier immediately adapts as we collect new training data. 
It allows the algorithm to respond quickly to changes in the input during real-time use.    
Very easy to implement for multi-c

Import the get_patch_names, get_patch_map from sklearnex and familiarize yourself with the functions available and more detials about where they reise on your system

# Regarding when/how to cast to and from dpctl.tensors

This information bears repeating: to make sure the concept is clear.

Study the code sectons near the conversion to and from dptcl/Numpy

For all sklearnex alogorithms - it will be necessary to cast the X and/or y data passed as the parameter list to dpctl tensor in order for the GPU to access the data and performan the computation.


### Examples dpctl version 12 and later:
```python
x_device = dpctl.tensor.from_numpy(**x**, usm_type = 'device', device = "gpu")
y_device = dpctl.tensor.from_numpy(**y**, usm_type = 'device', device = "gpu")
```
Pay attention ot **return** types from:
- **fit** - many cases in scikit-learn, fit returns selfobject
- **fit_predict** - returns **ndarray** requires casting after the call on host (to_numpy)
- **predict** -  returns **ndarray** requires casting after the call on host (to_numpy)
- **fit_transform** - returns returns **ndarray** requires casting after the call on host (to_numpy)
- **tranform** - typically returns **ndarray** requires casting after the call on host (to_numpy)

Scikit-learn routines that potentially return ndarray type objects or which expect ndtype objects passed as a parameter will need to be cast to/from numpy from/to dpctl.tensor

To cast data being fed TO one of these routines:
- use dpctl.tensor.from_numpy() to conver from NumPy to dpctl tensor
- use dpctl.tensor.to_numpy() to convert from dpctl tensor to NumPy

Example: After a call to fit_predict:
- **catch_device** = estimator.fit_predict(**x_device**, **y_device**)
- **predictedHost** = dpctl.tensor.to_numpy(**catch_device**)


# Test Harness

Below we create a test harness to test specific comparisons of algorith/parameter choice with specific synthetic data generation choices

For measurements, we made a function, "gallery",  in which data for each allgorithm is generated, an estiamtor for the algorithms is creaed, the the aggregated training and prediction times are computed.

We define a gallery function that will pair one algorithms with a synthesized dataset, to give you a sense of how fast these allgorithms can be when applied to some datasets - Not all datasets will have the same speedups but these are likley typical for the size, shape, and complexity of the generated data


# The Gallery Function

The below, tests which slearn functions are supported by the given GPU/ version of Intel Extensions for Scikit-learn. Despiste the documentaion on DevCLoud only a handful of algorithms aare supported

Keep these pass fail results in mind as you compete the practicums, as you should not blindly submit all algorithms to the GPU

## EXERCISE:

- follow the inserted comments to insert of modify code in the next few cells

In [3]:
%%writefile lab/gallery_gpu.py
from tqdm import tqdm
from sklearn.model_selection import train_test_split

############################# Import dpctl ######################
import dpctl
print(dpctl.__version__)
##################################################################

#########  apply patch here  prior to import of desired scikit-learn #######
from sklearnex import patch_sklearn
patch_sklearn()
############################################################################

def gallery(cases):
    elapsed_fit = {}  # dictionary to track the time elapsed for the fit method
    elapsed_predict = {}  # dictionary to track the time elapsed for the predict/transform method 
    # the parmeters for this algorithms and for generating the data will be in the next cell
    for name, case in tqdm(cases.items()):
        print("\nname: ", name)
        algorithm = case['algorithm']
        try:
            estimator = algorithm['estimator'](**algorithm['params'])
            data = case['data']
            x, y = data['generator'](**data['params'])
            x.astype(float)
            y.astype(float)
            ###################  Add code to get_devices, get_devices, select_gpu_device  ########
            for d in dpctl.get_devices():
                gpu_available = False
                for d in dpctl.get_devices():
                    if d.is_gpu:
                        gpu_device = dpctl.select_gpu_device()
                        gpu_available = True
                    else:
                        cpu_device = dpctl.select_cpu_device() 
            if gpu_available:
                print("GPU targeted: ", gpu_device)
            else:
                print("CPU targeted: ", cpu_device)
            ######################################################################################

            # Is this computed on GPU or on Host? Remember compute follows data
            x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=72)
            
            ############### Add code to convert x & y to dpctl.tensors x_device, y_device #########
            if gpu_available:
                ################## add code to cast from Numpy to dpctl_tensors #########################    # target a remote host GPU when submitted via q.sh or qsub -I
                    x_train_device = dpctl.tensor.asarray(x_train, usm_type = 'device', device = "gpu")
                    y_train_device = dpctl.tensor.asarray(y_train, usm_type = 'device', device = "gpu")
                    x_test_device = dpctl.tensor.asarray(x_test, usm_type = 'device', device = "gpu")
                    y_test_device = dpctl.tensor.asarray(y_test, usm_type = 'device', device = "gpu")
                ##########################################################################################
            else:
                ################## add code to cast from Numpy to dpctl_tensors for Host CPU ####################    # target a remote host GPU when submitted via q.sh or qsub -I    
                # target a remote host CPU when submitted via q.sh or qsub -I
                x_train_device = dpctl.tensor.asarray(x_train, usm_type = 'device', device = "cpu")
                y_train_device = dpctl.tensor.asarray(y_train, usm_type = 'device', device = "cpu")
                x_test_device = dpctl.tensor.asarray(x_test, usm_type = 'device', device = "cpu")
                y_test_device = dpctl.tensor.asarray(y_test, usm_type = 'device', device = "cpu")

            ######################################################################################

            
            
            if hasattr(estimator, 'fit_predict'):
                ###################### Modify code to fit  x_device, y_device ####################
                estimator.fit(x_train_device, y_train_device)
                ##################################################################################
                
                print("fit_predict section", name," fit")
                
                ###################### Modify code to predict  x_device, y_device ####################
                catch_device = estimator.fit_predict(x_train_device, y_train_device)
                ######################################################################################
                
                print("fit_predict section", name," fit_predict")   
                
                #######################################################################################
                ##### Since we will use the prediction to score accuracy metrics, we need to cast it ##
                cast_catch_device = dpctl.tensor.to_numpy(catch_device)
                #######################################################################################
                
                # print("fit_predict section dpctl.tensor.to_numpy", name)
                ########use the correct version of the two lines below (comment one out) ##########
                ####  print(predictedHost) 
                print(cast_catch_device)
                ###################################################################################
                
            elif hasattr(estimator, 'predict'):
                estimator.fit(x_train_device, y_train_device)
                print("predict section", name, " fit")
                catch_device = estimator.predict(x_test_device)
                print("predict section", name, " predict")
                
                ################# Add cast to move returned result to ddpctl.tensor, 
                ################# put result in varibale named predicted ###########
                ################# then print(predicted)
                catch_device = estimator.predict(x_test_device)
                cast_catch_device = dpctl.tensor.to_numpy(catch_device)
                print('len(cast_catch_device)',len(cast_catch_device))
                ###############################################################################
                if len(cast_catch_device) > 5:
                    print(cast_catch_device[:5])
                else:
                    print(cast_catch_device)
                
                             
        except Exception as e:
            print('A problem has occurred from the Problematic code:\n', e)
            print("Not Supported as Configured\n\n")
        

def get_cases():
    return {
    'Logistic Regression': {
        "algorithm": {
            'estimator': sklearn.linear_model.LogisticRegression,
            'params': {
                'random_state': 43,
                'max_iter': 300,
                'penalty': 'l2'
            }
        },
        "data": {
            'generator': sklearn.datasets.make_classification,
            'params':
            {
                'n_samples': 10000,
                'n_features': 40,
                'n_classes': 3,
                'n_informative': 5,
                'random_state': 43,
            }
        }
    },
    'KNN Classifier': {
        "algorithm": {
            'estimator': sklearn.neighbors.KNeighborsClassifier,
            'params': {
                'n_jobs': -1,
            }
        },
        "data": {
            'generator': sklearn.datasets.make_classification,
            'params':
            {
                'n_samples': 3500,
                'n_features': 30,
                'n_classes': 3,
                'n_informative': 3,
                'random_state': 43,
            }
        }
    },
    'KNN Regression': {
        "algorithm": {
            'estimator': sklearn.neighbors.KNeighborsRegressor,
            'params': {
                'n_neighbors': 10,
                'n_jobs': -1,
            }
        },
        "data": {
            'generator': sklearn.datasets.make_regression,
            'params':
            {
                'n_samples': 3500,
                'n_features': 30,
                'n_targets': 1,
                'random_state': 43,
            }
        }
    },
    'Linear Regression': {
        "algorithm": {
            'estimator': sklearn.linear_model.LinearRegression,
            'params': {
                'n_jobs': -1,
            }
        },
        "data": {
            'generator': sklearn.datasets.make_regression,
            'params':
            {
                'n_samples': 3000,
                'n_features': 100,
                'n_targets': 1,  
                'random_state': 43,
            }
        }
    },     
    'dbscan': {
            "algorithm": {
            'estimator': sklearn.cluster.DBSCAN,
            'params': {
                'eps': 10,
                'min_samples' :2
            }
        },
        "data": {
            'generator': sklearn.datasets.make_blobs,
            'params':
            {
                'n_samples': 3000,  
                'n_features': 30,
                'centers': 8,
                'random_state': 43,
            }
        }
    },
    'k_means_random': {
            "algorithm": {
            'estimator': sklearn.cluster.KMeans,
            'params': {
                'n_clusters': 3,
                'random_state' :0, 
                'init' : 'random',                
            }
        },
        "data": {
            'generator': sklearn.datasets.make_blobs,
            'params':
            {
                'n_samples': 3000,  
                'n_features': 30,
                'centers': 8,
                'random_state': 43,
            }
        }
    },          
}
from sklearn import metrics
from sklearnex import patch_sklearn
patch_sklearn()  # this will set parameters such that the stock version of sklearn will be called
import sklearn.svm, sklearn.datasets, sklearn.neighbors, sklearn.linear_model, sklearn.decomposition
cases = get_cases()  #case the algorithm/dataset pairs
gallery(cases)  # call the bench function to captures the elapsed time dictionaries
print('All Tests Good\n')

Overwriting lab/gallery_gpu.py


#### Build and Run
Select the cell below and click run ▶ to compile and execute the code:

In [4]:
! chmod 755 q; chmod 755 run_gallery_gpu.sh; if [ -x "$(command -v qsub)" ]; then ./q run_gallery_gpu.sh; else ./run_gallery_gpu.sh; fi

Job has been submitted to Intel(R) DevCloud and will execute soon.

 If you do not see result in 60 seconds, please restart the Jupyter kernel:
 Kernel -> 'Restart Kernel and Clear All Outputs...' and then try again

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
2089030.v-qsvr-1           ...ub-singleuser u177248         00:00:24 R jupyterhub     
2089595.v-qsvr-1           ...allery_gpu.sh u177248                0 Q batch          

Waiting for Output █████████████████████████████████████ Done⬇

########################################################################
#      Date:           Tue 13 Dec 2022 03:38:12 PM PST
#    Job ID:           2089595.v-qsvr-1.aidevcloud
#      User:           u177248
# Resources:           cput=35:00:00,neednodes=1:gen9:ppn=2,nodes=1:gen9:ppn=2,walltime=06:00:00
########################################################################

## u177248 

In [5]:
%%writefile lab/TestPCAonGPU.py
from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.decomposition import PCA
import numpy as np

############################# Import dpctl ######################
import dpctl
print(dpctl.__version__)
##################################################################


x = np.array([[1,1,1],[2,-1,3],[3,2,1]])

###################  Add code to get_devices, get_devices, select_gpu_device  ########
for d in dpctl.get_devices():
    gpu_available = False
    for d in dpctl.get_devices():
        if d.is_gpu:
            gpu_device = dpctl.select_gpu_device()
            gpu_available = True
        else:
            cpu_device = dpctl.select_cpu_device() 
if gpu_available:
    print("GPU targeted: ", gpu_device)
else:
    print("CPU targeted: ", cpu_device)
    
######################################################################################

            
            
############### Add code to convert x to dpctl.tensor x_device #########
x_device = dpctl.tensor.asarray(x, usm_type = 'device', device = "gpu")
######################################################################################


pca = PCA(2)  # 2 Principal components please

# replace x with x_device #######################################
est = pca.fit(x_device)
trans = pca.transform(x)  # replace trans with equivalent trans_x on device
trans_x = pca.transform(x_device)
################# Convert trans_x on GPU from dpctl.tensor to trans_host ########
trans_host =  dpctl.tensor.to_numpy(trans_x)
##################################################################

print('components_ ', pca.components_)
print('explained_variance_ ',pca.explained_variance_)

#### Choose the one that works, depending on your device!
# print('transformed x ',trans)
# print('transformed x ',trans_host)
print('PCA All Good\n')

Overwriting lab/TestPCAonGPU.py


In [6]:
! chmod 755 q; chmod 755 run_pca_gpu.sh; if [ -x "$(command -v qsub)" ]; then ./q run_pca_gpu.sh; else ./run_pca_gpu.sh; fi

Job has been submitted to Intel(R) DevCloud and will execute soon.

 If you do not see result in 60 seconds, please restart the Jupyter kernel:
 Kernel -> 'Restart Kernel and Clear All Outputs...' and then try again

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
2089030.v-qsvr-1           ...ub-singleuser u177248         00:00:29 R jupyterhub     
2089596.v-qsvr-1           run_pca_gpu.sh   u177248                0 Q batch          

Waiting for Output █████████████████████ Done⬇

########################################################################
#      Date:           Tue 13 Dec 2022 03:38:49 PM PST
#    Job ID:           2089596.v-qsvr-1.aidevcloud
#      User:           u177248
# Resources:           cput=35:00:00,neednodes=1:gen9:ppn=2,nodes=1:gen9:ppn=2,walltime=06:00:00
########################################################################

## u177248 is compiling AI 

# Conclusions
Only a handful of sklearn algorithms are currently optimized in oneAPI for Intel GPU. It is important to keep your libraries upto date to get the latest versions with themost supported functionality!

As you convert each practicum keep this in mind

**Exercise:** Take a moment to enumerate the algorithms that you found to be enabled

# Notices & Disclaimers 

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. 
*Other names and brands may be claimed as the property of others.

In [7]:
print("All Done")

All Done
