# Module 05_02: KNN: targeting GPU and Patching 

Here you will observe that t![GPUTargeted.png](attachment:5d8d6ed5-3a67-4e2f-b646-f9b98a680d18.png)he accuracy is not materially impacted but the speedups are very large. You will generate results suchs as these in this module

![Assets/GPUTargeted.png](Assets/GPUTargeted.png)
### Use nbconvert  patch_sklearn from command line

# Learning Objectives:

1) Describe how to apply dpctl compute follows data in conjuction with patching
1) Apply patching to KNN algorithm on covtype dataset


## Examples dpctl version 11:
- **x_device** = dpctl.tensor.from_numpy(**x**, usm_type = 'device', queue=dpctl.SyclQueue(gpu_device))
- **y_device** = dpctl.tensor.from_numpy(**y**, usm_type = 'device', queue=dpctl.SyclQueue(gpu_device))

## Examples dpctl version 12:
- **x_device** = dpctl.tensor.from_numpy(**x**, usm_type = 'device', device = dpctl.SyclDevice("gpu"))
- **y_device** = dpctl.tensor.from_numpy(**y**, usm_type = 'device', device = dpctl.SyclDevice("gpu"))


![Assets/DevCloudDpctlDependency.PNG](Assets/DevCloudDpctlDependency.PNG)

# *Real World* example KNN on CovType Dataset# *Real World* example KNN on CovType Dataset

### Compare timings of stock kmeans versus Intel Extension for Scikit-learn KNN using patch_sklean()

Below we will apply Intel Extension for Scikit learn to a use case on a CPU

Intel® Extension for Scikit-learn contains drop-in replacement functionality for the stock scikit-learn package. You can take advantage of the performance optimizations of Intel Extension for Scikit-learn by adding just two lines of code before the usual scikit-learn imports. Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality.

### Data: covtype

We will use forest cover type dataset known as covtype and fetch the data from sklearn.datasets


Here we are **predicting forest cover type** from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).

This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.


Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).

This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.

### Overview of procedure
In the below example we will train and predict kNN algorithm with Intel Extension for Scikit-learn for covtype dataset and calculate the CPU and wall clock time for training and prediction. Then in the next step we will unpatch the Intel extension for Scikit-learn and observe the time taken on the CPU for the same trainng and prediction.

### Fetch the Data

- [Back to Sections](#Back_to_Sections)

# Regarding when/how to cast to and from dpctl.tensors

This information bears repeating: to make sure the concept is clear.

Study the code sectons near the conversion to and from dptcl/Numpy

For all sklearnex alogorithms - it will be necessary to cast the X and/or y data passed as the parameter list to dpctl tensor in order for the GPU to access the data and performan the computation.

Examples:
- **x_device** = dpctl.tensor.from_numpy(**x**, usm_type = 'device', queue=dpctl.SyclQueue(gpu_device))
- **y_device** = dpctl.tensor.from_numpy(**y**, usm_type = 'device', queue=dpctl.SyclQueue(gpu_device))



Pay attention ot **return** types from:
- **fit** - many cases in scikit-learn, fit returns self object - No need to cast
- **fit_predict** - returns **ndarray** requires casting after the call on host (to_numpy)
- **predict** -  returns **ndarray** requires casting after the call on host (to_numpy)
- **fit_transform** - returns returns **ndarray** requires casting after the call on host (to_numpy)
- **tranform** - typically returns **ndarray** requires casting after the call on host (to_numpy)

Scikit-learn routines that potentially return ndarray type objects or which expect ndtype objects passed as a parameter will need to be cast to/from numpy from/to dpctl.tensor

To cast data being fed TO one of these routines:
- use dpctl.tensor.from_numpy() to conver from NumPy to dpctl tensor
- use dpctl.tensor.to_numpy() to convert from dpctl tensor to NumPy

Example: After a call to fit_predict:
- **catch_device** = estimator.fit_predict(**x_device**, **y_device**)
- **predictedHost** = dpctl.tensor.to_numpy(**catch_device**)


In [1]:
import dpctl
print(dpctl.__version__)

0.13.0+8.g921229cca


In [2]:
%%writefile lab/compute_KNN_GPU.py
# Copyright 2022 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml
import pandas as pd

############################3 import dpctl #################################

import dpctl
print(dpctl.__version__)

############################################################################

#########  apply patch here  prior to import of desired scikit-learn #######
from sklearnex import patch_sklearn
patch_sklearn()
############################################################################


from  sklearn.datasets import fetch_covtype
x, y = fetch_covtype(return_X_y=True)
# Data Set Information:
# Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).
# This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.

# for sake of time is 1/4th of the data
subset = x.shape[0]//4
x = x[:subset,:]
y = y[:subset]

# Is this computed on GPU or on Host? Remember compute follows data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=72)

#########  Add code to get GPU context and set flag if GPU is available   #######
#########  Add code to to set CPU context as well so this runs on eiher device   #######
for d in dpctl.get_devices():
    gpu_available = False
    for d in dpctl.get_devices():
        if d.is_gpu:
            gpu_device = dpctl.select_gpu_device()
            gpu_available = True
        else:
            cpu_device = dpctl.select_cpu_device() 
if gpu_available:
    print("GPU targeted: ", gpu_device)
else:
    print("CPU targeted: ", cpu_device)
#########################################################################################

if gpu_available:
    ################## add code to cast from Numpy to dpctl_tensors #########################    # target a remote host GPU when submitted via q.sh or qsub -I
        x_train_device = dpctl.tensor.from_numpy(x_train, usm_type = 'device', device = "gpu")
        y_train_device = dpctl.tensor.from_numpy(y_train, usm_type = 'device', device = "gpu")
        x_test_device = dpctl.tensor.from_numpy(x_test, usm_type = 'device', device = "gpu")
        y_test_device = dpctl.tensor.from_numpy(y_test, usm_type = 'device', device = "gpu")
    ##########################################################################################
else:
    ################## add code to cast from Numpy to dpctl_tensors for Host CPU ####################    # target a remote host GPU when submitted via q.sh or qsub -I    
    # target a remote host CPU when submitted via q.sh or qsub -I
    x_train_device = dpctl.tensor.from_numpy(x_train, usm_type = 'device', device = dpctl.SyclDevice("cpu"))
    y_train_device = dpctl.tensor.from_numpy(y_train, usm_type = 'device', device = dpctl.SyclDevice("cpu"))
    x_test_device = dpctl.tensor.from_numpy(x_test, usm_type = 'device', device = dpctl.SyclDevice("cpu"))
    y_test_device = dpctl.tensor.from_numpy(y_test, usm_type = 'device', device = dpctl.SyclDevice("cpu"))
    ##########################################################################################

params = {
    'n_neighbors': 40,  
    'weights': 'distance'
}
print('dataset shape: ', x_train_device.shape)
from sklearn.linear_model import LogisticRegression
#from sklearn.neighbors import KNeighborsClassifier
#knn = KNeighborsClassifier(**params).fit(x_train_device, y_train_device)
clf = LogisticRegression(random_state=0, solver='lbfgs').fit(x_train_device, y_train_device) # or lbfgs

# predictedGPU = knn.predict(x_test_device) #Predict on GPU
# predictedCPU = knn.predict(x_test) #Predict on CPU

predictedGPU = clf.predict(x_test_device) #Predict on GPU
predictedCPU = clf.predict(x_test) #Predict on CPU

################## add code to cast returned results to Numpy to dpctl_tensors ################  
# only need to do this for predict. fit_predict, transform, fit_transform IF I need to use results
# target a remote host GPU when submitted via q.sh or qsub -I    
predictedGPUNumpy = dpctl.tensor.to_numpy(predictedGPU)
###############################################################################################

reportGPU = metrics.classification_report(y_test, predictedGPUNumpy)
print(f"Classification report for kNN Fit and Predicted on GPU:\n{reportGPU}\n")

reportCPU = metrics.classification_report(y_test, predictedCPU)
print(f"Classification report for kNN Fit on GPU and Predicted on CPU:\n{reportCPU}\n")

Overwriting lab/compute_KNN_GPU.py


#### Build and Run

An alternative to the q method below 
- launch a DevCloud Terminal and qsub to a GPU enabled device as follows:
- qsub -I -l  nodes=1:gpu:ppn=2
- then run the bash script as follows:
- . run_KNN_dpctl.sh

## Demonstration of speedup without significant loss of accuracy

#### For running in this notebook:
Select the cell below and click run ▶ to compile and execute the code:

In [3]:
! chmod 755 q; chmod 755 run_KNN_dpctl.sh; if [ -x "$(command -v qsub)" ]; then ./q run_KNN_dpctl.sh; else ./run_KNN_dpctl.sh; fi

Job has been submitted to Intel(R) DevCloud and will execute soon.

 If you do not see result in 60 seconds, please restart the Jupyter kernel:
 Kernel -> 'Restart Kernel and Clear All Outputs...' and then try again

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
2008264.v-qsvr-1           ...ub-singleuser u78349          00:44:06 R jupyterhub     
2008286.v-qsvr-1           run_KNN_dpctl.sh u78349                 0 Q batch          

Waiting for Output ██████████████████████████████████████████████████████████████████████████ Done⬇

########################################################################
#      Date:           Thu 13 Oct 2022 02:15:07 PM PDT
#    Job ID:           2008286.v-qsvr-1.aidevcloud
#      User:           u78349
# Resources:           neednodes=1:gen9:ppn=2,nodes=1:gen9:ppn=2,walltime=06:00:00
###############################################################

In order to cancel optimizations, we use unpatch_sklearn and reimport the class KNeighborsClassifier. Observe the classification_report

## Observations:

We observe that with scikit-learn-intelex compute follow data:

- Easily target training or prediction on GPU
- Easily target training on GPU and prediction on CPU

# Summary:

You have:

1) Applied patching to KNN algorithm
2) Applied method to submitt KNN fit to Intel GPU (model on GPU)
3) Applied method to submitt KNN predict to Intel GPU (model on GPU)
4) Applied method to submitt KNN predict to Intel CPU (model on CPU)
    

# Notices & Disclaimers 

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. 
*Other names and brands may be claimed as the property of others.

In [4]:
print("All Done")

All Done
