# Module 01_01: Classifier: targeting CPU and Patching 

![Assets/KNNacceleration.jpg](Assets/KNNacceleration.jpg)
### Use nbconvert  patch_sklearn from command line

# Learning Objectives:

1) Describe how to surgically unpatch specific optimized functions if needed
1) Apply patching to KNN algorithm
2) Describe acceleration for the covtype dataset with KNN classification



# *Real World* example Classifier on CovType Dataset

### Compare timings of stock kmeans versus Intel Extension for Scikit-learn Classifier using patch_sklean()

Below we will apply Intel Extension for Scikit learn to a use case on a CPU

Intel® Extension for Scikit-learn contains drop-in replacement functionality for the stock scikit-learn package. You can take advantage of the performance optimizations of Intel Extension for Scikit-learn by adding just two lines of code before the usual scikit-learn imports. Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality.

### Data: covtype

We will use forest cover type dataset known as covtype and fetch the data from sklearn.datasets


Here we are **predicting forest cover type** from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).

This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.


Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).

This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.

### Overview of procedure
In the below example we will train and predict kNN algorithm with Intel Extension for Scikit-learn for covtype dataset and calculate the CPU and wall clock time for training and prediction. Then in the next step we will unpatch the Intel extension for Scikit-learn and observe the time taken on the CPU for the same trainng and prediction.

- [Back to Sections](#Back_to_Sections)

### Show patched KNN


- [Back to Sections](#Back_to_Sections)

# Exercise: Apply patch

From a terminal window, issue the following qsub interactive command to login to a node with 4th Generation Intel® Xeon® Scalable Processor:

``` bash
qsub -I  -l nodes=1:spr:ppn=2
source /opt/intel/oneapi/setvars.sh --force
conda activate base
cd MLoneAPI/Machine-Learning-using-oneAPI/01_Intel_Extensions_for_Scikit-learn_Patching_CPU/
python 01_04_Patching_Classifier_XeonScalable4thGen.py 
# Note the speedup!
exit
```

# Read the resulting execution times and plot them

In [None]:
import csv 

with open('data/compareTimes.csv', 'w', newline='') as csvfile: 
    writer = csv.writer(csvfile) 
    writer.writerow([pred_times[0],pred_times[1]])

## Plot KNN speed up using patch


- [Back to Sections](#Back_to_Sections)


In [None]:
# Copyright 2022 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
%matplotlib inline
import matplotlib.pyplot as plt
import csv

with open('data/compareTimes.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    string_times = list(reader)
pred_times = [float(n) for n in string_times[0]]
    
left = [1,2]

tick_label = ['unpatched KNN', 'patched KNN']
plt.figure(figsize = (16,8))
plt.bar(left, pred_times, tick_label = tick_label, width = 0.5, color = ['red', 'blue'])
plt.xlabel('Predict Method'); plt.ylabel('time [s]'); plt.title('KNN Predict time [s] - Lower is better')
plt.show()

print('Intel(R) Extensions for scikit-learn* \033[1mKNN acceleration {:4.1f} x!\033[0m'.format( unpatched_time/patched_time))

## Observations:

We observe that with scikit-learn-intelex patching you can:

- Opimize performance with minimal changes (a couple of lines of code);
- Achieve faster execution with 32 opitmized sklearn algorithms
- Achieve the same model quality.

Compare the times and accuracies of these two runs. 

Is the time versus accuracy trade off worth the effort to patch this function?

Reminder of how to find the list of functions available to patch


# Summary:

You have:

1) applied patching to KNN algorithm
2) Describe acceleration for the covtype dataset
    

# Notices & Disclaimers 

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. 
*Other names and brands may be claimed as the property of others.

In [None]:
print("All Done")