<center>
<h1>Tensors are all you need</h1>
<br>
<h3>Speed up Inference of your traditional CPU based models by converting them to Tensor based models</h3>
<br>
<a href='https://towardsdatascience.com/speed-up-the-inference-in-traditional-machine-learning-models-by-converting-them-into-tensor-based-efe6bbe5c92d?sk=4b6761f06e81403fb6297cb4d7c66f3c'>
        <img src='https://img.shields.io/badge/Medium-grey?logo=medium'>
    </a>
<a href='https://twitter.com/pandeyparul'>
        <img src='https://img.shields.io/twitter/follow/pandeyparul'>
</a>
</center>



Deep learning frameworks consists of tensors as their basic computational unit. As a result, they are able to utilize the hardware accelerators (e.g. GPUs) thereby speeding up the model training and inference. However, the traditional machine learning libraries like scikit-learn are developed to run on CPUs and have no notion of tensors. As a result, they are unable to take advantage of GPUs and hence miss out on the potential accelerations that deep learning libraries enjoy.

👉 In this notebook, we'll learn about a library called **[Hummingbird](https://github.com/microsoft/hummingbird)**, created to bridge this gap. Hummingbird speedups up the inferencing in tradiotnal machine learning models by converting them to tensor-based models. This enables us to use model like scikit-learn's decision trees and random forest even on GPUs and take advantage of the hardware capabilities.

![](https://miro.medium.com/max/700/1*JWT4IwQsoRArNVmIZXJexQ.png)

*Transforming a simple decision tree into neural networks | Reproduced from Hummingbird's [official blog](https://www.microsoft.com/en-us/research/group/gray-systems-lab/articles/announcing-hummingbird-a-library-for-accelerating-inference-with-traditional-machine-learning-models/)*

> 🗒️ **Incase you want to know more, here is an article that goes deeper into the theory behind the library along with other useful resources:** [Speed up Inference four your scikit-learn models](https://towardsdatascience.com/speed-up-the-inference-in-traditional-machine-learning-models-by-converting-them-into-tensor-based-efe6bbe5c92d?sk=4b6761f06e81403fb6297cb4d7c66f3c)

In [None]:
#Install the library. Make sure the GPU option is selected and the internet is turned 'ON'
!pip install hummingbird-ml

### Importing the necessary libraries and functions

In [None]:

import pandas as pd
from sklearn.model_selection import train_test_split

from hummingbird.ml import convert,load

### Importing the dataset

In [None]:

train = pd.read_csv('../input/tabular-playground-series-jul-2021/train.csv')

train.head()

# Training the sklearn model 

In [None]:
# Filtering Columns to be used for training:
columns = ['deg_C','relative_humidity','absolute_humidity','sensor_1','sensor_2','sensor_3','sensor_4','sensor_5']

In [None]:
# Setting the target variable and splitting the dataset. For demo purpose, only one of the target variable is used.
target = 'target_carbon_monoxide'
X_train, X_test, y_train, y_test = train_test_split(train[columns],train[target], test_size=0.20)


In [None]:
# Training the sklearn model for 1000 estimators
from sklearn.ensemble import RandomForestRegressor
num_est=1000

skl_model = RandomForestRegressor(n_estimators=num_est, max_depth=8)
skl_model.fit(X_train, y_train)

In [None]:
# Timing the inference for scikit-learn on CPU only ⏲
skl_time = %timeit -o skl_model.predict(X_test)


# Converting scikit-learn model to PyTorch on CPU 

In [None]:

model_pytorch = convert(skl_model, 'torch')

In [None]:
# Timing the inference for Pytorch on CPU only
pred_cpu = %timeit -o model_pytorch.predict(X_test)

# Switching PyTorch from CPU to GPU 

In [None]:

%%capture 
model_pytorch.to('cuda')

In [None]:

pred_gpu = %timeit -o model_pytorch.predict(X_test)

## Comparing the results

In [None]:
def plot(title, skl_time, pred_cpu, pred_gpu):
    import matplotlib.pyplot as plt
    import numpy as np
    from matplotlib.pyplot import cm

    fig = plt.figure()

    x = ['sklearn','pytorch-cpu','pytorch-gpu']
    height = [skl_time.best,pred_cpu.best,pred_gpu.best]
    width = 1.0
    plt.ylabel('time in seconds')
    plt.xlabel(title)

    rects = plt.bar(x, height, width, color=cm.rainbow(np.linspace(0,1,5)))
    def autolabel(rects):

        for rect in rects:
            height = rect.get_height()
            plt.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                    '%.4f' % (height),
                    ha='center', va='bottom')

    autolabel(rects)
    plt.show()

In [None]:
chartname = "Random Forest Regressor on CPU and GPU"

plot(chartname, skl_time, pred_cpu, pred_gpu)