# Hummingbird

With an aim of enabling the traditional ML libraries to take advantage of hardware acceleration and optimizations implemented for the neural networks without restructuring the model, Microsoft launched a library named Hummingbird.

To read about it more, please refer [this](https://analyticsindiamag.com/guide-to-hummingbird-a-microsofts-library-for-expediting-traditional-machine-learning-models/) article.

## Practical implementation
 The dataset winequality_red used in the code is available on Kaggle. The classification task is to label the wine quality for each instance as good or bad depending upon whether it is above 6.5 or not respectively.

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras --user -q

In [None]:
#install Hummingbird library
!python -m pip install hummingbird-ml --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

Hummingbird can be installed using pip command as:

In [None]:
#import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import seaborn as sns
from hummingbird.ml import convert
from sklearn.model_selection import train_test_split

Load the dataset

In [None]:
# load data file
data=pd.read_csv('https://raw.githubusercontent.com/shrikant-temburwar/Wine-Quality-Dataset/master/winequality-red.csv', delimiter=";")

In [None]:
data.head()

In [None]:
#plot the Countplot for the column quality
sns.countplot(x='quality',data=data)

Separate data into features(x) and labels(y)

In [None]:
# store the quality dataframe
quality=data['quality']

In [None]:
# if quality is less than 6.5 then it is assigned as 0 and if it is above 6.5 it is assigned to be 1
data['quality']=pd.cut(data['quality'],bins=(2,6.5,8),labels=[0,1])

In [None]:
#change the datatype of data['quality'] from category to int64
data['quality']=data['quality'].astype('int64')

In [None]:
#Now plot correlation heat map
plt.figure(figsize=(60,30))
sns.heatmap(data.corr(),annot=True,fmt='.2f')
plt.show()

In [None]:
# Seperate data into features and labels
x=data.iloc[:,:-1]
y=data.iloc[:,-1]

Perform train-test split of the data by keeping train:test ratio as 3:1 i.e. 75% training data and 25% test data

In [None]:
#Split the data into training and testing dataset by taking train_size as 75%
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.75,random_state=42)

Instantiate Random Forest classifier

In [None]:
#instantiate Random Forest classifier
model=RandomForestClassifier(n_estimators=300)

Fit the model to the training data

In [None]:
#Fit the model to the training data
model.fit(x_train,y_train)

Prediction of labels for test data

In [None]:
# Calculation of time for predicting test data without using Hummingbird
#prediction of labels for test data
y_pred=model.predict(np.array(x_test))


Convert the model into PyTorch model using Hummingbird library

In [None]:
#convert the model into PyTorch model using Hummingbird library
model_torch=convert(model,'pytorch')

Apply DNN Framework

In [None]:
#Calculation of time for appyling DNN
#Apply DNN Framework Nvidia's CUDA
model_torch.to('cpu')

Prediction of labels for test data

In [None]:
#Calculation of time for predicting test labels after using Hummingbird
# %%time
#prediction of labels for test data
y_pred_torch=model_torch.predict(np.array(x_test))