# Imports

In [1]:
from transformers import DistilBertForSequenceClassification, AutoTokenizer
import openvino.runtime as ov
import warnings
from pathlib import Path
import numpy as np
import time
import torch

## Initializing the model

In [2]:
checkpoint = "distilbert-base-uncased"
model = DistilBertForSequenceClassification.from_pretrained(
    pretrained_model_name_or_path=checkpoint
    )

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier

## Initializing The Tokenizer

Text Preprocessing is the method of cleaning the text input data to make it fit enough to be fed into the model. [Tokenization](https://towardsdatascience.com/tokenization-for-natural-language-processing-a179a891bad4) is used in natural language processing to split paragraphs and sentences into smaller units that can be more easily assigned meaning. It involves cleaning of the data and assigning tokens or ids to the words where words are representated in a vector space where similar words have similar vectors which helps to understand the contexts of the sentence. We're making use of a [AutoTokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer) from Huggingface, which is basically a pretrained tokenizer.

In [3]:
tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=checkpoint
    )

## Convert to ONNX

**ONNX** is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers. We need to convert our model from Pytorch to ONNX. In order to perform the operation, we use a function [torch.onnx.export](https://pytorch.org/docs/stable/onnx.html#example-alexnet-from-pytorch-to-onnx) to [convert a Huggingface model](https://huggingface.co/blog/convert-transformers-to-onnx#export-with-torchonnx-low-level) to its respective ONNX format.

In [4]:
onnx_model = "distilbert.onnx"
MODEL_DIR = "model/"
MODEL_DIR = f"{MODEL_DIR}"
onnx_model_path = Path(MODEL_DIR) / onnx_model
dummy_model_input = tokenizer("This is a sample", return_tensors="pt")
torch.onnx.export(
    model,
    tuple(dummy_model_input.values()),
    f=onnx_model,
    input_names=['input_ids', 'attention_mask'],
    output_names=['logits'],
    dynamic_axes={'input_ids': {0: 'batch_size', 1: 'sequence'},
                  'attention_mask': {0: 'batch_size', 1: 'sequence'},
                  'logits': {0: 'batch_size', 1: 'sequence'}},
    do_constant_folding=True,
    opset_version=13,
)

  mask, torch.tensor(torch.finfo(scores.dtype).min)


# Model Optimizer

[Model Optimizer](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) is a cross-platform command-line tool that facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.

In [5]:
optimizer_command = f"mo \
    --input_model {onnx_model} \
    --output_dir {MODEL_DIR} \
    --model_name {checkpoint} \
    --input input_ids,attention_mask \
    --input_shape [1,128],[1,128]"
! $optimizer_command

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	C:\Users\arunimac\OneDrive - Intel Corporation\Documents\Projects\OpenVINO Contrib\225-distilbert-sequence-classification\distilbert.onnx
	- Path for generated IR: 	C:\Users\arunimac\OneDrive - Intel Corporation\Documents\Projects\OpenVINO Contrib\225-distilbert-sequence-classification\model/
	- IR output name: 	distilbert-base-uncased
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	input_ids,attention_mask
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	[1,128],[1,128]
	- Source layout: 	Not specified
	- Target layout: 	Not specified
	- Layout: 	Not specified
	- Mean values: 	Not specified
	- Scale values: 	Not specified
	- Scale factor: 	Not specified
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- User transformations: 	Not specified
	- Reverse input channels: 	False
	- Enable IR generation for fixed input shape: 	False
	- Use the tra

OpenVINO™ Runtime uses [Infer Request](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Infer_request.html) mechanism which allows running models on different devices in asynchronous or synchronous manners. The model graph is sent as an argument to the OpenVINO API and an inference request is created. The default inference mode is AUTO but it can be changed according to requirement and hardwares available. You can explore the different inference modes and their usage [here.](https://docs.openvino.ai/latest/openvino_docs_Runtime_Inference_Modes_Overview.html)

In [6]:
warnings.filterwarnings("ignore")
core = ov.Core()
ir_model_xml = str((Path(MODEL_DIR) / checkpoint).with_suffix(".xml"))
compiled_model = core.compile_model(ir_model_xml)
infer_request = compiled_model.create_infer_request()

In [7]:
"""
Defining a softmax function to extract
the prediction from the output of the IR format
"""


def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# Inference

In [8]:
"""
Creating a generic inference function
to read the input and infer the result
into 2 classes: Positive or Negative.
"""


def infer(input_text):
    input_text = tokenizer(
        input_text,
        padding="max_length",
        max_length=128,
        truncation=True,
        return_tensors="np",
    )
    inputs = dict(input_text)
    label = {0: "NEGATIVE", 1: "POSITIVE"}
    result = infer_request.infer(inputs=inputs)
    for i in result.values():
        probability = np.argmax(softmax(i))
    return label[probability]

For a single input sentence

In [9]:
input_text = "I had a wonderful day"
start_time = time.perf_counter()
result = infer(input_text)
end_time = time.perf_counter()
total_time = end_time - start_time
print("Label: ", result)
print("Total Time: ", "%.2f" % total_time, " seconds")

{'input_ids': array([[ 101, 1045, 2018, 1037, 6919, 2154,  102,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0]]), 'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0

Read from a file

In [10]:
start_time = time.perf_counter()
with open("data/sample.txt", "r") as f:
    input_text = f.readlines()
    for lines in input_text:
        print("User Input: ", lines)
        result = infer(lines)
        print("Label: ", result, "\n")
end_time = time.perf_counter()
total_time = end_time - start_time
print("Total Time: ", "%.2f" % total_time, " seconds")

User Input:  The food was horrible.

{'input_ids': array([[ 101, 1996, 2833, 2001, 9202, 1012,  102,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0]]), 'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,