# Convert and Optimize DistilRoBERTa with OpenVINO™

Transformers are a set of popular architectures used in Natural Language Processing (NLP). They've literally transformed the NLP domain by becoming one of the most widely used architectures ever since their introduction in the paper - ["Attention is all you need"](https://arxiv.org/abs/1706.03762).

Bidirectional Encoder Representations Transformer or more commonly known as [BERT](https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html) is an open source transformer, widely used in NLP by fine tuning it to particular tasks. The uniqueness of BERT is that it is deeply bidirectional. This means that BERT processes text bidirectionally, i.e, it takes into account, both the left and right sides of the token. 

[RoBERTa - Robustly Optimized BERT pretraining Approach](https://arxiv.org/abs/1907.11692) published in 2018 introduced an optimized approach to pre-train BERT like language models. It used the same architecture as BERT but modified key hyperparameters and other training parameters. The application of [Knowledge Distillation](https://en.wikipedia.org/wiki/Knowledge_distillation) to transformer models led to the widespread adaptation of distilled models as these models are much more faster, smaller and efficient to run on common computers. [DistilRoBERTa](https://huggingface.co/distilroberta-base) and [DistilBERT](https://arxiv.org/abs/1910.01108) are two such models which use knowledge distillation to reduce model size and significantly improve performance.

Transformer architectures play a vital role in text analysis. Furthermore, distilled models make it possible to run transfomers on devices which don't have huge computational power. 

Although these models are written using different Deep Learning frameworks such as PyTorch and Tensorflow, they are more populary used through the [Transformer's library](https://huggingface.co/docs/transformers/index) built by [Hugging Face](https://huggingface.co).

This tutorial will use [Emotion Classification in English](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base), which is a fine-tuned checkpoint of [DistilRoBERTa-base](https://huggingface.co/distilroberta-base). The tutorial will provide step-by-step instructions on how to convert and optimize the transformer model using OpenVINO toolkit developed by Intel AI.

The following image gives a brief overview of the steps followed in the tutorial:
![204-flowchart-convert-optimize](./204-flow.jpg)

The tutorial consists of the following sections:
- [Installations and imports](#Installations-and-imports)
- [Setting up dataset](#Download-and-setup-dataset-for-the-notebook)
- [Validating original model](#Fetch-and-validate-original-model)
- [Converting model to OpenVINO IR](#Converting-transformer-model-to-OpenVINO-Model-(IR))
- [Optimizing the model](#Prepare-and-run-Optimization-(Quantization)-pipeline) (Under Construction)
- [Deleting created directories](#Deleting-created-directories-and-files-(Optional))
- [References](#References)

>**Note:** The code blocks in the Installation and imports section install and import all the libraries required to run the notebook. The commented imports in the code blocks of subsequent sections are provided to show where and how a particular import is being used.

## Installations and imports

In [None]:
!pip install optimum[openvino,nncf]==1.7.3
!pip install datasets==2.10.1

In [None]:
# All imports used in the notebook
import sys
from pathlib import Path

sys.path.append("../utils")
import json
import time
from shutil import rmtree, unpack_archive

import nncf
import numpy as np
import pandas as pd
from notebook_utils import download_file
from optimum.intel import OVModelForSequenceClassification as OVModel
from optimum.intel.openvino import OVConfig, OVQuantizer
from sklearn.metrics import ConfusionMatrixDisplay, classification_report, confusion_matrix
from torch.utils.data import DataLoader
from transformers import AutoModelForSequenceClassification as AutoModel
from transformers import AutoTokenizer, pipeline

## Download and setup dataset for the notebook

DistilRoBERTa-base was fine-tuned on multiple datasets for emotion classification as mentioned [here](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base#appendix-%F0%9F%93%9A). We will use the EmotionLines Dataset ([Paper](https://arxiv.org/abs/1802.08379)) which can be found [here](https://doraemon.iis.sinica.edu.tw/emotionlines/download.html). It contains dialogues from Friends TV Scripts and each dialogue is labelled as one of six Ekman's basic emotions plus the neutral emotion.

The following code downloads and extracts the dataset into the `data` folder. The code block after that reads the dataset from `friends_test.json` file and stores it in `test_data` dataframe. Note that we only store the `utterances` and `emotion` column which are the text and the corresponding true prediction respectively. We will be using this dataset throughout the notebook to make comparisons.

>**Note:** Run the code blocks in this section before performing classification in subsequent sections.

In [None]:
# Downloading EmotionLines Dataset and extracting files
# import sys
# from pathlib import Path
# sys.path.append("../utils")
# from shutil import unpack_archive
# from notebook_utils import download_file

DATASET_URL = "https://drive.google.com/uc?export=download&id=1Koxs2pVSmmO_-LWDGx3uUODVHY1yNrTM"
DATA_DIR = Path("data/")
DATA_DIR.mkdir(exist_ok=True)

filepath = download_file(
    DATASET_URL, directory=DATA_DIR, filename="EML_DATA.tar.gz", show_progress=True
)

if not (DATA_DIR / "EmotionLines/Friends").exists():
    unpack_archive(filepath, DATA_DIR)

In [None]:
# Parsing JSON data and reading to DataFrame
# import json
# import pandas as pd

TEST_DATA_PATH = Path("data/EmotionLines/Friends/friends_test.json")

# Reads json into list of list of dicts
with open(TEST_DATA_PATH, "r") as test_file:
    test_jsons = json.load(test_file)

# Flattening data into single list of dicts
test_dictlist = []
for dictlist in test_jsons:
    test_dictlist.extend(dictlist)

# Creating test_data DataFrame
test_data = pd.DataFrame(test_dictlist).drop(["speaker", "annotation"], axis=1)
DATA_SAVE_PATH = Path("data/test_data.csv")
test_data.to_csv(DATA_SAVE_PATH)

# Taking 10 samples as subset
test_subset = test_data.head(10)
test_subset

## Fetch and validate original model
We will use a small subset of `test_data` to check how the model performs inference. Additionally this section would describe how to use a transformer model from the [Transformer's library](https://huggingface.co/docs/transformers/index) provided by [Hugging Face](https://huggingface.co). The same model would be used in upcoming sections too.

Typical steps to use a pretrained transformer model from the transformers library:
1. Import Model, Tokenizer and Pipeline from the library
2. Load the pretrained Model and Tokenizer using the model's ID
3. Define pipeline for classification

The image below shows a basic pipeline used by transformers to perform text analysis:
![204-flowchart-tokenize-predict](./204-predict.jpg)

We will use the `infer` function defined below to perform inference. We can set `return_all_scores` to True in the infer function return scores for each emotion and store in the output DataFrame.

In [None]:
# Defining infer function and importing tokenizer, pipeline
# import pandas as pd
# from transformers import AutoTokenizer, pipeline


def infer(
    model,
    tokenizer: AutoTokenizer,
    test_dataset: pd.DataFrame,
    inp_text_col: str,
    pred_col: str = "predicted_emotion",
    return_all_scores: bool = False,
) -> pd.DataFrame:
    """
    Generic inference function to take a DataFrame containing texts and return new DataFrame with predicted emotion for each text sample along with annotated emotion and optionally scores for each emotion.
    Parameters:
        model: OpenVINO compiled model
        tokenizer (AutoTokenizer): Tokenizer for text
        test_dataset (DataFrame): Dataset for testing
        inp_text_col (str): Column name containing input sequences
        pred_col (str, *optional*, "predicted_emotion"): Column to store predictions in
        return_all_scores (bool, *optional* ,False): Return scores for each emotion
    Returns:
        test_predictions (DataFrame): predictions and true labels alongside inputs
    """

    if test_dataset.empty:
        raise ValueError("Empty DataFrame provided at input")

    if not inp_text_col in test_dataset.columns:
        raise KeyError(f"Invalid column name provided - {inp_text_col}")

    test_predicted = test_dataset.copy()
    predicted_emotion = []
    emotion_score = []
    classifier_pipeline = pipeline(
        "text-classification",
        model=model,
        tokenizer=tokenizer,
        return_all_scores=return_all_scores,
    )

    try:
        if return_all_scores is True:
            for text in test_predicted[inp_text_col]:
                prediction = classifier_pipeline(text)
                prediction_scores = classifier_pipeline(text)
                prediction_df = pd.DataFrame(prediction_scores[0])
                labels = prediction_df["label"]
                scores = prediction_df["score"]
                prediction = labels[np.argmax(scores)]
                predicted_emotion.append(prediction)
                all_scores = dict(zip(labels, scores))
                emotion_score.append(all_scores)

            emotion_score = pd.DataFrame(emotion_score)
            test_predicted[[pred_col]] = predicted_emotion
            test_predicted = pd.concat([test_subset, emotion_score], axis=1)

        else:
            for text in test_predicted[inp_text_col]:
                prediction = classifier_pipeline(text)[0]
                predicted_emotion.append(prediction["label"])
                emotion_score.append(prediction["score"])
            test_predicted[pred_col] = predicted_emotion
            test_predicted["emotion_score"] = emotion_score

    except Exception as e:
        print(f"Inference failed with exception - {e}")
    return test_predicted

In [None]:
# Loading pretrained model
# from transformers import AutoModelForSequenceClassification as AutoModel

MODEL_ID = "j-hartmann/emotion-english-distilroberta-base"
Model = AutoModel.from_pretrained(MODEL_ID)
Tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

In [None]:
# Performing classification on subset of test_data
model_predictions = infer(
    model=Model,
    tokenizer=Tokenizer,
    test_dataset=test_subset,
    inp_text_col="utterance",
    return_all_scores=False,
)
model_predictions

## Converting transformer model to OpenVINO Model (IR)
Since the creators of transformers library provide [Optimum Library](https://huggingface.co/docs/optimum/index) which is an extension to the transformer's libary for exporting  of pretrained models, we don't have to write extra code for it. Optimum provides an API to directly export transformer models to ONNX, OpenVINO IR, etc. For more details on OpenVINO IR, refer this [link](https://docs.openvino.ai/latest/openvino_docs_MO_DG_IR_and_opsets.html). 

The last lines of the below code block are used to save the OpenVINO model files into the `model` folder.

In [None]:
# Loading pretrained model
# from optimum.intel import OVModelForSequenceClassification as OVModel

MODEL_ID = "j-hartmann/emotion-english-distilroberta-base"
OVModel = OVModel.from_pretrained(MODEL_ID, export=True)
Tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# Saving OpenVINO model files
MODEL_DIR = Path("model/")
MODEL_DIR.mkdir(exist_ok=True)
OVModel.save_pretrained(MODEL_DIR)
Tokenizer.save_pretrained(MODEL_DIR)

In [None]:
# Performing classification on subset of test_data
OVmodel_predictions = infer(
    model=OVModel,
    tokenizer=Tokenizer,
    test_dataset=test_subset,
    inp_text_col="utterance",
    return_all_scores=False,
)
OVmodel_predictions

## Prepare and run Optimization (Quantization) pipeline

After obtaining the OpenVINO IR Model in the [previous section](#Fetch-and-validate-original-model), we can optimize the model through quantization. Quantization basically involves using lower precision representation instead of higher precision representation (usually 32 bit floating point). This reduces memory requirements, energy consumption (theoretically), and speeds up inference by faster matrix multiplications and other arithmetic operations. Quantization helps to make models more efficient with negligible reduction in prediction performance while significantly reducing computing requirements. For a detailed understanding of quantization, refer this [paper](https://arxiv.org/pdf/1712.05877.pdf).

We have two ways to quantize the OpenVINO IR model obtained before.
- Using [OpenVINO's API for Quantization](https://docs.openvino.ai/latest/ptq_introduction.html)
- Using [Optimum's API for Transformer Quantization](https://huggingface.co/docs/optimum/intel/optimization_ov)

The following subsections describe how to use both the methods mentioned above.
Since Optimum's API uses NNCF as a backend to perform quantization, we will use that to show how to perform quantization.

### Defining testing function and reporting pre-optimization metrics
Before we get started with quantization, we define a testing function to report accuracies and inference times on the entire dataset downloaded [previously](#Download-and-setup-dataset-for-the-notebook). We also report the accuracy and inference time of the OpenVINO IR model in this subsection for comparison. To keep things simple, we will report and analyse four [standard classification metrics](https://towardsdatascience.com/comprehensive-guide-on-multiclass-classification-metrics-af94cfb83fbd) which are Accuracy, Precision, Recall and F1-Score using [Scikit-learn's metrics module](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics). The metrics are summarized below.

1. `Accuracy`: Accuracy is the ratio of correctly classified samples and total number of samples in the dataset
2. `Precision`: Precision tells us how precisely the model predicts, i.e, out of all the samples predicted as positive, how many are actually positive.
3. `Recall`: Recall describes how well the model recalls patterns, i.e, how many of true positives are correctly identified by the model.
4. `F1-Score`: This is the [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean) of Precision and Recall. It helps to minimize both False positives and False Negatives by bringing a balance between the precision-recall trade-off. 

We also define a `tester` function to print metrics and report inference time in an orderly fashion. Putting `plotcm=True` in the tester function would plot the confusion matrix which can be used for further analysis.
>**Note 1**: Please run the code block where `infer` function is defined to ensure the below block runs without errors (see [this](#Fetch-and-validate-original-model) section).

>**Note 2**: Please run the code block where data is setup to ensure the successive blocks run without errors (see [this](#Download-and-setup-dataset-for-the-notebook) section).

In [None]:
# Defining function to generate classification metrics
# import time
# import numpy as np
# from sklearn.metrics import ConfusionMatrixDisplay, classification_report, confusion_matrix


def clmetrics(y_true: list, y_pred: list, plotcm: bool = False) -> tuple:
    """
    Generic function to generate four standard classification metrics viz. Accuracy, Precision, Recall, F1-Score using scikit-learn by taking true labels and predicted labels as input.
    Parameters:
        y_true (list, array-like): True labels
        y_pred (list, array-like): Predicted labels
        plotcm (bool, *optional*,False): Whether to plot confusion matrix
    Returns:
        clreport (str): Classification report containing metrics
        cm (np.ndarray): Confusion Matrix
        cmplot (ConfusionMatrixDisplay): Confusion Matrix object to plot confusion matrix
    """

    # Checking inputs
    if len(y_true) < 1:
        raise IndexError("Empty list received at input - y_true")

    if len(y_pred) < 1:
        raise IndexError("Empty list received at input - y_pred")

    # Obtaining metrics using scikit-learn functions
    try:
        # Classification report - Accuracy, Precision, Recall, F1-Score
        clreport = classification_report(y_true, y_pred)

        # Plotting Confusion Matrix
        cm = confusion_matrix(y_true, y_pred)
        cmplot = ConfusionMatrixDisplay(confusion_matrix=cm)
        if plotcm is True:
            print(clreport)
            cmplot.plot()
            plt.show()

    except Exception as e:
        print(f"Metric generation failed with exception - {e}")
    return clreport, cm, cmplot


# Defining function to record times and report metrics
def tester(
    model,
    tokenizer: AutoTokenizer,
    test_dataset: pd.DataFrame,
    inp_text_col: str,
    true_label_col: str,
    pred_col: str = "predicted_emotion",
    model_name: str = "",
    return_all_scores: bool = False,
    plotcm: bool = False,
    use_compile: bool = True,
) -> tuple:
    """
    Tester function which uses infer and clmetrics function to test model and report inference time and metrics on provided dataset.
    Parameters:
        model: OpenVINO model. Must not be compiled. Set compile=False when loading model.
        tokenizer (AutoTokenizer): Tokenizer for text
        test_dataset (DataFrame): Dataset for testing
        inp_text_col (str): Column name containing input sequences
        true_label_col (str): Column name containing true predictions
        pred_col (str, *optional*, "predicted_emotion"): Column to store predictions in
        model_name (str, *optional*, ""): Model name - used to print with metrics for identification
        return_all_scores (bool, *optional* ,False): Return scores for each emotion
        plotcm (bool, *optional*,False): Whether to plot confusion matrix
        use_compile (bool, True): Whether to compile model before performing inference
    Returns:
        test_predictions (pd.DataFrame): DataFrame containing predictions and scores for inputs
        metrics (tuple): Tuple containing classification report, confusion matrix and plot of confusion matrix in that order
        infer_time (str): Time taken to perform inference over dataset (H:M:S)
    """
    # Compiling model if needed
    if use_compile is True:
        model.compile()
    additional_prints = "".join(["="] * 20)  # Just some beautification

    # Inference using infer
    print(f"{additional_prints} Model inference started {additional_prints}")
    infer_start = time.time()
    test_predictions = infer(
        model=model,
        tokenizer=tokenizer,
        test_dataset=test_dataset,
        inp_text_col=inp_text_col,
        pred_col=pred_col,
        return_all_scores=return_all_scores,
    )
    infer_end = time.time()
    infer_time = time.strftime("%H:%M:%S", time.gmtime(infer_end - infer_start))
    print(f"{additional_prints} Model inference finished in {infer_time} {additional_prints}")

    # Obtaining true labels and predictions
    y_true = test_dataset[true_label_col]
    y_pred = test_predictions[pred_col]

    # Generating and printing metrics using clmetrics
    print(f"{additional_prints} Metrics of {model_name} Model {additional_prints}")
    metrics = clmetrics(y_true=y_true, y_pred=y_pred, plotcm=plotcm)
    clreport, cm, cmplot = metrics[0], metrics[1], metrics[2]
    print(clreport)
    print(f"{additional_prints*3}")

    return test_predictions, metrics, infer_time

In [None]:
# Generating metrics using test_data for OpenVINO IR
Model_testout = tester(
    model=Model,
    model_name="Pretrained",
    tokenizer=Tokenizer,
    test_dataset=test_data,
    inp_text_col="utterance",
    true_label_col="emotion",
    return_all_scores=False,
    use_compile=False,
)

In [None]:
# Generating metrics using test_data for OpenVINO IR
OVmodel_testout = tester(
    model=OVModel,
    model_name="OpenVINO IR",
    tokenizer=Tokenizer,
    test_dataset=test_data,
    inp_text_col="utterance",
    true_label_col="emotion",
    return_all_scores=False,
)

## Deleting created directories and files (Optional)
The following codeblock deletes all the directories and files that were created after running this notebook. To run it, please uncomment the code first.

>**Note**: Please ensure you haven't stored any important files in the directories created with this notebook. Use the following code block with <u>extreme caution</u>.

In [None]:
# from shutil import rmtree
# rmtree(DATA_DIR)
# rmtree(MODEL_DIR)

## References

\[1\] [Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).](https://arxiv.org/abs/1706.03762)

\[2\] [Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
](https://arxiv.org/abs/1810.04805)

\[3\] [Liu, Yinhan, et al. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint arXiv:1907.11692 (2019).](https://arxiv.org/abs/1907.11692)

\[4\] [Sanh, Victor, et al. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108 (2019).](https://arxiv.org/abs/1910.01108)

\[5\] [Jochen Hartmann, "Emotion English DistilRoBERTa-base", 2022.](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/)