# Hugging Face 🤗 NLP Transformers pipelines with ONNX

![logo](assets/logo.png)

*This project is linked to the Medium blog post: [How to use Hugging Face 🤗 Transformers with ONNX in real world]()*

## Working environment

First of all, you need to install all required dependencies. It is recommended to use and isolated environment to avoid conflicts.

You can use any package manager you want. I recommend [`conda`](https://conda.io/).

```bash
conda create -y -n hf-onnx python=3.8
```

The project requires Python 3.8 or higher.

All required dependencies are listed in the `requirements.txt` file. To install them, run the following command:


In [2]:
!pip install -r requirements.txt

Ignoring colorama: markers 'platform_system == "Windows" and python_full_version >= "3.6.0" and python_version >= "3.6"' don't match your environment
Ignoring pyreadline3: markers 'sys_platform == "win32" and python_version >= "3.8" and (python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0")' don't match your environment
Collecting charset-normalizer==2.0.12
  Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting flatbuffers==2.0
  Using cached flatbuffers-2.0-py2.py3-none-any.whl (26 kB)
Collecting joblib==1.1.0
  Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting onnxruntime==1.10.0
  Using cached onnxruntime-1.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB)
Collecting psutil==5.9.0
  Using cached psutil-5.9.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (283 kB)
Collecting requests==2.27.1
  Using cached requests-2.27.1-py2.py3-none

## Export the model to ONNX

For this example, we can use any TokenClassification model from Hugging Face's library because the task we are trying to solve is `Named Entity Recognition` (NER). 

I chose [`dslim/bert-base-NER`](https://huggingface.co/dslim/bert-base-NER) model because it is a `base` model which means medium computation time on CPU. Plus, BERT architecture is a good choice for NER.

In [3]:
import torch

from onnxruntime import (
    InferenceSession, SessionOptions, GraphOptimizationLevel
)
from transformers import (
    TokenClassificationPipeline, AutoTokenizer, AutoModelForTokenClassification
)

In [2]:
options = SessionOptions()
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
session = InferenceSession(
    "onnx/model.onnx", sess_options=options, providers=["CPUExecutionProvider"]
)
session.disable_fallback()


class OnnxTokenClassificationPipeline(TokenClassificationPipeline):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    
    def _forward(self, model_inputs):
        """
        Forward pass through the model.
        """
        special_tokens_mask = model_inputs.pop("special_tokens_mask")
        offset_mapping = model_inputs.pop("offset_mapping", None)
        sentence = model_inputs.pop("sentence")

        inputs = {k: v.cpu().detach().numpy() for k, v in model_inputs.items()}
        outputs_name = session.get_outputs()[0].name

        logits = session.run(output_names=[outputs_name], input_feed=inputs)[0]

        return {
            "logits": torch.tensor(logits),
            "special_tokens_mask": special_tokens_mask,
            "offset_mapping": offset_mapping,
            "sentence": sentence,
            **model_inputs,
        }

    
    def preprocess(self, sentence, offset_mapping=None):
        truncation = True if self.tokenizer.model_max_length and self.tokenizer.model_max_length > 0 else False
        model_inputs = self.tokenizer(
            sentence,
            return_attention_mask=True,
            return_tensors=self.framework,
            truncation=truncation,
            return_special_tokens_mask=True,
            return_offsets_mapping=self.tokenizer.is_fast,
        )
        if offset_mapping:
            model_inputs["offset_mapping"] = offset_mapping

        model_inputs["sentence"] = sentence

        return model_inputs


In [3]:
# model_name_from_hub = "Jean-Baptiste/roberta-large-ner-english"
model_name_from_hub = "dslim/bert-base-NER"
tokenizer = AutoTokenizer.from_pretrained(model_name_from_hub)
model = AutoModelForTokenClassification.from_pretrained(model_name_from_hub)

ner_pipeline = OnnxTokenClassificationPipeline(
    task="ner", 
    model=model,
    tokenizer=tokenizer,
    framework="pt",
    aggregation_strategy="simple",
)

In [6]:
sequence = "Apple was founded in 1976 by Steve Jobs, Steve Wozniak and Ronald Wayne to develop and sell Wozniak's Apple I personal computer"

ner_pipeline(sequence)

[{'entity_group': 'ORG',
  'score': 0.9978969,
  'word': 'Apple',
  'start': 0,
  'end': 5},
 {'entity_group': 'PER',
  'score': 0.9981243,
  'word': 'Steve Jobs',
  'start': 29,
  'end': 39},
 {'entity_group': 'PER',
  'score': 0.9741297,
  'word': 'Steve Wozniak',
  'start': 41,
  'end': 54},
 {'entity_group': 'PER',
  'score': 0.99970996,
  'word': 'Ronald Wayne',
  'start': 59,
  'end': 71},
 {'entity_group': 'PER',
  'score': 0.86664414,
  'word': 'Wozniak',
  'start': 92,
  'end': 99},
 {'entity_group': 'MISC',
  'score': 0.99852806,
  'word': 'Apple I',
  'start': 102,
  'end': 109}]