# Convert Pytorch -> Tensorflow Lite

We wanted to use pytorch models on a rasbperryPi and run them on a *CPU* and *TPU* for benchmarking. Firstly, we tried to use `torch_xla` library but we couldn't go through the installation.

Instead, we will convert Pytorch model to *TFlite*. Then, we will use the *TPU* using *TFLiite* on the RaspberryLite.

We will use the code within the blog : [Convert Pytorch model to tf-lite with onnx-tf](https://medium.com/@zergtant/convert-pytorch-model-to-tf-lite-with-onnx-tf-232a3894657c)

In [1]:
from model_frameworks.grammar import GrammarModel

import onnxruntime
import onnx
# from onnx_tf.backend import prepare

import torch

  from .autonotebook import tqdm as notebook_tqdm


# Pytorch to ONNX...

In [2]:
grammarFramework = GrammarModel(quantization="float16")

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Loading prithivida/grammar_error_correcter_v1 [quantized into float16] model...


In [3]:
model = grammarFramework.get_model()

Model: T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=768, out_features=3072, bias=False)
              (wo): Linear(in_features=3072, out_features=768, bias=False)
              (dropou

In [4]:
tokenizer = grammarFramework.get_tokenizer()

In [5]:
input_text = "translate English to French: Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")
inputs

{'input_ids': tensor([[13959,  1566,    12,  2379,    10,  8774,     6,   149,    33,    25,
            58,     1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [6]:
type(inputs)

transformers.tokenization_utils_base.BatchEncoding

In [7]:
# Define the encoder and decoder inputs for ONNX export
input_ids = inputs["input_ids"]  # Encoder input
attention_mask = inputs["attention_mask"]  # Attention mask

In [8]:
decoder_input_ids = torch.tensor([[tokenizer.pad_token_id]])
decoder_input_ids

tensor([[0]])

In [9]:
# ONNX Export Path
onnx_model_path = "../models/onnx/t5_model.onnx"

In [10]:
torch.onnx.export(
    model,  # The model
    (input_ids, attention_mask, decoder_input_ids),  # Inputs to the model
    onnx_model_path,  # Path where the ONNX file will be saved
    export_params=True,  # Store the trained weights in the model file
    opset_version=13,  # ONNX opset version, use a recent version like 13 or 14
    do_constant_folding=True,  # Optimize constants during export
    input_names=["input_ids", "attention_mask", "decoder_input_ids"],  # Input names
    output_names=["logits"],  # Output names
    dynamic_axes={
        "input_ids": {0: "batch_size", 1: "sequence_length"},  # Variable batch size and sequence length
        "attention_mask": {0: "batch_size", 1: "sequence_length"},  # Variable batch size and sequence length
        "decoder_input_ids": {0: "batch_size", 1: "target_sequence_length"},  # Decoder sequence length
        "logits": {0: "batch_size", 1: "target_sequence_length"}  # Output dimensions
    }
)

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
  if sequence_length != 1:


## Test ONNX

In [11]:
# Load the ONNX model
ort_session = onnxruntime.InferenceSession(onnx_model_path)

# Prepare inputs
onnx_inputs = {
    "input_ids": input_ids.numpy(),
    "attention_mask": attention_mask.numpy(),
    "decoder_input_ids": decoder_input_ids.numpy(),
}

# Run inference
outputs = ort_session.run(["logits"], onnx_inputs)
print("ONNX inference successful. Output logits shape:", outputs[0].shape)

ONNX inference successful. Output logits shape: (1, 1, 32128)


In [None]:
# Load the ONNX model
onnx_model = onnx.load(onnx_model_path)

# Check that the ONNX model is well-formed
onnx.checker.check_model(onnx_model)

# Print a human-readable representation of the graph
print(onnx.helper.printable_graph(onnx_model.graph))


# ONNX To Tensorflow

Execute this code in the shell to install the library needed for this conversion (recommended in the project parent folder)

```shell
cd ..
git clone https://github.com/MPolaris/onnx2tflite.git
cd teacher-correction-assistant
pipenv run python ../onnx2tflite/setup.py install # For pipenv environnement
python ../onnx2tflite/setup.py install # For venv environnement
```

In [None]:
from onnx_tf.backend import prepare

tf_rep = prepare(onnx_model)