## Import ONNX T5 models from HuggingFace 🤗 into Spark NLP 🚀

Let's keep in mind a few things before we start 😊

- ONNX support was introduced in  `Spark NLP 5.0.0`, enabling high performance inference for models.
- ONNX support for the `T5Transformer` is only available since in `Spark NLP 5.2.0` and after. So please make sure you have upgraded to the latest Spark NLP release
- You can import T5 models via `T5Model`. These models are usually under `Text2Text Generation` category and have `T5` in their labels
- This is a very computationally expensive module especially on larger sequence. The use of an accelerator such as GPU is recommended.
- Reference: [T5Model](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model)
- Some [example models](https://huggingface.co/models?other=T5)

## Export and Save HuggingFace model

- Let's install `transformers` package with the `onnx` extension and it's dependencies. You don't need `onnx` to be installed for Spark NLP, however, we need it to load and save models from HuggingFace.
- We lock `transformers` on version `4.35.2`. This doesn't mean it won't work with the future releases
- We will also need `sentencepiece` for tokenization.

In [1]:
!pip install -q --upgrade transformers[onnx]==4.35.2 optimum sentencepiece

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.13.1 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.
tensorflow 2.13.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.11.0 which is incompatible.[0m[31m
[0m

In [2]:
!pip install -q --upgrade --force-reinstall transformers[onnx]==4.35.2 optimum sentencepiece tensorflow

[0m

- HuggingFace has an extension called Optimum which offers specialized model inference, including ONNX. We can use this to import and export ONNX models with `from_pretrained` and `save_pretrained`.
- We'll use [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) model from HuggingFace as an example
- In addition to `T5Model` we also need to save the tokenizer. This is the same for every model, these are assets needed for tokenization inside Spark NLP.
- If we want to optimize the model, a GPU will be needed. Make sure to select the correct runtime.
0

![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_T5.ipynb)

In [1]:
import transformers
# Model name, either HF (e.g. "google/flan-t5-base") or a local path
MODEL_NAME = "/data/HW/proj2/best_model"


# Path to store the exported models
EXPORT_PATH = "/data/HW/proj2/exported_model"

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Export the model to ONNX using optimum

# Export with optimizations (uncomment next line)
# !optimum-cli export onnx --task text2text-generation-with-past --model {MODEL_NAME} --optimize O2 {EXPORT_PATH}
# IMPORTANT - there is a bug in onnxruntime which crashes it when trying to optimize a T5 small model (or any derivative of it)
# There are two ways to addess the problem:
# 1. Go to onnx_model_bert.py in the onnxruntime module (the full path depends on the module version),
#    find the BertOnnxModel class and comment the following line in the constructor:
#    assert (num_heads == 0 and hidden_size == 0) or (num_heads > 0 and hidden_size % num_heads == 0)
# 2. Disable optimization by removing '--optimize O2' (use line below).

# Export without optimizations
!optimum-cli export onnx --task text2text-generation-with-past --model {MODEL_NAME} {EXPORT_PATH}

2024-05-22 13:10:15.390121: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-22 13:10:15.392066: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-22 13:10:15.432109: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/3: T5Sta

Let's have a look inside these two directories and see what we are dealing with:

In [4]:
!ls -l {EXPORT_PATH}

total 810572
-rw-r--r-- 1 root root      1532 May 22 13:10 config.json
-rw-r--r-- 1 root root 232553641 May 22 13:10 decoder_model.onnx
-rw-r--r-- 1 root root 232784326 May 22 13:10 decoder_model_merged.onnx
-rw-r--r-- 1 root root 219953955 May 22 13:10 decoder_with_past_model.onnx
-rw-r--r-- 1 root root 141456353 May 22 13:10 encoder_model.onnx
-rw-r--r-- 1 root root       142 May 22 13:10 generation_config.json
-rw-r--r-- 1 root root      2201 May 22 13:10 special_tokens_map.json
-rw-r--r-- 1 root root    791656 May 22 13:10 spiece.model
-rw-r--r-- 1 root root   2422256 May 22 13:10 tokenizer.json
-rw-r--r-- 1 root root     20771 May 22 13:10 tokenizer_config.json


- As you can see, we need to move the sentence piece models `spiece.model` from the tokenizer to assets folder which Spark NLP will look for

In [5]:
! mkdir -p {EXPORT_PATH}/assets
! mv -t {EXPORT_PATH}/assets {EXPORT_PATH}/spiece.model

In [6]:
!ls -l {EXPORT_PATH}/assets

total 776
-rw-r--r-- 1 root root 791656 May 22 13:10 spiece.model


## Import and Save T5 in Spark NLP

- Let's install and setup Spark NLP in Google Colab
- This part is pretty easy via our simple script

In [8]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Installing PySpark 3.2.3 and Spark NLP 5.3.3
setup Colab for PySpark 3.2.3 and Spark NLP 5.3.3
[0m

Let's start Spark with Spark NLP included via our simple `start()` function

In [7]:
import sparknlp

# let's start Spark with Spark NLP
spark = sparknlp.start()

:: loading settings :: url = jar:file:/opt/module/spark-3.5.0-bin-hadoop3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
com.johnsnowlabs.nlp#spark-nlp_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-9880cb02-82c9-474a-8f38-96695ac04bf4;1.0
	confs: [default]
	found com.johnsnowlabs.nlp#spark-nlp_2.12;5.3.3 in central
	found com.typesafe#config;1.4.2 in central
	found org.rocksdb#rocksdbjni;6.29.5 in central
	found com.amazonaws#aws-java-sdk-s3;1.12.500 in central
	found com.amazonaws#aws-java-sdk-kms;1.12.500 in central
	found com.amazonaws#aws-java-sdk-core;1.12.500 in central
	found commons-logging#commons-logging;1.1.3 in central
	found commons-codec#commons-codec;1.15 in central
	found org.apache.httpcomponents#httpclient;4.5.13 in central
	found org.apache.httpcomponents#httpcore;4.4.13 in central
	found software.amazon.ion#ion-java;1.0.2 in central
	found joda-time#joda-time;2.8.1 in central
	found com.amazonaws#jmespath-java;1.12.500 in central
	found com.g

- Let's use `loadSavedModel` functon in `T5Transformer` which allows us to load the ONNX model
- Most params will be set automatically. They can also be set later after loading the model in `T5Transformer` during runtime, so don't worry about setting them now
- `loadSavedModel` accepts two params, first is the path to the exported model. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.st and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.

In [8]:
from sparknlp.annotator import *

# T5 = T5Transformer.loadSavedModel(EXPORT_PATH, spark)\
#   .setUseCache(True) \
#   .setTask("summarize:") \
#   .setMaxOutputLength(200)
T5 = T5Transformer.loadSavedModel(EXPORT_PATH, spark)\
    .setUseCache(True) \
    .setTask("question:") \
    .setMaxOutputLength(200) \
    .setInputCols(["documents"]) \
    .setOutputCol("answers")

Using CPUs
Using CPUs


Let's save it on disk so it is easier to be moved around and also be used later via `.load` function

In [9]:
T5.write().overwrite().save(f"{MODEL_NAME}_spark_nlp")

Let's clean up stuff we don't need anymore

In [10]:
!rm -rf {EXPORT_PATH}

Awesome  😎 !

This is your ONNX T5 model from HuggingFace 🤗  loaded and saved by Spark NLP 🚀

In [11]:
! ls -l {MODEL_NAME}_spark_nlp

total 366308
-rw-r--r-- 1 root root 232819989 May 22 13:12 decoder.onxx
-rw-r--r-- 1 root root 141478076 May 22 13:12 encoder.onxx
drwxr-xr-x 2 root root      4096 May 22 13:12 metadata
-rw-r--r-- 1 root root    791656 May 22 13:12 t5_spp


Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny T5 model 😊

That's it! You can now go wild and use hundreds of T5 models from HuggingFace 🤗 in Spark NLP 🚀


In [15]:
MODEL_NAME

'/data/HW/proj2/best_model'