![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_T5.ipynb)

## Import ONNX T5 models from HuggingFace 🤗 into Spark NLP 🚀

Let's keep in mind a few things before we start 😊

- ONNX support was introduced in  `Spark NLP 5.0.0`, enabling high performance inference for models.
- ONNX support for the `T5Transformer` is only available since in `Spark NLP 5.2.0` and after. So please make sure you have upgraded to the latest Spark NLP release
- You can import T5 models via `T5Model`. These models are usually under `Text2Text Generation` category and have `T5` in their labels
- This is a very computationally expensive module especially on larger sequence. The use of an accelerator such as GPU is recommended.
- Reference: [T5Model](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model)
- Some [example models](https://huggingface.co/models?other=T5)

## Export and Save HuggingFace model

- Let's install `transformers` package with the `onnx` extension and it's dependencies. You don't need `onnx` to be installed for Spark NLP, however, we need it to load and save models from HuggingFace.
- We lock `transformers` on version `4.35.2`. This doesn't mean it won't work with the future releases
- We will also need `sentencepiece` for tokenization.

In [None]:
!pip install -q --upgrade transformers[onnx]==4.35.2 optimum sentencepiece

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.5/84.5 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m454.7/454.7 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.4/6.4 MB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.7/212.7 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━

- HuggingFace has an extension called Optimum which offers specialized model inference, including ONNX. We can use this to import and export ONNX models with `from_pretrained` and `save_pretrained`.
- We'll use [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) model from HuggingFace as an example
- In addition to `T5Model` we also need to save the tokenizer. This is the same for every model, these are assets needed for tokenization inside Spark NLP.
- If we want to optimize the model, a GPU will be needed. Make sure to select the correct runtime.
0

In [None]:
import transformers
# Model name, either HF (e.g. "google/flan-t5-base") or a local path
MODEL_NAME = "google/flan-t5-base"


# Path to store the exported models
EXPORT_PATH = f"onnx_models/{MODEL_NAME}"

In [None]:
# Export the model to ONNX using optimum

# Export with optimizations (uncomment next line)
# !optimum-cli export onnx --task text2text-generation-with-past --model {MODEL_NAME} --optimize O2 {EXPORT_PATH}
# IMPORTANT - there is a bug in onnxruntime which crashes it when trying to optimize a T5 small model (or any derivative of it)
# There are two ways to addess the problem:
# 1. Go to onnx_model_bert.py in the onnxruntime module (the full path depends on the module version),
#    find the BertOnnxModel class and comment the following line in the constructor:
#    assert (num_heads == 0 and hidden_size == 0) or (num_heads > 0 and hidden_size % num_heads == 0)
# 2. Disable optimization by removing '--optimize O2' (use line below).

# Export without optimizations
!optimum-cli export onnx --task text2text-generation-with-past --model {MODEL_NAME} {EXPORT_PATH}

2023-12-09 15:50:28.712604: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-09 15:50:28.712694: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-09 15:50:28.712744: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Framework not specified. Using pt to export to ONNX.
config.json: 100% 1.40k/1.40k [00:00<00:00, 5.73MB/s]
model.safetensors: 100% 990M/990M [00:12<00:00, 82.4MB/s]
generation_config.json: 100% 147/147 [00:00<00:00, 555kB/s]
tokenizer_config.json: 100% 2.54k/2.54k [00:00<00:00, 8.77MB/s]
spiece.model: 100% 792k/792k [00:00<00:00, 138MB/s]
tokenizer.json: 100% 

Let's have a look inside these two directories and see what we are dealing with:

In [None]:
!ls -l {EXPORT_PATH}

total 2283400
-rw-r--r-- 1 root root      1529 Dec  9 15:50 config.json
-rw-r--r-- 1 root root 651182887 Dec  9 15:51 decoder_model_merged.onnx
-rw-r--r-- 1 root root 650848962 Dec  9 15:51 decoder_model.onnx
-rw-r--r-- 1 root root 594197310 Dec  9 15:51 decoder_with_past_model.onnx
-rw-r--r-- 1 root root 438697389 Dec  9 15:50 encoder_model.onnx
-rw-r--r-- 1 root root       142 Dec  9 15:50 generation_config.json
-rw-r--r-- 1 root root      2201 Dec  9 15:50 special_tokens_map.json
-rw-r--r-- 1 root root    791656 Dec  9 15:50 spiece.model
-rw-r--r-- 1 root root     20771 Dec  9 15:50 tokenizer_config.json
-rw-r--r-- 1 root root   2422256 Dec  9 15:50 tokenizer.json


- As you can see, we need to move the sentence piece models `spiece.model` from the tokenizer to assets folder which Spark NLP will look for

In [None]:
! mkdir -p {EXPORT_PATH}/assets
! mv -t {EXPORT_PATH}/assets {EXPORT_PATH}/spiece.model

In [None]:
!ls -l {EXPORT_PATH}/assets

total 776
-rw-r--r-- 1 root root 791656 Dec  9 15:50 spiece.model


## Import and Save T5 in Spark NLP

- Let's install and setup Spark NLP in Google Colab
- This part is pretty easy via our simple script

In [None]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Installing PySpark 3.2.3 and Spark NLP 5.2.0
setup Colab for PySpark 3.2.3 and Spark NLP 5.2.0
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m281.5/281.5 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m548.5/548.5 kB[0m [31m41.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.7/199.7 kB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


Let's start Spark with Spark NLP included via our simple `start()` function

In [None]:
import sparknlp

# let's start Spark with Spark NLP
spark = sparknlp.start()

- Let's use `loadSavedModel` functon in `T5Transformer` which allows us to load the ONNX model
- Most params will be set automatically. They can also be set later after loading the model in `T5Transformer` during runtime, so don't worry about setting them now
- `loadSavedModel` accepts two params, first is the path to the exported model. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.st and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.

In [None]:
from sparknlp.annotator import *

T5 = T5Transformer.loadSavedModel(EXPORT_PATH, spark)\
  .setUseCache(True) \
  .setTask("summarize:") \
  .setMaxOutputLength(200)

Let's save it on disk so it is easier to be moved around and also be used later via `.load` function

In [None]:
T5.write().overwrite().save(f"{MODEL_NAME}_spark_nlp")

Let's clean up stuff we don't need anymore

In [None]:
!rm -rf {EXPORT_PATH}

Awesome  😎 !

This is your ONNX T5 model from HuggingFace 🤗  loaded and saved by Spark NLP 🚀

In [None]:
! ls -l {MODEL_NAME}_spark_nlp

total 1065292
-rw-r--r-- 1 root root 651282390 Dec  9 16:07 decoder.onxx
-rw-r--r-- 1 root root 438764467 Dec  9 16:07 encoder.onxx
drwxr-xr-x 2 root root      4096 Dec  9 16:07 metadata
-rw-r--r-- 1 root root    791656 Dec  9 16:07 t5_spp


Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny T5 model 😊

In [None]:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

test_data = spark.createDataFrame([
    ["Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a " +
       "downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness" +
       " of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this " +
       "paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework " +
       "that converts all text-based language problems into a text-to-text format. Our systematic study compares " +
       "pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens " +
       "of language understanding tasks. By combining the insights from our exploration with scale and our new " +
       "Colossal Clean Crawled Corpus, we achieve state-of-the-art results on many benchmarks covering " +
       "summarization, question answering, text classification, and more. To facilitate future work on transfer " +
       "learning for NLP, we release our data set, pre-trained models, and code."]
]).toDF("text")


document_assembler = DocumentAssembler() \
    .setInputCol("text")\
    .setOutputCol("document")

T5 = T5Transformer.load(f"{MODEL_NAME}_spark_nlp") \
  .setInputCols(["document"]) \
  .setOutputCol("summary")

pipeline = Pipeline().setStages([document_assembler, T5])

result = pipeline.fit(test_data).transform(test_data)
result.select("summary.result").show(truncate=False)

+-----------------------------------------------------------------------------------------------------------+
|result                                                                                                     |
+-----------------------------------------------------------------------------------------------------------+
|[We introduce a unified framework that converts text-to-text language problems into a text-to-text format.]|
+-----------------------------------------------------------------------------------------------------------+



That's it! You can now go wild and use hundreds of T5 models from HuggingFace 🤗 in Spark NLP 🚀
