![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_CLIP.ipynb)

# Import ONNX BLIP models from HuggingFace 🤗 into Spark NLP 🚀

Let's keep in mind a few things before we start 😊

- This feature is only in `Spark NLP 5.5.1` and after. So please make sure you have upgraded to the latest Spark NLP release
-  You can import BLIP models trained/fine-tuned for question answering via `TFBlipForQuestionAnswering`.
- Reference: [TFBlipForQuestionAnswering](https://huggingface.co/docs/transformers/en/model_doc/blip#transformers.TFBlipForQuestionAnswering)
- Some [example models](https://huggingface.co/models?pipeline_tag=visual-question-answering&sort=trending&search=BLIP)
- To execute this notebook on Google Colab you will need an A100 or similar instance

## Export and Save HuggingFace model

- We lock TensorFlow on `2.11.0` version and Transformers on `4.39.3`. This doesn't mean it won't work with the future releases, but we wanted you to know which versions have been tested successfully.

In [1]:
!pip install -q tensorflow==2.11.0

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m588.3/588.3 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m40.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m46.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m77.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m439.2/439.2 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.9/4.9 MB[0m [31m86.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m781.3/781.3 kB[0m [31m41.1 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following depend

- HuggingFace comes with a native `saved_model` feature inside `save_pretrained` function for TensorFlow based models. We will use that to save it as TF `SavedModel`.
- We'll use [Salesforce/blip-vqa-base](https://huggingface.co/Salesforce/blip-vqa-base) model from HuggingFace as an example
- In addition to `TFBlipForQuestionAnswering` we also need to save the `BlipProcessor`.

In [2]:
from PIL import Image
import requests
from transformers import BlipProcessor, TFBlipForQuestionAnswering
import tensorflow as tf

In [3]:
MODEL_NAME = "Salesforce/blip-vqa-base"

In [4]:
processor = BlipProcessor.from_pretrained(MODEL_NAME)
processor.save_pretrained("./{}_blip_processor".format(MODEL_NAME))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/445 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]



[]

In [5]:
try:
  print("TF model")
  model = TFBlipForQuestionAnswering.from_pretrained(MODEL_NAME)
except:
  print("TF model with pt" )
  model = TFBlipForQuestionAnswering.from_pretrained(MODEL_NAME, from_pt=True)

model.save_pretrained("./{}".format(MODEL_NAME), saved_model=True)

TF model


config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.54G [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBlipForQuestionAnswering: ['text_encoder.embeddings.position_ids', 'text_decoder.bert.embeddings.position_ids']
- This IS expected if you are initializing TFBlipForQuestionAnswering from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBlipForQuestionAnswering from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBlipForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBlipForQuestionAnswering for predictions without further training.


In [6]:
# Define TF Signature
@tf.function(
  input_signature=[
      {
          "pixel_values": tf.TensorSpec((1, None, None, None), tf.float32, name="pixel_values"),
          "input_ids": tf.TensorSpec((1, None), tf.int32, name="input_ids"),
          "attention_mask": tf.TensorSpec((1, None), tf.int64, name="attention_mask")
      }
  ]
)
def serving_fn(inputs):
   # Unpack the input dictionary and pass it to the model's generate function
    return model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        attention_mask=inputs.get("attention_mask", None)
    )

model.save_pretrained("./{}".format(MODEL_NAME), saved_model=True, signatures={"serving_default": serving_fn.get_concrete_function()})

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'NoneType' object has no attribute '_fields'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'NoneType' object has no attribute '_fields'


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


  return py_builtins.overload_of(f)(*args)


Let's have a look inside these two directories and see what we are dealing with:

In [7]:
!ls -l {MODEL_NAME}_blip_processor

total 936
-rw-r--r-- 1 root root    471 Oct  2 18:10 preprocessor_config.json
-rw-r--r-- 1 root root    695 Oct  2 18:10 special_tokens_map.json
-rw-r--r-- 1 root root   1348 Oct  2 18:10 tokenizer_config.json
-rw-r--r-- 1 root root 711396 Oct  2 18:10 tokenizer.json
-rw-r--r-- 1 root root 231508 Oct  2 18:10 vocab.txt


In [8]:
!ls -l {MODEL_NAME}

total 1503636
-rw-r--r-- 1 root root        664 Oct  2 18:18 config.json
-rw-r--r-- 1 root root        136 Oct  2 18:18 generation_config.json
drwxr-xr-x 3 root root       4096 Oct  2 18:14 saved_model
-rw-r--r-- 1 root root 1539703504 Oct  2 18:18 tf_model.h5


In [9]:
!ls -l {MODEL_NAME}/saved_model/1

total 61764
drwxr-xr-x 2 root root     4096 Oct  2 18:14 assets
-rw-r--r-- 1 root root       55 Oct  2 18:18 fingerprint.pb
-rw-r--r-- 1 root root   604021 Oct  2 18:18 keras_metadata.pb
-rw-r--r-- 1 root root 62626669 Oct  2 18:18 saved_model.pb
drwxr-xr-x 2 root root     4096 Oct  2 18:17 variables


So we need to move the files `preprocessor_config.json`, `tokenizer.json` and `vocab.txt` from processor to assets

- As you can see, we need the SavedModel from `saved_model/1/` path
- We also be needing `preprocessor_config.json`, `tokenizer.json` and `vocab.txt` from processor
- All we need is to just copy those files to `saved_model/1/assets` which Spark NLP will look for

In [10]:
!mv {MODEL_NAME}_blip_processor/preprocessor_config.json {MODEL_NAME}/saved_model/1/assets
!mv {MODEL_NAME}_blip_processor/tokenizer.json {MODEL_NAME}/saved_model/1/assets
!mv {MODEL_NAME}_blip_processor/vocab.txt  {MODEL_NAME}/saved_model/1/assets

Voila! We have our `preprocessor_config.json`, `tokenizer.json` and `vocab.txt` inside assets directory

In [11]:
!ls -l {MODEL_NAME}/saved_model/1/assets

total 928
-rw-r--r-- 1 root root    471 Oct  2 18:10 preprocessor_config.json
-rw-r--r-- 1 root root 711396 Oct  2 18:10 tokenizer.json
-rw-r--r-- 1 root root 231508 Oct  2 18:10 vocab.txt


## Import and Save BertForQuestionAnswering in Spark NLP

Let's install and setup Spark NLP in Google Colab
This part is pretty easy via our simple script

In [14]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [18]:
import sparknlp
# let's start Spark with Spark NLP
spark = sparknlp.start()

print("Apache Spark version: {}".format(spark.version))

  self.pid = _posixsubprocess.fork_exec(


Apache Spark version: 3.4.0


- Let's use `loadSavedModel` functon in `BLIPForQuestionAnswering` which allows us to load TensorFlow model in SavedModel format
- `loadSavedModel` accepts two params, first is the path to the TF SavedModel. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.

In [19]:
from sparknlp.annotator import *
from sparknlp.base import *

blip_for_question_answering = BLIPForQuestionAnswering.loadSavedModel(
     '{}/saved_model/1'.format(MODEL_NAME),
     spark
 )\
  .setSize(384)

Let's save it on disk so it is easier to be moved around and also be used later via .load function

In [20]:
blip_for_question_answering.write().overwrite().save("./{}_spark_nlp".format(MODEL_NAME))

Let's clean up stuff we don't need anymore

In [21]:
!rm -rf {MODEL_NAME}_blip_processor {MODEL_NAME}

Awesome 😎  !

This is your BLIPForQuestionAnswering model from HuggingFace 🤗  loaded and saved by Spark NLP 🚀

In [22]:
! ls -l {MODEL_NAME}_spark_nlp

total 1563412
-rw-r--r-- 1 root root 1600921187 Oct  2 18:42 blip_vqa_tensorflow
drwxr-xr-x 4 root root       4096 Oct  2 18:41 fields
drwxr-xr-x 2 root root       4096 Oct  2 18:41 metadata


Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny BLIPForQuestionAnswering model in Spark NLP 🚀 pipeline!

Let's try with a public image of cats

In [23]:
!wget -O /content/cat_image.jpg "http://images.cocodataset.org/val2017/000000039769.jpg"

--2024-10-02 18:42:30--  http://images.cocodataset.org/val2017/000000039769.jpg
Resolving images.cocodataset.org (images.cocodataset.org)... 3.5.27.152, 3.5.29.161, 16.182.34.49, ...
Connecting to images.cocodataset.org (images.cocodataset.org)|3.5.27.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 173131 (169K) [image/jpeg]
Saving to: ‘/content/cat_image.jpg’


2024-10-02 18:42:31 (312 KB/s) - ‘/content/cat_image.jpg’ saved [173131/173131]



In [24]:
!mkdir images
!mv cat_image.jpg images

To proceed, please create a DataFrame with two columns:

- An `image` column that contains the file path for each image in the directory.
- A `text` column where you can input the specific question you would like to ask about each image.

In [25]:
from pyspark.sql.functions import lit

images_path = "./images/"
image_df = spark.read.format("image").load(path=images_path)

test_df = image_df.withColumn("text", lit("What's this picture about?"))
test_df.show()

+--------------------+--------------------+
|               image|                text|
+--------------------+--------------------+
|{file:///content/...|What's this pictu...|
+--------------------+--------------------+



Now let's build our `BLIPForQuestionAnswering` pipeline

In [26]:
imageAssembler = ImageAssembler() \
  .setInputCol("image") \
  .setOutputCol("image_assembler") \

imageClassifier = BLIPForQuestionAnswering.load("./{}_spark_nlp".format(MODEL_NAME)) \
  .setInputCols("image_assembler") \
  .setOutputCol("answer") \
  .setSize(384)

pipeline = Pipeline(
    stages=[
        imageAssembler,
        imageClassifier,
    ]
)

In [27]:
model = pipeline.fit(test_df)
result = model.transform(test_df)

In [28]:
result.select("image_assembler.origin", "answer.result").show(truncate = False)

+--------------------------------------+------+
|origin                                |result|
+--------------------------------------+------+
|[file:///content/images/cat_image.jpg]|[cats]|
+--------------------------------------+------+



That's it! You can now go wild and use hundreds of `BLIPForQuestionAnswering` models from HuggingFace 🤗 in Spark NLP 🚀
