![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/openvino/HuggingFace_OpenVINO_in_Spark_NLP_Qwen2VL.ipynb)

# Import OpenVINO Qwen2VL models from HuggingFace 🤗 into Spark NLP 🚀

This notebook provides a detailed walkthrough on optimizing and importing Qwen2VL models from HuggingFace  for use in Spark NLP, with [Intel OpenVINO toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html). The focus is on converting the model to the OpenVINO format and applying precision optimizations (INT8 and INT4), to enhance the performance and efficiency on CPU platforms using [Optimum Intel](https://huggingface.co/docs/optimum/main/en/intel/inference).

Let's keep in mind a few things before we start 😊

- OpenVINO support was introduced in  `Spark NLP 5.4.0`, enabling high performance CPU inference for models. So please make sure you have upgraded to the latest Spark NLP release.
- Model quantization is a computationally expensive process, so it is recommended to use a runtime with more than 32GB memory for exporting the quantized model from HuggingFace.
- You can import Qwen2VL models via `Qwen2VL`. These models are usually under `Text Generation` category and have `Qwen2VL` in their labels.
- Reference: [Qwen2VL](https://huggingface.co/docs/transformers/model_doc/llama#transformers.Qwen2VL)
- Some [example models](https://huggingface.co/models?search=Qwen2VL)

## 1. Environment Setup

This notebook installs and configures the dependencies required to load, optimize, and run Qwen2-VL models using OpenVINO and Hugging Face Transformers.

In [None]:
from tqdm import tqdm
import subprocess

def pip_install(package_list):
    for pkg in tqdm(package_list, desc="Installing packages", ncols=100):
        subprocess.run(["pip", "install", *pkg.split()], stdout=subprocess.DEVNULL)

packages = [
    'openvino>=2024.4.0',
    'nncf>=2.13.0',
    'sentencepiece',
    'tokenizers>=0.12.1',
    'transformers==4.45.0',
    'accelerate>=0.26.0',
    '--pre --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly openvino openvino-tokenizers openvino-genai',
    'torch>=2.2.1',
    'torchvision>=0.10.2',
    'qwen-vl-utils'
]

pip_install(packages)

Installing packages: 100%|██████████████████████████████████████████| 10/10 [02:56<00:00, 17.70s/it]


In [None]:
from pathlib import Path
import requests
from tqdm import tqdm

files_to_download = {
    "ov_qwen2_vl.py": "https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/e0aa6c599bb9c88de0c8758aef967a6c05ad27b6/notebooks/qwen2-vl/ov_qwen2_vl.py",
    "notebook_utils.py": "https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/e0aa6c599bb9c88de0c8758aef967a6c05ad27b6/utils/notebook_utils.py"
}

for filename, url in tqdm(files_to_download.items(), desc="Downloading utility files", ncols=100):
    if not Path(filename).exists():
        response = requests.get(url)
        Path(filename).write_text(response.text)

Downloading utility files: 100%|██████████████████████████████████████| 2/2 [00:00<00:00,  3.12it/s]


In [None]:
# Third-party libraries
import torch
import torch.nn as nn
import openvino as ov
import nncf

# Transformers (Hugging Face)
from transformers import AutoConfig, AutoProcessor
from transformers.models.qwen2_vl.modeling_qwen2_vl import VisionRotaryEmbedding

# Local project modules
from ov_qwen2_vl import convert_qwen2vl_model, model_selector



## Convert the model to OpenVino

In [None]:
model_id = "numind/NuExtract-2.0-2B"
print(f"Selected model: {model_id}")

pt_model_id = model_id
model_dir = Path(model_id.split("/")[-1])

model_dir

Selected model: numind/NuExtract-2.0-2B


PosixPath('NuExtract-2.0-2B')

In [None]:
compression_configuration = {
    "mode": nncf.CompressWeightsMode.INT4_ASYM,
    "group_size": 128,
    "ratio": 1.0,
}

convert_qwen2vl_model(pt_model_id, model_dir, compression_configuration)

⌛ numind/NuExtract-2.0-2B conversion started. Be patient, it may takes some time.
⌛ Load Original model


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}


model.safetensors:   0%|          | 0.00/4.42G [00:00<?, ?B/s]

`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46


generation_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/573 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/392 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

chat_template.json: 0.00B [00:00, ?B/s]

✅ Original model successfully loaded
⌛ Convert Input embedding model
✅ Input embedding model successfully converted
⌛ Convert Language model


  if sequence_length != 1:


✅ Language model successfully converted
⌛ Weights compression with int4_asym mode started
INFO:nncf:Statistics of the bitwidth distribution:
┍━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Weight compression mode   │ % all parameters (layers)   │ % ratio-defining parameters (layers)   │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ int8_asym                 │ 15% (1 / 197)               │ 0% (0 / 196)                           │
├───────────────────────────┼─────────────────────────────┼────────────────────────────────────────┤
│ int4_asym                 │ 85% (196 / 197)             │ 100% (196 / 196)                       │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙


Output()

✅ Weights compression finished
⌛ Convert Image embedding model
⌛ Weights compression with int4_asym mode started
INFO:nncf:Statistics of the bitwidth distribution:
┍━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Weight compression mode   │ % all parameters (layers)   │ % ratio-defining parameters (layers)   │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ int8_asym                 │ 1% (1 / 130)                │ 0% (0 / 129)                           │
├───────────────────────────┼─────────────────────────────┼────────────────────────────────────────┤
│ int4_asym                 │ 99% (129 / 130)             │ 100% (129 / 129)                       │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙


Output()

✅ Weights compression finished
✅ Image embedding model successfully converted
✅ numind/NuExtract-2.0-2B model conversion finished. You can find results in NuExtract-2.0-2B


In [None]:
# Patch reshape model
class Qwen2ReshapePatches(nn.Module):
    def __init__(self, temporal_patch_size: int = 2, merge_size: int = 2, patch_size: int = 14):
        super().__init__()
        self.temporal_patch_size = temporal_patch_size
        self.merge_size = merge_size
        self.patch_size = patch_size

    def forward(self, patches, repetition_factor=1):
        patches = patches.repeat(repetition_factor, 1, 1, 1)
        channel = patches.shape[1]
        grid_t = patches.shape[0] // self.temporal_patch_size
        resized_height = patches.shape[2]
        resized_width = patches.shape[3]
        grid_h = resized_height // self.patch_size
        grid_w = resized_width // self.patch_size

        patches = patches.reshape(
            grid_t,
            self.temporal_patch_size,
            channel,
            grid_h // self.merge_size,
            self.merge_size,
            self.patch_size,
            grid_w // self.merge_size,
            self.merge_size,
            self.patch_size,
        )
        patches = patches.permute(0, 3, 6, 4, 7, 2, 1, 5, 8)
        flatten_patches = patches.reshape(
            grid_t * grid_h * grid_w,
            channel * self.temporal_patch_size * self.patch_size * self.patch_size
        )
        return flatten_patches

patch_reshape_model = Qwen2ReshapePatches()

ov_model = ov.convert_model(
    patch_reshape_model,
    example_input={
        "patches": torch.ones((1, 3, 1372, 2044), dtype=torch.float32),
        "repetition_factor": torch.tensor(2),
    }
)
ov.save_model(ov_model, model_dir / "openvino_patch_reshape_model.xml")

# Rotary embedding
config = AutoConfig.from_pretrained(model_id)

class RotaryEmbedding(nn.Module):
    def __init__(self, embed_dim, spatial_merge_size):
        super().__init__()
        self._rotary_pos_emb = VisionRotaryEmbedding(embed_dim)
        self.spatial_merge_size = spatial_merge_size

    def forward(self, grid_thw):
        t, h, w = grid_thw
        pos_ids = []

        hpos_ids = torch.arange(h).unsqueeze(1).expand(-1, w)
        hpos_ids = hpos_ids.reshape(
            h // self.spatial_merge_size,
            self.spatial_merge_size,
            w // self.spatial_merge_size,
            self.spatial_merge_size,
        )
        hpos_ids = hpos_ids.permute(0, 2, 1, 3).flatten()

        wpos_ids = torch.arange(w).unsqueeze(0).expand(h, -1)
        wpos_ids = wpos_ids.reshape(
            h // self.spatial_merge_size,
            self.spatial_merge_size,
            w // self.spatial_merge_size,
            self.spatial_merge_size,
        )
        wpos_ids = wpos_ids.permute(0, 2, 1, 3).flatten()

        pos_ids.append(torch.stack([hpos_ids, wpos_ids], dim=-1).repeat(t, 1))
        pos_ids = torch.cat(pos_ids, dim=0)

        max_grid_size = grid_thw.max()
        rotary_pos_emb_full = self._rotary_pos_emb(max_grid_size)
        rotary_pos_emb = rotary_pos_emb_full[pos_ids].flatten(1)

        return rotary_pos_emb

vision_rotary_embedding = RotaryEmbedding(
    config.vision_config.embed_dim // config.vision_config.num_heads // 2,
    config.vision_config.spatial_merge_size
)

vision_embedding_ov = ov.convert_model(
    vision_rotary_embedding,
    example_input={
        "grid_thw": torch.tensor([1, 98, 146]),
    }
)
ov.save_model(vision_embedding_ov, model_dir / "openvino_rotary_embeddings_model.xml")

# Multimodal merge module
class MergeMultiModalInputs(nn.Module):
    def __init__(self, image_token_index=151655):
        super().__init__()
        self.image_token_index = image_token_index

    def forward(self, vision_embeds, inputs_embeds, input_ids):
        image_features = vision_embeds
        inputs_embeds = inputs_embeds
        special_image_mask = (input_ids == self.image_token_index).unsqueeze(-1).expand_as(inputs_embeds)
        final_embedding = inputs_embeds.masked_scatter(special_image_mask, image_features)
        return {"inputs_embeds": final_embedding}

torch_model_merge = MergeMultiModalInputs()

ov_model_merge = ov.convert_model(
    torch_model_merge,
    example_input={
        "vision_embeds": torch.randn((3577, 1536), dtype=torch.float32),
        "inputs_embeds": torch.randn((1, 3602, 1536), dtype=torch.float32),
        "input_ids": torch.randint(0, 151656, (1, 3602), dtype=torch.long),
    }
)
ov.save_model(ov_model_merge, model_dir / "openvino_multimodal_merge_model.xml")

Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
  t, h, w = grid_thw


### 1.2 Load openvino models

In [None]:
# Model filenames
LANGUAGE_MODEL_NAME = "openvino_language_model.xml"
IMAGE_EMBEDDING_NAME = "openvino_vision_embeddings_model.xml"
IMAGE_EMBEDDING_MERGER_NAME = "openvino_vision_embeddings_merger_model.xml"
TEXT_EMBEDDING_NAME = "openvino_text_embeddings_model.xml"
ROTARY_EMBEDDING_NAME = "openvino_rotary_embeddings_model.xml"
PATCH_RESHAPE_NAME = "openvino_patch_reshape_model.xml"

# Load OpenVINO models
core = ov.Core()
model_path = model_dir

language_model = core.read_model(model_path / LANGUAGE_MODEL_NAME)
compiled_language_model = core.compile_model(language_model, "CPU")
request = compiled_language_model.create_infer_request()

image_embedding = core.compile_model(model_path / IMAGE_EMBEDDING_NAME, "CPU")
image_embedding_merger = core.compile_model(model_path / IMAGE_EMBEDDING_MERGER_NAME, "CPU")
text_embedding = core.compile_model(model_path / TEXT_EMBEDDING_NAME, "CPU")
rotary_embedding = core.compile_model(model_path / ROTARY_EMBEDDING_NAME, "CPU")
patch_reshape = core.compile_model(model_path / PATCH_RESHAPE_NAME, "CPU")

# Check if all required model files exist
print("Check if all models are converted")

language_model_path = model_path / LANGUAGE_MODEL_NAME
image_embed_path = model_path / IMAGE_EMBEDDING_NAME
image_merger_path = model_path / IMAGE_EMBEDDING_MERGER_NAME
text_embed_path = model_path / TEXT_EMBEDDING_NAME
rotary_embed_path = model_path / ROTARY_EMBEDDING_NAME
patch_reshape_path = model_path / PATCH_RESHAPE_NAME

if all([
    language_model_path.exists(),
    image_embed_path.exists(),
    image_merger_path.exists(),
    text_embed_path.exists(),
    rotary_embed_path.exists(),
    patch_reshape_path.exists()
]):
    print(f"All models are converted. You can find results in {model_path}")
else:
    print("Not all models are converted. Please check the conversion process.")


Check if all models are converted
All models are converted. You can find results in NuExtract-2.0-2B


### 1.2 Copy assets to the assets folder

In [None]:
import shutil

assets_dir = model_dir / "assets"
assets_dir.mkdir(exist_ok=True)

for file in model_dir.glob("*.json"):
    shutil.copy(file, assets_dir)

## Import and Save Qwen2VL in Spark NLP

- Let's install and setup Spark NLP in Google Colab
- This part is pretty easy via our simple script

**Restart the session to run the below code in colab as it needs more RAM !!**

In [None]:
!wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Installing PySpark 3.4.4 and Spark NLP 6.0.5
setup Colab for PySpark 3.4.4 and Spark NLP 6.0.5
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.4/311.4 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m718.9/718.9 kB[0m [31m56.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dataproc-spark-connect 0.8.2 requires pyspark[connect]~=3.5.1, but you have pyspark 3.4.4 which is incompatible.[0m[31m
[0m

Let's start Spark with Spark NLP included via our simple `start()` function

In [None]:
import sparknlp

spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)

Spark NLP version:  6.0.5
Apache Spark version:  3.4.4


- Let's use `loadSavedModel` functon in `AlbertEmbeddings` which allows us to load the ONNX model
- Most params will be set automatically. They can also be set later after loading the model in `AlbertEmbeddings` during runtime, so don't worry about setting them now
- `loadSavedModel` accepts two params, first is the path to the exported model. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- `setStorageRef` is very important. When you are training a task like NER or any Text Classification, we use this reference to bound the trained model to this specific embeddings so you won't load a different embeddings by mistake and see terrible results 😊
- It's up to you what you put in `setStorageRef` but it cannot be changed later on. We usually use the name of the model to be clear, but you can get creative if you want!
- The `dimension` param is is purely cosmetic and won't change anything. It's mostly for you to know later via `.getDimension` what is the dimension of your model. So set this accordingly.
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.st and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.


In [None]:
from pathlib import Path

model_id = "numind/NuExtract-2.0-2B"
model_path = Path(model_id.split("/")[-1])

In [None]:
from sparknlp.annotator import Qwen2VLTransformer

imageClassifier = Qwen2VLTransformer.loadSavedModel(str(model_path),spark) \
            .setInputCols("image_assembler") \
            .setOutputCol("answer")

Let's save it on disk so it is easier to be moved around and also be used later via `.load` function

In [None]:
imageClassifier.write().overwrite().save(f"{model_id.replace('-','_')}_spark_nlp")

Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny ALBERT model 😊

In [None]:
import os
from pathlib import Path
from pyspark.sql.functions import lit
from pyspark.ml import Pipeline
from sparknlp.annotator import *
from sparknlp.base import *

url1 = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
url2 = "http://images.cocodataset.org/val2017/000000039769.jpg"

Path("images").mkdir(exist_ok=True)

!wget -q -O images/image1.jpg {url1}
!wget -q -O images/image2.jpg {url2}

images_path = "file://" + os.getcwd() + "/images/"
image_df = spark.read.format("image").load(path=images_path)

prompt = (
    "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
    "<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>"
    "Describe this image.<|im_end|>\n<|im_start|>assistant\n"
)
test_df = image_df.withColumn("text", lit(prompt))

image_assembler = ImageAssembler() \
    .setInputCol("image") \
    .setOutputCol("image_assembler")

imageClassifier = Qwen2VLTransformer.load(f"{model_id.replace('-', '_')}_spark_nlp") \
    .setMaxOutputLength(50) \
    .setInputCols("image_assembler") \
    .setOutputCol("answer")

pipeline = Pipeline(stages=[image_assembler, imageClassifier])
model = pipeline.fit(test_df)

If you encounter an error at this step, try restarting the runtime,. it’s likely due to low RAM.

In [None]:
from sparknlp.base import LightPipeline

image_path = os.path.join(os.getcwd(), "images", "image1.jpg")

# Run inference with LightPipeline (for fast, local inference on small inputs)
prompt = (
    "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
    "<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>"
    "Describe this image.<|im_end|>\n<|im_start|>assistant\n"
)

light_pipeline = LightPipeline(model)
annotations_result = light_pipeline.fullAnnotateImage(image_path, prompt)

for result in annotations_result:
    print(result["answer"])

[Annotation(document, 0, 234, The image shows a cat lying inside a cardboard box. The cat has a relaxed posture, with its paws tucked under its body and its head resting on its front paws. The box is positioned on a light-colored carpet, and the background includes, Map(), [])]


That's it! You can now go wild and use hundreds of Qwen2VL models from HuggingFace 🤗 in Spark NLP 🚀


Additionally, you can zip the model and use it locally using `.load()`

In [None]:
import shutil

NEW_MODEL_NAME = "nuextract_2.0_2B"

MODEL_PATH_ZIP = shutil.make_archive(
    base_name=NEW_MODEL_NAME,
    format='zip',
    root_dir='/content/numind/NuExtract_2.0_2B_spark_nlp'
)

In [None]:
!unzip -l nuextract_2.0_2B.zip

Archive:  nuextract_2.0_2B.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2025-07-18 21:37   fields/
        0  2025-07-18 21:37   metadata/
    23572  2025-07-18 21:38   .openvino_vision_embeddings_model.xml.crc
  3015902  2025-07-18 21:38   openvino_vision_embeddings_model.xml
  3647056  2025-07-18 21:39   .openvino_text_embeddings_model.xml.crc
  2747308  2025-07-18 21:38   .openvino_vision_embeddings_merger_model.xml.crc
  7176132  2025-07-18 21:38   .openvino_language_model.xml.crc
466821888  2025-07-18 21:39   openvino_text_embeddings_model.xml
      200  2025-07-18 21:37   .openvino_patch_reshape_model.xml.crc
    30738  2025-07-18 21:39   openvino_rotary_embeddings_model.xml
      252  2025-07-18 21:39   .openvino_rotary_embeddings_model.xml.crc
    24299  2025-07-18 21:37   openvino_patch_reshape_model.xml
918543427  2025-07-18 21:38   openvino_language_model.xml
    10423  2025-07-18 21:39   openvino_multimodal_merge_model.xml
351653924  