![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/openvino/HuggingFace_OpenVINO_in_Spark_NLP_MiniLM.ipynb)

# Import OpenVINO MiniLM models from HuggingFace 🤗 into Spark NLP 🚀

This notebook provides a detailed walkthrough on optimizing and importing MiniLM models from HuggingFace  for use in Spark NLP, with [Intel OpenVINO toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html). The focus is on converting the model to the OpenVINO format and applying precision optimizations (INT8 and INT4), to enhance the performance and efficiency on CPU platforms using [Optimum Intel](https://huggingface.co/docs/optimum/main/en/intel/inference).

Let's keep in mind a few things before we start 😊

- OpenVINO support was introduced in  `Spark NLP 5.4.0`, enabling high performance CPU inference for models. So please make sure you have upgraded to the latest Spark NLP release.
- Model quantization is a computationally expensive process, so it is recommended to use a runtime with more than 32GB memory for exporting the quantized model from HuggingFace.
- You can import LLama models via `MiniLMModel`. These models are usually under `Text Generation` category and have `MiniLM` in their labels.
- Some [example models](https://huggingface.co/models?search=MiniLM)

## 1. Export and Save the HuggingFace model

- Let's install `transformers` and `openvino` packages with other dependencies. You don't need `openvino` to be installed for Spark NLP, however, we need it to load and save models from HuggingFace.
- We lock `transformers` on version `4.43.4`. This doesn't mean it won't work with the future release, but we wanted you to know which versions have been tested successfully.

In [None]:
import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [None]:
# # Install OpenVINO and NNCF for model optimization
import platform

%pip install -q "einops" "torch>2.1" "torchvision" "matplotlib>=3.4" "timm>=0.9.8" "transformers==4.41.2" "pillow" "gradio>=4.19" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q -U --pre "openvino>=2025.0" "openvino-tokenizers>=2025.0" "openvino-genai>=2025.0" --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
%pip install -q "accelerate" "nncf>=2.14.0" "git+https://github.com/huggingface/optimum-intel.git" --extra-index-url https://download.pytorch.org/whl/cpu

if platform.system() == "Darwin":
    %pip install -q "numpy<2.0.0"

[Optimum Intel](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#openvino) is the interface between the Transformers library and the various model optimization and acceleration tools provided by Intel. HuggingFace models loaded with optimum-intel are automatically optimized for OpenVINO, while being compatible with the Transformers API. It also offers the ability to perform weight compression during export.
- To load a HuggingFace model directly for inference/export, just replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. We can use this to import and export OpenVINO models with `from_pretrained` and `save_pretrained`.
- By setting `export=True`, the source model is converted to OpenVINO IR format on the fly.
- We'll use [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) model from HuggingFace as an example.
- In addition to `MiniLM` we also need to save the tokenizer. This is the same for every model, these are assets needed for tokenization inside Spark NLP.

### Exporting to OpenVINO IR in INT4 Precision

Passing the `weight-format` parameter applies 4-bit quantization on the model weights.

In [None]:
model_id = "sentence-transformers/all-MiniLM-L6-v2"
output_dir = f"./models/int4/{model_id}"

!optimum-cli export openvino --model {model_id} --weight-format int4 {output_dir}


In [None]:
from transformers import AutoTokenizer, AutoConfig

model_id = "sentence-transformers/all-MiniLM-L6-v2"
output_dir = f"./models/int4/{model_id}"

tokenizer = AutoTokenizer.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)

tokenizer.save_pretrained(f"{output_dir}/assets")
config.save_pretrained(f"{output_dir}/assets")

In [None]:
!ls -l {output_dir}

In [None]:
!ls -l {output_dir}/assets

## 2. Import and Save MiniLM in Spark NLP

- Let's install and setup Spark NLP in Google Colab
- This part is pretty easy via our simple script

In [None]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Let's start Spark with Spark NLP included via our simple `start()` function

In [None]:
import sparknlp

# let's start Spark with Spark NLP
spark = sparknlp.start()

- Let's use `loadSavedModel` functon in `LLAMA2Transformer` which allows us to load the OpenVINO model.
- Most params will be set automatically. They can also be set later after loading the model in `LLAMA2Transformer` during runtime, so don't worry about setting them now.
- `loadSavedModel` accepts two params, first is the path to the exported model. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.st and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.

In [None]:
from sparknlp.annotator import *

MiniLM = MiniLMEmbeddings \
    .loadSavedModel(str(output_dir), spark) \
    .setInputCols(["documents"]) \
    .setOutputCol("minilm")

Let's save it on disk so it is easier to be moved around and also be used later via `.load` function

In [10]:
MiniLM.write().overwrite().save(f"file:///tmp/{model_id}_spark_nlp")

Let's clean up stuff we don't need anymore

Awesome  😎 !

This is your OpenVINO MiniLM model from HuggingFace 🤗  loaded and saved by Spark NLP 🚀

In [8]:
model_id = "sentence-transformers/all-MiniLM-L6-v2"


In [11]:
! ls -l /tmp/{model_id}_spark_nlp

total 17872
drwxr-xr-x 3 prabod prabod     4096 Jun 23 09:48 fields
drwxr-xr-x 2 prabod prabod     4096 Jun 23 09:48 metadata
-rw-r--r-- 1 prabod prabod 18291023 Jun 23 09:48 minilm_openvino


Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny Llama2 model 😊

In [None]:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

test_data = spark.createDataFrame([
            [1, "This is a sample sentence for embedding generation."],
            [2, "Another example sentence to demonstrate MiniLM embeddings."],
            [3, "MiniLM is a lightweight and efficient sentence embedding model that can generate text embeddings for various NLP tasks."],
            [4, "The model achieves comparable results with BERT-base while being much smaller and faster."]
        ]).toDF("id", "text")


document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("documents")

MiniLM = (
    MiniLMEmbeddings.load(f"file:///tmp/{model_id}_spark_nlp")
    .setInputCols(["documents"])
    .setOutputCol("minilm")
)

pipeline = Pipeline().setStages([document_assembler, MiniLM])
results = pipeline.fit(test_data).transform(test_data)

results.select("minilm.result").show(truncate=False)

That's it! You can now go wild and use hundreds of MiniLM models from HuggingFace 🤗 in Spark NLP 🚀
