![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_OLMO.ipynb)

## Import ONNX OLMO models from HuggingFace 🤗 into Spark NLP 🚀

Let's keep in mind a few things before we start 😊

- ONNX support was introduced in  `Spark NLP 5.0.0`, enabling high performance inference for models.
- You can import OLMO models via `OLMOModel`. These models are usually under `Text2Text Generation` category and have `OLMO` in their labels
- This is a very computationally expensive module especially on larger sequence. The use of an accelerator such as GPU is recommended.
- Reference: [OLMOModel](https://huggingface.co/docs/transformers/en/model_doc/OLMO)
- Some [example models](https://huggingface.co/models?other=OLMO)

## Export and Save HuggingFace model

- Let's install `transformers` package with the `onnx` extension and it's dependencies. You don't need `onnx` to be installed for Spark NLP, however, we need it to load and save models from HuggingFace.
- We lock `transformers` on version `4.41.0`. This doesn't mean it won't work with the future releases
- We will also need `sentencepiece` for tokenization.

In [None]:
!pip install -q --upgrade transformers[onnx]==4.41.0
!pip install optimum sentencepiece onnx onnxruntime ai2-olmo

- HuggingFace has an extension called Optimum which offers specialized model inference, including ONNX. We can use this to import and export ONNX models with `from_pretrained` and `save_pretrained`.
- We'll use [allenai/OLMo-1B-hf](https://huggingface.co/allenai/OLMo-1B-hf) model from HuggingFace as an example
- In addition to `OLMO` we also need to save the tokenizer. This is the same for every model, these are assets needed for tokenization inside Spark NLP.
- If we want to optimize the model, a GPU will be needed. Make sure to select the correct runtime.
0

In [3]:
import transformers
MODEL_NAME = "allenai/OLMo-1B-hf"


# Path to store the exported models
EXPORT_PATH = f"onnx_models/{MODEL_NAME}"

In [4]:
!optimum-cli export onnx  --trust-remote-code --task text-generation --model {MODEL_NAME} {EXPORT_PATH} 

config.json: 100%|█████████████████████████████| 632/632 [00:00<00:00, 38.6kB/s]
model.safetensors: 100%|███████████████████| 4.71G/4.71G [03:24<00:00, 23.1MB/s]
generation_config.json: 100%|██████████████████| 116/116 [00:00<00:00, 12.9kB/s]
tokenizer_config.json: 100%|████████████████| 5.37k/5.37k [00:00<00:00, 698kB/s]
tokenizer.json: 100%|██████████████████████| 2.12M/2.12M [00:00<00:00, 2.45MB/s]
special_tokens_map.json: 100%|███████████████| 65.0/65.0 [00:00<00:00, 25.5kB/s]
  if sequence_length != 1:
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
		-[x] values not close enough, max diff: 0.0007228851318359375 (atol: 0.0001)
- logits: max diff = 0.0007228851318359375.
 The exported model was saved at: onnx_models/allenai/OLMo-1B-hf


Let's have a look inside these two directories and see what we are dealing with:

In [5]:
!ls -l {EXPORT_PATH}

total 5001720
-rw-rw-r-- 1 prabod prabod        646 Feb 12 03:51 config.json
-rw-rw-r-- 1 prabod prabod        111 Feb 12 03:51 generation_config.json
-rw-rw-r-- 1 prabod prabod     468660 Feb 12 03:52 model.onnx
-rw-rw-r-- 1 prabod prabod 5119148032 Feb 12 03:52 model.onnx_data
-rw-rw-r-- 1 prabod prabod        293 Feb 12 03:51 special_tokens_map.json
-rw-rw-r-- 1 prabod prabod       5372 Feb 12 03:51 tokenizer_config.json
-rw-rw-r-- 1 prabod prabod    2115417 Feb 12 03:51 tokenizer.json


- As you can see, we need to move the sentence piece models `spiece.model` from the tokenizer to assets folder which Spark NLP will look for

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
from pathlib import Path
model_id = 'allenai/OLMo-1B-hf'

tokenizer = AutoTokenizer.from_pretrained(model_id,trust_remote_code=True)
config = AutoConfig.from_pretrained(model_id,trust_remote_code=True)


ASSETS_PATH = f"{EXPORT_PATH}/assets"



# make sure the directory exists
Path(ASSETS_PATH).mkdir(parents=True, exist_ok=True)

config.save_pretrained(ASSETS_PATH)
tokenizer.save_vocabulary(ASSETS_PATH)

tokenizer.save_pretrained(ASSETS_PATH)



('onnx_models/allenai/OLMo-1B-hf/assets/tokenizer_config.json',
 'onnx_models/allenai/OLMo-1B-hf/assets/special_tokens_map.json',
 'onnx_models/allenai/OLMo-1B-hf/assets/tokenizer.json')

In [5]:
! mkdir -p {EXPORT_PATH}/assets
! mv -t {EXPORT_PATH}/assets {EXPORT_PATH}/merges.txt

In [7]:
import json
with open(f"{ASSETS_PATH}/vocab.json", "r") as F:
    vocab_json = json.load(F)
    vocab = ["" for i in range(len(vocab_json))]
    for word in vocab_json:
        vocab[vocab_json[word]] = word
    with open(f"{ASSETS_PATH}/vocab.txt", "w") as F2:
        F2.writelines(map(lambda x: str(x) + "\n", vocab))

In [8]:
!ls -l {EXPORT_PATH}/assets

total 3716
-rw-rw-r-- 1 prabod prabod     673 Feb 12 03:59 config.json
-rw-rw-r-- 1 prabod prabod  456598 Feb 12 03:59 merges.txt
-rw-rw-r-- 1 prabod prabod     293 Feb 12 03:59 special_tokens_map.json
-rw-rw-r-- 1 prabod prabod    5372 Feb 12 03:59 tokenizer_config.json
-rw-rw-r-- 1 prabod prabod 2115417 Feb 12 03:59 tokenizer.json
-rw-rw-r-- 1 prabod prabod  799451 Feb 12 03:59 vocab.json
-rw-rw-r-- 1 prabod prabod  407614 Feb 12 04:00 vocab.txt


In [9]:
import onnx
# from onnxruntime import quantization as ort_quantization
from onnxruntime.quantization.matmul_4bits_quantizer import MatMul4BitsQuantizer

Path(f'onnx_models/{model_id}_int4').mkdir(parents=True, exist_ok=True)

model = onnx.load_model(f"onnx_models/{model_id}/model.onnx", load_external_data=True)
quant = MatMul4BitsQuantizer(
    model=model,
    block_size=32,
    is_symmetric=True,
    nodes_to_exclude=[],
)
quant.process()
quant.model.save_model_to_file(f'onnx_models/{model_id}_int4/model.onnx', use_external_data_format=True)

2025-02-12 04:30:03,971 onnxruntime.quantization.matmul_4bits_quantizer [INFO] - start to quantize /model/layers.0/self_attn/q_proj/MatMul ...
2025-02-12 04:30:03,994 onnxruntime.quantization.matmul_4bits_quantizer [INFO] - complete quantization of /model/layers.0/self_attn/q_proj/MatMul ...
2025-02-12 04:30:03,995 onnxruntime.quantization.matmul_4bits_quantizer [INFO] - start to quantize /model/layers.0/self_attn/k_proj/MatMul ...
2025-02-12 04:30:04,016 onnxruntime.quantization.matmul_4bits_quantizer [INFO] - complete quantization of /model/layers.0/self_attn/k_proj/MatMul ...
2025-02-12 04:30:04,017 onnxruntime.quantization.matmul_4bits_quantizer [INFO] - start to quantize /model/layers.0/self_attn/v_proj/MatMul ...
2025-02-12 04:30:04,039 onnxruntime.quantization.matmul_4bits_quantizer [INFO] - complete quantization of /model/layers.0/self_attn/v_proj/MatMul ...
2025-02-12 04:30:04,041 onnxruntime.quantization.matmul_4bits_quantizer [INFO] - start to quantize /model/layers.0/self_a

In [10]:
model_id = 'allenai/OLMo-1B-hf'

In [None]:
import onnx
model = onnx.load(f"onnx_models/{model_id}_int4/model.onnx")
EXPORT_PATH = f"onnx_models/{model_id}_int4"
onnx.save_model(model, f"{EXPORT_PATH}/decoder_model.onnx", save_as_external_data=True, all_tensors_to_one_file=True, location="_olmo_decoder_model.onnx_data", size_threshold=1024, convert_attribute=False)


In [None]:
!rm -rf {EXPORT_PATH}/model.onnx {EXPORT_PATH}/model.onnx_data

In [10]:
#copy the assets
!cp -r onnx_models/{model_id}/assets onnx_models/{model_id}_int4/assets

In [12]:
!pip install /home/prabod/Projects/spark-nlp/python/dist/spark_nlp-5.5.3-py2.py3-none-any.whl pyspark==3.2.3

Processing /home/prabod/Projects/spark-nlp/python/dist/spark_nlp-5.5.3-py2.py3-none-any.whl
Collecting pyspark==3.2.3
  Using cached pyspark-3.2.3.tar.gz (281.5 MB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting py4j==0.10.9.5 (from pyspark==3.2.3)
  Using cached py4j-0.10.9.5-py2.py3-none-any.whl.metadata (1.5 kB)
spark-nlp is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
Using cached py4j-0.10.9.5-py2.py3-none-any.whl (199 kB)
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25ldone
[?25h  Created wheel for pyspark: filename=pyspark-3.2.3-py2.py3-none-any.whl size=281990715 sha256=ec075358b0ed3cc8cae95e6699c93f9e9949e54045ca13ced0d05052e0143361
  Stored in directory: /home/prabod/.cache/pip/wheels/cc/f4/8d/dfbbd536587311afde33711613a0c193f18e7d90b120801108
Successfully built pyspark
Installing collected packages: py4j, pyspark
Successfully inst

## Import and Save OLMO in Spark NLP

- Let's install and setup Spark NLP in Google Colab
- This part is pretty easy via our simple script

In [8]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Installing PySpark 3.2.3 and Spark NLP 5.4.2
setup Colab for PySpark 3.2.3 and Spark NLP 5.4.2
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m281.5/281.5 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.6/55.6 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m579.5/579.5 kB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.7/199.7 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


Let's start Spark with Spark NLP included via our simple `start()` function

In [9]:
import sparknlp
# let's start Spark with Spark NLP
spark = sparknlp.start()
print("Apache Spark version: {}".format(spark.version))

Collecting spark-nlp==5.5.0rc1
  Downloading spark_nlp-5.5.0rc1-py2.py3-none-any.whl.metadata (55 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/55.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.8/55.8 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading spark_nlp-5.5.0rc1-py2.py3-none-any.whl (629 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/629.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m624.6/629.6 kB[0m [31m25.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m629.6/629.6 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: spark-nlp
  Attempting uninstall: spark-nlp
    Found existing installation: spark-nlp 5.4.2
    Uninstalling spark-nlp-5.4.2:
      Successfully uninstalled spark-nlp-5.4.2
Successfully installed

  self.pid = _posixsubprocess.fork_exec(


- Let's use `loadSavedModel` functon in `OLMOTransformer` which allows us to load the ONNX model
- Most params will be set automatically. They can also be set later after loading the model in `OLMOTransformer` during runtime, so don't worry about setting them now
- `loadSavedModel` accepts two params, first is the path to the exported model. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.st and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.

In [3]:
from sparknlp.annotator import *

olmo = OLMoTransformer.loadSavedModel(EXPORT_PATH, spark)\
  .setInputCols(["documents"])\
  .setMaxOutputLength(100)\
  .setDoSample(False)\
  .setOutputCol("generation")

Could not extract bos_token_id from config.json, assigning default value -1




Let's save it on disk so it is easier to be moved around and also be used later via `.load` function

In [4]:
olmo.write().overwrite().save(f"/tmp/{MODEL_NAME}_spark_nlp_int4")

                                                                                

Let's clean up stuff we don't need anymore

In [13]:
!rm -rf {EXPORT_PATH}

Awesome  😎 !

This is your ONNX OLMO model from HuggingFace 🤗  loaded and saved by Spark NLP 🚀

In [5]:
! ls -l /tmp/{MODEL_NAME}_spark_nlp_int4

total 1121168
-rw-r--r-- 1 prabod prabod     496159 Feb 12 11:54 decoder_model.onnx
drwxr-xr-x 5 prabod prabod       4096 Feb 12 11:54 fields
drwxr-xr-x 2 prabod prabod       4096 Feb 12 11:54 metadata
-rw-r--r-- 1 prabod prabod 1147568128 Feb 12 11:54 _olmo_decoder_model.onnx_data


Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny OLMO model 😊

In [4]:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

test_data = spark.createDataFrame([
    ["Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a " +
       "downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness" +
       " of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this " +
       "paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework " +
       "that converts all text-based language problems into a text-to-text format. Our systematic study compares " +
       "pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens " +
       "of language understanding tasks. By combining the insights from our exploration with scale and our new " +
       "Colossal Clean Crawled Corpus, we achieve state-of-the-art results on many benchmarks covering " +
       "summarization, question answering, text classification, and more. To facilitate future work on transfer " +
       "learning for NLP, we release our data set, pre-trained models, and code."]
]).toDF("text")


document_assembler = DocumentAssembler() \
    .setInputCol("text")\
    .setOutputCol("document")

olmo = OLMoTransformer.load(f"file:///tmp/{MODEL_NAME}_spark_nlp_int4")\
      .setInputCols(["document"])\
      .setMaxOutputLength(50)\
      .setDoSample(True)\
      .setTopK(50)\
      .setTemperature(0)\
      .setBatchSize(5)\
      .setNoRepeatNgramSize(3)\
      .setOutputCol("generation")

pipeline = Pipeline().setStages([document_assembler, olmo])

result = pipeline.fit(test_data).transform(test_data)
result.show(truncate=False)

                                                                                

Using CPUs




+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                                                                                

That's it! You can now go wild and use hundreds of OLMO models from HuggingFace 🤗 in Spark NLP 🚀
