![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/openvino/HuggingFace_OpenVINO_in_Spark_NLP_CoHere.ipynb)

# Import OpenVINO CoHere models from HuggingFace 🤗 into Spark NLP 🚀

This notebook provides a detailed walkthrough on optimizing and importing CoHere models from HuggingFace  for use in Spark NLP, with [Intel OpenVINO toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html). The focus is on converting the model to the OpenVINO format and applying precision optimizations (INT8 and INT4), to enhance the performance and efficiency on CPU platforms using [Optimum Intel](https://huggingface.co/docs/optimum/main/en/intel/inference).

Let's keep in mind a few things before we start 😊

- OpenVINO support was introduced in  `Spark NLP 5.4.0`, enabling high performance CPU inference for models. So please make sure you have upgraded to the latest Spark NLP release.
- Model quantization is a computationally expensive process, so it is recommended to use a runtime with more than 32GB memory for exporting the quantized model from HuggingFace.
- You can import CoHere models via `CoHereModel`. These models are usually under `Text Generation` category and have `CoHere` in their labels.
- Reference: [CoHereModel](https://huggingface.co/docs/transformers/model_doc/CoHereTransformer#transformers.CoHereModel)
- Some [example models](https://huggingface.co/models?search=CoHere)

## 1. Export and Save the HuggingFace model

- Let's install `transformers` and `openvino` packages with other dependencies. You don't need `openvino` to be installed for Spark NLP, however, we need it to load and save models from HuggingFace.
- We lock `transformers` on version `4.41.2`. This doesn't mean it won't work with the future release, but we wanted you to know which versions have been tested successfully.

In [1]:
%pip install -q "nncf>=2.14.0" "torch>=2.3" "transformers>=4.39.1" "accelerate" "pillow" "gradio>=4.26" "datasets>=2.14.6" "tqdm" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q -U "openvino>=2024.5.0" "openvino-tokenizers>=2024.5.0" "openvino-genai>=2024.5"
%pip install -q "git+https://github.com/huggingface/optimum-intel.git" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q ipywidgets

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

[Optimum Intel](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#openvino) is the interface between the Transformers library and the various model optimization and acceleration tools provided by Intel. HuggingFace models loaded with optimum-intel are automatically optimized for OpenVINO, while being compatible with the Transformers API. It also offers the ability to perform weight compression during export.
- To load a HuggingFace model directly for inference/export, just replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. We can use this to import and export OpenVINO models with `from_pretrained` and `save_pretrained`.
- By setting `export=True`, the source model is converted to OpenVINO IR format on the fly.
- We'll use [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model from HuggingFace as an example.
- In addition to `CoHereModel` we also need to save the tokenizer. This is the same for every model, these are assets needed for tokenization inside Spark NLP.

### Exporting to OpenVINO IR in INT4 Precision

In [4]:
import requests
from pathlib import Path


utility_files = ["notebook_utils.py", "cmd_helper.py"]

for utility in utility_files:
    local_path = Path(utility)
    if not local_path.exists():
        r = requests.get(
            url=f"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/{local_path.name}",
        )
        with local_path.open("w") as f:
            f.write(r.text)

In [5]:
from cmd_helper import optimum_cli

model_id = "CohereForAI/c4ai-command-r-v01"
model_path = Path(model_id.split("/")[-1]) / "INT4"

model_path = "/mnt/research" / model_path
if not model_path.exists():
    optimum_cli(
        model_id,
        model_path,
        additional_args={"weight-format": "int4", "task": "text-generation-with-past","group-size": "128", "ratio": "1", "all-layers": ""},
    )

**Export command:**

`optimum-cli export openvino --model CohereForAI/c4ai-command-r-v01 /mnt/research/c4ai-command-r-v01/INT4 --weight-format int4 --task text-generation-with-past --group-size 128 --ratio 1 --all-layers`

  self.__spec__.loader.exec_module(self)
Loading checkpoint shards: 100%|██████████| 15/15 [00:03<00:00,  4.13it/s]
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
  or len(self.key_cache[layer_idx]) == 0  # the layer has no cache
  if sequence_length != 1:
  len(self.key_cache[layer_idx]) == 0


INFO:nncf:Statistics of the bitwidth distribution:
┍━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Weight compression mode   │ % all parameters (layers)   │ % ratio-defining parameters (layers)   │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ int4_asym                 │ 100% (281 / 281)            │ 100% (281 / 281)                       │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙
[2KApplying Weight Compression [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m • [36m0:04:08[0m • [36m0:00:00[0m00:01[0m00:08[0m
[?25h

Once the model export and quantization is complete, move the model assets needed for tokenization in Spark NLP to the `assets` directory.

Let's have a look inside these two directories and see what we are dealing with:

In [6]:
EXPORT_PATH = model_path

In [10]:
!ls -lah {EXPORT_PATH}

total 17G
drwxrwxr-x 3 prabod prabod 4.0K Feb 13 09:13 .
drwxrwxr-x 3 prabod prabod 4.0K Feb 13 09:02 ..
drwxrwxr-x 2 prabod prabod 4.0K Feb 13 09:13 assets
-rw-rw-r-- 1 prabod prabod  810 Feb 13 09:02 config.json
-rw-rw-r-- 1 prabod prabod  137 Feb 13 09:02 generation_config.json
-rw-rw-r-- 1 prabod prabod 2.8M Feb 13 09:06 openvino_detokenizer.bin
-rw-rw-r-- 1 prabod prabod  23K Feb 13 09:06 openvino_detokenizer.xml
-rw-rw-r-- 1 prabod prabod  17G Feb 13 09:11 openvino_model.bin
-rw-rw-r-- 1 prabod prabod 3.4M Feb 13 09:11 openvino_model.xml
-rw-rw-r-- 1 prabod prabod 6.6M Feb 13 09:06 openvino_tokenizer.bin
-rw-rw-r-- 1 prabod prabod  40K Feb 13 09:06 openvino_tokenizer.xml
-rw-rw-r-- 1 prabod prabod  439 Feb 13 09:02 special_tokens_map.json
-rw-rw-r-- 1 prabod prabod  21K Feb 13 09:02 tokenizer_config.json
-rw-rw-r-- 1 prabod prabod  20M Feb 13 09:02 tokenizer.json


In [8]:
assets_dir = EXPORT_PATH / "assets"
assets_dir.mkdir(exist_ok=True)

# copy all the assets to the assets directory (json files, vocab files, etc.)

import shutil

# copy all json files

for file in EXPORT_PATH.glob("*.json"):
    shutil.copy(file, assets_dir)

In [9]:
!ls -l {EXPORT_PATH}/assets

total 19692
-rw-rw-r-- 1 prabod prabod      810 Feb 13 09:13 config.json
-rw-rw-r-- 1 prabod prabod      137 Feb 13 09:13 generation_config.json
-rw-rw-r-- 1 prabod prabod      439 Feb 13 09:13 special_tokens_map.json
-rw-rw-r-- 1 prabod prabod    20749 Feb 13 09:13 tokenizer_config.json
-rw-rw-r-- 1 prabod prabod 20124090 Feb 13 09:13 tokenizer.json


## 2. Import and Save CoHere in Spark NLP

- Let's install and setup Spark NLP in Google Colab
- This part is pretty easy via our simple script

In [None]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Let's start Spark with Spark NLP included via our simple `start()` function

In [None]:
import sparknlp

# let's start Spark with Spark NLP
spark = sparknlp.start()

- Let's use `loadSavedModel` functon in `CoHereTransformer` which allows us to load the OpenVINO model.
- Most params will be set automatically. They can also be set later after loading the model in `CoHereTransformer` during runtime, so don't worry about setting them now.
- `loadSavedModel` accepts two params, first is the path to the exported model. The second is the SparkSession that is `spark` variable we previously started via `sparknlp.start()`
- NOTE: `loadSavedModel` accepts local paths in addition to distributed file systems such as `HDFS`, `S3`, `DBFS`, etc. This feature was introduced in Spark NLP 4.2.2 release. Keep in mind the best and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.st and recommended way to move/share/reuse Spark NLP models is to use `write.save` so you can use `.load()` from any file systems natively.

In [14]:
from sparknlp.annotator import *

CoHere = CoHereTransformer \
    .loadSavedModel(str(EXPORT_PATH), spark) \
    .setMaxOutputLength(50) \
    .setDoSample(False) \
    .setInputCols(["documents"]) \
    .setOutputCol("generation")

25/02/13 09:19:52 WARN NativeLibrary: Failed to load library null: java.lang.UnsatisfiedLinkError: Can't load library: /tmp/openvino-native14220754060683836653/libtbb.so.2




Let's save it on disk so it is easier to be moved around and also be used later via `.load` function

In [15]:
MODEL_NAME = "CohereForAI/c4ai-command-r-v01"


In [None]:
CoHere.write().overwrite().save(f"{MODEL_NAME}_spark_nlp")

                                                                                

Let's clean up stuff we don't need anymore

In [None]:
!rm -rf {EXPORT_PATH}

Awesome  😎 !

This is your OpenVINO CoHere model from HuggingFace 🤗  loaded and saved by Spark NLP 🚀

In [None]:
! ls -l {MODEL_NAME}_spark_nlp

total 17754944
-rw-r--r-- 1 prabod prabod 18181049933 Feb 13 09:34 CoHere_openvino
drwxr-xr-x 6 prabod prabod        4096 Feb 13 09:32 fields
drwxr-xr-x 2 prabod prabod        4096 Feb 13 09:32 metadata


Now let's see how we can use it on other machines, clusters, or any place you wish to use your new and shiny CoHere model 😊

In [None]:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

test_data = spark.createDataFrame([
            (
                1,
                "<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>"
            )
        ]).toDF("id", "text")


document_assembler = DocumentAssembler() \
            .setInputCol("text") \
            .setOutputCol("documents")

CoHere = CoHereTransformer \
            .load(f"{MODEL_NAME}_spark_nlp") \
            .setMaxOutputLength(50) \
            .setDoSample(False) \
            .setBeamSize(1) \
            .setInputCols(["documents"]) \
            .setOutputCol("generation")

pipeline = Pipeline().setStages([document_assembler, CoHere])
results = pipeline.fit(test_data).transform(test_data)

results.select("generation.result").show(truncate=False)



+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                  |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|[ Hello, how are you?Hello! I'm doing well, thank you for asking! I'm excited to help you with whatever questions you have today. How can I assist you?]|
+--------------------------------------------------------------------------------------------------------------------------------------------------------+



                                                                                

That's it! You can now go wild and use hundreds of CoHere models from HuggingFace 🤗 in Spark NLP 🚀
