<a href="https://colab.research.google.com/github/ankitmhn/script-sandbox/blob/main/notebooks/nuextract-structure-extraction/nuextract-structure-extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Structure Extraction with NuExtract and OpenVINO

![image](https://github.com/user-attachments/assets/70dd93cc-da36-4c53-8891-78c0f9a41f20)

[NuExtract](https://huggingface.co/numind/NuExtract) model is a text-to-JSON Large Language Model (LLM) that allows to extract arbitrarily complex information from text and turns it into structured data.

LLM stands for “Large Language Model” which refers to a type of artificial intelligence model that is designed to understand and generate human-like text based on the input it receives. LLMs are trained on large datasets of text to learn patterns, grammar, and semantic relationships, allowing them to generate coherent and contextually relevant responses. One core capability of Large Language Models (LLMs) is to follow natural language instructions. Instruction-following models are capable of generating text in response to prompts and are often used for tasks like writing assistance, chatbots, and content generation.

In this tutorial, we consider how to run a structure extraction text generation pipeline using NuExtract model and OpenVINO. We will use pre-trained models from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. The [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library converts the models to OpenVINO™ IR format. To simplify the user experience, we will use [OpenVINO Generate API](https://github.com/openvinotoolkit/openvino.genai) for generation inference pipeline.  

The tutorial consists of the following steps:

- Install prerequisites
- Download and convert the model from a public source using the [OpenVINO integration with Hugging Face Optimum](https://huggingface.co/blog/openvino)
- Compress model weights to INT8 and INT4 with [OpenVINO NNCF](https://github.com/openvinotoolkit/nncf)
- Create a structure extraction inference pipeline with [Generate API](https://github.com/openvinotoolkit/openvino.genai)
- Launch interactive Gradio demo with structure extraction pipeline


#### Table of contents:

- [Prerequisites](#Prerequisites)
- [Select model for inference](#Select-model-for-inference)
- [Download and convert model to OpenVINO IR via Optimum Intel CLI](#Download-and-convert-model-to-OpenVINO-IR-via-Optimum-Intel-CLI)
- [Compress model weights](#Compress-model-weights)
    - [Weights Compression using Optimum Intel CLI](#weights-compression-using-optimum-intel-cli)
- [Select device for inference and model variant](#Select-device-for-inference-and-model-variant)
- [Create a structure extraction inference pipeline](#Create-a-structure-extraction-inference-pipeline)
- [Run interactive structure extraction demo with Gradio](#Run-interactive-structure-extraction-demo-with-Gradio)


### Installation Instructions

This is a self-contained example that relies solely on its own code.

We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/nuextract-structure-extraction/nuextract-structure-extraction.ipynb" />


## Prerequisites
[back to top ⬆️](#Table-of-contents:)


In [11]:
%pip install -Uq "openvino>=2024.3.0" "openvino-genai"
%pip install -q "torch>=2.1" "nncf>=2.12" "transformers>=4.40.0" "accelerate" "gradio>=4.19" "git+https://github.com/huggingface/optimum-intel.git" --extra-index-url https://download.pytorch.org/whl/cpu

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m41.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.8/14.8 MB[0m [31m92.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [5]:
import os
from pathlib import Path
import requests
import shutil

if not Path("notebook_utils.py").exists():
    r = requests.get(url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py")
    open("notebook_utils.py", "w").write(r.text)

from notebook_utils import download_file

# Fetch llm_config.py
llm_config_shared_path = Path("../../utils/llm_config.py")
llm_config_dst_path = Path("llm_config.py")

if not llm_config_dst_path.exists():
    if llm_config_shared_path.exists():
        try:
            os.symlink(llm_config_shared_path, llm_config_dst_path)
        except Exception:
            shutil.copy(llm_config_shared_path, llm_config_dst_path)
    else:
        download_file(url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/llm_config.py")
elif not os.path.islink(llm_config_dst_path):
    print("LLM config will be updated")
    if llm_config_shared_path.exists():
        shutil.copy(llm_config_shared_path, llm_config_dst_path)
    else:
        download_file(url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/llm_config.py")

llm_config.py:   0%|          | 0.00/5.85k [00:00<?, ?B/s]

## Select model for inference
[back to top ⬆️](#Table-of-contents:)

The tutorial supports different models, you can select one from the provided options to compare the quality of open source solutions.
>**Note**: conversion of some models can require additional actions from user side and at least 64GB RAM for conversion.

NuExtract model has several versions:

* **NuExtract-tiny** - This is a version of [Qwen1.5-0.5](https://huggingface.co/Qwen/Qwen1.5-0.5B) model with 0.5 billion parameters. More details about the model can be found in [model card](https://huggingface.co/numind/NuExtract-tiny).
* **NuExtract** - This is a version of [phi-3-mini](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model with 3.8 billion parameters. More details about the model can be found in [model card](https://huggingface.co/numind/NuExtract).
* **NuExtract-large** - This is a version of [phi-3-small](https://huggingface.co/microsoft/Phi-3-small-8k-instruct) model with 7 billion parameters. More details about the model can be found in [model card](https://huggingface.co/numind/NuExtract-large).

All NuExtract models are fine-tuned on a private high-quality synthetic dataset for information extraction.

In [13]:
from llm_config import get_llm_selection_widget

models = {
    "NuExtract_tiny": {"model_id": "numind/NuExtract-tiny"},
    "NuExtract": {"model_id": "numind/NuExtract"},
    "NuExtract_large": {"model_id": "numind/NuExtract-large"},
}

form, _, model_dropdown, compression_dropdown, _ = get_llm_selection_widget(languages=None, models=models, show_preconverted_checkbox=False)

form

Box(children=(Box(children=(Label(value='Model:'), Dropdown(options={'NuExtract_tiny': {'model_id': 'numind/Nu…

In [14]:
model_name = model_dropdown.label
model_config = model_dropdown.value
print(f"Selected model {model_name} with {compression_dropdown.value} compression")

Selected model NuExtract_tiny with INT4 compression


## Select device for inference and model variant
[back to top ⬆️](#Table-of-contents:)

>**Note**: There may be no speedup for INT4/INT8 compressed models on dGPU.

In [15]:
from notebook_utils import device_widget

device = device_widget(default="CPU", exclude=["NPU"])

device

Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')

## Create a structure extraction inference pipeline
[back to top ⬆️](#Table-of-contents:)

Firstly we will prepare input prompt for NuExtract model by introducing `prepare_input()` function. This function combines the main text, a JSON schema and optional examples into a single string that adheres to model's specific input requirements.

`prepare_input()` function accepts the following parameters:
1. `text`: This is the primary text from which you want to extract information.
2. `schema`: A JSON schema string that defines the structure of the information you want to extract. This acts as a template, guiding NuExtract model on what data to look for and how to format the output.
3. `examples`: An optional list of example strings. These can be used to provide the model with sample extractions, potentially improving accuracy for complex or ambiguous cases.

In [16]:
import json
from typing import List


def prepare_input(text: str, schema: str, examples: List[str] = ["", "", ""]) -> str:
    schema = json.dumps(json.loads(schema), indent=4)
    input_llm = "<|input|>\n### Template:\n" + schema + "\n"
    for example in examples:
        if example != "":
            input_llm += "### Example:\n" + json.dumps(json.loads(example), indent=4) + "\n"

    input_llm += "### Text:\n" + text + "\n<|output|>\n"
    return input_llm

In [None]:
from llm_config import convert_and_compress_model

model_dir = convert_and_compress_model(model_name, model_config, compression_dropdown.value, use_preconverted=False)

⌛ NuExtract_tiny conversion to INT4 started. It may takes some time.


**Export command:**

`optimum-cli export openvino --model numind/NuExtract-tiny --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.8 NuExtract_tiny/INT4_compressed_weights`

In [17]:
import openvino_genai as ov_genai

pipe = ov_genai.LLMPipeline(model_dir.as_posix(), device.value)


def run_structure_extraction(text: str, schema: str) -> str:
    input = prepare_input(text, schema)
    return pipe.generate(input, max_new_tokens=200)

NameError: name 'model_dir' is not defined

To run structure extraction inference pipeline we need to provide example text for data extraction and define output structure in a JSON schema format:

In [None]:
text = """We introduce Mistral 7B, a 7-billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B - Instruct, that surpasses Llama 2 13B - chat model both on human and
automated benchmarks. Our models are released under the Apache 2.0 license.
Code: https://github.com/mistralai/mistral-src
Webpage: https://mistral.ai/news/announcing-mistral-7b/"""

schema = """{
    "Model": {
        "Name": "",
        "Number of parameters": "",
        "Number of max token": "",
        "Architecture": []
    },
    "Usage": {
        "Use case": [],
        "Licence": ""
    }
}"""

output = run_structure_extraction(text, schema)
print(output)

{
    "Model": {
        "Name": "Mistral 7B",
        "Number of parameters": "7-billion",
        "Number of max token": "",
        "Architecture": [
            "grouped-query attention",
            "sliding window attention"
        ]
    },
    "Usage": {
        "Use case": [
            "reasoning",
            "mathematics",
            "code generation"
        ],
        "Licence": "Apache 2.0"
    }
}



## Run interactive structure extraction demo with Gradio
[back to top ⬆️](#Table-of-contents:)

In [None]:
if not Path("gradio_helper.py").exists():
    r = requests.get(
        url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/nuextract-structure-extraction/gradio_helper.py"
    )
    open("gradio_helper.py", "w").write(r.text)

from gradio_helper import make_demo

demo = make_demo(fn=run_structure_extraction)

try:
    demo.launch(height=800)
except Exception:
    demo.launch(share=True, height=800)
# If you are launching remotely, specify server_name and server_port
# EXAMPLE: `demo.launch(server_name='your server name', server_port='server port in int')`
# To learn more please refer to the Gradio docs: https://gradio.app/docs/

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://0b81c07a84c46fbc43.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


In [None]:
# Uncomment and run this cell for stopping gradio interface
demo.close()

Closing server running on port: 7860
