# Multimodal RAG with LlamaIndex

This notebook shows how to perform RAG on the table, chart, and text extraction results of nv-ingest's pdf extraction tools using LlamaIndex

To start we'll need to make sure we have LlamaIndex installed as well as pymilvus so that we can connect to the Milvus vector database (VDB) that NV-Ingest uses to store embeddings

In [None]:
pip install -qU llama_index llama-index-embeddings-nvidia llama-index-llms-nvidia llama-index-vector-stores-milvus pymilvus

Then, we'll use NV-Ingest to parse an example pdf that contains text, tables, charts, and images, embed it with the included embedding microservice and store the results in the Milvus vector database. We'll need to make sure to have the NV-Ingest microservice up and running at localhost:7670 along with the supporting NIMs and microservices. To do this, follow the nv-ingest [quickstart guide](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#quickstart). This notebook requires all of the services to be [running](https://github.com/NVIDIA/nv-ingest/blob/main/docs/deployment.md#launch-nv-ingest-micro-services). Once everything is ready, we can create a job with the NV-Ingest python client

In [1]:
from nv_ingest_client.client import NvIngestClient
from nv_ingest_client.primitives import JobSpec
from nv_ingest_client.primitives.tasks import ExtractTask
from nv_ingest_client.primitives.tasks import EmbedTask
from nv_ingest_client.primitives.tasks import VdbUploadTask


from nv_ingest_client.util.file_processing.extract import extract_file_content
import logging, time

logger = logging.getLogger("nv_ingest_client")

file_name = "../data/multimodal_test.pdf"
file_content, file_type = extract_file_content(file_name)

client = NvIngestClient(
  message_client_hostname="localhost",
  message_client_port=7670
)

job_spec = JobSpec(
    document_type=file_type,
    payload=file_content,
    source_id=file_name,
    source_name=file_name,
    extended_options={
        "tracing_options": {
            "trace": True,
            "ts_send": time.time_ns()
        }
    },
)

And then, we can add and submit tasks to extract the text, tables, and charts from the example pdf, generate embeddings from the results, and store them in the Milvus VDB

In [2]:
extract_task = ExtractTask(
    document_type=file_type,
    extract_text=True,
    extract_images=False,
    extract_tables=True,
    extract_charts=True,
)

embed_task = EmbedTask(
    text=True,
    tables=True,
)

vdb_upload_task = VdbUploadTask()

job_spec.add_task(extract_task)
job_spec.add_task(embed_task)
job_spec.add_task(vdb_upload_task)

job_id = client.add_job(job_spec)

client.submit_job(job_id, "morpheus_task_queue")

result = client.fetch_job_result(job_id, timeout=60)

Now, the text, table, and chart content is extracted and stored in the Milvus VDB along with the embeddings. Next, we'll connect LlamaIndex to Milvus and create a vector store index so that we can query our extraction results. The vector store index must use the same embedding model as the embedding service in NV-Ingest: `nv-embed-qa-e5-v5`

In [7]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.vector_stores.milvus import MilvusVectorStore

embed_model = NVIDIAEmbedding(base_url="http://localhost:8012/v1", model="nvidia/nv-embedqa-e5-v5")

vector_store = MilvusVectorStore(
    uri="http://localhost:19530",
    collection_name="nv_ingest_collection",
    doc_id_field="pk",
    embedding_field="vector",
    text_key="text",
    dim=1024,
    overwrite=False
)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, embed_model=embed_model)

Next, we'll use our vector store index to create a query engine that handles the RAG pipeline and we'll use an LLM from the NVIDIA API catalog to generate the final response

In [30]:
from llama_index.llms.nvidia import NVIDIA

llm = NVIDIA(model="meta/llama-3.1-405b-instruct")
query_engine = index.as_query_engine(llm=llm)

And finally, we can ask it questions about our example PDF

In [9]:
query_engine.query("What is the dog doing and where?").response

'The dog is chasing a squirrel in the front yard.'