# Multimodal RAG with LlamaIndex

This cookbook shows how to perform RAG on the table and text extraction output of nv-ingest's pdf extraction tools using LlamaIndex

To start we'll need to make sure we have llama_index installed

In [None]:
pip install llama_index

Then, we'll use nv-ingest to parse an example pdf that contains text, tables, charts, and images. We'll need to make sure to have the nv-ingest microservice up and running at localhost:7670 along with the supporting NIMs. To do this, follow the nv-ingest [quickstart guide](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#quickstart). Once the microservice is ready we can create a job with the nv-ingest python client

In [1]:
from nv_ingest_client.client import NvIngestClient
from nv_ingest_client.primitives import JobSpec
from nv_ingest_client.primitives.tasks import ExtractTask
from nv_ingest_client.primitives.tasks import SplitTask
from nv_ingest_client.util.file_processing.extract import extract_file_content
import logging, time

logger = logging.getLogger("nv_ingest_client")

file_name = "../data/multimodal_test.pdf"
file_content, file_type = extract_file_content(file_name)

job_spec = JobSpec(
    document_type=file_type,
    payload=file_content,
    source_id=file_name,
    source_name=file_name,
    extended_options={"tracing_options": {"trace": True, "ts_send": time.time_ns()}},
)

And then we can and submit a task to extract the text and tables from the example pdf

In [2]:
extract_task = ExtractTask(
    document_type=file_type,
    extract_text=True,
    extract_images=False,
    extract_tables=True,
)


job_spec.add_task(extract_task)
client = NvIngestClient()
job_id = client.add_job(job_spec)

client.submit_job(job_id, "morpheus_task_queue")

result = client.fetch_job_result(job_id, timeout=60)

In [3]:
result[0][0][0]

{'document_type': 'text',
 'metadata': {'content': 'TestingDocument\r\nA sample document with headings and placeholder text\r\nIntroduction\r\nThis is a placeholder document that can be used for any purpose. It contains some \r\nheadings and some placeholder text to fill the space. The text is not important and contains \r\nno real value, but it is useful for testing. Below, we will have some simple tables and charts \r\nthat we can use to confirm Ingest is working as expected.\r\nTable 1\r\nThis table describes some animals, and some activities they might be doing in specific \r\nlocations.\r\nAnimal Activity Place\r\nGira@e Driving a car At the beach\r\nLion Putting on sunscreen At the park\r\nCat Jumping onto a laptop In a home o@ice\r\nDog Chasing a squirrel In the front yard\r\nChart 1\r\nThis chart shows some gadgets, and some very fictitious costs. Section One\r\nThis is the first section of the document. It has some more placeholder text to show how \r\nthe document looks like.

Now, we have the extraction results in the nv-ingest metadata format. We'll separate the content out of this and load it into LlamaIndex documents

In [4]:
from llama_index.core import Document

texts = []
tables = []
for element in result[0][0]:
    if element['document_type'] == 'text':
        texts.append(Document(text=element['metadata']['content']))
    elif element['document_type'] == 'structured':
        tables.append(Document(text=element['metadata']['table_metadata']['table_content']))

In [5]:
texts

[Document(id_='9bad2140-a997-4af0-a4ea-c8236416def0', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='TestingDocument\r\nA sample document with headings and placeholder text\r\nIntroduction\r\nThis is a placeholder document that can be used for any purpose. It contains some \r\nheadings and some placeholder text to fill the space. The text is not important and contains \r\nno real value, but it is useful for testing. Below, we will have some simple tables and charts \r\nthat we can use to confirm Ingest is working as expected.\r\nTable 1\r\nThis table describes some animals, and some activities they might be doing in specific \r\nlocations.\r\nAnimal Activity Place\r\nGira@e Driving a car At the beach\r\nLion Putting on sunscreen At the park\r\nCat Jumping onto a laptop In a home o@ice\r\nDog Chasing a squirrel In the front yard\r\nChart 1\r\nThis chart shows some gadgets, and some very fictitious costs. Section One\r

In [6]:
tables

[Document(id_='542181ac-d3f2-490a-979b-cf0c6e24c1d4', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='locations. Animal Activity Place Giraffe Driving a car At the beach Lion Putting on sunscreen At the park Cat Jumping onto a laptop In a home office Dog Chasing a squirrel In the front yard', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='563e88da-dd0f-4121-98f9-c3d85f7a7bea', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='This chart shows some gadgets, and some very fictitious costs. >\\n7938.758 ext. Print & Maroon Bookshelf Fine Art Poems Collection dla Cemicon Diamtháhn | Gadgets and their cost\nSollywood for Coasters | 19875.075     t158.281 \n Hammer | 19871.55 \n Powerdrill | 12044.625 \n Bluetooth speaker 

Now, the text and table content is ready to be embedded and stored. We'll set our OpenAI api key in order to use OpenAI's embedding model, but any desired embedding model can be used here

In [7]:
import os
from llama_index.core import VectorStoreIndex

# TODO: Add your OpenAI API key here
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>"

index = VectorStoreIndex.from_documents(texts+tables)

Next, we'll use our vectorstore to create a query engine that handles the RAG pipeline

In [8]:
query_engine = index.as_query_engine()

And finally, we can ask it questions about our example PDF

In [9]:
query_engine.query("What is the dog doing and where?").response

'The dog is chasing a squirrel in the front yard.'