# Multimodal RAG with LangChain

This notebook shows how to perform RAG on the table, chart, and text extraction results of nv-ingest's pdf extraction tools using LangChain

To start we'll need to make sure we have Langchain installed as well as pymilvus so that we can connect to the Milvus vector database (VDB) that NV-Ingest uses to store embeddings

In [None]:
pip install -qU langchain langchain_community langchain-nvidia-ai-endpoints langchain_milvus pymilvus

Then, we'll use NV-Ingest to parse an example pdf that contains text, tables, charts, and images, embed it with the included embedding microservice and store the results in the Milvus vector database. We'll need to make sure to have the NV-Ingest microservice up and running at localhost:7670 along with the supporting NIMs and microservices. To do this, follow the nv-ingest [quickstart guide](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#quickstart). This notebook requires all of the services to be [running](https://github.com/NVIDIA/nv-ingest/blob/main/docs/deployment.md#launch-nv-ingest-micro-services). Once everything is ready, we can create a job with the NV-Ingest python client

In [1]:
from nv_ingest_client.client import NvIngestClient
from nv_ingest_client.message_clients.rest.rest_client import RestClient
from nv_ingest_client.primitives import JobSpec
from nv_ingest_client.primitives.tasks import ExtractTask
from nv_ingest_client.primitives.tasks import EmbedTask
from nv_ingest_client.primitives.tasks import VdbUploadTask


from nv_ingest_client.util.file_processing.extract import extract_file_content
import logging, time

logger = logging.getLogger("nv_ingest_client")

file_name = "../data/multimodal_test.pdf"
file_content, file_type = extract_file_content(file_name)

client = NvIngestClient(
  message_client_hostname="localhost",
  message_client_port=7670
)

job_spec = JobSpec(
    document_type=file_type,
    payload=file_content,
    source_id=file_name,
    source_name=file_name,
    extended_options={
        "tracing_options": {
            "trace": True,
            "ts_send": time.time_ns()
        }
    },
)

And then, we can add and submit tasks to extract the text, tables, and charts from the example pdf, generate embeddings from the results, and store them in the Milvus VDB

In [2]:
extract_task = ExtractTask(
    document_type=file_type,
    extract_text=True,
    extract_images=False,
    extract_tables=True,
    extract_charts=True,
)

embed_task = EmbedTask(
    text=True,
    tables=True,
)

vdb_upload_task = VdbUploadTask()

job_spec.add_task(extract_task)
job_spec.add_task(embed_task)
job_spec.add_task(vdb_upload_task)

job_id = client.add_job(job_spec)

client.submit_job(job_id, "morpheus_task_queue")

result = client.fetch_job_result(job_id, timeout=60)

Now, the text, table, and chart content is extracted and stored in the Milvus VDB along with the embeddings. Next we'll connect LlamaIndex to Milvus and create a vector store so that we can query our extraction results. The vector store must use the same embedding model as the embedding service in NV-Ingest: `nv-embed-qa-e5-v5`

In [13]:
import os
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain_milvus import Milvus

# TODO: Add your NVIDIA API key here
os.environ["NVIDIA_API_KEY"] = "<YOUR_NVIDIA_API_KEY>"

embedding = NVIDIAEmbeddings(model="nvidia/nv-embedqa-e5-v5")

vectorstore = Milvus(
    embedding_function=embedding,
    collection_name="nv_ingest_collection",
    primary_field = "pk",
    vector_field = "vector",
    text_field="text",
    connection_args={"uri": "http://localhost:19530"},
)
retriever = vectorstore.as_retriever()

Finally, we'll create an RAG chain that we can use to query our pdf in natural language

In [16]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

template = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Keep the answer concise."
    "\n\n"
    "{context}"
    "Question: {question}"
)

prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
rag_chain.invoke("What is the dog doing and where?")

'The dog is chasing a squirrel in the front yard.'