# Multimodal RAG with Langchain

This cookbook shows how to perform RAG on the table and text output of the nv-ingest pdf extraction pipeline. However, using RAG on tables can present some challenges as raw table data doesn't always work well with semantic similarity search. To account for this we will generate summaries of the table data to perform the similarity search on.

First, we'll load in the text and table json content from the extracted nv-ingest metadata:

In [2]:
import json
from pathlib import Path

text_data = json.loads(Path("/raid/cjarrett/ingest_gh/nv-ingest/processed_docs/text/multimodal_test.pdf.metadata.json").read_text())
table_data = json.loads(Path("/raid/cjarrett/ingest_gh/nv-ingest/processed_docs/structured/multimodal_test.pdf.metadata.json").read_text())

text_content = [doc["metadata"]["content"] for doc in text_data]
table_content = [table["metadata"]["table_metadata"]["table_content"] for table in table_data]

Next, we'll create a chain that uses an llm to summarize table or text chunks

In [3]:
import os

os.environ["OPENAI_API_KEY"] = ""

In [5]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt_text = """You are an assistant tasked with summarizing tables and text. \ 
Give a concise summary of the table or text. Table or text chunk: {element} """
prompt = ChatPromptTemplate.from_template(prompt_text)

model = ChatOpenAI(temperature=0, model="gpt-4o")
summarize_chain = {"element": lambda x: x} | prompt | model | StrOutputParser()

And then we'll apply that chain to each of our text and table chunks

In [None]:
table_summaries = summarize_chain.batch(table_content, {"max_concurrency": 5})
text_summaries = summarize_chain.batch(text_content, {"max_concurrency": 5})

Next, we'll create a multi vectore retriever which allows us to store vectors both for the raw text and tables as well as the summaries we generated

In [None]:
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma(collection_name="summaries", embedding_function=OpenAIEmbeddings())

store = InMemoryStore()
id_key = "doc_id"

retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    docstore=store,
    id_key=id_key,
)

Now, we can add the text and table as well as the text and table summaries to our multi vectore retriever

In [None]:
import uuid

doc_ids = [str(uuid.uuid4()) for _ in texts]
summary_texts = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(text_summaries)
]
retriever.vectorstore.add_documents(summary_texts)
retriever.docstore.mset(list(zip(doc_ids, texts)))

table_ids = [str(uuid.uuid4()) for _ in tables]
summary_tables = [
    Document(page_content=s, metadata={id_key: table_ids[i]})
    for i, s in enumerate(table_summaries)
]
retriever.vectorstore.add_documents(summary_tables)
retriever.docstore.mset(list(zip(table_ids, tables)))

Finally, we'll create an RAG chain that we can use to query our pdf in naturall language

In [None]:
from langchain_core.runnables import RunnablePassthrough

template = """You are an assistant for question-answering tasks. 
Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI(temperature=0, model="gpt-4o")

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [None]:
chain.invoke("What activity was the giraffe performing?")