#Multi-Modal RAG Application PDF

This Notebook includes the following steps:-

1. Downloading the source
2. Installing the necessary libraries
3. Extracting text, images and tables elements by using [unstructured's partition_pdf](https://docs.unstructured.io/open-source/core-functionality/partitioning) method
4. Creation of text, table and image summaries
5. Creation of faiss_index using `faiss-cpu`
6. Saving the vector databse to local
7. Creation of custom llm and embedding model using [mdb.ai](https://mdb.ai/models) endpoints
8. Loading the saved vector database and creating a prompt template
9. Creating a function that retreives the relevant content for the user question. Where it is used as context while generating answer with the help of LLM..
10. Finally, testing.....


#Architecture
![Flowcharts](https://github.com/chakka-guna-sekhar-venkata-chennaiah/Mutli-Modal-RAG-ChaBot/assets/110555361/8e0788c4-8b87-4221-9d5a-9707ccccfce4)


#1. Downloading the source pdf
For the demonstration, we are using [Monuments-of-National-Importance](https://eacpm.gov.in/wp-content/uploads/2023/01/Monuments-of-National-Importance.pdf)


In [1]:
import requests

# URL of the PDF file
url = 'https://eacpm.gov.in/wp-content/uploads/2023/01/Monuments-of-National-Importance.pdf'

# Send a GET request
response = requests.get(url)

# Ensure the request was successful
if response.status_code == 200:
    # Update the file path to a valid location on your system
    with open('Monuments-of-National-Importance.pdf', 'wb') as f:
        f.write(response.content)
    print("PDF downloaded successfully!")
else:
    print("Failed to retrieve the PDF. Status code:", response.status_code)


ReadTimeout: HTTPSConnectionPool(host='eacpm.gov.in', port=443): Read timed out. (read timeout=None)

#2. Installing the necessary libraries


In [2]:
! pip install langchain unstructured[all-docs] pydantic lxml openai chromadb tiktoken



In [3]:
!sudo apt install tesseract-ocr -y
!sudo apt install libtesseract-dev -y
!sudo apt-get install poppler-utils -y

'sudo' is not recognized as an internal or external command,
operable program or batch file.
'sudo' is not recognized as an internal or external command,
operable program or batch file.


'sudo' is not recognized as an internal or external command,
operable program or batch file.


In [4]:
! pip install langchain-community langchain-core



#3. Extracting text, images and tables by using [unstructured's partition_pdf](https://docs.unstructured.io/open-source/core-functionality/partitioning) method


Create a folder named `images` in content section.


In [5]:
output_path='/images'

Adding the important parameters in `partition_pdf()` method of `unstructured`.


In [6]:
from typing import Any
import os
from unstructured.partition.pdf import partition_pdf
import os

# Get elements
raw_pdf_elements = partition_pdf(
    filename="Monuments-of-National-Importance.pdf",
    strategy='auto',
    extract_images_in_pdf=True,
    extract_image_block_types=["Image", "Table"],
    infer_table_structure=True,
    chunking_strategy="by_title",
    max_characters=4000,
    new_after_n_chars=3800,
    combine_text_under_n_chars=2000,
    image_output_dir_path=output_path,
)

For the images, we are encoding every extracted image by using `bs64` library.


In [7]:
import base64

text_elements = []
table_elements = []
image_elements = []

# Function to encode images
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

for element in raw_pdf_elements:
    if 'CompositeElement' in str(type(element)):
        text_elements.append(element)
    elif 'Table' in str(type(element)):
        table_elements.append(element)

table_elements = [i.text for i in table_elements]
text_elements = [i.text for i in text_elements]



In [None]:
for image_file in os.listdir("figures"):
    if image_file.endswith(('.png', '.jpg', '.jpeg')):
        image_path = os.path.join("figures", image_file)
        encoded_image = encode_image(image_path)
        image_elements.append(encoded_image)


#4.Creation of text, table and image summaries


In [18]:
text_elements = text_elements[:5]
text_elements 

['THE URGENT NEED FOR RATIONALIZATION\n\nJanuary 2023  \n\nSanjeev Sanyal\n\nJayasimha K R\n\nApurv Kumar Mishra\n\nAe G2 é worst at antic aereer ata IT AIT = Chairman wete Tay Economic Advisory Council to the Prime Minister fade cana Government of India BIBEK DEBROY Va sargitd} SH HTT 23" January, 2023 PREFACE | am pleased to introduce this report on “Monuments of National Importance: The Urgent Need for Rationalization”. India currently has 3695 Monuments of National Importance (MNI) and the responsibility of protecting them vests with the Archeological Survey of India (ASI). Since the list of MNI has not been comprehensively reviewed since independence it has become unwieldy. Therefore, the current list needs to be immediately scrutinised and rationalised. The Report analyses various problems associated with the current list of MINI including: (a) selection errors; (b) geographically skewed distribution of monuments; (c) inadequate expenditure on protection of monuments. Various min

In [19]:
# Set Google API key
os.environ["GOOGLE_API_KEY"] = "AIzaSyBbzp4mgG2sYikzLJwiR_GgZQ-Qd3M43UA"
api_key = os.getenv("GOOGLE_API_KEY")

In [20]:
import os
import google.generativeai as genai
import time
from PIL import Image

genai.configure(api_key=api_key)
model = genai.GenerativeModel("gemini-1.5-flash")


In [21]:
def summarize_text(text_element):
    prompt = f"Summarize the following text:\n\n{text_element}\n\nSummary:"
    response = model.generate_content(prompt)
    return response.text


In [22]:
def summarize_table(table_element):
    prompt = f"Summarize the following table:\n\n{table_element}\n\nSummary:"
    response = model.generate_content(prompt)
    return response.text


In [None]:
def summarize_image(encoded_image):
    prompt = [
        {"type": "text", "text": "Describe the contents of this image."},
        {"type": "image_url", "image_url": f"data:image/jpeg;base64,{encoded_image}"}
    ]
    response = model.generate_content(prompt)
    return response.text

In [25]:
text_summaries = []
for i, te in enumerate(text_elements):
    summary = summarize_text(te)
    text_summaries.append(summary)
    print(f"{i + 1}th element of texts processed.")
    time.sleep(10)  # Optional delay if rate limits apply


1th element of texts processed.
2th element of texts processed.
3th element of texts processed.
4th element of texts processed.
5th element of texts processed.


In [26]:
text_summaries

['This report, authored by members of the Economic Advisory Council to the Prime Minister of India, highlights the urgent need for rationalization of the list of Monuments of National Importance (MNI) in India. The current list, which includes 3695 monuments under the care of the Archaeological Survey of India (ASI), is deemed unwieldy and outdated, suffering from issues such as:\n\n* **Selection Errors:**  The list includes minor colonial structures, monuments with local significance, and even "untraceable" monuments. \n* **Uneven Geographic Distribution:** The distribution of MNI is geographically skewed, potentially overlooking important cultural sites in certain regions.\n* **Inadequate Funding and Maintenance:**  Insufficient funds are allocated for the preservation and upkeep of the MNI, and sustainable revenue generation models are lacking.\n\nThe report proposes key recommendations:\n\n* **Increased Funding:**  Allocate more funds for protection and maintenance of the MNI.\n* *

In [27]:
table_elements = table_elements[:2]

In [28]:
table_summaries = []
for i, te in enumerate(table_elements):
    try:
        summary = summarize_table(te)
        table_summaries.append(summary)
        print(f"{i + 1}th element of tables processed.")
    except Exception as e:
        print(f"Error processing element {i + 1}: {e}")
        continue
    time.sleep(10)  # Wait for 30 seconds before the next request

1th element of tables processed.
2th element of tables processed.


In [29]:
table_elements

['I. II. Annexure B: List of British graves/cemeteries treated as monuments of national importance Annexure E: Standard Operating Procedure (SoP) followed by ASI for declaring',
 'State/ No. of State/ No. of Union Territory Monuments Union Territory Monuments Andhra Pradesh 135 Manipur 1 Arunachal Pradesh 3 Meghalaya 8 Assam 55 Mizoram 1 Bihar 70 Nagaland 4 Chhattisgarh 46 N.C.T. Delhi 173 Daman & Diu (U.T) 11 Odisha 80 Goa 21 Puducherry (U.T) 7 Gujarat 205 Punjab 33 Haryana 91 Rajasthan 163 Himachal Pradesh 40 Sikkim 3 Jammu & Kashmir (U.T) 56 Telangana 8 Jharkhand 13 Tamil Nadu 412 Karnataka 506 Tripura 8 Kerala 29 Uttar Pradesh 743 Ladakh (U.T) 15 Uttarakhand 43 Madhya Pradesh 291 West Bengal 135 Maharashtra 286 Total 3695 Source: Compiled by EAC-PM']

In [30]:
def summarize_image(image_file):
    # Open the image using the provided filename
    image_path = os.path.join("figures", image_file)
    image = Image.open(image_path)

    # Generate content using the image
    response = model.generate_content([
        "Describe the contents of this image.",
        image
    ])
    
    return response.text

In [99]:
image_elements = image_elements[:2]
image_elements

['/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABABEwDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD0PxF4i8SaIt1PaeB11SzgJIltdUBYrngmPZuzjnA3Y/WvOD+0fbBf+RPbf6f2jx/6Lr1ibR431H+0IZpbe6/vxt1+o6GvCfjX4Tnl+Ikcmi6ZPPNfaet7cx2sRbLhmV3wBxwFJ9znvQB6t4E+JmkePo5YIbc6fqsQLGyeXf

In [100]:
image_summaries = []

# Process each image file in the figures directory
for i, image_file in enumerate(image_elements):
    try:
        summary = summarize_image(image_file)  # Pass only the filename
        image_summaries.append(summary)
        print(f"{i + 1}th element of images processed.")
    except Exception as e:
        print(f"Error processing element {i + 1}: {e}")
        continue
    time.sleep(30)  # Wait for 30 seconds before the next request

Error processing element 1: [Errno 2] No such file or directory: 'C:\\9j\\4AAQSkZJRgABAQAAAQABAAD\\2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL\\2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL\\wAARCABABEwDASIAAhEBAxEB\\8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL\\8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6\\8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL\\8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6\\9oADAMBAAIRAxEAPwD0PxF4i8SaIt1PaeB11SzgJIltdUBYrngmPZuzjnA3Y\\WvOD+0fbBf+RPbf6f2jx\\6Lr1ibR431H+0IZpbe6\\vxt1

In [101]:
image_summaries

[]

#5. Creation of faiss_index using `faiss-cpu`


Installing the required library...


In [41]:
! pip install faiss-cpu



In [43]:
#gemini
! pip install --upgrade --quiet langchain-google-genai 


In [47]:
# Imports
import os
import uuid
import faiss
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema.document import Document

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Create documents list
documents = []
retrieve_contents = []

# Add text documents
for e, s in zip(text_elements, text_summaries):
    i = str(uuid.uuid4())
    doc = Document(
        page_content=s,
        metadata={
            'id': i,
            'type': 'text',
            'original_content': e
        }
    )
    retrieve_contents.append((i, e))
    documents.append(doc)

# Add table documents
for e, s in zip(table_elements, table_summaries):
    i = str(uuid.uuid4())
    doc = Document(
        page_content=s,
        metadata={
            'id': i,
            'type': 'table',
            'original_content': e
        }
    )
    retrieve_contents.append((i, e))
    documents.append(doc)

# Add image documents
for e, s in zip(image_elements, image_summaries):
    i = str(uuid.uuid4())
    doc = Document(
        page_content=s,
        metadata={
            'id': i,
            'type': 'image',
            'original_content': e
        }
    )
    retrieve_contents.append((i, e))
    documents.append(doc)

# Create the FAISS vector store with Gemini embeddings
vectorstore = FAISS.from_documents(documents=documents, embedding=embeddings)

# Verify vector store creation
print("Number of items in the vector store:", len(documents))


Number of items in the vector store: 7


In [45]:
#remove

Creating documents and converting those documents into a one special thing. Its called **vector store**.


In [111]:
# from langchain.vectorstores import FAISS
# import os
# import uuid
# import base64
# from IPython import display
# from unstructured.partition.pdf import partition_pdf
# from langchain.chat_models import ChatOpenAI
# from langchain.embeddings import OpenAIEmbeddings
# from langchain.chains import LLMChain
# from langchain.prompts import PromptTemplate
# from langchain.schema.messages import HumanMessage, SystemMessage
# from langchain.schema.document import Document
# from langchain.vectorstores import FAISS
# from langchain.retrievers.multi_vector import MultiVectorRetriever
# # Create Documents and Vectorstore
# documents = []
# retrieve_contents = []

# for e, s in zip(text_elements, text_summaries):
#     i = str(uuid.uuid4())
#     doc = Document(
#         page_content=s,
#         metadata={
#             'id': i,
#             'type': 'text',
#             'original_content': e
#         }
#     )
#     retrieve_contents.append((i, e))
#     documents.append(doc)

# for e, s in zip(table_elements, table_summaries):
#     i = str(uuid.uuid4())
#     doc = Document(
#         page_content=s,
#         metadata={
#             'id': i,
#             'type': 'table',
#             'original_content': e
#         }
#     )
#     retrieve_contents.append((i, e))
#     documents.append(doc)

# for e, s in zip(image_elements, image_summaries):
#     i = str(uuid.uuid4())
#     doc = Document(
#         page_content=s,
#         metadata={
#             'id': i,
#             'type': 'image',
#             'original_content': e
#         }
#     )
#     retrieve_contents.append((i, e))
#     documents.append(doc)

# # Create the vector database
# vectorstore = FAISS.from_documents(documents=documents, embedding=OpenAIEmbeddings(openai_api_key="sk-nzeHHW_akDaS9h_zx3hlxdmlrTeS48eQWbhdhdqWXMT3BlbkFJGHobntdhy0MJ7Hoxgf7p98gq5sU8VPmURcD3v9zw4A"))


#6. Saving the vector databse to local


In [50]:
vectorstore.save_local("../faiss_index_pdf") #You can checkout the file in your contents

#7. Creating Custom LLM and Embedding Models using Mindsdb Endpoints.


Before that, generate the api key from [mdb.ai](https://mdb.ai/models). By the God's grace 😀 the api is free......


In [112]:
# import base64
# from openai import OpenAI
# from langchain.vectorstores import FAISS
# from langchain.chains import LLMChain
# from langchain.prompts import PromptTemplate
# from IPython.display import display, Image
# from langchain.embeddings.base import Embeddings
# from langchain.llms.base import LLM
# from pydantic import BaseModel, Field

# # Initialize OpenAI/MindsDB client
# client = OpenAI(
#     api_key="replace-your-mdb.ai-api-key",
#     base_url="https://llm.mdb.ai/"
# )

# class (Embeddings):
#     def __init__(self, client):
#         super().__MDBEmbeddingsinit__()
#         self.client = client

#     def embed_query(self, text):
#         response = self.client.embeddings.create(
#             model="text-embedding-ada-002",
#             input=text,
#             encoding_format="float"
#         )
#         return response.data[0].embedding

#     def __call__(self, text):
#         return self.embed_query(text)

#     def embed_documents(self, texts):
#         return [self.embed_query(text) for text in texts]

# class MDBChatLLM(LLM):
#     client: OpenAI = Field(...)

#     def __init__(self, client):
#         super().__init__()
#         self.client = client

#     def _call(self, prompt, **kwargs):
#         completion = self.client.chat.completions.create(
#             model="gpt-3.5-turbo",
#             messages=[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}],
#             stream=False
#         )
#         return completion.choices[0].message.content

#     @property
#     def _llm_type(self) -> str:
#         return "custom_mdb_chat"

# # Instantiate the embeddings and LLM classes
# embeddings = MDBEmbeddings(client=client)
# mdb_chat_llm = MDBChatLLM(client=client)

#8. Loading the saved vector database and creating a prompt template


In [86]:
from langchain_google_genai import GoogleGenerativeAI
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnableSequence
from langchain.chains import LLMChain


In [87]:
# Load the FAISS index with custom embeddings
db = FAISS.load_local("../faiss_index_pdf", embeddings, allow_dangerous_deserialization=True)

# llm = GoogleGenerativeAI(model="models/text-bison-001", google_api_key=api_key)
llm = ChatGoogleGenerativeAI(
        model="gemini-pro",
        google_api_key=api_key,
        temperature=0.7,
        convert_system_message_to_human=True
    )

# Define the prompt template for the LLMChain
prompt_template = """
You are an assistant tasked with summarizing tables and text.
Give a concise summary of the table or text.
Answer the question based only on the following context, which can include text, images, and tables:
{context}
Question: {question}
Don't answer if you are not sure and decline to answer and say "Sorry, I don't have much information about it."
Just return the helpful answer in as much detail as possible.
Answer:
"""

prompt = PromptTemplate.from_template(prompt_template)

#9. Creating a function that retreives the relevant content for the user question. Where it is used as context while generating answer with the help of LLM..


In [89]:
qa_chain = LLMChain(llm=llm, prompt=prompt)

# Define the answer function to retrieve content and answer queries
def answer(question):
    # Retrieve relevant documents from FAISS
    relevant_docs = db.similarity_search(question)
     # Initialize context and images list
    context_parts = []
    relevant_images = []
    
    # Build context from retrieved documents
    for doc in relevant_docs:
        doc_type = doc.metadata.get('type', 'unknown')
        
        if doc_type == 'text':
            context_parts.append(f'[text]{doc.metadata["original_content"]}')
        elif doc_type == 'table':
            context_parts.append(f'[table]{doc.metadata["original_content"]}')
        elif doc_type == 'image':
            context_parts.append(f'[image]{doc.page_content}')
            if 'original_content' in doc.metadata:
                relevant_images.append(doc.metadata['original_content'])

    # Combine all context parts
    context = "\n".join(context_parts)

    # Get the answer
    result = qa_chain.run(context=context, question=question)
    
    return result, relevant_images

#10. Testing....


In [96]:
# Example usage
question = "List the number of Monuments in each State"
result, relevant_images = answer(question)



In [97]:
# result #retreived result from LLM
result, relevant_images

('| State/ Union Territory | No. of Monuments |\n|---|---|\n| Andhra Pradesh | 135 |\n| Arunachal Pradesh | 3 |\n| Assam | 55 |\n| Bihar | 70 |\n| Chhattisgarh | 46 |\n| Daman & Diu (U.T) | 11 |\n| Goa | 21 |\n| Gujarat | 205 |\n| Haryana | 91 |\n| Himachal Pradesh | 40 |\n| Jammu & Kashmir (U.T) | 56 |\n| Jharkhand | 13 |\n| Karnataka | 506 |\n| Kerala | 29 |\n| Ladakh (U.T) | 15 |\n| Madhya Pradesh | 291 |\n| Maharashtra | 286 |\n| Manipur | 1 |\n| Meghalaya | 8 |\n| Mizoram | 1 |\n| Nagaland | 4 |\n| N.C.T. Delhi | 173 |\n| Odisha | 80 |\n| Puducherry (U.T) | 7 |\n| Punjab | 33 |\n| Rajasthan | 163 |\n| Sikkim | 3 |\n| Tamil Nadu | 412 |\n| Telangana | 8 |\n| Tripura | 8 |\n| Uttar Pradesh | 743 |\n| Uttarakhand | 43 |\n| West Bengal | 135 |',
 [])

In [95]:
#Displaying the top most relevant image from relevant images list
image_data = base64.b64decode(relevant_images[1])
display(Image(image_data))

IndexError: list index out of range

#Conclusion

Checkout the [Github Link](https://github.com/chakka-guna-sekhar-venkata-chennaiah/Mutli-Modal-RAG-ChaBot) for Streamlit Deployment....

If you appreciate this project, kindly show your support by ⭐ starring the repository and voting for me on Quria. Your encouragement would mean a lot! Additionally, I'd be grateful if you could like 👍, share, and follow me on [LinkedIn](https://linkedin.com/in/chakka-guna-sekhar-venkata-chennaiah-7a6985208/) to stay connected and get updates on my latest work. Thank you! 🙏✨
