# Testing Large Language Model Agents with Retrieval-Augmented Generation
In this notebook, we'll explore how to test various large language model (LLM) agents powered by Retrieval-Augmented Generation (RAG). We'll walk through the entire process, from loading and preprocessing multimodal data to querying exam questions using different RAG strategies.

## Table of Contents

1. [Setup](#1-setup)
2. [Data Loading and Preprocessing](#2-data-loading-and-preprocessing)
    - [2.1. Load Multimodal Data](#21-load-multimodal-data)
    - [2.2. Extract and Partition Text, Tables, and Images](#22-extract-and-partition-text-tables-and-images)
3. [Connecting to the Language Model](#3-connecting-to-the-language-model)
4. [Generating Summaries for Multimodal Data](#4-generating-summaries-for-multimodal-data)
    - [4.1. Create Text and Table Summaries](#41-create-text-and-table-summaries)
    - [4.2. Create Image Summaries](#42-create-image-summaries)
5. [Building Vector Retrievers](#5-building-vector-retrievers)
    - [5.1. Access Embedding Model](#51-access-embedding-model)
    - [5.2. Create Utility Functions](#52-create-utility-functions)
    - [5.3. Initiate Vectorstores: Chroma](#53-initiate-vectorstores-chroma)
    - [5.4. Initiate Docstores: Redis and InMemoryStore](#54-initiate-docstores-redis-and-inmemorystore)
    - [5.5. Create Retrievers](#55-create-retrievers)
6. [Querying Exam Questions](#6-querying-exam-questions)
    - [6.1. Read the Excel File](#61-read-the-excel-file)
    - [6.2. Define Helper Functions](#62-define-helper-functions)
    - [6.3. Modify RAG Functions](#63-modify-rag-functions)
    - [6.4. Process Each Question and Collect Responses](#64-process-each-question-and-collect-responses)

## 1. Setup

Before diving into the core functionalities, let's set up our environment by importing the necessary libraries and configuring essential settings.

In [None]:
import os
import uuid
import pandas as pd
import re
import base64
import htmltabletomd
import logging
import requests
import time
import ipywidgets as widgets
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

In [None]:
from io import BytesIO, StringIO
from PIL import Image as PILImage
from IPython.display import display, Markdown, HTML

In [None]:
# LangChain and related libraries
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain_community.storage import RedisStore
from langchain_community.utilities.redis import get_client
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain.storage import InMemoryStore

## 2. Data Loading and Preprocessing

In this section, we'll load multimodal data (PDFs containing text, tables, and images) and preprocess it for further analysis.

### 2.1. Load Multimodal Data

We'll start by locating all PDF files within the specified directory and its subdirectories. This setup ensures that we process all relevant documents while excluding hidden files and directories.

In [None]:
# Remove existing figures to ensure a clean workspace
!rm -rf ./figures

In [None]:
# Directory containing the PDFs
pdf_dir = './references'

# Collect all PDF files from the directory and subdirectories, excluding hidden ones
pdf_files = []

for root, dirs, files in os.walk(pdf_dir):
    # Exclude hidden directories
    dirs[:] = [d for d in dirs if not d.startswith('.')]
    for file in files:
        # Exclude hidden files and ensure the file has a .pdf extension
        if file.lower().endswith('.pdf') and not file.startswith('.'):
            pdf_files.append(os.path.join(root, file))

### 2.2. Extract and Partition Text, Tables, and Images

Next, we'll extract the content from each PDF using UnstructuredPDFLoader. The loader is configured to extract text, tables, and images, and to partition the content into manageable chunks based on titles.

In [None]:
# Initialize an empty list to hold data from all PDFs
data = []

# Loop through each PDF file and load its content
for pdf_file in pdf_files:
    print(f'Loading {pdf_file}')
    loader = UnstructuredPDFLoader(
        file_path=pdf_file,
        strategy='hi_res',
        extract_images_in_pdf=True,
        infer_table_structure=True,
        chunking_strategy="by_title",     # Section-based chunking
        max_characters=4000,              # Max size of chunks
        new_after_n_chars=4000,           # Preferred size of chunks
        combine_text_under_n_chars=2000,  # Combine smaller chunks
        mode='elements',
        image_output_dir_path='./figures'
    )
    data.extend(loader.load())

In [None]:
# Separate documents and tables based on metadata
docs = []
tables = []

for doc in data:
    if doc.metadata['category'] == 'Table':
        tables.append(doc)
    elif doc.metadata['category'] == 'CompositeElement':
        docs.append(doc)

# Display the number of documents and tables extracted
len(docs), len(tables)

In [None]:
# Convert HTML tables to Markdown for easier readability and processing
for table in tables:
    table.page_content = htmltabletomd.convert_table(table.metadata['text_as_html'])

## 3. Connecting to the Language Model

To interact with the OpenAI language models, we'll establish a connection using the OpenAI API. You'll be prompted to enter your API key securely.

In [None]:
from getpass import getpass

# Prompt the user to enter their OpenAI API Key securely
OPENAI_KEY = getpass('Enter Open AI API Key: ')
os.environ['OPENAI_API_KEY'] = OPENAI_KEY

In [None]:
# Initialize the ChatOpenAI model with desired parameters
chatgpt = ChatOpenAI(model_name='gpt-4o', temperature=0)

## 4. Generating Summaries for Multimodal Data

Summarizing the extracted data is crucial for efficient retrieval. We'll generate summaries for texts, tables, and images to optimize them for semantic retrieval.

### 4.1. Create Text and Table Summaries

Using a tailored prompt, we'll instruct the language model to generate detailed summaries of text and tables. These summaries are designed to be easily embedded and retrieved later.

In [None]:
# Define the prompt template for summarization
prompt_text = """
You are an assistant tasked with summarizing tables and text particularly for semantic retrieval.
These summaries will be embedded and used to retrieve the raw text or table elements.
Give a detailed summary of the table or text below that is well optimized for retrieval.
For any tables also add in a one line description of what the table is about besides the summary.
Do not add additional words like Summary: etc.

Table or text chunk:
{element}
"""
prompt = ChatPromptTemplate.from_template(prompt_text)

# Define the summarization chain
summarize_chain = (
    {"element": RunnablePassthrough()}
      |
    prompt
      |
    chatgpt
      |
    StrOutputParser()  # Extracts the response as text and returns it as a string
)

In [None]:
# Initialize lists to hold summaries
text_summaries = []
table_summaries = []

# Prepare documents for summarization
text_docs = [doc.page_content for doc in docs]
table_docs = [table.page_content for table in tables]

# Generate summaries in batches with concurrency
text_summaries = summarize_chain.batch(text_docs, {"max_concurrency": 5})
table_summaries = summarize_chain.batch(table_docs, {"max_concurrency": 5})

# Display the number of summaries generated
len(text_summaries), len(table_summaries)

### 4.2. Create Image Summaries

Images require special handling. We'll encode images to Base64 and generate summaries that describe their content, making them suitable for retrieval-based tasks.

In [None]:
# Function to encode images to Base64
def encode_image(image_path):
    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

In [None]:
# Function to generate image summaries using the language model
def image_summarize(img_base64, prompt):
    chat = ChatOpenAI(model="gpt-4o", temperature=0)

    msg = chat.invoke(
        [
            HumanMessage(
                content=[
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"},
                    },
                ]
            )
        ]
    )
    return msg.content

In [None]:
# Function to generate summaries for all images in a directory
def generate_img_summaries(path):
    """
    Generate summaries and base64 encoded strings for images
    path: Path to list of .jpg files extracted by Unstructured
    """

    # Lists to store Base64 encoded images and their summaries
    img_base64_list = []
    image_summaries = []

    # Define the prompt for image summarization
    prompt = """You are an assistant tasked with summarizing images for retrieval.
                Remember these images could potentially contain graphs, charts or tables also.
                These summaries will be embedded and used to retrieve the raw image for question answering.
                Give a detailed summary of the image that is well optimized for retrieval.
                Do not add additional words like Summary: etc.
             """

    # Process each image file in the directory
    for img_file in sorted(os.listdir(path)):
        if img_file.endswith(".jpg"):
            img_path = os.path.join(path, img_file)
            base64_image = encode_image(img_path)
            img_base64_list.append(base64_image)
            image_summaries.append(image_summarize(base64_image, prompt))

    return img_base64_list, image_summaries

In [None]:
# Path to the directory containing extracted images
IMG_PATH = './figures'

# Generate Base64 encoded images and their summaries
imgs_base64, image_summaries = generate_img_summaries(IMG_PATH)

# Display the number of images processed
len(imgs_base64), len(image_summaries)

## 5. Building Vector Retrievers

Vector retrievers play a pivotal role in RAG by enabling efficient and relevant information retrieval. We'll build both multimodal and single-modal retrievers to handle diverse data types.

### 5.1. Access Embedding Model

We'll use OpenAI's embedding model to convert our summaries into vector representations suitable for retrieval.

In [None]:
# Initialize the OpenAI embedding model
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-large')

### 5.2. Create Utility Functions

Utility functions will assist in managing documents and integrating them with our vector store and document store.

In [None]:
# Function to create a multi-vector retriever
def create_multi_vector_retriever(
    docstore, vectorstore, text_summaries, texts, table_summaries, tables, image_summaries, images
):

    id_key = "doc_id"

    # Initialize the MultiVectorRetriever
    retriever = MultiVectorRetriever(
        vectorstore=vectorstore,
        docstore=docstore,
        id_key=id_key,
    )

    # Helper function to add documents to the retriever
    def add_documents(retriever, doc_summaries, doc_contents):
        doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
        summary_docs = [
            Document(page_content=s, metadata={id_key: doc_ids[i]})
            for i, s in enumerate(doc_summaries)
        ]
        retriever.vectorstore.add_documents(summary_docs)
        retriever.docstore.mset(list(zip(doc_ids, doc_contents)))

    # Add text summaries and their contents
    if text_summaries:
        add_documents(retriever, text_summaries, texts)
    # Add table summaries and their contents
    if table_summaries:
        add_documents(retriever, table_summaries, tables)
    # Add image summaries and their contents
    if image_summaries:
        add_documents(retriever, image_summaries, images)

    return retriever

### 5.3. Initiate Vectorstores: Chroma

Chroma serves as our vector store, indexing the summaries and their embeddings for efficient retrieval.

In [None]:
# Initialize the Chroma vectorstore for multimodal data
chroma_db_multimodal = Chroma(
    collection_name="mm_rag",
    embedding_function=openai_embed_model,
    collection_metadata={"hnsw:space": "cosine"},
)

In [None]:
# Initialize the Chroma vectorstore for single-modal data
chroma_db_single_modal = Chroma(
    collection_name="text_rag",
    embedding_function=openai_embed_model,
    collection_metadata={"hnsw:space": "cosine"},
)

### 5.4. Initiate Docstores: Redis and InMemoryStore

Docstores store the raw documents corresponding to the summaries. We'll use Redis for the multimodal retriever and an in-memory store for the single-modal retriever.

**Note:** Before proceeding, ensure that Redis Stack Server is installed and running. You can set it up by executing the following commands in JupyterLab's terminal:

```bash
# 1. Import the GPG key for the Redis repository
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg

# 2. Add the Redis repository to your sources list
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] \
https://packages.redis.io/deb $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/redis.list

# 3. Update package lists
sudo apt-get update

# 4. Install Redis Stack Server
sudo apt-get install redis-stack-server

# 5. Start Redis Stack Server in the background
redis-stack-server --daemonize yes

In [None]:
from langchain.storage import InMemoryStore
from langchain_community.utilities.redis import get_client
from langchain_community.storage import RedisStore

# Initialize Redis client
client = get_client('redis://localhost:6379')

# Initialize RedisStore for multimodal retriever
redis_store = RedisStore(client=client)  # Alternative stores like filestore or memorystore can also be used

In [None]:
# Initialize InMemoryStore for single-modal retriever
docstore_single_modal = InMemoryStore()

### 5.5. Create Retrievers

With our vector stores and document stores set up, we'll create both multimodal and single-modal retrievers.

In [None]:
# Create the multimodal retriever
retriever_multimodal = create_multi_vector_retriever(
    redis_store,
    chroma_db_multimodal,
    text_summaries,
    text_docs,
    table_summaries,
    table_docs,
    image_summaries,
    imgs_base64,
)

# Display the multimodal retriever
retriever_multimodal

In [None]:
# Create the single-modal retriever (only text)
retriever_single_modal = create_multi_vector_retriever(
    docstore_single_modal,
    chroma_db_single_modal,
    text_summaries,
    text_docs,
    table_summaries=[],  # No table summaries
    tables=[],
    image_summaries=[],  # No image summaries
    images=[],
)

# Display the single-modal retriever
retriever_single_modal

## 6. Querying Exam Questions

Now that our data is preprocessed and our retrievers are set up, we'll proceed to query exam questions using different RAG strategies.

### 6.1. Read the Excel File

We'll start by loading the exam questions from an Excel file into a pandas DataFrame. This setup allows us to iterate through each question systematically.

In [None]:
# Configure logging to suppress unnecessary logs
logging.basicConfig(level=logging.CRITICAL, format='%(levelname)s: %(message)s')

# Disable specific external library loggers to reduce clutter
logging.getLogger('openai').disabled = True
logging.getLogger('urllib3').disabled = True
logging.getLogger('requests').disabled = True
logging.getLogger('httpx').disabled = True

# Create a dedicated logger for the application
logger = logging.getLogger('EA_Exam_Processor')
logger.setLevel(logging.WARNING)  # Only WARNING and above will be handled

In [None]:
# Path to the Excel file containing exam questions
excel_file_path = './EA QAs/EA QAs.xlsx'

# Read the Excel file into a DataFrame
questions_df = pd.read_excel(excel_file_path, sheet_name='EA QAs')

# Ensure the columns are correctly loaded
print("Columns in the Excel file:", questions_df.columns)

### 6.2. Define Helper Functions

Helper functions are essential for tasks like extracting answers, encoding images, resizing images, and determining the type of content.

In [None]:
# Function to extract the answer from the model's response
def extract_answer(text):
    """
    Extracts the agent's answer, ensuring it is one of (A), (B), (C), or (D).
    Returns the uppercase letter if found, otherwise logs a warning and returns None.
    """
    # Normalize the text
    text = text.strip().upper()
    
    # Use regex to find standalone A, B, C, or D, possibly followed by punctuation
    match = re.search(r'\b([ABCD])\b', text)
    if match:
        return match.group(1)
    
    # Additional checks: look for patterns like "Answer: A" or "A)"
    match = re.search(r'ANSWER[:\s]*([ABCD])', text)
    if match:
        return match.group(1)
    
    match = re.search(r'\b([ABCD])[).\s]', text)
    if match:
        return match.group(1)
    
    return None

In [None]:
# Function to encode the image to Base64
def encode_image(image_path):
    """
    Encodes an image file to a Base64 string after resizing it to a maximum size.
    """
    try:
        with PILImage.open(image_path) as img:
            # Resize the image while maintaining aspect ratio
            img.thumbnail((400, 400), PILImage.LANCZOS)
            # Save the image to a BytesIO object
            buffered = BytesIO()
            img.save(buffered, format="JPEG", quality=85)
            # Encode the image to Base64 and decode to string
            img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
            return img_str
    except FileNotFoundError:
        # Log a warning if the image file is not found
        logger.warning(f"Image file {image_path} not found.")
        return None
    except Exception as e:
        # Log any other exceptions
        logger.warning(f"Error encoding image {image_path}: {e}")
        return None

In [None]:
# Function to resize a Base64-encoded image
def resize_base64_image(img_base64, max_size=(400, 400)):
    """
    Resizes a Base64-encoded image to the specified maximum size.
    """
    try:
        img_data = base64.b64decode(img_base64)
        img = PILImage.open(BytesIO(img_data))
        img.thumbnail(max_size, PILImage.LANCZOS)
        buffered = BytesIO()
        img.save(buffered, format="JPEG", quality=85)
        resized_img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
        return resized_img_str
    except Exception as e:
        # Log any exceptions during resizing
        logger.warning(f"Error resizing image: {e}")
        return img_base64  # Return original if resizing fails

In [None]:
# Function to check if a string looks like Base64
def looks_like_base64(sb):
    """Check if the string looks like base64"""
    return re.match("^[A-Za-z0-9+/]+[=]{0,2}$", sb) is not None

In [None]:
# Function to check if Base64 data is an image
def is_image_data(b64data):
    """
    Check if the base64 data is an image by looking at the start of the data
    """
    image_signatures = {
        b"\xff\xd8\xff": "jpg",
        b"\x89\x50\x4e\x47\x0d\x0a\x1a\x0a": "png",
        b"\x47\x49\x46\x38": "gif",
        b"\x52\x49\x46\x46": "webp",
    }
    try:
        header = base64.b64decode(b64data)[:8]  # Decode and get the first 8 bytes
        for sig, format in image_signatures.items():
            if header.startswith(sig):
                return True
        return False
    except Exception:
        return False

In [None]:
# Function to detect if text contains a Markdown table
def detect_markdown_table(text):
    """
    Detects if the text contains a Markdown-formatted table.
    """
    lines = text.strip().split('\n')
    if len(lines) >= 2:
        # Check for header separator line (e.g., | --- | --- |)
        header_line = lines[1].strip()
        if re.match(r'^\s*\|?\s*:-{1,}\s*(\|\s*:-{1,}\s*)+\|?\s*$', header_line):
            return True
    return False

In [None]:
# Function to split documents into images, texts, and tables
def split_docs_into_images_texts_tables(docs):
    """
    Splits documents into images, texts, and tables.
    """
    images = []
    texts = []
    tables = []
    for doc in docs:
        # Extract content and metadata
        if isinstance(doc, Document):
            content = doc.page_content
            metadata = doc.metadata
        else:
            content = doc
            metadata = {}
    
        # Ensure content is a string
        if isinstance(content, bytes):
            content = content.decode('utf-8', errors='ignore')
    
        # Extract category from metadata
        category = metadata.get('category', '').lower()
    
        # Check if the document is a table based on metadata or content
        if category == 'table':
            tables.append({'content': content, 'metadata': metadata})
            continue
        elif '<table' in content.lower():
            tables.append({'content': content, 'metadata': metadata})
            continue
        elif detect_markdown_table(content):
            tables.append({'content': content, 'metadata': metadata})
            continue
    
        # Remove data URL prefix if present
        if content.startswith('data:image'):
            content = content.split(',', 1)[1]
    
        # Check if content is an image
        if looks_like_base64(content) and is_image_data(content):
            images.append(content)
        else:
            texts.append(content)
    return {'images': images, 'texts': texts, 'tables': tables}

In [None]:
# Function to limit text length
def limit_text_length(text, max_words=100):
    """
    Truncates the input text to a maximum number of words.

    Parameters:
    - text (str): The input text to truncate.
    - max_words (int): The maximum number of words to retain.

    Returns:
    - str: The truncated text.
    """
    words = text.split()
    return ' '.join(words[:max_words])

In [None]:
# Function to display a Base64-encoded image using matplotlib
def display_base64_image(img_base64):
    """
    Displays a Base64-encoded image using matplotlib.
    """
    try:
        img_data = base64.b64decode(img_base64)
        img = PILImage.open(BytesIO(img_data))
        plt.figure(figsize=(6,6))
        plt.imshow(img)
        plt.axis('off')
        plt.show()
    except Exception as e:
        logger.warning(f"Error displaying image: {e}")

In [None]:
# Function to make direct API calls with images
def call_openai_api_with_image(messages, max_tokens=300):
    """
    Makes a direct API call to OpenAI's Chat Completion endpoint with structured messages.
    """
    api_key = os.getenv('OPENAI_API_KEY')
    if not api_key:
        logger.error("OpenAI API key not set in environment variables.")
        return None
    
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    
    payload = {
        "model": "gpt-4o",
        "messages": messages,
        "max_tokens": max_tokens,
        "temperature": 0.7,
        "top_p": 0.9
    }
    
    try:
        response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as http_err:
        logger.warning(f"HTTP error occurred: {http_err} - Response: {response.text}")
    except Exception as err:
        logger.warning(f"Other error occurred: {err}")
    return None

### 6.3. Define RAG Functions

We'll define functions to handle both RAG and non-RAG strategies for answering multiple-choice questions. These functions manage the retrieval of relevant documents and interact with the language model to generate answers.

In [None]:
# Modified RAG function using direct API calls with the updated invoke method
def rag_multiple_choice_qa(question, options, retriever, image_base64=None):
    """
    Answers a multiple-choice question using the given retriever.
    Returns the answer (A, B, C, or D) and the sources.
    """
    # Retrieve relevant documents using the updated invoke method
    retrieved_docs = retriever.invoke(question)
    
    # Split documents into images, texts, and tables
    sources = split_docs_into_images_texts_tables(retrieved_docs)
    
    # Limit text sources to 100 words
    sources['texts'] = [limit_text_length(text) for text in sources['texts']]
    
    # Build the prompt
    formatted_texts = "\n".join(sources['texts'])
    context_text = f"Context documents:\n{formatted_texts}"
    
    # Build the prompt with clear instructions
    prompt_text = f"""You are an assistant that answers multiple-choice questions based solely on the input provided.
You will be given a question, several options, and context documents, which may include text, HTML tables, and images.
Use the context documents to choose the correct option.
Respond with only the letter of the chosen option: A, B, C, or D. Do not provide any additional text.

User question:
{question}

Options:
{options}

{context_text}

Answer:"""
    
    # Prepare the messages
    messages = []
    
    # Include the question image if provided
    if image_base64:
        resized_image_base64 = resize_base64_image(image_base64)
        messages.append({
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "User question includes an image."
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{resized_image_base64}"}
                }
            ]
        })
    
    # Add images from retrieved sources
    for image_data in sources['images']:
        resized_img_base64 = resize_base64_image(image_data)
        messages.append({
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{resized_img_base64}"}
                }
            ]
        })
    
    # Add tables from retrieved sources
    for table_dict in sources['tables']:
        table_content = table_dict['content']
        if isinstance(table_content, bytes):
            table_content = table_content.decode('utf-8', errors='ignore')
        messages.append({
            "role": "system",
            "content": table_content
        })
    
    # Add the main prompt
    messages.append({
        "role": "user",
        "content": prompt_text
    })
    
    # Make the API call
    response_json = call_openai_api_with_image(messages, max_tokens=300)
    
    if not response_json:
        return 'Invalid', sources
    
    # Extract the answer
    try:
        answer_text = response_json['choices'][0]['message']['content'].strip()
        answer = extract_answer(answer_text)
    except (KeyError, IndexError) as e:
        logger.warning(f"Error parsing response: {e} - Response: {response_json}")
        return 'Invalid', sources
    
    # If answer is None, log the issue and set to 'Invalid'
    if answer is None:
        logger.warning(f"Invalid answer extracted for question: '{answer_text}'")
        # Optionally, implement a retry mechanism here
        answer = 'Invalid'
    
    return answer, sources

In [None]:
# Modified No-RAG function using direct API calls
def no_rag_multiple_choice_qa(question, options, image_base64=None):
    """
    Answers a multiple-choice question without retrieval (no RAG).
    Returns the answer (A, B, C, or D).
    """
    # Build the prompt with clear instructions
    prompt_text = f"""You are an assistant that answers multiple-choice questions based solely on the input provided.
You will be given a question and several options.
Choose the correct option from the given options, and respond with only the letter of the chosen option: A, B, C, or D. Do not provide any additional text.

User question:
{question}

Options:
{options}

Answer:"""
    
    # Prepare the messages
    messages = []
    
    # Include the question image if provided
    if image_base64:
        resized_image_base64 = resize_base64_image(image_base64)
        messages.append({
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "User question includes an image."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{resized_image_base64}",
                        "detail": "high"
                    }
                }
            ]
        })
    
    # Add the main prompt
    messages.append({
        "role": "user",
        "content": prompt_text
    })
    
    # Make the API call
    response_json = call_openai_api_with_image(messages, max_tokens=300)
    
    if not response_json:
        return 'Invalid'
    
    # Extract the answer
    try:
        answer_text = response_json['choices'][0]['message']['content'].strip()
        answer = extract_answer(answer_text)
    except (KeyError, IndexError) as e:
        logger.warning(f"Error parsing response: {e} - Response: {response_json}")
        return 'Invalid'
    
    # If answer is None, log the issue and set to 'Invalid'
    if answer is None:
        logger.warning(f"Invalid answer: '{answer_text}'")
        # Optionally, implement a retry mechanism here
        answer = 'Invalid'
    
    return answer

### 6.4. Process Each Question and Collect Responses

Finally, we'll iterate through each exam question, process it using different RAG strategies, and compare the generated answers against the true answers. We'll also display the sources retrieved by each retriever for transparency.

In [None]:
# Main loop to process the questions
for idx, row in questions_df.iterrows():
    question_number = row['#']
    question_text = row['Question']
    image_needed = row['Image']
    options_text = row['Option']
    true_answer = row['Answer']

    # Load image if needed
    image_base64 = None
    if str(image_needed).strip().upper() == 'TRUE':
        image_path = os.path.join('EA QAs', f"{question_number}.jpg")
        image_base64 = encode_image(image_path)  # Adjust size as needed

    # Multimodal RAG agent
    mm_answer, mm_sources = rag_multiple_choice_qa(
        question_text, options_text, retriever_multimodal, image_base64
    )

    # Single-modal RAG agent
    sm_answer, sm_sources = rag_multiple_choice_qa(
        question_text, options_text, retriever_single_modal, image_base64=image_base64
    )

    # No-RAG agent
    nr_answer = no_rag_multiple_choice_qa(
        question_text, options_text, image_base64
    )

    # Compare answers to true answer
    mm_correct = 'CORRECT' if mm_answer == true_answer else 'WRONG'
    sm_correct = 'CORRECT' if sm_answer == true_answer else 'WRONG'
    nr_correct = 'CORRECT' if nr_answer == true_answer else 'WRONG'

    # Display the results
    print(f"[{question_number}]")
    print("-"*30)
    print(f"({true_answer}): True answer\n")
    print(f"({mm_answer}): Multimodal RAG - {mm_correct}")
    print(f"({sm_answer}): Single-modal RAG - {sm_correct}")
    print(f"({nr_answer}): No-RAG - {nr_correct}")
    print("-"*30)
    print("Question:")
    print(question_text)
    if image_base64:
        display_base64_image(image_base64)
    print("\nOptions:")
    print(options_text)
    print("-"*30)
    
    # Display Multimodal RAG Sources
    print("Multimodal RAG Sources:")
    mm_text_sources = mm_sources['texts']
    mm_image_sources = mm_sources['images']
    mm_table_sources = mm_sources['tables']
    
    # Display Multimodal Text Sources
    for i, text in enumerate(mm_text_sources):
        print(f"Text Source {i+1}:")
        display(Markdown(text))
        print()
    
    # Display Multimodal Image Sources
    for i, img_base64 in enumerate(mm_image_sources):
        print(f"Image Source {i+1}:")
        display_base64_image(img_base64)
        print()
    
    # Display Multimodal Table Sources
    for i, table_dict in enumerate(mm_table_sources):
        print(f"Table Source {i+1}:")
        table_content = table_dict['content']
        # Try to parse and display as HTML table
        try:
            tables = pd.read_html(StringIO(table_content))
            for table in tables:
                display(table)
        except ValueError:
            # If parsing fails, try to render as Markdown table
            try:
                display(Markdown(table_content))
            except Exception:
                # If all else fails, display the raw content
                print(table_content)
        print()
    
    print("-"*30)
    
    # Display Single-modal RAG Sources
    print("Single-modal RAG Sources:")
    sm_text_sources = sm_sources['texts']
    sm_image_sources = sm_sources['images']
    sm_table_sources = sm_sources['tables']
    
    # Display Single-modal Text Sources
    for i, text in enumerate(sm_text_sources):
        print(f"Text Source {i+1}:")
        display(Markdown(text))
        print()
    
    # Display Single-modal Image Sources
    for i, img_base64 in enumerate(sm_image_sources):
        print(f"Image Source {i+1}:")
        display_base64_image(img_base64)
        print()
    
    # Display Single-modal Table Sources
    for i, table_dict in enumerate(sm_table_sources):
        print(f"Table Source {i+1}:")
        table_content = table_dict['content']
        # Try to parse and display as HTML table
        try:
            tables = pd.read_html(StringIO(table_content))
            for table in tables:
                display(table)
        except ValueError:
            # If parsing fails, try to render as Markdown table
            try:
                display(Markdown(table_content))
            except Exception:
                # If all else fails, display the raw content
                print(table_content)
        print()
    
    print("="*100)
    
    # Add a 2-second pause to avoid rate limiting
    time.sleep(2)