# Adaptive Retrieval for Enhanced RAG Systems

In this notebook, I implement an Adaptive Retrieval system that dynamically selects the most appropriate retrieval strategy based on the type of query. This approach significantly enhances our RAG system's ability to provide accurate and relevant responses across a diverse range of questions.

Different questions demand different retrieval strategies. Our system:

1. Classifies the query type (Factual, Analytical, Opinion, or Contextual)
2. Selects the appropriate retrieval strategy
3. Executes specialized retrieval techniques
4. Generates a tailored response

## Setting Up the Environment
We begin by importing necessary libraries.

In [43]:
import os
import numpy as np
import json
import fitz
from openai import OpenAI
import re

## Extracting Text from a PDF File
To implement RAG, we first need a source of textual data. In this case, we extract text from a PDF file using the PyMuPDF library.

In [44]:
import os
import fitz  # pip install PyMuPDF

def extract_text_from_pdf(pdf_path: str) -> str:
    """
    Extracts text from a PDF file.

    Args:
        pdf_path (str): Path to the PDF file.

    Returns:
        str: Extracted text from the entire PDF.
    """
    doc = fitz.open(pdf_path)
    all_text = []
    for page in doc:
        all_text.append(page.get_text("text"))
    doc.close()
    return "\n".join(all_text)

def extract_texts_from_folder(folder_path: str):
    """
    Extracts text from all PDF files in a folder (recursively).
    Args:
        folder_path (str): Path to the folder containing PDFs.
    Returns:
        dict: {pdf_filename: extracted_text, ...}
    """
    pdf_texts = {}
    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                try:
                    pdf_texts[pdf_path] = extract_text_from_pdf(pdf_path)
                except Exception as e:
                    print(f"Failed to extract {pdf_path}: {e}")
    return pdf_texts

# Example usage:
folder_path = "/Users/kekunkoya/Desktop/RAG Google 2/PDFs"
pdf_texts = extract_texts_from_folder(folder_path)

for pdf_file, text in pdf_texts.items():
    print(f"\n--- {os.path.basename(pdf_file)} ---")
    print(text[:500])  # Print the first 500 characters to verify extraction



--- PA 211 Disaster Community Resources.pdf ---
PA 211 Community Disaster and Human 
Services Resources in Pennsylvania 
Introduction 
 
Community Disaster and Human Services Resources in Pennsylvania 
 
Disasters, whether natural or man-made, have significant and far-reaching impacts on 
individuals, families, and communities. Pennsylvania, with its mix of urban, suburban, and 
rural regions, faces a diverse array of emergencies ranging from floods and severe storms to 
public health crises and housing instability. To ensure an effective res

--- 211 RESPONDS TO URGENT NEEDS.pdf ---
211 RESPONDS TO URGENT NEEDS 
FACT
211 stood up a statewide text
response to support employees
impacted by the partial federal
government shutdown who did
not know when they would
receive their next paycheck.
211 assists in times of
disaster and widespread
need
FACT
FACT
1
PLEASE VOTE TO INCLUDE FUNDING FOR PENNSYLVANIA'S 211 SYSTEM IN THE STATE BUDGET TO
SUPPORT 211'S CAPACITY TO HELP OUR COMMUNITIES IN 

## Chunking the Extracted Text
Once we have the extracted text, we divide it into smaller, overlapping chunks to improve retrieval accuracy.

In [45]:
def chunk_text(text, n, overlap):
    """
    Chunks the given text into segments of n characters with overlap.

    Args:
    text (str): The text to be chunked.
    n (int): The number of characters in each chunk.
    overlap (int): The number of overlapping characters between chunks.

    Returns:
    List[str]: A list of text chunks.
    """
    chunks = []  # Initialize an empty list to store the chunks
    
    # Loop through the text with a step size of (n - overlap)
    for i in range(0, len(text), n - overlap):
        # Append a chunk of text from index i to i + n to the chunks list
        chunks.append(text[i:i + n])

    return chunks  # Return the list of text chunks

## Setting Up the OpenAI API Client
We initialize the OpenAI client to generate embeddings and responses.

In [46]:
# Initialize the OpenAI client with the base URL and API key
client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY")  # Retrieve the API key from environment variables
)

## Simple Vector Store Implementation
We'll create a basic vector store to manage document chunks and their embeddings.

In [47]:
import os
import numpy as np
import fitz  # pip install PyMuPDF
import google.generativeai as genai
from dotenv import load_dotenv

class SimpleVectorStore:
    """
    A simple vector store implementation using NumPy.
    """
    def __init__(self):
        self.vectors = []   # List to store embedding vectors
        self.texts = []     # List to store original texts
        self.metadata = []  # List to store metadata for each text

    def add_item(self, text, embedding, metadata=None):
        """
        Add an item to the vector store.
        """
        self.vectors.append(np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})

    def similarity_search(self, query_embedding, k=5, filter_func=None):
        """
        Find the most similar items to a query embedding.
        """
        if not self.vectors:
            return []
        query_vector = np.array(query_embedding)
        similarities = []
        for i, vector in enumerate(self.vectors):
            if filter_func and not filter_func(self.metadata[i]):
                continue
            norm_query = np.linalg.norm(query_vector)
            norm_vector = np.linalg.norm(vector)
            if norm_query == 0 or norm_vector == 0:
                similarity = 0.0
            else:
                similarity = np.dot(query_vector, vector) / (norm_query * norm_vector)
            similarities.append((i, similarity))
        similarities.sort(key=lambda x: x[1], reverse=True)
        results = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            results.append({
                "text": self.texts[idx],
                "metadata": self.metadata[idx],
                "similarity": score
            })
        return results

def extract_text_from_pdf(pdf_path: str) -> str:
    doc = fitz.open(pdf_path)
    all_text = []
    for page in doc:
        all_text.append(page.get_text("text"))
    doc.close()
    return "\n".join(all_text)

def chunk_text(text, chunk_size=1000, overlap=200):
    chunks = []
    start = 0
    while start < len(text):
        end = min(start + chunk_size, len(text))
        chunks.append(text[start:end])
        start += chunk_size - overlap
        if start >= len(text):
            break
    return chunks

def create_gemini_embedding(text, model="models/embedding-001"):
    response = genai.embed_content(model=model, content=text)
    return response['embedding']

if __name__ == '__main__':
    # Load API key
    load_dotenv()
    api_key = os.getenv("GEMINI_API_KEY")

    try:
        genai.configure(api_key=api_key)
    except Exception as e:
        print(f"An error occurred during Gemini API configuration: {e}")
        exit()

    # Create the vector store
    store = SimpleVectorStore()

    # Folder containing PDFs
    folder_path = "/Users/kekunkoya/Desktop/RAG Google 2/PDFs/"
    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                print(f"Processing PDF: {pdf_path}")
                try:
                    text = extract_text_from_pdf(pdf_path)
                except Exception as e:
                    print(f"Failed to extract {pdf_path}: {e}")
                    continue
                chunks = chunk_text(text, chunk_size=1000, overlap=200)
                for i, chunk in enumerate(chunks):
                    if not chunk.strip():
                        continue
                    try:
                        embedding = create_gemini_embedding(chunk)
                        store.add_item(chunk, embedding, metadata={"source": file, "chunk_id": i})
                    except Exception as e:
                        print(f"Embedding failed for {file} chunk {i}: {e}")

    print("Vector store populated with Gemini embeddings from all PDFs.")

    # Sample similarity search
    query_text = "how to make a plan for my family"
    print(f"\nSearching for items similar to: '{query_text}'")
    query_embedding = create_gemini_embedding(query_text)
    search_results = store.similarity_search(query_embedding, k=3)

    print("\nTop 3 search results:")
    for result in search_results:
        print(f"  - Text: {result['text'][:150].replace('\n',' ')}...")
        print(f"    Similarity: {result['similarity']:.4f}")
        print(f"    Metadata: {result['metadata']}")


Processing PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/PA 211 Disaster Community Resources.pdf
Processing PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/211 RESPONDS TO URGENT NEEDS.pdf
Processing PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/PEMA.pdf
Processing PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/ready-gov_disaster-preparedness-guide-for-older-adults.pdf
Processing PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/Substantial Damages Toolkit.pdf
Vector store populated with Gemini embeddings from all PDFs.

Searching for items similar to: 'how to make a plan for my family'

Top 3 search results:
  - Text: fore an emergency happens, sit down together and decide how you will get in  contact with each other, what mobility and/ or medication issues will nee...
    Similarity: 0.6842
    Metadata: {'source': 'PEMA.pdf', 'chunk_id': 65}
  - Text:  your family and friends have a plan in case of an emergency. Fill  out these cards and give one to each of them to make sure they

## Creating Embeddings

In [48]:
import google.generativeai as genai

# Assume genai.configure(api_key="YOUR_API_KEY") has been called and 'client' is not used.

def create_embeddings(text, model="models/embedding-001"):
    """
    Creates embeddings for the given text using the specified Gemini model.

    Args:
    text (str or List[str]): The input text(s) for which embeddings are to be created.
    model (str): The model to be used for creating embeddings. Defaults to "models/embedding-001".

    Returns:
    List[float] or List[List[float]]: The embedding vector(s).
    """
    # Gemini's embed_content can handle both a single string or a list of strings
    # in the 'content' parameter.
    response = genai.embed_content(
        model=model,
        content=text
    )

    # If the original input was a single string, return just the first embedding vector.
    if isinstance(text, str):
        return response['embedding']

    # Otherwise, return all embedding vectors as a list of lists.
    return response['embedding']

## Document Processing Pipeline

In [49]:
import os

def process_folder(folder_path, chunk_size=1000, chunk_overlap=200):
    """
    Process all PDFs in a folder for use with adaptive retrieval.

    Args:
        folder_path (str): Path to the folder containing PDF files.
        chunk_size (int): Size of each chunk in characters.
        chunk_overlap (int): Overlap between chunks in characters.

    Returns:
        Tuple[List[str], SimpleVectorStore]: All document chunks and combined vector store.
    """
    all_chunks = []
    store = SimpleVectorStore()

    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                print(f"\nExtracting text from PDF: {pdf_path}")
                extracted_text = extract_text_from_pdf(pdf_path)

                if not extracted_text:
                    print(f"Failed to extract text from {pdf_path}. Skipping.")
                    continue

                print("Chunking text...")
                chunks = chunk_text(extracted_text, chunk_size, chunk_overlap)
                print(f"Created {len(chunks)} text chunks for {file}")

                print("Creating embeddings for chunks...")
                chunk_embeddings = create_embeddings(chunks)

                if len(chunks) != len(chunk_embeddings):
                    print(f"Error: Mismatch between number of chunks and embeddings for {file}. Skipping.")
                    continue

                for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
                    store.add_item(
                        text=chunk,
                        embedding=embedding,
                        metadata={"index": i, "source": file}
                    )
                all_chunks.extend(chunks)
                print(f"Added {len(chunks)} chunks from {file} to the vector store.")

    print(f"\nAll done! Processed {len(all_chunks)} chunks from all PDFs.")
    return all_chunks, store

# Example usage:
# folder_path = "/Users/kekunkoya/Desktop/RAG Google 2/PDFs/"
# all_chunks, store = process_folder(folder_path)


## Query Classification

In [50]:
import google.generativeai as genai

def classify_query(query, model="gemini-pro"):
    """
    Classify a query into one of four categories: Factual, Analytical, Opinion, or Contextual.
    
    Args:
        query (str): User query
        model (str): LLM model to use. Defaults to "gemini-2.0-flash".
        
    Returns:
        str: Query category
    """
    # Define the prompt to guide the AI's classification
    prompt = f"""
    You are an expert at classifying questions.
    Classify the given query into exactly one of these categories:
    - Factual: Queries seeking specific, verifiable information.
    - Analytical: Queries requiring comprehensive analysis or explanation.
    - Opinion: Queries about subjective matters or seeking diverse viewpoints.
    - Contextual: Queries that depend on user-specific context.

    Return ONLY the category name, without any explanation or additional text.

    Query: {query}

    Category:
    """

    # Create a GenerativeModel instance
    model_instance = genai.GenerativeModel(model)

    # Generate the classification response from the AI model
    try:
        response = model_instance.generate_content(
            prompt,
            generation_config=genai.GenerationConfig(
                temperature=0.0, # Low temperature for deterministic output
                max_output_tokens=20 # Limit output to ensure it's just the category name
            )
        )
        
        # Extract and strip the category from the response
        category = response.text.strip()
    
        # Define the list of valid categories
        valid_categories = ["Factual", "Analytical", "Opinion", "Contextual"]
        
        # Ensure the returned category is a valid, single word
        for valid in valid_categories:
            if valid.lower() in category.lower():
                return valid
    
    except Exception as e:
        print(f"An error occurred during query classification: {e}")
        # Default to "Factual" if classification fails
        return "Factual"
    
    # Default to "Factual" if classification is not one of the valid categories
    return "Factual"

## Implementing Specialized Retrieval Strategies
### 1. Factual Strategy - Focus on Precision

In [51]:
import google.generativeai as genai

# Assume genai.configure(api_key="YOUR_API_KEY") has been called.

def call_gemini(prompt, model="gemini-2.0-flash", temperature=0):
    """A helper function to make a call to the Gemini API."""
    model_instance = genai.GenerativeModel(model)
    response = model_instance.generate_content(
        prompt,
        generation_config=genai.GenerationConfig(temperature=temperature)
    )
    return response.text.strip()

def factual_retrieval_strategy(query, vector_store, k=4):
    """
    Retrieval strategy for factual queries focusing on precision.
    
    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return
        
    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Factual retrieval strategy for: '{query}'")
    
    # Use LLM to enhance the query for better precision
    system_prompt = """You are an expert at enhancing search queries.
    Your task is to reformulate the given factual query to make it more precise and
    specific for information retrieval. Focus on key entities and their relationships.
    Provide ONLY the enhanced query without any explanation.
    """

    user_prompt = f"Enhance this factual query: {query}"
    
    # Generate the enhanced query using the LLM
    enhanced_query_prompt = f"{system_prompt}\n\n{user_prompt}"
    enhanced_query = call_gemini(enhanced_query_prompt, model="gemini-2.0-flash", temperature=0)
    print(f"Enhanced query: {enhanced_query}")
    
    # Create embeddings for the enhanced query
    query_embedding = create_embeddings(enhanced_query)
    
    # Perform initial similarity search to retrieve documents
    initial_results = vector_store.similarity_search(query_embedding, k=k*2)
    
    # Initialize a list to store ranked results
    ranked_results = []
    
    # Score and rank documents by relevance using LLM
    for doc in initial_results:
        relevance_score = score_document_relevance(enhanced_query, doc["text"])
        ranked_results.append({
            "text": doc["text"],
            "metadata": doc["metadata"],
            "similarity": doc["similarity"],
            "relevance_score": relevance_score
        })
    
    # Sort the results by relevance score in descending order
    ranked_results.sort(key=lambda x: x["relevance_score"], reverse=True)
    
    # Return the top k results
    return ranked_results[:k]



### 2. Analytical Strategy - Comprehensive Coverage

In [52]:
import google.generativeai as genai



def analytical_retrieval_strategy(query, vector_store, k=4):
    """
    Retrieval strategy for analytical queries focusing on comprehensive coverage.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return

    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Analytical retrieval strategy for: '{query}'")

    # Define the prompt to guide the AI in generating sub-questions
    prompt = f"""
    You are an expert at breaking down complex questions.
    Generate sub-questions that explore different aspects of the main analytical query.
    These sub-questions should cover the breadth of the topic and help retrieve
    comprehensive information.

    Return a list of exactly 3 sub-questions, one per line.

    Main query: {query}
    """

    # Create a GenerativeModel instance
    model_instance = genai.GenerativeModel("gemini-2.0-flash")

    # Generate the sub-questions using the LLM
    try:
        response = model_instance.generate_content(
            prompt,
            generation_config=genai.GenerationConfig(
                temperature=0.3,
                max_output_tokens=150 # A reasonable limit for 3 questions
            )
        )
        
        # Extract and clean the sub-questions
        sub_queries = response.text.strip().split('\n')
        sub_queries = [q.strip() for q in sub_queries if q.strip()]
        print(f"Generated sub-queries: {sub_queries}")

    except Exception as e:
        print(f"An error occurred during sub-query generation: {e}")
        sub_queries = [query] # Fallback to the original query
    
    # Retrieve documents for each sub-query
    all_results = []
    for sub_query in sub_queries:
        # Create embeddings for the sub-query
        sub_query_embedding = create_embeddings(sub_query)
        # Perform similarity search for the sub-query
        results = vector_store.similarity_search(sub_query_embedding, k=2)
        all_results.extend(results)
    
    # Ensure diversity by selecting from different sub-query results
    # Remove duplicates (same text content)
    unique_texts = set()
    diverse_results = []
    
    for result in all_results:
        if result["text"] not in unique_texts:
            unique_texts.add(result["text"])
            diverse_results.append(result)
    
    # If we need more results to reach k, add more from initial results
    if len(diverse_results) < k:
        # Direct retrieval for the main query
        main_query_embedding = create_embeddings(query)
        main_results = vector_store.similarity_search(main_query_embedding, k=k)
        
        for result in main_results:
            if result["text"] not in unique_texts and len(diverse_results) < k:
                unique_texts.add(result["text"])
                diverse_results.append(result)
    
    # Return the top k diverse results
    return diverse_results[:k]

### 3. Opinion Strategy - Diverse Perspectives

In [53]:
import google.generativeai as genai

# Assume genai.configure(api_key="YOUR_API_KEY") has been called.
# Also assume that `create_embeddings` and `SimpleVectorStore` are defined.

def opinion_retrieval_strategy(query, vector_store, k=4):
    """
    Retrieval strategy for opinion queries focusing on diverse perspectives.
    
    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return
        
    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Opinion retrieval strategy for: '{query}'")
    
    # Define the prompt to guide the AI in identifying different perspectives
    prompt = f"""
    You are an expert at identifying different perspectives on a topic.
    For the given query about opinions or viewpoints, identify different perspectives
    that people might have on this topic.

    Return a list of exactly 3 different viewpoint angles, one per line.

    Query: {query}
    """

    # Create a GenerativeModel instance
    model_instance = genai.GenerativeModel("gemini-2.0-flash")
    
    # Generate the different perspectives using the LLM
    try:
        response = model_instance.generate_content(
            prompt,
            generation_config=genai.GenerationConfig(
                temperature=0.3,
                max_output_tokens=150 # A reasonable limit for 3 short viewpoints
            )
        )
        
        # Extract and clean the viewpoints
        viewpoints = response.text.strip().split('\n')
        viewpoints = [v.strip() for v in viewpoints if v.strip()]
        print(f"Identified viewpoints: {viewpoints}")

    except Exception as e:
        print(f"An error occurred during viewpoint generation: {e}")
        # Fallback to a simple retrieval if viewpoint generation fails
        viewpoint_embedding = create_embeddings(query)
        return vector_store.similarity_search(viewpoint_embedding, k=k)
    
    # Retrieve documents representing each viewpoint
    all_results = []
    for viewpoint in viewpoints:
        # Combine the main query with the viewpoint
        combined_query = f"{query} {viewpoint}"
        # Create embeddings for the combined query
        viewpoint_embedding = create_embeddings(combined_query)
        # Perform similarity search for the combined query
        results = vector_store.similarity_search(viewpoint_embedding, k=2)
        
        # Mark results with the viewpoint they represent
        for result in results:
            result["viewpoint"] = viewpoint
        
        # Add the results to the list of all results
        all_results.extend(results)
    
    # Select a diverse range of opinions
    # Ensure we get at least one document from each viewpoint if possible
    selected_results = []
    for viewpoint in viewpoints:
        # Filter documents by viewpoint
        viewpoint_docs = [r for r in all_results if r.get("viewpoint") == viewpoint]
        if viewpoint_docs:
            selected_results.append(viewpoint_docs[0])
    
    # Fill remaining slots with highest similarity docs
    remaining_slots = k - len(selected_results)
    if remaining_slots > 0:
        # Sort remaining docs by similarity
        remaining_docs = [r for r in all_results if r not in selected_results]
        remaining_docs.sort(key=lambda x: x["similarity"], reverse=True)
        selected_results.extend(remaining_docs[:remaining_slots])
    
    # Return the top k results
    return selected_results[:k]

### 4. Contextual Strategy - User Context Integration

In [54]:
import google.generativeai as genai



def call_gemini(prompt, model="gemini-pro", temperature=0):
    """A helper function to make a call to the Gemini API."""
    model_instance = genai.GenerativeModel(model)
    response = model_instance.generate_content(
        prompt,
        generation_config=genai.GenerationConfig(temperature=temperature)
    )
    return response.text.strip()

def contextual_retrieval_strategy(query, vector_store, k=4, user_context=None):
    """
    Retrieval strategy for contextual queries integrating user context.
    
    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to return
        user_context (str): Additional user context
        
    Returns:
        List[Dict]: Retrieved documents
    """
    print(f"Executing Contextual retrieval strategy for: '{query}'")
    
    # If no user context provided, try to infer it from the query
    if not user_context:
        system_prompt = """You are an expert at understanding implied context in questions.
        For the given query, infer what contextual information might be relevant or implied
        but not explicitly stated. Focus on what background would help answering this query.

        Return a brief description of the implied context."""

        user_prompt = f"Infer the implied context in this query: {query}"
        
        inferred_context_prompt = f"{system_prompt}\n\n{user_prompt}"
        
        try:
            # Generate the inferred context using the LLM
            user_context = call_gemini(inferred_context_prompt, model="gemini-2.0-flash", temperature=0.1)
            print(f"Inferred context: {user_context}")
        except Exception as e:
            print(f"Error inferring context: {e}")
            user_context = "" # Fallback to empty context
    
    # Reformulate the query to incorporate context
    system_prompt = """You are an expert at reformulating questions with context.
    Given a query and some contextual information, create a more specific query that
    incorporates the context to get more relevant information.

    Return ONLY the reformulated query without explanation."""

    user_prompt = f"""
    Query: {query}
    Context: {user_context}

    Reformulate the query to incorporate this context:"""
    
    contextualized_query_prompt = f"{system_prompt}\n\n{user_prompt}"
    
    try:
        # Generate the contextualized query using the LLM
        contextualized_query = call_gemini(contextualized_query_prompt, model="gemini-2.0-flash", temperature=0)
        print(f"Contextualized query: {contextualized_query}")
    except Exception as e:
        print(f"Error reformulating query: {e}")
        contextualized_query = query # Fallback to original query
    
    # Retrieve documents based on the contextualized query
    query_embedding = create_embeddings(contextualized_query)
    initial_results = vector_store.similarity_search(query_embedding, k=k*2)
    
    # Rank documents considering both relevance and user context
    ranked_results = []
    
    for doc in initial_results:
        # Score document relevance considering the context
        context_relevance = score_document_context_relevance(query, user_context, doc["text"])
        ranked_results.append({
            "text": doc["text"],
            "metadata": doc["metadata"],
            "similarity": doc["similarity"],
            "context_relevance": context_relevance
        })
    
    # Sort by context relevance and return top k results
    ranked_results.sort(key=lambda x: x["context_relevance"], reverse=True)
    return ranked_results[:k]

## Helper Functions for Document Scoring

In [55]:
import google.generativeai as genai
import re

# Assume genai.configure(api_key="YOUR_API_KEY") has been called.

def score_document_relevance(query, document, model="gemini-2.0-flash"):
    """
    Score document relevance to a query using a Gemini model.

    Args:
        query (str): User query
        document (str): Document text
        model (str): LLM model. Defaults to "gemini-2.0-flash".

    Returns:
        float: Relevance score from 0-10
    """
    # System prompt to instruct the model on how to rate relevance
    system_prompt = """You are an expert at evaluating document relevance.
    Rate the relevance of a document to a query on a scale from 0 to 10, where:
    0 = Completely irrelevant
    10 = Perfectly addresses the query

    Return ONLY a numerical score between 0 and 10, nothing else.
    """

    # Truncate document if it's too long
    # Gemini models have higher context limits, but truncating is still good practice.
    doc_preview = document[:4000] + "..." if len(document) > 4000 else document

    # User prompt containing the query and document preview
    user_prompt = f"""
    Query: {query}

    Document: {doc_preview}

    Relevance score (0-10):
    """
    
    # Combine the system and user prompts into a single prompt for Gemini
    full_prompt = f"{system_prompt}\n\n{user_prompt}"

    # Create a GenerativeModel instance
    model_instance = genai.GenerativeModel(model)

    # Generate response from the model
    try:
        response = model_instance.generate_content(
            full_prompt,
            generation_config=genai.GenerationConfig(
                temperature=0.0, # Low temperature for a deterministic score
                max_output_tokens=10 # Keep output short
            )
        )
        
        # Extract the score from the model's response
        score_text = response.text.strip()
        
        # Extract numeric score using regex
        match = re.search(r'(\d+(\.\d+)?)', score_text)
        if match:
            score = float(match.group(1))
            return min(10.0, max(0.0, score))  # Ensure score is within 0-10
        else:
            # Default score if extraction fails
            return 5.0

    except Exception as e:
        print(f"An error occurred during relevance scoring: {e}")
        return 5.0 # Return a neutral score on error

In [56]:
import google.generativeai as genai
import re

# Assume genai.configure(api_key="YOUR_API_KEY") has been called.

def score_document_context_relevance(query, context, document, model="gemini-2.0-flash"):
    """
    Score document relevance considering both query and context using a Gemini model.

    Args:
        query (str): User query
        context (str): User context
        document (str): Document text
        model (str): LLM model. Defaults to "gemini-pro".

    Returns:
        float: Relevance score from 0-10
    """
    # System prompt to instruct the model on how to rate relevance considering context
    system_prompt = """You are an expert at evaluating document relevance considering context.
    Rate the document on a scale from 0 to 10 based on how well it addresses the query
    when considering the provided context, where:
    0 = Completely irrelevant
    10 = Perfectly addresses the query in the given context

    Return ONLY a numerical score between 0 and 10, nothing else.
    """

    # Truncate document if it's too long
    # Gemini models have higher context limits, but truncating is still a good practice.
    doc_preview = document[:4000] + "..." if len(document) > 4000 else document
    
    # User prompt containing the query, context, and document preview
    user_prompt = f"""
    Query: {query}
    Context: {context}

    Document: {doc_preview}

    Relevance score considering context (0-10):
    """
    
    # Combine the system and user prompts into a single prompt for Gemini
    full_prompt = f"{system_prompt}\n\n{user_prompt}"

    # Create a GenerativeModel instance
    model_instance = genai.GenerativeModel(model)

    # Generate response from the model
    try:
        response = model_instance.generate_content(
            full_prompt,
            generation_config=genai.GenerationConfig(
                temperature=0.0, # Low temperature for a deterministic score
                max_output_tokens=10 # Keep output short
            )
        )
        
        # Extract the score from the model's response
        score_text = response.text.strip()
        
        # Extract numeric score using regex
        match = re.search(r'(\d+(\.\d+)?)', score_text)
        if match:
            score = float(match.group(1))
            return min(10.0, max(0.0, score))  # Ensure score is within 0-10
        else:
            # Default score if extraction fails
            return 5.0

    except Exception as e:
        print(f"An error occurred during relevance scoring: {e}")
        return 5.0 # Return a neutral score on error

## The Core Adaptive Retriever

In [57]:


def adaptive_retrieval(query, vector_store, k=4, user_context=None):
    """
    Perform adaptive retrieval by selecting and executing the appropriate strategy.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store
        k (int): Number of documents to retrieve
        user_context (str): Optional user context for contextual queries

    Returns:
        List[Dict]: Retrieved documents
    """
    # Classify the query to determine its type
    try:
        query_type = classify_query(query)
    except Exception as e:
        print(f"Error classifying query. Falling back to Factual retrieval. Details: {e}")
        query_type = "Factual"
        
    print(f"Query classified as: {query_type}")

    # Select and execute the appropriate retrieval strategy based on the query type
    if query_type == "Factual":
        # Use the factual retrieval strategy for precise information
        results = factual_retrieval_strategy(query, vector_store, k)
    elif query_type == "Analytical":
        # Use the analytical retrieval strategy for comprehensive coverage
        results = analytical_retrieval_strategy(query, vector_store, k)
    elif query_type == "Opinion":
        # Use the opinion retrieval strategy for diverse perspectives
        results = opinion_retrieval_strategy(query, vector_store, k)
    elif query_type == "Contextual":
        # Use the contextual retrieval strategy, incorporating user context
        results = contextual_retrieval_strategy(query, vector_store, k, user_context)
    else:
        # Default to factual retrieval strategy if classification fails
        results = factual_retrieval_strategy(query, vector_store, k)
        
    return results  # Return the retrieved documents

## Response Generation

In [58]:
import google.generativeai as genai

# Assume genai.configure(api_key="YOUR_API_KEY") has been called.

def generate_response(query, results, query_type, model="gemini-2.0-flash"):
    """
    Generate a response based on query, retrieved documents, and query type.

    Args:
        query (str): User query
        results (List[Dict]): Retrieved documents
        query_type (str): Type of query
        model (str): LLM model. Defaults to "gemini-2.0-flash".

    Returns:
        str: Generated response
    """
    # Prepare context from retrieved documents by joining their texts with separators
    context = "\n\n---\n\n".join([r["text"] for r in results])

    # Create custom system prompt based on query type
    if query_type == "Factual":
        system_prompt = """You are a helpful assistant providing factual information.
        Answer the question based on the provided context. Focus on accuracy and precision.
        If the context doesn't contain the information needed, acknowledge the limitations."""

    elif query_type == "Analytical":
        system_prompt = """You are a helpful assistant providing analytical insights.
        Based on the provided context, offer a comprehensive analysis of the topic.
        Cover different aspects and perspectives in your explanation.
        If the context has gaps, acknowledge them while providing the best analysis possible."""

    elif query_type == "Opinion":
        system_prompt = """You are a helpful assistant discussing topics with multiple viewpoints.
        Based on the provided context, present different perspectives on the topic.
        Ensure fair representation of diverse opinions without showing bias.
        Acknowledge where the context presents limited viewpoints."""

    elif query_type == "Contextual":
        system_prompt = """You are a helpful assistant providing contextually relevant information.
        Answer the question considering both the query and its context.
        Make connections between the query context and the information in the provided documents.
        If the context doesn't fully address the specific situation, acknowledge the limitations."""

    else:
        system_prompt = """You are a helpful assistant. Answer the question based on the provided context. If you cannot answer from the context, acknowledge the limitations."""

    # Create a single user prompt by combining the system prompt, context, and query
    user_prompt = f"""
    {system_prompt}

    Context:
    {context}

    Question: {query}

    Please provide a helpful response based on the context.
    """

    # Initialize the Gemini GenerativeModel
    model_instance = genai.GenerativeModel(model)

    # Generate response using the Gemini API
    try:
        response = model_instance.generate_content(
            user_prompt,
            generation_config=genai.GenerationConfig(
                temperature=0.2 # Temperature for some creativity
            )
        )

        # Return the generated response content
        return response.text.strip()
        
    except Exception as e:
        return f"An error occurred while generating the response: {e}"

## Complete RAG Pipeline with Adaptive Retrieval

In [59]:
def rag_with_adaptive_retrieval(pdf_path, query, k=4, user_context=None):
    """
    Complete RAG pipeline with adaptive retrieval.
    
    Args:
        pdf_path (str): Path to PDF document
        query (str): User query
        k (int): Number of documents to retrieve
        user_context (str): Optional user context
        
    Returns:
        Dict: Results including query, retrieved documents, query type, and response
    """
    print("\n=== RAG WITH ADAPTIVE RETRIEVAL ===")
    print(f"Query: {query}")
    
    # Process the document to extract text, chunk it, and create embeddings
    chunks, vector_store = process_document(pdf_path)
    
    # Classify the query to determine its type
    query_type = classify_query(query)
    print(f"Query classified as: {query_type}")
    
    # Retrieve documents using the adaptive retrieval strategy based on the query type
    retrieved_docs = adaptive_retrieval(query, vector_store, k, user_context)
    
    # Generate a response based on the query, retrieved documents, and query type
    response = generate_response(query, retrieved_docs, query_type)
    
    # Compile the results into a dictionary
    result = {
        "query": query,
        "query_type": query_type,
        "retrieved_documents": retrieved_docs,
        "response": response
    }
    
    print("\n=== RESPONSE ===")
    print(response)
    
    return result

In [60]:
import os

def rag_with_adaptive_retrieval_folder(folder_path, query, k=4, user_context=None):
    """
    Complete RAG pipeline with adaptive retrieval for all PDFs in a folder.

    Args:
        folder_path (str): Path to folder containing PDF documents
        query (str): User query
        k (int): Number of documents/chunks to retrieve
        user_context (str): Optional user context

    Returns:
        Dict: Results including query, retrieved documents, query type, and response
    """
    print("\n=== RAG WITH ADAPTIVE RETRIEVAL ===")
    print(f"Query: {query}")

    # Process all PDFs in the folder to extract text, chunk it, and create embeddings
    chunks, vector_store = process_folder(folder_path)
    
    # Classify the query to determine its type
    query_type = classify_query(query)
    print(f"Query classified as: {query_type}")
    
    # Retrieve documents using the adaptive retrieval strategy based on the query type
    retrieved_docs = adaptive_retrieval(query, vector_store, k, user_context)
    
    # Generate a response based on the query, retrieved documents, and query type
    response = generate_response(query, retrieved_docs, query_type)
    
    # Compile the results into a dictionary
    result = {
        "query": query,
        "query_type": query_type,
        "retrieved_documents": retrieved_docs,
        "response": response
    }
    
    print("\n=== RESPONSE ===")
    print(response)
    
    return result




## Evaluation Framework

In [61]:
def evaluate_adaptive_vs_standard(pdf_path, test_queries, reference_answers=None):
    """
    Compare adaptive retrieval with standard retrieval on a set of test queries.
    
    This function processes a document, runs both standard and adaptive retrieval methods
    on each test query, and compares their performance. If reference answers are provided,
    it also evaluates the quality of responses against these references.
    
    Args:
        pdf_path (str): Path to PDF document to be processed as the knowledge source
        test_queries (List[str]): List of test queries to evaluate both retrieval methods
        reference_answers (List[str], optional): Reference answers for evaluation metrics
        
    Returns:
        Dict: Evaluation results containing individual query results and overall comparison
    """
    print("=== EVALUATING ADAPTIVE VS. STANDARD RETRIEVAL ===")
    
    # Process document to extract text, create chunks and build the vector store
    chunks, vector_store = process_document(pdf_path)
    
    # Initialize collection for storing comparison results
    results = []
    
    # Process each test query with both retrieval methods
    for i, query in enumerate(test_queries):
        print(f"\n\nQuery {i+1}: {query}")
        
        # --- Standard retrieval approach ---
        print("\n--- Standard Retrieval ---")
        # Create embedding for the query
        query_embedding = create_embeddings(query)
        # Retrieve documents using simple vector similarity
        standard_docs = vector_store.similarity_search(query_embedding, k=4)
        # Generate response using a generic approach
        standard_response = generate_response(query, standard_docs, "General")
        
        # --- Adaptive retrieval approach ---
        print("\n--- Adaptive Retrieval ---")
        # Classify the query to determine its type (Factual, Analytical, Opinion, Contextual)
        query_type = classify_query(query)
        # Retrieve documents using the strategy appropriate for this query type
        adaptive_docs = adaptive_retrieval(query, vector_store, k=4)
        # Generate a response tailored to the query type
        adaptive_response = generate_response(query, adaptive_docs, query_type)
        
        # Store complete results for this query
        result = {
            "query": query,
            "query_type": query_type,
            "standard_retrieval": {
                "documents": standard_docs,
                "response": standard_response
            },
            "adaptive_retrieval": {
                "documents": adaptive_docs,
                "response": adaptive_response
            }
        }
        
        # Add reference answer if available for this query
        if reference_answers and i < len(reference_answers):
            result["reference_answer"] = reference_answers[i]
            
        results.append(result)
        
        # Display preview of both responses for quick comparison
        print("\n--- Responses ---")
        print(f"Standard: {standard_response[:200]}...")
        print(f"Adaptive: {adaptive_response[:200]}...")
    
    # Calculate comparative metrics if reference answers are available
    if reference_answers:
        comparison = compare_responses(results)
        print("\n=== EVALUATION RESULTS ===")
        print(comparison)
    
    # Return the complete evaluation results
    return {
        "results": results,
        "comparison": comparison if reference_answers else "No reference answers provided for evaluation"
    }

In [62]:
def compare_responses(results):
    """
    Compare standard and adaptive responses against reference answers.
    
    Args:
        results (List[Dict]): Results containing both types of responses
        
    Returns:
        str: Comparison analysis
    """
    # Define the system prompt to guide the AI in comparing responses
    comparison_prompt = """You are an expert evaluator of information retrieval systems.
    Compare the standard retrieval and adaptive retrieval responses for each query.
    Consider factors like accuracy, relevance, comprehensiveness, and alignment with the reference answer.
    Provide a detailed analysis of the strengths and weaknesses of each approach."""
    
    # Initialize the comparison text with a header
    comparison_text = "# Evaluation of Standard vs. Adaptive Retrieval\n\n"
    
    # Iterate through each result to compare responses
    for i, result in enumerate(results):
        # Skip if there is no reference answer for the query
        if "reference_answer" not in result:
            continue
            
        # Add query details to the comparison text
        comparison_text += f"## Query {i+1}: {result['query']}\n"
        comparison_text += f"*Query Type: {result['query_type']}*\n\n"
        comparison_text += f"**Reference Answer:**\n{result['reference_answer']}\n\n"
        
        # Add standard retrieval response to the comparison text
        comparison_text += f"**Standard Retrieval Response:**\n{result['standard_retrieval']['response']}\n\n"
        
        # Add adaptive retrieval response to the comparison text
        comparison_text += f"**Adaptive Retrieval Response:**\n{result['adaptive_retrieval']['response']}\n\n"
        
        # Create the user prompt for the AI to compare the responses
        user_prompt = f"""
        Reference Answer: {result['reference_answer']}
        
        Standard Retrieval Response: {result['standard_retrieval']['response']}
        
        Adaptive Retrieval Response: {result['adaptive_retrieval']['response']}
        
        Provide a detailed comparison of the two responses.
        """
        
        # Generate the comparison analysis using the OpenAI client
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": comparison_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.2
        )
        
        # Add the AI's comparison analysis to the comparison text
        comparison_text += f"**Comparison Analysis:**\n{response.choices[0].message.content}\n\n"
    
    return comparison_text  # Return the complete comparison analysis

In [63]:
import google.generativeai as genai



def compare_responses(results):
    """
    Compare standard and adaptive responses against reference answers.
    
    Args:
        results (List[Dict]): Results containing both types of responses
        
    Returns:
        str: Comparison analysis
    """
    # Define the system prompt to guide the AI in comparing responses
    comparison_prompt = """You are an expert evaluator of information retrieval systems.
    Compare the standard retrieval and adaptive retrieval responses for each query.
    Consider factors like accuracy, relevance, comprehensiveness, and alignment with the reference answer.
    Provide a detailed analysis of the strengths and weaknesses of each approach."""
    
    # Initialize the comparison text with a header
    comparison_text = "# Evaluation of Standard vs. Adaptive Retrieval\n\n"
    
    # Iterate through each result to compare responses
    for i, result in enumerate(results):
        # Skip if there is no reference answer for the query
        if "reference_answer" not in result:
            continue
            
        # Add query details to the comparison text
        comparison_text += f"## Query {i+1}: {result['query']}\n"
        comparison_text += f"*Query Type: {result['query_type']}*\n\n"
        comparison_text += f"**Reference Answer:**\n{result['reference_answer']}\n\n"
        
        # Add standard retrieval response to the comparison text
        standard_response = result['standard_retrieval'].get('response', 'No response found.')
        comparison_text += f"**Standard Retrieval Response:**\n{standard_response}\n\n"
        
        # Add adaptive retrieval response to the comparison text
        adaptive_response = result['adaptive_retrieval'].get('response', 'No response found.')
        comparison_text += f"**Adaptive Retrieval Response:**\n{adaptive_response}\n\n"
        
        # Create the user prompt for the AI to compare the responses
        user_prompt = f"""
        Reference Answer: {result['reference_answer']}
        
        Standard Retrieval Response: {standard_response}
        
        Adaptive Retrieval Response: {adaptive_response}
        
        Provide a detailed comparison of the two responses.
        """
        
        # Combine system and user prompts into a single string for Gemini
        full_prompt = f"{comparison_prompt}\n\n{user_prompt}"

        # Initialize the Gemini GenerativeModel
        model_instance = genai.GenerativeModel("gemini-2.0-flash")

        # Generate the comparison analysis using the Gemini API
        try:
            response = model_instance.generate_content(
                full_prompt,
                generation_config=genai.GenerationConfig(
                    temperature=0.2 # Temperature for some creative analysis
                )
            )
            
            # Add the AI's comparison analysis to the comparison text
            comparison_text += f"**Comparison Analysis:**\n{response.text.strip()}\n\n"
        
        except Exception as e:
            comparison_text += f"**Comparison Analysis:**\nAn error occurred during analysis: {e}\n\n"
    
    return comparison_text # Return the complete comparison analysis

## Evaluating the Adaptive Retrieval System (Customized Queries)

The final step to use the adaptive RAG evaluation system is to call the evaluate_adaptive_vs_standard() function with your PDF document and test queries:

In [64]:
# Path to your knowledge source folder (with all your PDF files)
folder_path = "/Users/kekunkoya/Desktop/RAG Google 2/PDFs"  # Update as needed

# Define test queries covering different query types
test_queries = [
    "What does the Red Cross and PEMA shelter guide say about bringing pets to emergency shelters in Harrisburg?"                                           
]

reference_answers = [
   "The Red Cross and PEMA shelter guide indicates that select shelters, including those managed by the Red Cross in Harrisburg, permit pets in designated areas. Service animals are always allowed. PA 211 and shelter staff can provide details on which shelters are pet-friendly during each event." 
]

# Example usage of your RAG function for PDFs in a folder:
for query, ref_answer in zip(test_queries, reference_answers):
    print(f"\n--- Running query: {query} ---")
    result = rag_with_adaptive_retrieval_folder(folder_path, query)
    print("\nGenerated Response:")
    print(result["response"])
    print("\nReference Answer:")
    print(ref_answer)



--- Running query: What does the Red Cross and PEMA shelter guide say about bringing pets to emergency shelters in Harrisburg? ---

=== RAG WITH ADAPTIVE RETRIEVAL ===
Query: What does the Red Cross and PEMA shelter guide say about bringing pets to emergency shelters in Harrisburg?

Extracting text from PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/PA 211 Disaster Community Resources.pdf
Chunking text...
Created 13 text chunks for PA 211 Disaster Community Resources.pdf
Creating embeddings for chunks...
Added 13 chunks from PA 211 Disaster Community Resources.pdf to the vector store.

Extracting text from PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/211 RESPONDS TO URGENT NEEDS.pdf
Chunking text...
Created 7 text chunks for 211 RESPONDS TO URGENT NEEDS.pdf
Creating embeddings for chunks...
Added 7 chunks from 211 RESPONDS TO URGENT NEEDS.pdf to the vector store.

Extracting text from PDF: /Users/kekunkoya/Desktop/RAG Google 2/PDFs/PEMA.pdf
Chunking text...
Created 69 text chunks 

ResourceExhausted: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.0-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 200
}
, links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, retry_delay {
  seconds: 11
}
]

In [27]:
def evaluate_adaptive_vs_standard(folder_path, test_queries, reference_answers):
    # Process ALL PDFs in the folder, not just one!
    all_chunks = []
    vector_store = SimpleVectorStore()
    import os

    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.lower().endswith('.pdf'):
                pdf_path = os.path.join(root, file)
                print(f"Processing: {pdf_path}")
                # Extract, chunk, embed
                extracted_text = extract_text_from_pdf(pdf_path)
                chunks = chunk_text(extracted_text, 1000, 200)
                chunk_embeddings = [create_embeddings(chunk) for chunk in chunks]
                for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
                    vector_store.add_item(chunk, embedding, metadata={"index": i, "source": pdf_path})
                all_chunks.extend(chunks)
    # Now proceed as before, using the combined vector_store

    all_results = []
    for i, query in enumerate(test_queries):
        print(f"\n===== Evaluating Query: '{query}' =====")
        query_type = classify_query(query)
        
        # --- Standard Retrieval ---
        print("\n--- Running Standard Retrieval ---")
        standard_embedding = create_embeddings(query)
        standard_results = vector_store.similarity_search(standard_embedding, k=4)
        standard_response = generate_response(query, standard_results, query_type)
        
        # --- Adaptive Retrieval ---
        print("\n--- Running Adaptive Retrieval ---")
        adaptive_results = adaptive_retrieval(query, vector_store, k=4)
        adaptive_response = generate_response(query, adaptive_results, query_type)
        
        # --- Store Results ---
        result_entry = {
            "query": query,
            "query_type": query_type,
            "reference_answer": reference_answers[i] if i < len(reference_answers) else None,
            "standard_retrieval": {
                "results": standard_results,
                "response": standard_response
            },
            "adaptive_retrieval": {
                "results": adaptive_results,
                "response": adaptive_response
            }
        }
        all_results.append(result_entry)
        print("------------------------------------------")
        
    comparison = compare_responses(all_results)
    
    return {
        "individual_results": all_results,
        "comparison": comparison
    }


In [31]:
import time
import re
import google.api_core.exceptions

# --- Rate-limit aware wrapper ---
def call_gemini(prompt, model="gemini-2.0-flash", temperature=0, max_retries=6, base_sleep=8):
    """
    Generate with retry/backoff when quota/rate limits are hit.
    Tries to honor retry_delay from the error when present.
    """
    model_instance = genai.GenerativeModel(model)
    for attempt in range(max_retries):
        try:
            resp = model_instance.generate_content(
                prompt,
                generation_config=genai.GenerationConfig(temperature=temperature)
            )
            return (resp.text or "").strip()
        except google.api_core.exceptions.ResourceExhausted as e:
            # Try to parse a suggested retry seconds from the message
            m = re.search(r"retry_delay\s*{\s*seconds:\s*(\d+)", str(e))
            wait_s = int(m.group(1)) if m else base_sleep * (attempt + 1)
            print(f"[Rate limit] Waiting {wait_s}s (attempt {attempt+1}/{max_retries})...")
            time.sleep(wait_s)
        except Exception as e:
            # Non-quota error: surface and stop retrying
            print(f"[Gemini error] {e}")
            return "ERROR: generation failed."
    return "ERROR: gave up after retries due to quota limits."

def generate_response(query, results, query_type, model="gemini-2.0-flash"):
    context = "\n\n---\n\n".join([r["text"] for r in results]) if results else ""
    system_prompt = (
        f"You are a helpful assistant. Answer the user's question based on the provided context. "
        f"The query type is {query_type}. If you cannot answer, say so."
    )
    user_prompt = f"Context:\n{context}\n\nQuestion: {query}"
    return call_gemini(f"{system_prompt}\n{user_prompt}", model=model, temperature=0.2)

# --- Evaluation Pipeline using an EXISTING vector store (no PDF processing) ---
def evaluate_adaptive_vs_standard(vector_store, test_queries, reference_answers, k=4, pause_seconds=6):
    """
    Evaluate standard vs adaptive retrieval using a prebuilt vector_store.
    Adds pauses between API calls to avoid hitting per-minute quotas.
    """
    all_results = []
    print("Using existing vector store (no PDF processing).")

    for i, query in enumerate(test_queries):
        print(f"\n===== Evaluating Query: '{query}' =====")
        query_type = classify_query(query) or "Factual"
        time.sleep(pause_seconds)  # small pause between calls

        # --- Standard Retrieval ---
        print("\n--- Running Standard Retrieval ---")
        standard_embedding = create_embeddings(query)  # 1 embed call
        standard_results = vector_store.similarity_search(standard_embedding, k=k)
        standard_response = generate_response(query, standard_results, query_type)  # 1 generate call
        time.sleep(pause_seconds)

        # --- Adaptive Retrieval ---
        print("\n--- Running Adaptive Retrieval ---")
        adaptive_results = adaptive_retrieval(query, vector_store, k=k)  # internally uses a few generates/embeds
        adaptive_response = generate_response(query, adaptive_results, query_type)  # 1 generate call
        time.sleep(pause_seconds)

        # --- Store Results ---
        result_entry = {
            "query": query,
            "query_type": query_type,
            "reference_answer": reference_answers[i] if i < len(reference_answers) else None,
            "standard_retrieval": {
                "results": standard_results,
                "response": standard_response
            },
            "adaptive_retrieval": {
                "results": adaptive_results,
                "response": adaptive_response
            }
        }
        all_results.append(result_entry)
        print("------------------------------------------")

    comparison = compare_responses(all_results)  # this will call Gemini a few times; keep the pauses inside compare if needed
    return {
        "individual_results": all_results,
        "comparison": comparison
    }


In [35]:
# ====== vector_store_loader.py ======
import os
import pickle
from typing import Optional

# Optional: only import LangChain bits if present
def _lazy_import_faiss():
    from langchain_community.vectorstores import FAISS
    from langchain_core.embeddings import Embeddings
    return FAISS, Embeddings

def _lazy_import_chroma():
    from langchain_community.vectorstores import Chroma
    return Chroma

# ---------- SimpleVectorStore (your custom one) ----------
class SimpleVectorStore:
    def __init__(self):
        import numpy as np
        self.np = np
        self.vectors = []
        self.texts = []
        self.metadata = []

    def add_item(self, text, embedding, metadata=None):
        self.vectors.append(self.np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})

    def similarity_search(self, query_embedding, k=5, filter_func=None):
        if not self.vectors:
            return []
        query_vector = self.np.array(query_embedding)
        similarities = []
        for i, vector in enumerate(self.vectors):
            if filter_func and not filter_func(self.metadata[i]):
                continue
            nq = self.np.linalg.norm(query_vector)
            nv = self.np.linalg.norm(vector)
            score = 0.0 if nq == 0 or nv == 0 else float(self.np.dot(query_vector, vector) / (nq * nv))
            similarities.append((i, score))
        similarities.sort(key=lambda x: x[1], reverse=True)
        out = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            out.append({"text": self.texts[idx], "metadata": self.metadata[idx], "similarity": score})
        return out

# ---------- Adapters so your code can call .similarity_search(...) uniformly ----------
class VectorStoreAdapter:
    """Uniform interface: .similarity_search(query_embedding, k) -> List[dict(text, metadata, similarity)]."""
    def __init__(self, kind: str, store, name: Optional[str] = None):
        self.kind = kind
        self.store = store
        self.name = name or kind

    def similarity_search(self, query_embedding, k=4):
        if self.kind == "simple":
            return self.store.similarity_search(query_embedding, k=k)

        if self.kind in ("faiss", "chroma"):
            # LangChain vectorstores return Documents; use by-vector search to avoid re-embedding
            docs = self.store.similarity_search_by_vector(query_embedding, k=k)
            # Some implementations expose scores via similarity_search_with_score_by_vector
            # Try to get scores if available; fall back to None
            try:
                docs_scores = self.store.similarity_search_with_score_by_vector(query_embedding, k=k)
                score_map = {id(d): s for d, s in docs_scores}
            except Exception:
                score_map = {}

            results = []
            for d in docs:
                # LC Document: .page_content and .metadata
                sim = score_map.get(id(d))
                # If scores are distances, you might want to invert/normalize — leaving as-is here.
                results.append({
                    "text": getattr(d, "page_content", ""),
                    "metadata": getattr(d, "metadata", {}),
                    "similarity": sim if sim is not None else 0.0
                })
            return results

        raise ValueError(f"Unknown adapter kind: {self.kind}")

# ---------- Loaders ----------
def load_simple_pickle(pkl_path: str) -> VectorStoreAdapter:
    with open(pkl_path, "rb") as f:
        store = pickle.load(f)
    # If it’s raw dicts, you could reconstruct a SimpleVectorStore here.
    # We assume it’s your SimpleVectorStore instance.
    return VectorStoreAdapter("simple", store, name=os.path.basename(pkl_path))

def load_faiss(dir_path: str, embeddings=None) -> VectorStoreAdapter:
    FAISS, Embeddings = _lazy_import_faiss()

    # A tiny dummy embeddings class so FAISS.load_local won’t complain if you only do by-vector search.
    class _DummyEmbeddings(Embeddings):
        def embed_documents(self, texts): raise NotImplementedError("Not used")
        def embed_query(self, text): raise NotImplementedError("Not used")

    if embeddings is None:
        embeddings = _DummyEmbeddings()

    store = FAISS.load_local(dir_path, embeddings, allow_dangerous_deserialization=True)
    return VectorStoreAdapter("faiss", store, name=os.path.basename(dir_path))

def load_chroma(dir_path: str) -> VectorStoreAdapter:
    Chroma = _lazy_import_chroma()
    store = Chroma(persist_directory=dir_path)  # embedding_function not needed for by-vector calls
    return VectorStoreAdapter("chroma", store, name=os.path.basename(dir_path))

def detect_backend(path: str) -> str:
    if os.path.isfile(path) and path.endswith(".pkl"):
        return "simple"
    if os.path.isdir(path):
        # FAISS typical files
        faiss_files = {"index.faiss", "index.pkl"}
        if faiss_files.issubset(set(os.listdir(path))):
            return "faiss"
        # Chroma typical files
        chroma_markers = {"chroma-collections.parquet", "chroma-embeddings.parquet", "chroma.sqlite3"}
        if any(fname in os.listdir(path) for fname in chroma_markers):
            return "chroma"
    raise FileNotFoundError(
        "Cannot detect vector store backend at path. "
        "Expected a .pkl file (SimpleVectorStore) or a FAISS/Chroma directory."
    )

def load_vector_store(path: str, backend: Optional[str] = None, embeddings=None) -> VectorStoreAdapter:
    """
    Auto-detect and load a vector store (SimpleVectorStore .pkl, FAISS dir, or Chroma dir).
    For FAISS, you can pass an 'embeddings' instance used at build time; otherwise a dummy is used.
    """
    kind = backend or detect_backend(path)
    if kind == "simple":
        return load_simple_pickle(path)
    if kind == "faiss":
        return load_faiss(path, embeddings=embeddings)
    if kind == "chroma":
        return load_chroma(path)
    raise ValueError(f"Unsupported backend: {kind}")

# ---------- Optional helpers to save/load SimpleVectorStore ----------
def save_simple_vector_store(store: SimpleVectorStore, pkl_path: str):
    with open(pkl_path, "wb") as f:
        pickle.dump(store, f)


In [38]:
# === Inline vector-store loader & adapter ===
import os, pickle
from typing import Optional

# --- Your SimpleVectorStore shape (only needed if you saved a pickle of it) ---
class SimpleVectorStore:
    def __init__(self):
        import numpy as np
        self.np = np
        self.vectors = []
        self.texts = []
        self.metadata = []

    def add_item(self, text, embedding, metadata=None):
        self.vectors.append(self.np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})

    def similarity_search(self, query_embedding, k=5, filter_func=None):
        if not self.vectors:
            return []
        q = self.np.array(query_embedding)
        nq = self.np.linalg.norm(q)
        sims = []
        for i, v in enumerate(self.vectors):
            nv = self.np.linalg.norm(v)
            score = 0.0 if nq == 0 or nv == 0 else float(self.np.dot(q, v) / (nq * nv))
            if (not filter_func) or filter_func(self.metadata[i]):
                sims.append((i, score))
        sims.sort(key=lambda x: x[1], reverse=True)
        out = []
        for i in range(min(k, len(sims))):
            idx, score = sims[i]
            out.append({"text": self.texts[idx], "metadata": self.metadata[idx], "similarity": score})
        return out

# --- Uniform adapter so your evaluator can call .similarity_search(query_embedding, k) ---
class VectorStoreAdapter:
    def __init__(self, kind: str, store, name: Optional[str] = None):
        self.kind = kind
        self.store = store
        self.name = name or kind

    def similarity_search(self, query_embedding, k=4):
        if self.kind == "simple":
            return self.store.similarity_search(query_embedding, k=k)

        if self.kind in ("faiss", "chroma"):
            # by-vector search (no re-embedding)
            try:
                docs_scores = self.store.similarity_search_with_score_by_vector(query_embedding, k=k)
                results = []
                for d, s in docs_scores:
                    results.append({
                        "text": getattr(d, "page_content", ""),
                        "metadata": getattr(d, "metadata", {}),
                        "similarity": float(s) if s is not None else 0.0
                    })
                return results
            except Exception:
                docs = self.store.similarity_search_by_vector(query_embedding, k=k)
                return [{
                    "text": getattr(d, "page_content", ""),
                    "metadata": getattr(d, "metadata", {}),
                    "similarity": 0.0
                } for d in docs]

        raise ValueError(f"Unknown adapter kind: {self.kind}")

# --- Backends ---
def _lazy_import_faiss():
    from langchain_community.vectorstores import FAISS
    from langchain_core.embeddings import Embeddings
    return FAISS, Embeddings

def _lazy_import_chroma():
    from langchain_community.vectorstores import Chroma
    return Chroma

def load_simple_pickle(pkl_path: str) -> VectorStoreAdapter:
    with open(pkl_path, "rb") as f:
        store = pickle.load(f)
    # If you saved raw dicts, reconstruct here; assuming you pickled the object.
    return VectorStoreAdapter("simple", store, name=os.path.basename(pkl_path))

def load_faiss(dir_path: str, embeddings=None) -> VectorStoreAdapter:
    FAISS, Embeddings = _lazy_import_faiss()
    class _DummyEmb(Embeddings):
        def embed_documents(self, texts): raise NotImplementedError
        def embed_query(self, text): raise NotImplementedError
    if embeddings is None:
        embeddings = _DummyEmb()
    store = FAISS.load_local(dir_path, embeddings, allow_dangerous_deserialization=True)
    return VectorStoreAdapter("faiss", store, name=os.path.basename(dir_path))

def load_chroma(dir_path: str) -> VectorStoreAdapter:
    Chroma = _lazy_import_chroma()
    store = Chroma(persist_directory=dir_path)
    return VectorStoreAdapter("chroma", store, name=os.path.basename(dir_path))

def detect_backend(path: str) -> str:
    if os.path.isfile(path) and path.endswith(".pkl"):
        return "simple"
    if os.path.isdir(path):
        names = set(os.listdir(path))
        if {"index.faiss", "index.pkl"}.issubset(names):
            return "faiss"
        if any(n in names for n in {"chroma.sqlite3", "chroma-collections.parquet", "chroma-embeddings.parquet"}):
            return "chroma"
    raise FileNotFoundError("Could not detect vector store backend at given path.")

def load_vector_store(path: str, backend: Optional[str] = None, embeddings=None) -> VectorStoreAdapter:
    kind = backend or detect_backend(path)
    if kind == "simple":
        return load_simple_pickle(path)
    if kind == "faiss":
        return load_faiss(path, embeddings=embeddings)
    if kind == "chroma":
        return load_chroma(path)
    raise ValueError(f"Unsupported backend: {kind}")
