# Relevant Segment Extraction (RSE) for Enhanced RAG

In this notebook, we implement a Relevant Segment Extraction (RSE) technique to improve the context quality in our RAG system. Rather than simply retrieving a collection of isolated chunks, we identify and reconstruct continuous segments of text that provide better context to our language model.

## Key Concept

Relevant chunks tend to be clustered together within documents. By identifying these clusters and preserving their continuity, we provide more coherent context for the LLM to work with.

## Setting Up the Environment
We begin by importing necessary libraries.

In [1]:
import fitz # PyMuPDF
import os
import numpy as np
import json
import re
import google.generativeai as genai

In [2]:

import fitz
import os
import google.generativeai as genai
from dotenv import load_dotenv


## Extracting Text from a PDF File
To implement RAG, we first need a source of textual data. In this case, we extract text from a PDF file using the PyMuPDF library.

In [4]:
import fitz
from typing import List, Dict

def extract_text_from_pdf(pdf_path: str) -> str:
    """
    Extracts text from a PDF file using PyMuPDF (fitz).

    Args:
        pdf_path (str): Path to the PDF file.

    Returns:
        str: Extracted text from the PDF, or an empty string if an error occurs.
    """
    all_text = ""
    try:
        # Use a context manager to automatically close the document
        with fitz.open(pdf_path) as mypdf:
            # Iterate through each page to extract text
            for page in mypdf:
                all_text += page.get_text("text") + " "
    except Exception as e:
        print(f"Error reading PDF file: {e}")
        return ""
    
    return all_text

## Chunking the Extracted Text
Once we have the extracted text, we divide it into smaller, overlapping chunks to improve retrieval accuracy.

In [5]:
def chunk_text(text, chunk_size=800, overlap=0):
    """
    Split text into non-overlapping chunks.
    For RSE, we typically want non-overlapping chunks so we can reconstruct segments properly.
    
    Args:
        text (str): Input text to chunk
        chunk_size (int): Size of each chunk in characters
        overlap (int): Overlap between chunks in characters
        
    Returns:
        List[str]: List of text chunks
    """
    chunks = []
    
    # Simple character-based chunking
    for i in range(0, len(text), chunk_size - overlap):
        chunk = text[i:i + chunk_size]
        if chunk:  # Ensure we don't add empty chunks
            chunks.append(chunk)
    
    return chunks

## Building a Simple Vector Store
let's implement a simple vector store.

In [6]:
class SimpleVectorStore:
    """
    A lightweight vector store implementation using NumPy.
    """
    def __init__(self, dimension=1536):
        """
        Initialize the vector store.
        
        Args:
            dimension (int): Dimension of embeddings
        """
        self.dimension = dimension
        self.vectors = []
        self.documents = []
        self.metadata = []
    
    def add_documents(self, documents, vectors=None, metadata=None):
        """
        Add documents to the vector store.
        
        Args:
            documents (List[str]): List of document chunks
            vectors (List[List[float]], optional): List of embedding vectors
            metadata (List[Dict], optional): List of metadata dictionaries
        """
        if vectors is None:
            vectors = [None] * len(documents)
        
        if metadata is None:
            metadata = [{} for _ in range(len(documents))]
        
        for doc, vec, meta in zip(documents, vectors, metadata):
            self.documents.append(doc)
            self.vectors.append(vec)
            self.metadata.append(meta)
    
    def search(self, query_vector, top_k=5):
        """
        Search for most similar documents.
        
        Args:
            query_vector (List[float]): Query embedding vector
            top_k (int): Number of results to return
            
        Returns:
            List[Dict]: List of results with documents, scores, and metadata
        """
        if not self.vectors or not self.documents:
            return []
        
        # Convert query vector to numpy array
        query_array = np.array(query_vector)
        
        # Calculate similarities
        similarities = []
        for i, vector in enumerate(self.vectors):
            if vector is not None:
                # Compute cosine similarity
                similarity = np.dot(query_array, vector) / (
                    np.linalg.norm(query_array) * np.linalg.norm(vector)
                )
                similarities.append((i, similarity))
        
        # Sort by similarity (descending)
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Get top-k results
        results = []
        for i, score in similarities[:top_k]:
            results.append({
                "document": self.documents[i],
                "score": float(score),
                "metadata": self.metadata[i]
            })
        
        return results

## Creating Embeddings for Text Chunks
Embeddings transform text into numerical vectors, which allow for efficient similarity search.

In [7]:
import os
import google.generativeai as genai
from typing import List, Any

# --- 1. Gemini API Configuration ---
# Your GOOGLE_API_KEY should be set in your environment
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Define the create_embeddings function for Gemini ---
def create_embeddings(texts: List[str], model: str = "models/embedding-001") -> Any:
    """
    Generate embeddings for a list of texts using the Gemini API.
    
    Args:
        texts (List[str]): List of texts to embed
        model (str): Embedding model to use
        
    Returns:
        List[List[float]]: List of embedding vectors
    """
    if not texts:
        return []
        
    try:
        # The Gemini API's `embed_content` can handle a list of texts directly,
        # so explicit batching loops are not required for most use cases.
        response = genai.embed_content(
            model=model,
            content=texts
        )
        
        # The embedding list is directly under the 'embedding' key
        return response['embedding']
    except Exception as e:
        print(f"An error occurred during embedding: {e}")
        return []

# --- 3. Main Logic (Re-implemented for a runnable example) ---
if __name__ == "__main__":
    # Simulate a list of text chunks
    text_chunks = [
        "Homelessness is a complex social problem.",
        "A lack of affordable housing is a key contributing factor.",
        "Social factors like family breakdown can also lead to homelessness."
    ]

    print("Creating embeddings with Gemini...")
    # Create embeddings for the text chunks
    embeddings = create_embeddings(text_chunks)

    if embeddings:
        print("\nEmbeddings created successfully.")
        print(f"Number of embeddings: {len(embeddings)}")
        print(f"Embedding dimensions: {len(embeddings[0])}")
    else:
        print("\nFailed to create embeddings.")

Creating embeddings with Gemini...

Embeddings created successfully.
Number of embeddings: 3
Embedding dimensions: 768


## Processing Documents with RSE
Now let's implement the core RSE functionality.

In [8]:
# --- Imports ---
import os
from typing import List, Tuple, Dict, Any
import numpy as np
import google.generativeai as genai
import fitz
from tqdm import tqdm

# --- 1. Gemini API Configuration ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Helper Functions (Gemini-compatible) ---
def extract_text_from_pdf(pdf_path: str) -> str:
    """Extracts text from a PDF file."""
    all_text = []
    try:
        with fitz.open(pdf_path) as doc:
            for page in doc:
                all_text.append(page.get_text("text"))
    except Exception as e:
        print(f"Error reading PDF: {e}")
        return ""
    return "\n".join(all_text)

def chunk_text(text: str, chunk_size: int, overlap: int) -> List[str]:
    """Chunks the given text into segments."""
    chunks = []
    step = chunk_size - overlap
    for i in range(0, len(text), step):
        chunks.append(text[i:i + chunk_size])
    return chunks

def create_embeddings(texts: List[str], model: str = "models/embedding-001") -> Any:
    """Creates embeddings for a list of texts using the Gemini API."""
    try:
        if not texts:
            return []
        # Gemini's embed_content handles batching, no need for a manual loop
        response = genai.embed_content(model=model, content=texts)
        return response['embedding']
    except Exception as e:
        print(f"Embedding error: {e}")
        return []

class SimpleVectorStore:
    """A simple vector store implementation using NumPy."""
    def __init__(self):
        self.vectors = []
        self.documents = []
        self.metadata = []

    def add_documents(self, documents: List[str], vectors: List[Any], metadata: List[Dict]):
        for doc, vec, meta in zip(documents, vectors, metadata):
            self.documents.append(doc)
            self.vectors.append(np.array(vec, dtype=np.float32))
            self.metadata.append(meta)

# --- 3. The main processing function (revised for Gemini) ---
def process_document(pdf_path: str, chunk_size: int = 800) -> Tuple[List[str], SimpleVectorStore, Dict]:
    """
    Process a document for use with RSE.
    
    Args:
        pdf_path (str): Path to the PDF document
        chunk_size (int): Size of each chunk in characters
        
    Returns:
        Tuple[List[str], SimpleVectorStore, Dict]: Chunks, vector store, and document info
    """
    print("Extracting text from document...")
    text = extract_text_from_pdf(pdf_path)
    
    print("Chunking text into non-overlapping segments...")
    chunks = chunk_text(text, chunk_size=chunk_size, overlap=0)
    print(f"Created {len(chunks)} chunks")
    
    print("Generating embeddings for chunks...")
    # Use the Gemini-compatible embedding function
    chunk_embeddings = create_embeddings(chunks)
    
    if not chunk_embeddings or len(chunks) != len(chunk_embeddings):
        raise RuntimeError("Failed to create embeddings or embedding count does not match chunk count.")

    vector_store = SimpleVectorStore()
    
    metadata = [{"chunk_index": i, "source": pdf_path} for i in range(len(chunks))]
    vector_store.add_documents(chunks, chunk_embeddings, metadata)
    
    doc_info = {
        "chunks": chunks,
        "source": pdf_path,
    }
    
    return chunks, vector_store, doc_info

# --- 4. Main Logic for a runnable example ---
if __name__ == "__main__":
    pdf_file_path = '/Users/kekunkoya/Desktop/770 Google /Homelessness.pdf'
    
    if not os.path.exists(pdf_file_path):
        print(f"Error: PDF file not found at '{pdf_file_path}'")
        exit()

    try:
        chunks, store, doc_info = process_document(pdf_file_path, chunk_size=800)
        print(f"\nDocument '{doc_info['source']}' processed successfully.")
        print(f"Vector store contains {len(store.documents)} items.")
    except Exception as e:
        print(f"An error occurred during document processing: {e}")

Extracting text from document...
Chunking text into non-overlapping segments...
Created 65 chunks
Generating embeddings for chunks...

Document '/Users/kekunkoya/Desktop/770 Google /Homelessness.pdf' processed successfully.
Vector store contains 65 items.


## RSE Core Algorithm: Computing Chunk Values and Finding Best Segments
Now that we have the necessary functions to process a document and generate embeddings for its chunks, we can implement the core algorithm for RSE. 

In [9]:
import os
import google.generativeai as genai
import numpy as np
from typing import List, Dict, Any

# --- 1. Gemini API Configuration ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Helper Functions (Assumed to be defined and configured for Gemini) ---
def create_embeddings(texts: str or List[str], model: str = "models/embedding-001") -> Any:
    """
    Creates embeddings for the given text(s) using the Gemini API.
    """
    try:
        response = genai.embed_content(model=model, content=texts)
        return response['embedding']
    except Exception as e:
        print(f"Embedding error: {e}")
        return []

class SimpleVectorStore:
    """A placeholder for your vector store class."""
    def __init__(self):
        self.vectors = []
        self.texts = []
        self.metadata = []
    
    def add_item(self, text, embedding, metadata=None):
        self.vectors.append(np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})
    
    def search(self, query_embedding, top_k=5):
        if not self.vectors: return []
        
        query_vector = np.array(query_embedding)
        similarities = []
        for i, vector in enumerate(self.vectors):
            if np.linalg.norm(query_vector) == 0 or np.linalg.norm(vector) == 0:
                similarity = 0
            else:
                similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append({"score": similarity, "metadata": self.metadata[i]})
        
        similarities.sort(key=lambda x: x["score"], reverse=True)
        return similarities[:top_k]

# --- 3. The main function (revised for Gemini) ---
def calculate_chunk_values(query: str, chunks: List[str], vector_store: SimpleVectorStore, irrelevant_chunk_penalty: float = 0.2) -> List[float]:
    """
    Calculate chunk values by combining relevance and position.
    
    Args:
        query (str): Query text
        chunks (List[str]): List of document chunks
        vector_store (SimpleVectorStore): Vector store containing the chunks
        irrelevant_chunk_penalty (float): Penalty for irrelevant chunks
        
    Returns:
        List[float]: List of chunk values
    """
    # Create query embedding using the Gemini-compatible function
    query_embedding_list = create_embeddings(query)
    if not query_embedding_list:
        return [0.0] * len(chunks)
    query_embedding = query_embedding_list[0]
    
    # Get all chunks with similarity scores
    num_chunks = len(chunks)
    results = vector_store.search(query_embedding, top_k=num_chunks)
    
    # Create a mapping of chunk_index to relevance score
    relevance_scores = {result["metadata"]["index"]: result["score"] for result in results}
    
    # Calculate chunk values (relevance score minus penalty)
    chunk_values = []
    for i in range(num_chunks):
        score = relevance_scores.get(i, 0.0)
        value = score - irrelevant_chunk_penalty
        chunk_values.append(value)
    
    return chunk_values

# --- 4. Main Logic for a runnable example ---
if __name__ == "__main__":
    # Simulate a vector store with chunks and embeddings
    store = SimpleVectorStore()
    chunks = [
        "Homelessness is a complex social problem. It is a state of not having a place to live.",
        "A lack of affordable housing is a key contributing factor. This is a big problem in many cities.",
        "The sun is the center of our solar system. The earth orbits the sun." # Irrelevant chunk
    ]
    for i, chunk in enumerate(chunks):
        embedding = create_embeddings(chunk)
        if embedding:
            store.add_item(chunk, embedding[0], {"index": i})
    
    query = "What causes homelessness?"
    
    print("Calculating chunk values for the query...")
    values = calculate_chunk_values(query, chunks, store)
    
    print("\nCalculated Chunk Values:")
    for i, value in enumerate(values):
        print(f"Chunk {i} ('{chunks[i][:30]}...') Value: {value:.4f}")

Calculating chunk values for the query...

Calculated Chunk Values:
Chunk 0 ('Homelessness is a complex soci...') Value: 0.8000
Chunk 1 ('A lack of affordable housing i...') Value: 0.8000
Chunk 2 ('The sun is the center of our s...') Value: -1.2000


In [10]:
def find_best_segments(chunk_values, max_segment_length=20, total_max_length=30, min_segment_value=0.2):
    """
    Find the best segments using a variant of the maximum sum subarray algorithm.
    
    Args:
        chunk_values (List[float]): Values for each chunk
        max_segment_length (int): Maximum length of a single segment
        total_max_length (int): Maximum total length across all segments
        min_segment_value (float): Minimum value for a segment to be considered
        
    Returns:
        List[Tuple[int, int]]: List of (start, end) indices for best segments
    """
    print("Finding optimal continuous text segments...")
    
    best_segments = []
    segment_scores = []
    total_included_chunks = 0
    
    # Keep finding segments until we hit our limits
    while total_included_chunks < total_max_length:
        best_score = min_segment_value  # Minimum threshold for a segment
        best_segment = None
        
        # Try each possible starting position
        for start in range(len(chunk_values)):
            # Skip if this start position is already in a selected segment
            if any(start >= s[0] and start < s[1] for s in best_segments):
                continue
                
            # Try each possible segment length
            for length in range(1, min(max_segment_length, len(chunk_values) - start) + 1):
                end = start + length
                
                # Skip if end position is already in a selected segment
                if any(end > s[0] and end <= s[1] for s in best_segments):
                    continue
                
                # Calculate segment value as sum of chunk values
                segment_value = sum(chunk_values[start:end])
                
                # Update best segment if this one is better
                if segment_value > best_score:
                    best_score = segment_value
                    best_segment = (start, end)
        
        # If we found a good segment, add it
        if best_segment:
            best_segments.append(best_segment)
            segment_scores.append(best_score)
            total_included_chunks += best_segment[1] - best_segment[0]
            print(f"Found segment {best_segment} with score {best_score:.4f}")
        else:
            # No more good segments to find
            break
    
    # Sort segments by their starting position for readability
    best_segments = sorted(best_segments, key=lambda x: x[0])
    
    return best_segments, segment_scores

## Reconstructing and Using Segments for RAG

In [11]:
def reconstruct_segments(chunks, best_segments):
    """
    Reconstruct text segments based on chunk indices.
    
    Args:
        chunks (List[str]): List of all document chunks
        best_segments (List[Tuple[int, int]]): List of (start, end) indices for segments
        
    Returns:
        List[str]: List of reconstructed text segments
    """
    reconstructed_segments = []  # Initialize an empty list to store the reconstructed segments
    
    for start, end in best_segments:
        # Join the chunks in this segment to form the complete segment text
        segment_text = " ".join(chunks[start:end])
        # Append the segment text and its range to the reconstructed_segments list
        reconstructed_segments.append({
            "text": segment_text,
            "segment_range": (start, end),
        })
    
    return reconstructed_segments  # Return the list of reconstructed text segments

In [12]:
def format_segments_for_context(segments):
    """
    Format segments into a context string for the LLM.
    
    Args:
        segments (List[Dict]): List of segment dictionaries
        
    Returns:
        str: Formatted context text
    """
    context = []  # Initialize an empty list to store the formatted context
    
    for i, segment in enumerate(segments):
        # Create a header for each segment with its index and chunk range
        segment_header = f"SEGMENT {i+1} (Chunks {segment['segment_range'][0]}-{segment['segment_range'][1]-1}):"
        context.append(segment_header)  # Add the segment header to the context list
        context.append(segment['text'])  # Add the segment text to the context list
        context.append("-" * 80)  # Add a separator line for readability
    
    # Join all elements in the context list with double newlines and return the result
    return "\n\n".join(context)

## Generating Responses with RSE Context

In [13]:
import os
import google.generativeai as genai
from typing import List

# --- 1. Gemini API Configuration ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Define the response generator for Gemini ---
def generate_response(query: str, context: str, model: str = "gemini-1.5-flash") -> str:
    """
    Generate a response based on the query and context using Gemini.

    Args:
        query (str): User query
        context (str): Context text from relevant segments
        model (str): LLM model to use

    Returns:
        str: Generated response
    """
    print("Generating response using relevant segments as context...")
    
    # Define the system prompt to guide the AI's behavior
    system_prompt = """You are a helpful assistant that answers questions based on the provided context.
The context consists of document segments that have been retrieved as relevant to the user's query.
Use the information from these segments to provide a comprehensive and accurate answer.
If the context doesn't contain relevant information to answer the question, say so clearly."""
    
    # Create the user prompt by combining the context and the query
    user_prompt = f"""
Context:
{context}

Question: {query}

Please provide a helpful answer based on the context provided.
"""
    
    try:
        # Pass the system prompt to the GenerativeModel's system_instruction parameter
        gemini_model = genai.GenerativeModel(model, system_instruction=system_prompt)
        
        # Generate the response using the specified model
        response = gemini_model.generate_content(user_prompt)
        
        # Return the generated response content
        return response.text
    except Exception as e:
        print(f"An error occurred during response generation: {e}")
        return "I could not generate a response due to an error."

# --- 3. Main Logic (Re-implemented for a runnable example) ---
if __name__ == "__main__":
    # Simulate a query and context from a previous step
    query = "What are the main causes of homelessness?"
    context = """
    SEGMENT 1:
    Homelessness is a complex social problem with various contributing factors, including economic, social, and personal issues.
    
    SEGMENT 2:
    A key factor is the lack of affordable housing, which disproportionately affects low-income families and individuals.
    """
    
    print("Generating AI response with Gemini...")
    ai_response = generate_response(query, context)
    
    print("\nAI Response:")
    print(ai_response)

Generating AI response with Gemini...
Generating response using relevant segments as context...

AI Response:
Based on the provided text, homelessness is a complex issue stemming from a combination of economic, social, and personal factors.  A significant contributing factor is the lack of affordable housing, particularly impacting low-income individuals and families.



## Complete RSE Pipeline Function

In [14]:
def rag_with_rse(pdf_path, query, chunk_size=800, irrelevant_chunk_penalty=0.2):
    """
    Complete RAG pipeline with Relevant Segment Extraction.
    
    Args:
        pdf_path (str): Path to the document
        query (str): User query
        chunk_size (int): Size of chunks
        irrelevant_chunk_penalty (float): Penalty for irrelevant chunks
        
    Returns:
        Dict: Result with query, segments, and response
    """
    print("\n=== STARTING RAG WITH RELEVANT SEGMENT EXTRACTION ===")
    print(f"Query: {query}")
    
    # Process the document to extract text, chunk it, and create embeddings
    chunks, vector_store, doc_info = process_document(pdf_path, chunk_size)
    
    # Calculate relevance scores and chunk values based on the query
    print("\nCalculating relevance scores and chunk values...")
    chunk_values = calculate_chunk_values(query, chunks, vector_store, irrelevant_chunk_penalty)
    
    # Find the best segments of text based on chunk values
    best_segments, scores = find_best_segments(
        chunk_values, 
        max_segment_length=20, 
        total_max_length=30, 
        min_segment_value=0.2
    )
    
    # Reconstruct text segments from the best chunks
    print("\nReconstructing text segments from chunks...")
    segments = reconstruct_segments(chunks, best_segments)
    
    # Format the segments into a context string for the language model
    context = format_segments_for_context(segments)
    
    # Generate a response from the language model using the context
    response = generate_response(query, context)
    
    # Compile the result into a dictionary
    result = {
        "query": query,
        "segments": segments,
        "response": response
    }
    
    print("\n=== FINAL RESPONSE ===")
    print(response)
    
    return result

## Comparing with Standard Retrieval
Let's implement a standard retrieval approach to compare with RSE:

In [15]:
def standard_top_k_retrieval(pdf_path, query, k=10, chunk_size=800):
    """
    Standard RAG with top-k retrieval.
    
    Args:
        pdf_path (str): Path to the document
        query (str): User query
        k (int): Number of chunks to retrieve
        chunk_size (int): Size of chunks
        
    Returns:
        Dict: Result with query, chunks, and response
    """
    print("\n=== STARTING STANDARD TOP-K RETRIEVAL ===")
    print(f"Query: {query}")
    
    # Process the document to extract text, chunk it, and create embeddings
    chunks, vector_store, doc_info = process_document(pdf_path, chunk_size)
    
    # Create an embedding for the query
    print("Creating query embedding and retrieving chunks...")
    query_embedding = create_embeddings([query])[0]
    
    # Retrieve the top-k most relevant chunks based on the query embedding
    results = vector_store.search(query_embedding, top_k=k)
    retrieved_chunks = [result["document"] for result in results]
    
    # Format the retrieved chunks into a context string
    context = "\n\n".join([
        f"CHUNK {i+1}:\n{chunk}" 
        for i, chunk in enumerate(retrieved_chunks)
    ])
    
    # Generate a response from the language model using the context
    response = generate_response(query, context)
    
    # Compile the result into a dictionary
    result = {
        "query": query,
        "chunks": retrieved_chunks,
        "response": response
    }
    
    print("\n=== FINAL RESPONSE ===")
    print(response)
    
    return result

## Evaluation of RSE

In [16]:
def evaluate_methods(pdf_path, query, reference_answer=None):
    """
    Compare RSE with standard top-k retrieval.
    
    Args:
        pdf_path (str): Path to the document
        query (str): User query
        reference_answer (str, optional): Reference answer for evaluation
    """
    print("\n========= EVALUATION =========\n")
    
    # Run the RAG with Relevant Segment Extraction (RSE) method
    rse_result = rag_with_rse(pdf_path, query)
    
    # Run the standard top-k retrieval method
    standard_result = standard_top_k_retrieval(pdf_path, query)
    
    # If a reference answer is provided, evaluate the responses
    if reference_answer:
        print("\n=== COMPARING RESULTS ===")
        
        # Create an evaluation prompt to compare the responses against the reference answer
        evaluation_prompt = f"""
            Query: {query}

            Reference Answer:
            {reference_answer}

            Response from Standard Retrieval:
            {standard_result["response"]}

            Response from Relevant Segment Extraction:
            {rse_result["response"]}

            Compare these two responses against the reference answer. Which one is:
            1. More accurate and comprehensive
            2. Better at addressing the user's query
            3. Less likely to include irrelevant information

            Explain your reasoning for each point.
        """
        
        print("Evaluating responses against reference answer...")
        
        # Generate the evaluation using the specified model
        evaluation = client.chat.completions.create(
            model="meta-llama/Llama-3.2-3B-Instruct",
            messages=[
                {"role": "system", "content": "You are an objective evaluator of RAG system responses."},
                {"role": "user", "content": evaluation_prompt}
            ]
        )
        
        # Print the evaluation results
        print("\n=== EVALUATION RESULTS ===")
        print(evaluation.choices[0].message.content)
    
    # Return the results of both methods
    return {
        "rse_result": rse_result,
        "standard_result": standard_result
    }

In [17]:
import os
import json
import google.generativeai as genai
from typing import List, Dict, Any

# 0) Initialize Gemini client
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# 1) Load your validation JSON
with open('/Users/kekunkoya/Desktop/ISEM 770 Class Project/valh.json', 'r', encoding='utf-8') as f:
    data = json.load(f)
query = data[0]['question']
reference_answer = data[0]['ideal_answer']
pdf_path = "/Users/kekunkoya/Desktop/ISEM 770 Class Project/Homelessness.pdf"

# 2) Here's the revised evaluate_methods function for Gemini
def evaluate_methods(pdf_path: str, query: str, reference_answer: str, standard_results: List[str], reranked_results: List[str]) -> str:
    """
    Evaluates reranking results using Gemini.
    
    Args:
    pdf_path (str): Path to the document.
    query (str): The user's question.
    reference_answer (str): The ideal answer.
    standard_results (List[str]): Results before reranking.
    reranked_results (List[str]): Results after reranking.
    
    Returns:
    str: A detailed analysis of the results.
    """
    system_prompt = (
        "You are an objective evaluator of RAG system responses. "
        "Provide a detailed analysis of how the two result sets compare against the reference answer, "
        "with specific examples."
    )
    
    comparison_text = (
        f"Based on the RAG pipeline’s answers to:\n"
        f"  • PDF: {pdf_path}\n"
        f"  • Question: {query}\n\n"
        f"And comparing against the reference:\n"
        f"  {reference_answer}\n\n"
        f"Standard Results:\n{standard_results}\n\n"
        f"Reranked Results:\n{reranked_results}"
    )
    
    user_prompt = (
        f"{comparison_text}\n\n"
        "Please compare faithfulness, relevance, and ordering—point out specific strengths and weaknesses."
    )
    
    try:
        gemini_model = genai.GenerativeModel("gemini-1.5-flash", system_instruction=system_prompt)
        resp = gemini_model.generate_content(user_prompt)
        return resp.text
    except Exception as e:
        print(f"An error occurred during evaluation: {e}")
        return "Evaluation failed due to an error."

# 3) Simulate RAG pipeline outputs
standard_results = ["lack of affordable housing", "job loss", "mental health issues"]
llm_results = ["job loss", "lack of affordable housing", "mental health issues"]

# 4) Call the function
results = evaluate_methods(
    pdf_path=pdf_path,
    query=query,
    reference_answer=reference_answer,
    standard_results=standard_results,
    reranked_results=llm_results
)

# 5) Print
print("\n=== EVALUATION RESULTS ===")
print(results)


=== EVALUATION RESULTS ===
The provided RAG system results are completely unfaithful and irrelevant to the reference answer regarding the ETHOS typology.  Neither the standard nor reranked results mention any aspect of the ETHOS typology's definition, structure, or purpose. Instead, they offer a list of common contributing factors to homelessness.

Let's break down the evaluation criteria:

* **Faithfulness:**  The results are entirely unfaithful.  They don't contain any information from the provided PDF about the ETHOS typology.  The reference answer clearly defines the typology as a specific framework with categories and subcategories, while the RAG system responses offer unrelated factors.

* **Relevance:** The results are irrelevant. The question explicitly asks about the ETHOS typology.  While the listed factors (job loss, lack of affordable housing, mental health issues) are related to the broader topic of homelessness, they do not answer the question asked.  They represent a di