# Feedback Loop in RAG

In this notebook, I implement a RAG system with a feedback loop mechanism that continuously improves over time. By collecting and incorporating user feedback, our system learns to provide more relevant and higher-quality responses with each interaction.

Traditional RAG systems are static - they retrieve information based solely on embedding similarity. With a feedback loop, we create a dynamic system that:

- Remembers what worked (and what didn't)
- Adjusts document relevance scores over time
- Incorporates successful Q&A pairs into its knowledge base
- Gets smarter with each user interaction

## Setting Up the Environment
We begin by importing necessary libraries.

In [1]:
import fitz # PyMuPDF
import os
import numpy as np
import json
import _datetime
import google.generativeai as genai

In [2]:

import fitz
import os
import google.generativeai as genai
from dotenv import load_dotenv


## Extracting Text from a PDF File
To implement RAG, we first need a source of textual data. In this case, we extract text from a PDF file using the PyMuPDF library.

In [3]:
import fitz
from typing import List, Dict

def extract_text_from_pdf(pdf_path: str) -> str:
    """
    Extracts text from a PDF file using PyMuPDF (fitz).

    Args:
        pdf_path (str): Path to the PDF file.

    Returns:
        str: Extracted text from the PDF, or an empty string if an error occurs.
    """
    all_text = ""
    try:
        # Use a context manager to automatically close the document
        with fitz.open(pdf_path) as mypdf:
            # Iterate through each page to extract text
            for page in mypdf:
                all_text += page.get_text("text") + " "
    except Exception as e:
        print(f"Error reading PDF file: {e}")
        return ""
    
    return all_text

## Chunking the Extracted Text
Once we have the extracted text, we divide it into smaller, overlapping chunks to improve retrieval accuracy.

In [4]:
def chunk_text(text, n, overlap):
    """
    Chunks the given text into segments of n characters with overlap.

    Args:
    text (str): The text to be chunked.
    n (int): The number of characters in each chunk.
    overlap (int): The number of overlapping characters between chunks.

    Returns:
    List[str]: A list of text chunks.
    """
    chunks = []  # Initialize an empty list to store the chunks
    
    # Loop through the text with a step size of (n - overlap)
    for i in range(0, len(text), n - overlap):
        # Append a chunk of text from index i to i + n to the chunks list
        chunks.append(text[i:i + n])

    return chunks  # Return the list of text chunks

## Simple Vector Store Implementation
We'll create a basic vector store to manage document chunks and their embeddings.

In [5]:
class SimpleVectorStore:
    """
    A simple vector store implementation using NumPy.
    
    This class provides an in-memory storage and retrieval system for 
    embedding vectors and their corresponding text chunks and metadata.
    It supports basic similarity search functionality using cosine similarity.
    """
    def __init__(self):
        """
        Initialize the vector store with empty lists for vectors, texts, and metadata.
        
        The vector store maintains three parallel lists:
        - vectors: NumPy arrays of embedding vectors
        - texts: Original text chunks corresponding to each vector
        - metadata: Optional metadata dictionaries for each item
        """
        self.vectors = []  # List to store embedding vectors
        self.texts = []    # List to store original text chunks
        self.metadata = [] # List to store metadata for each text chunk
    
    def add_item(self, text, embedding, metadata=None):
        """
        Add an item to the vector store.

        Args:
            text (str): The original text chunk to store.
            embedding (List[float]): The embedding vector representing the text.
            metadata (dict, optional): Additional metadata for the text chunk,
                                      such as source, timestamp, or relevance scores.
        """
        self.vectors.append(np.array(embedding))  # Convert and store the embedding
        self.texts.append(text)                   # Store the original text
        self.metadata.append(metadata or {})      # Store metadata (empty dict if None)
    
    def similarity_search(self, query_embedding, k=5, filter_func=None):
        """
        Find the most similar items to a query embedding using cosine similarity.

        Args:
            query_embedding (List[float]): Query embedding vector to compare against stored vectors.
            k (int): Number of most similar results to return.
            filter_func (callable, optional): Function to filter results based on metadata.
                                             Takes metadata dict as input and returns boolean.

        Returns:
            List[Dict]: Top k most similar items, each containing:
                - text: The original text
                - metadata: Associated metadata
                - similarity: Raw cosine similarity score
                - relevance_score: Either metadata-based relevance or calculated similarity
                
        Note: Returns empty list if no vectors are stored or none pass the filter.
        """
        if not self.vectors:
            return []  # Return empty list if vector store is empty
        
        # Convert query embedding to numpy array for vector operations
        query_vector = np.array(query_embedding)
        
        # Calculate cosine similarity between query and each stored vector
        similarities = []
        for i, vector in enumerate(self.vectors):
            # Skip items that don't pass the filter criteria
            if filter_func and not filter_func(self.metadata[i]):
                continue
                
            # Calculate cosine similarity: dot product / (norm1 * norm2)
            similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append((i, similarity))  # Store index and similarity score
        
        # Sort results by similarity score in descending order
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Construct result dictionaries for the top k matches
        results = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            results.append({
                "text": self.texts[idx],
                "metadata": self.metadata[idx],
                "similarity": score,
                # Use pre-existing relevance score from metadata if available, otherwise use similarity
                "relevance_score": self.metadata[idx].get("relevance_score", score)
            })
        
        return results

## Creating Embeddings

In [6]:
import os
import google.generativeai as genai
from typing import List, Any
import numpy as np

# --- 1. Gemini API Configuration ---
# Your GOOGLE_API_KEY should be set in your environment
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Define the create_embeddings function for Gemini ---
def create_embeddings(text: str or List[str], model: str = "models/embedding-001") -> Any:
    """
    Creates embeddings for the given text or list of texts using the Gemini API.

    Args:
    text (str or List[str]): The input text(s) for which embeddings are to be created.
    model (str): The model to be used for creating embeddings. Default is "models/embedding-001".

    Returns:
    List[float] or List[List[float]]: The embedding vector(s).
    """
    try:
        # The Gemini API can handle both single strings and lists of strings
        response = genai.embed_content(
            model=model,
            content=text
        )
        
        # If the input was a single string, the response has a single embedding.
        # If the input was a list, the response is a list of embeddings.
        return response['embedding']

    except Exception as e:
        print(f"An error occurred during embedding: {e}")
        return []

# --- 3. Main Logic (Re-implemented for a runnable example) ---
if __name__ == "__main__":
    # Example 1: Create an embedding for a single string
    single_text = "Homelessness is a complex social issue."
    embedding = create_embeddings(single_text)
    print(f"Embedding for single text (first 5 values): {embedding[:5]}")
    
    # Example 2: Create embeddings for a list of strings
    list_of_texts = [
        "A lack of affordable housing is a key contributing factor.",
        "Social factors also play a role in homelessness."
    ]
    embeddings_list = create_embeddings(list_of_texts)
    print(f"\nNumber of embeddings for list: {len(embeddings_list)}")
    print(f"First embedding in list (first 5 values): {embeddings_list[0][:5]}")

Embedding for single text (first 5 values): [0.052571062, -0.03685706, -0.06520665, -0.04034025, 0.038206574]

Number of embeddings for list: 2
First embedding in list (first 5 values): [0.07521696, -0.034325134, -0.039195377, -0.008227663, 0.10222888]


## Feedback System Functions
Now we'll implement the core feedback system components.

In [7]:
def get_user_feedback(query, response, relevance, quality, comments=""):
    """
    Format user feedback in a dictionary.
    
    Args:
        query (str): User's query
        response (str): System's response
        relevance (int): Relevance score (1-5)
        quality (int): Quality score (1-5)
        comments (str): Optional feedback comments
        
    Returns:
        Dict: Formatted feedback
    """
    return {
        "query": query,
        "response": response,
        "relevance": int(relevance),
        "quality": int(quality),
        "comments": comments,
        "timestamp": datetime.now().isoformat()
    }

In [8]:
def store_feedback(feedback, feedback_file="feedback_data.json"):
    """
    Store feedback in a JSON file.
    
    Args:
        feedback (Dict): Feedback data
        feedback_file (str): Path to feedback file
    """
    with open(feedback_file, "a") as f:
        json.dump(feedback, f)
        f.write("\n")

In [9]:
def load_feedback_data(feedback_file="feedback_data.json"):
    """
    Load feedback data from file.
    
    Args:
        feedback_file (str): Path to feedback file
        
    Returns:
        List[Dict]: List of feedback entries
    """
    feedback_data = []
    try:
        with open(feedback_file, "r") as f:
            for line in f:
                if line.strip():
                    feedback_data.append(json.loads(line.strip()))
    except FileNotFoundError:
        print("No feedback data file found. Starting with empty feedback.")
    
    return feedback_data

## Document Processing with Feedback Awareness

In [10]:
def process_document(pdf_path, chunk_size=1000, chunk_overlap=200):
    """
    Process a document for RAG (Retrieval Augmented Generation) with feedback loop.
    This function handles the complete document processing pipeline:
    1. Text extraction from PDF
    2. Text chunking with overlap
    3. Embedding creation for chunks
    4. Storage in vector database with metadata

    Args:
    pdf_path (str): Path to the PDF file to process.
    chunk_size (int): Size of each text chunk in characters.
    chunk_overlap (int): Number of overlapping characters between consecutive chunks.

    Returns:
    Tuple[List[str], SimpleVectorStore]: A tuple containing:
        - List of document chunks
        - Populated vector store with embeddings and metadata
    """
    # Step 1: Extract raw text content from the PDF document
    print("Extracting text from PDF...")
    extracted_text = extract_text_from_pdf(pdf_path)
    
    # Step 2: Split text into manageable, overlapping chunks for better context preservation
    print("Chunking text...")
    chunks = chunk_text(extracted_text, chunk_size, chunk_overlap)
    print(f"Created {len(chunks)} text chunks")
    
    # Step 3: Generate vector embeddings for each text chunk
    print("Creating embeddings for chunks...")
    chunk_embeddings = create_embeddings(chunks)
    
    # Step 4: Initialize the vector database to store chunks and their embeddings
    store = SimpleVectorStore()
    
    # Step 5: Add each chunk with its embedding to the vector store
    # Include metadata for feedback-based improvements
    for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
        store.add_item(
            text=chunk,
            embedding=embedding,
            metadata={
                "index": i,                # Position in original document
                "source": pdf_path,        # Source document path
                "relevance_score": 1.0,    # Initial relevance score (will be updated with feedback)
                "feedback_count": 0        # Counter for feedback received on this chunk
            }
        )
    
    print(f"Added {len(chunks)} chunks to the vector store")
    return chunks, store

## Relevance Adjustment Based on Feedback

In [11]:
def assess_feedback_relevance(query, doc_text, feedback):
    """
    Use LLM to assess if a past feedback entry is relevant to the current query and document.
    
    This function helps determine which past feedback should influence the current retrieval
    by sending the current query, past query+feedback, and document content to an LLM
    for relevance assessment.
    
    Args:
        query (str): Current user query that needs information retrieval
        doc_text (str): Text content of the document being evaluated
        feedback (Dict): Previous feedback data containing 'query' and 'response' keys
        
    Returns:
        bool: True if the feedback is deemed relevant to current query/document, False otherwise
    """
    # Define system prompt instructing the LLM to make binary relevance judgments only
    system_prompt = """You are an AI system that determines if a past feedback is relevant to a current query and document.
    Answer with ONLY 'yes' or 'no'. Your job is strictly to determine relevance, not to provide explanations."""

    # Construct user prompt with current query, past feedback data, and truncated document content
    user_prompt = f"""
    Current query: {query}
    Past query that received feedback: {feedback['query']}
    Document content: {doc_text[:500]}... [truncated]
    Past response that received feedback: {feedback['response'][:500]}... [truncated]

    Is this past feedback relevant to the current query and document? (yes/no)
    """

    # Call the LLM API with zero temperature for deterministic output
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.2-3B-Instruct",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0  # Use temperature=0 for consistent, deterministic responses
    )
    
    # Extract and normalize the response to determine relevance
    answer = response.choices[0].message.content.strip().lower()
    return 'yes' in answer  # Return True if the answer contains 'yes'

In [12]:
def adjust_relevance_scores(query, results, feedback_data):
    """
    Adjust document relevance scores based on historical feedback to improve retrieval quality.
    
    This function analyzes past user feedback to dynamically adjust the relevance scores of 
    retrieved documents. It identifies feedback that is relevant to the current query context,
    calculates score modifiers based on relevance ratings, and re-ranks the results accordingly.
    
    Args:
        query (str): Current user query
        results (List[Dict]): Retrieved documents with their original similarity scores
        feedback_data (List[Dict]): Historical feedback containing user ratings
        
    Returns:
        List[Dict]: Results with adjusted relevance scores, sorted by the new scores
    """
    # If no feedback data available, return original results unchanged
    if not feedback_data:
        return results
    
    print("Adjusting relevance scores based on feedback history...")
    
    # Process each retrieved document
    for i, result in enumerate(results):
        document_text = result["text"]
        relevant_feedback = []
        
        # Find relevant feedback for this specific document and query combination
        # by querying the LLM to assess relevance of each historical feedback item
        for feedback in feedback_data:
            is_relevant = assess_feedback_relevance(query, document_text, feedback)
            if is_relevant:
                relevant_feedback.append(feedback)
        
        # Apply score adjustments if relevant feedback exists
        if relevant_feedback:
            # Calculate average relevance rating from all applicable feedback entries
            # Feedback relevance is on a 1-5 scale (1=not relevant, 5=highly relevant)
            avg_relevance = sum(f['relevance'] for f in relevant_feedback) / len(relevant_feedback)
            
            # Convert the average relevance to a score modifier in range 0.5-1.5
            # - Scores below 3/5 will reduce the original similarity (modifier < 1.0)
            # - Scores above 3/5 will increase the original similarity (modifier > 1.0)
            modifier = 0.5 + (avg_relevance / 5.0)
            
            # Apply the modifier to the original similarity score
            original_score = result["similarity"]
            adjusted_score = original_score * modifier
            
            # Update the result dictionary with new scores and feedback metadata
            result["original_similarity"] = original_score  # Preserve the original score
            result["similarity"] = adjusted_score           # Update the primary score
            result["relevance_score"] = adjusted_score      # Update the relevance score
            result["feedback_applied"] = True               # Flag that feedback was applied
            result["feedback_count"] = len(relevant_feedback)  # Number of feedback entries used
            
            # Log the adjustment details
            print(f"  Document {i+1}: Adjusted score from {original_score:.4f} to {adjusted_score:.4f} based on {len(relevant_feedback)} feedback(s)")
    
    # Re-sort results by adjusted scores to ensure higher quality matches appear first
    results.sort(key=lambda x: x["similarity"], reverse=True)
    
    return results

## Fine-tuning Our Index with Feedback

In [13]:
def fine_tune_index(current_store, chunks, feedback_data):
    """
    Enhance vector store with high-quality feedback to improve retrieval quality over time.
    
    This function implements a continuous learning process by:
    1. Identifying high-quality feedback (highly rated Q&A pairs)
    2. Creating new retrieval items from successful interactions
    3. Adding these to the vector store with boosted relevance weights
    
    Args:
        current_store (SimpleVectorStore): Current vector store containing original document chunks
        chunks (List[str]): Original document text chunks 
        feedback_data (List[Dict]): Historical user feedback with relevance and quality ratings
        
    Returns:
        SimpleVectorStore: Enhanced vector store containing both original chunks and feedback-derived content
    """
    print("Fine-tuning index with high-quality feedback...")
    
    # Filter for only high-quality responses (both relevance and quality rated 4 or 5)
    # This ensures we only learn from the most successful interactions
    good_feedback = [f for f in feedback_data if f['relevance'] >= 4 and f['quality'] >= 4]
    
    if not good_feedback:
        print("No high-quality feedback found for fine-tuning.")
        return current_store  # Return original store unchanged if no good feedback exists
    
    # Initialize new store that will contain both original and enhanced content
    new_store = SimpleVectorStore()
    
    # First transfer all original document chunks with their existing metadata
    for i in range(len(current_store.texts)):
        new_store.add_item(
            text=current_store.texts[i],
            embedding=current_store.vectors[i],
            metadata=current_store.metadata[i].copy()  # Use copy to prevent reference issues
        )
    
    # Create and add enhanced content from good feedback
    for feedback in good_feedback:
        # Format a new document that combines the question and its high-quality answer
        # This creates retrievable content that directly addresses user queries
        enhanced_text = f"Question: {feedback['query']}\nAnswer: {feedback['response']}"
        
        # Generate embedding vector for this new synthetic document
        embedding = create_embeddings(enhanced_text)
        
        # Add to vector store with special metadata that identifies its origin and importance
        new_store.add_item(
            text=enhanced_text,
            embedding=embedding,
            metadata={
                "type": "feedback_enhanced",  # Mark as derived from feedback
                "query": feedback["query"],   # Store original query for reference
                "relevance_score": 1.2,       # Boost initial relevance to prioritize these items
                "feedback_count": 1,          # Track feedback incorporation
                "original_feedback": feedback # Preserve complete feedback record
            }
        )
        
        print(f"Added enhanced content from feedback: {feedback['query'][:50]}...")
    
    # Log summary statistics about the enhancement
    print(f"Fine-tuned index now has {len(new_store.texts)} items (original: {len(chunks)})")
    return new_store

## Complete RAG Pipeline with Feedback Loop

In [14]:
import os
import google.generativeai as genai
from typing import List

# --- 1. Gemini API Configuration ---
# Your GOOGLE_API_KEY should be set in your environment
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Define the response generator for Gemini ---
def generate_response(query: str, context: str, model: str = "gemini-1.5-flash") -> str:
    """
    Generate a response based on the query and context using Gemini.

    Args:
        query (str): User query
        context (str): Context text from retrieved documents
        model (str): LLM model to use

    Returns:
        str: Generated response
    """
    # Define the system prompt to guide the AI's behavior
    system_prompt = """You are a helpful AI assistant. Answer the user's question based only on the provided context. If you cannot find the answer in the context, state that you don't have enough information."""
    
    # Create the user prompt by combining the context and the query
    user_prompt = f"""
Context:
{context}

Question: {query}

Please provide a comprehensive answer based only on the context above.
"""
    
    try:
        # Pass the system prompt to the GenerativeModel's system_instruction parameter
        gemini_model = genai.GenerativeModel(model, system_instruction=system_prompt)
        
        # Generate the response using the specified model
        response = gemini_model.generate_content(user_prompt)
        
        # Return the generated response content
        return response.text
    except Exception as e:
        print(f"An error occurred during response generation: {e}")
        return "I could not generate a response due to an error."

# --- 3. Main Logic (Re-implemented for a runnable example) ---
if __name__ == "__main__":
    # Simulate a query and context from a previous step
    query = "What are the main causes of homelessness?"
    context = "Homelessness is a complex social problem. A key factor is the lack of affordable housing, which disproportionately affects low-income families and individuals."
    
    print("Generating AI response with Gemini...")
    ai_response = generate_response(query, context)
    
    print("\nAI Response:")
    print(ai_response)

Generating AI response with Gemini...

AI Response:
Based on the provided text, a key factor contributing to homelessness is the lack of affordable housing.  This disproportionately impacts low-income families and individuals.



In [15]:


def create_embeddings(text: str, model: str = "models/embedding-001") -> Any:
    """Creates an embedding for text using the Gemini API."""
    try:
        response = genai.embed_content(model=model, content=text)
        return response['embedding']
    except Exception as e:
        print(f"Embedding error: {e}")
        return []

def generate_response(query: str, context: str, model: str = "gemini-1.5-flash") -> str:
    """Generate a response based on the query and context using Gemini."""
    system_prompt = "You are a helpful AI assistant. Answer the user's question based only on the provided context. If the context doesn't contain the answer, say so clearly."
    user_prompt = f"Context:\n{context}\n\nQuestion: {query}"
    
    try:
        gemini_model = genai.GenerativeModel(model, system_instruction=system_prompt)
        response = gemini_model.generate_content(user_prompt)
        return response.text
    except Exception as e:
        print(f"An error occurred during response generation: {e}")
        return "I could not generate a response due to an error."

class SimpleVectorStore:
    """Placeholder vector store class for a runnable example."""
    def __init__(self):
        self.vectors = []
        self.texts = []
        self.metadata = []

    def add_item(self, text, embedding, metadata=None):
        self.vectors.append(np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})
    
    def similarity_search(self, query_embedding, k=5):
        query_vector = np.array(query_embedding)
        similarities = []
        for i, vector in enumerate(self.vectors):
            if np.linalg.norm(query_vector) == 0 or np.linalg.norm(vector) == 0:
                similarity = 0
            else:
                similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append({"text": self.texts[i], "score": similarity, "metadata": self.metadata[i]})
        similarities.sort(key=lambda x: x["score"], reverse=True)
        return similarities[:k]

def adjust_relevance_scores(query, results, feedback_data):
    """
    Placeholder function to adjust relevance scores.
    In a real scenario, this would use feedback data to boost or penalize scores.
    """
    return results

# --- 2. Your original `rag_with_feedback_loop` function (revised) ---
def rag_with_feedback_loop(query: str, vector_store: SimpleVectorStore, feedback_data: List[Dict], k: int = 5, model: str = "gemini-1.5-flash") -> Dict:
    """
    Complete RAG pipeline incorporating feedback loop.
    """
    print(f"\n=== Processing query with feedback-enhanced RAG ===")
    print(f"Query: {query}")
    
    query_embedding = create_embeddings(query)
    if not query_embedding:
        return {"error": "Failed to create query embedding."}
    
    results = vector_store.similarity_search(query_embedding, k=k)
    
    adjusted_results = adjust_relevance_scores(query, results, feedback_data)
    
    retrieved_texts = [result["text"] for result in adjusted_results]
    
    context = "\n\n---\n\n".join(retrieved_texts)
    
    print("Generating response...")
    response = generate_response(query, context, model)
    
    result = {
        "query": query,
        "retrieved_documents": adjusted_results,
        "response": response
    }
    
    print("\n=== Response ===")
    print(response)
    
    return result

# --- 3. Main Logic for a runnable example ---
if __name__ == "__main__":
    # Simulate a vector store with chunks and embeddings
    store = SimpleVectorStore()
    docs = [
        "Homelessness is a complex social problem with economic, social, and personal factors.",
        "A key factor is the lack of affordable housing, which disproportionately affects low-income families.",
        "The sun is the star at the center of our solar system."
    ]
    for i, doc in enumerate(docs):
        embedding = create_embeddings(doc)
        if embedding:
            store.add_item(doc, embedding, {"id": i})

    # Simulate feedback data
    feedback = [
        {"query": "affordable housing", "doc_id": 1, "relevance_feedback": "positive"},
        {"query": "sun", "doc_id": 2, "relevance_feedback": "negative"}
    ]
    
    user_query = "What causes homelessness?"
    rag_with_feedback_loop(user_query, store, feedback)


=== Processing query with feedback-enhanced RAG ===
Query: What causes homelessness?
Generating response...

=== Response ===
Based on the provided text, a key factor in homelessness is the lack of affordable housing, which disproportionately affects low-income families.  The text also states that homelessness is a complex problem with economic, social, and personal factors, but it does not elaborate on those factors.



## Complete Workflow: From Initial Setup to Feedback Collection

In [16]:
def full_rag_workflow(pdf_path, query, feedback_data=None, feedback_file="valh.json", fine_tune=False):
    """
    Execute a complete RAG workflow with feedback integration for continuous improvement.
    
    This function orchestrates the entire Retrieval-Augmented Generation process:
    1. Load historical feedback data
    2. Process and chunk the document
    3. Optionally fine-tune the vector index with prior feedback
    4. Perform retrieval and generation with feedback-adjusted relevance scores
    5. Collect new user feedback for future improvement
    6. Store feedback to enable system learning over time
    
    Args:
        pdf_path (str): Path to the PDF document to be processed
        query (str): User's natural language query
        feedback_data (List[Dict], optional): Pre-loaded feedback data, loads from file if None
        feedback_file (str): Path to the JSON file storing feedback history
        fine_tune (bool): Whether to enhance the index with successful past Q&A pairs
        
    Returns:
        Dict: Results containing the response and retrieval metadata
    """
    # Step 1: Load historical feedback for relevance adjustment if not explicitly provided
    if feedback_data is None:
        feedback_data = load_feedback_data(feedback_file)
        print(f"Loaded {len(feedback_data)} feedback entries from {feedback_file}")
    
    # Step 2: Process document through extraction, chunking and embedding pipeline
    chunks, vector_store = process_document(pdf_path)
    
    # Step 3: Fine-tune the vector index by incorporating high-quality past interactions
    # This creates enhanced retrievable content from successful Q&A pairs
    if fine_tune and feedback_data:
        vector_store = fine_tune_index(vector_store, chunks, feedback_data)
    
    # Step 4: Execute core RAG with feedback-aware retrieval
    # Note: This depends on the rag_with_feedback_loop function which should be defined elsewhere
    result = rag_with_feedback_loop(query, vector_store, feedback_data)
    
    # Step 5: Collect user feedback to improve future performance
    print("\n=== Would you like to provide feedback on this response? ===")
    print("Rate relevance (1-5, with 5 being most relevant):")
    relevance = input()
    
    print("Rate quality (1-5, with 5 being highest quality):")
    quality = input()
    
    print("Any comments? (optional, press Enter to skip)")
    comments = input()
    
    # Step 6: Format feedback into structured data
    feedback = get_user_feedback(
        query=query,
        response=result["response"],
        relevance=int(relevance),
        quality=int(quality),
        comments=comments
    )
    
    # Step 7: Persist feedback to enable continuous system learning
    store_feedback(feedback, feedback_file)
    print("Feedback recorded. Thank you!")
    
    return result

In [None]:
import os
import json
import fitz
import numpy as np
import google.generativeai as genai
from typing import List, Dict, Tuple, Any

# --- 1. Gemini API Configuration and Helper Functions ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

def create_embeddings(texts: str or List[str], model: str = "models/embedding-001") -> Any:
    """Creates embeddings for text using the Gemini API."""
    try:
        response = genai.embed_content(model=model, content=texts)
        return response['embedding']
    except Exception as e:
        print(f"Embedding error: {e}")
        return []

class SimpleVectorStore:
    def __init__(self):
        self.vectors = []
        self.texts = []
        self.metadata = []
    def add_item(self, text, embedding, metadata=None):
        self.vectors.append(np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})
    def similarity_search(self, query_embedding, k=5):
        query_vector = np.array(query_embedding)
        similarities = []
        for i, vector in enumerate(self.vectors):
            similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append({"text": self.texts[i], "score": similarity, "metadata": self.metadata[i]})
        similarities.sort(key=lambda x: x["score"], reverse=True)
        return similarities[:k]

def load_feedback_data(feedback_file: str) -> List[Dict]:
    """Loads feedback data from a JSON file."""
    if os.path.exists(feedback_file):
        with open(feedback_file, 'r', encoding='utf-8') as f:
            return json.load(f)
    return []

def store_feedback(feedback: Dict, feedback_file: str):
    """Stores a new feedback entry to a JSON file."""
    feedback_data = load_feedback_data(feedback_file)
    feedback_data.append(feedback)
    with open(feedback_file, 'w', encoding='utf-8') as f:
        json.dump(feedback_data, f, indent=2)

def process_document(pdf_path: str) -> Tuple[List[str], SimpleVectorStore]:
    """A placeholder for your PDF processing and vector store creation."""
    chunks = [
        "Homelessness is a complex social problem. It has multiple causes.",
        "A lack of affordable housing is a major contributing factor.",
        "Past successful interventions involved providing supportive services.",
        "User feedback is crucial for improving RAG systems over time."
    ]
    vector_store = SimpleVectorStore()
    for i, chunk in enumerate(chunks):
        embedding = create_embeddings(chunk)
        if embedding:
            vector_store.add_item(chunk, embedding, metadata={"index": i})
    return chunks, vector_store

def fine_tune_index(vector_store, chunks, feedback_data):
    """A placeholder for fine-tuning the index based on feedback."""
    print("Fine-tuning index with feedback data...")
    return vector_store

def rag_with_feedback_loop(query, vector_store, feedback_data, k=5, model="gemini-1.5-flash"):
    """
    A placeholder RAG pipeline using Gemini and a simulated feedback adjustment.
    """
    query_embedding = create_embeddings(query)
    results = vector_store.similarity_search(query_embedding, k=k)
    retrieved_texts = [result["text"] for result in results]
    context = "\n\n---\n\n".join(retrieved_texts)
    
    # Generate response using Gemini
    system_prompt = "You are a helpful assistant. Answer the user's question based only on the provided context."
    user_prompt = f"Context:\n{context}\n\nQuestion: {query}"
    try:
        gemini_model = genai.GenerativeModel(model, system_instruction=system_prompt)
        response = gemini_model.generate_content(user_prompt)
        return {"response": response.text, "retrieved_documents": results}
    except Exception as e:
        print(f"Response generation error: {e}")
        return {"response": "Error", "retrieved_documents": results}

def get_user_feedback(query, response, relevance, quality, comments):
    """Placeholder to format feedback."""
    return {"query": query, "response": response, "relevance": relevance, "quality": quality, "comments": comments}

# --- 2. Your original `full_rag_workflow` function ---
def full_rag_workflow(pdf_path, query, feedback_data=None, feedback_file="feedback_data.json", fine_tune=False):
    """
    Execute a complete RAG workflow with feedback integration for continuous improvement.
    """
    print("=== Starting Full RAG Workflow ===")
    
    # Step 1: Load historical feedback data
    if feedback_data is None:
        feedback_data = load_feedback_data(feedback_file)
        print(f"Loaded {len(feedback_data)} feedback entries from {feedback_file}")
    
    # Step 2: Process document
    chunks, vector_store = process_document(pdf_path)
    
    # Step 3: Fine-tune the vector index
    if fine_tune and feedback_data:
        vector_store = fine_tune_index(vector_store, chunks, feedback_data)
    
    # Step 4: Execute core RAG with feedback-aware retrieval
    result = rag_with_feedback_loop(query, vector_store, feedback_data)
    
    # Step 5: Collect user feedback
    print("\n=== Would you like to provide feedback on this response? ===")
    relevance = input("Rate relevance (1-5, with 5 being most relevant): ")
    quality = input("Rate quality (1-5, with 5 being highest quality): ")
    comments = input("Any comments? (optional, press Enter to skip): ")
    
    # Step 6: Format feedback into structured data
    feedback = get_user_feedback(
        query=query,
        response=result["response"],
        relevance=int(relevance),
        quality=int(quality),
        comments=comments
    )
    
    # Step 7: Persist feedback
    
    store_feedback(feedback, feedback_file)
    print("Feedback recorded. Thank you!")
    
    return result

# --- 3. Main Logic for a runnable example ---
if __name__ == "__main__":
    pdf_file_path = "/Users/kekunkoya/Desktop/770 Google /Homelessness.pdf"
    user_query = "What are the key factors of homelessness?"
    
    full_rag_workflow(pdf_file_path, user_query)

=== Starting Full RAG Workflow ===
Loaded 0 feedback entries from feedback_data.json

=== Would you like to provide feedback on this response? ===


## Evaluating Our Feedback Loop

In [1]:
def evaluate_feedback_loop(pdf_path, test_queries, reference_answers=None):
    """
    Evaluate the impact of feedback loop on RAG quality by comparing performance before and after feedback integration.
    
    This function runs a controlled experiment to measure how incorporating feedback affects retrieval and generation:
    1. First round: Run all test queries with no feedback
    2. Generate synthetic feedback based on reference answers (if provided)
    3. Second round: Run the same queries with feedback-enhanced retrieval
    4. Compare results between rounds to quantify feedback impact
    
    Args:
        pdf_path (str): Path to the PDF document used as the knowledge base
        test_queries (List[str]): List of test queries to evaluate system performance
        reference_answers (List[str], optional): Reference/gold standard answers for evaluation
                                                and synthetic feedback generation
        
    Returns:
        Dict: Evaluation results containing:
            - round1_results: Results without feedback
            - round2_results: Results with feedback
            - comparison: Quantitative comparison metrics between rounds
    """
    print("=== Evaluating Feedback Loop Impact ===")
    
    # Create a temporary feedback file for this evaluation session only
    temp_feedback_file = "temp_evaluation_feedback.json"
    
    # Initialize feedback collection (empty at the start)
    feedback_data = []
    
    # ----------------------- FIRST EVALUATION ROUND -----------------------
    # Run all queries without any feedback influence to establish baseline performance
    print("\n=== ROUND 1: NO FEEDBACK ===")
    round1_results = []
    
    for i, query in enumerate(test_queries):
        print(f"\nQuery {i+1}: {query}")
        
        # Process document to create initial vector store
        chunks, vector_store = process_document(pdf_path)
        
        # Execute RAG without feedback influence (empty feedback list)
        result = rag_with_feedback_loop(query, vector_store, [])
        round1_results.append(result)
        
        # Generate synthetic feedback if reference answers are available
        # This simulates user feedback for training the system
        if reference_answers and i < len(reference_answers):
            # Calculate synthetic feedback scores based on similarity to reference answer
            similarity_to_ref = calculate_similarity(result["response"], reference_answers[i])
            # Convert similarity (0-1) to rating scale (1-5)
            relevance = max(1, min(5, int(similarity_to_ref * 5)))
            quality = max(1, min(5, int(similarity_to_ref * 5)))
            
            # Create structured feedback entry
            feedback = get_user_feedback(
                query=query,
                response=result["response"],
                relevance=relevance,
                quality=quality,
                comments=f"Synthetic feedback based on reference similarity: {similarity_to_ref:.2f}"
            )
            
            # Add to in-memory collection and persist to temporary file
            feedback_data.append(feedback)
            store_feedback(feedback, temp_feedback_file)
    
    # ----------------------- SECOND EVALUATION ROUND -----------------------
    # Run the same queries with feedback incorporation to measure improvement
    print("\n=== ROUND 2: WITH FEEDBACK ===")
    round2_results = []
    
    # Process document and enhance with feedback-derived content
    chunks, vector_store = process_document(pdf_path)
    vector_store = fine_tune_index(vector_store, chunks, feedback_data)
    
    for i, query in enumerate(test_queries):
        print(f"\nQuery {i+1}: {query}")
        
        # Execute RAG with feedback influence
        result = rag_with_feedback_loop(query, vector_store, feedback_data)
        round2_results.append(result)
    
    # ----------------------- RESULTS ANALYSIS -----------------------
    # Compare performance metrics between the two rounds
    comparison = compare_results(test_queries, round1_results, round2_results, reference_answers)
    
    # Clean up temporary evaluation artifacts
    if os.path.exists(temp_feedback_file):
        os.remove(temp_feedback_file)
    
    return {
        "round1_results": round1_results,
        "round2_results": round2_results,
        "comparison": comparison
    }

## Helper Functions for Evaluation

In [2]:
def calculate_similarity(text1, text2):
    """
    Calculate semantic similarity between two texts using embeddings.
    
    Args:
        text1 (str): First text
        text2 (str): Second text
        
    Returns:
        float: Similarity score between 0 and 1
    """
    # Generate embeddings for both texts
    embedding1 = create_embeddings(text1)
    embedding2 = create_embeddings(text2)
    
    # Convert embeddings to numpy arrays
    vec1 = np.array(embedding1)
    vec2 = np.array(embedding2)
    
    # Calculate cosine similarity between the two vectors
    similarity = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
    
    return similarity

In [3]:
def compare_results(queries, round1_results, round2_results, reference_answers=None):
    """
    Compare results from two rounds of RAG.
    
    Args:
        queries (List[str]): Test queries
        round1_results (List[Dict]): Results from round 1
        round2_results (List[Dict]): Results from round 2
        reference_answers (List[str], optional): Reference answers
        
    Returns:
        str: Comparison analysis
    """
    print("\n=== COMPARING RESULTS ===")
    
    # System prompt to guide the AI's evaluation behavior
    system_prompt = """You are an expert evaluator of RAG systems. Compare responses from two versions:
        1. Standard RAG: No feedback used
        2. Feedback-enhanced RAG: Uses a feedback loop to improve retrieval

        Analyze which version provides better responses in terms of:
        - Relevance to the query
        - Accuracy of information
        - Completeness
        - Clarity and conciseness
    """

    comparisons = []
    
    # Iterate over each query and its corresponding results from both rounds
    for i, (query, r1, r2) in enumerate(zip(queries, round1_results, round2_results)):
        # Create a prompt for comparing the responses
        comparison_prompt = f"""
        Query: {query}

        Standard RAG Response:
        {r1["response"]}

        Feedback-enhanced RAG Response:
        {r2["response"]}
        """

        # Include reference answer if available
        if reference_answers and i < len(reference_answers):
            comparison_prompt += f"""
            Reference Answer:
            {reference_answers[i]}
            """

        comparison_prompt += """
        Compare these responses and explain which one is better and why.
        Focus specifically on how the feedback loop has (or hasn't) improved the response quality.
        """

        # Call the OpenAI API to generate a comparison analysis
        response = client.chat.completions.create(
            model="meta-llama/Llama-3.2-3B-Instruct",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": comparison_prompt}
            ],
            temperature=0
        )
        
        # Append the comparison analysis to the results
        comparisons.append({
            "query": query,
            "analysis": response.choices[0].message.content
        })
        
        # Print a snippet of the analysis for each query
        print(f"\nQuery {i+1}: {query}")
        print(f"Analysis: {response.choices[0].message.content[:200]}...")
    
    return comparisons

## Evaluation of the feedback loop (Custom Validation Queries)

In [7]:
import os
import google.generativeai as genai
import fitz
import numpy as np
import json
from typing import List, Tuple, Dict, Any

# --- Gemini API Configuration ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- Helper functions (must be defined before being called) ---
def extract_text_from_pdf(pdf_path: str) -> str:
    all_text = []
    try:
        with fitz.open(pdf_path) as doc:
            for page in doc:
                all_text.append(page.get_text("text"))
    except Exception as e:
        print(f"Error reading PDF: {e}")
        return ""
    return "\n".join(all_text)

def chunk_text(text: str, chunk_size: int, overlap: int) -> List[str]:
    chunks = []
    step = chunk_size - overlap
    for i in range(0, len(text), step):
        chunks.append(text[i:i + chunk_size])
    return chunks

def create_embeddings(texts: str or List[str], model: str = "models/embedding-001") -> Any:
    try:
        response = genai.embed_content(model=model, content=texts)
        return response['embedding']
    except Exception as e:
        print(f"Embedding error: {e}")
        return []

class SimpleVectorStore:
    def __init__(self):
        self.vectors = []
        self.texts = []
        self.metadata = []
    def add_item(self, text, embedding, metadata=None):
        self.vectors.append(np.array(embedding))
        self.texts.append(text)
        self.metadata.append(metadata or {})
    def similarity_search(self, query_embedding, k=5):
        query_vector = np.array(query_embedding)
        similarities = []
        for i, vector in enumerate(self.vectors):
            similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append({"text": self.texts[i], "score": similarity, "metadata": self.metadata[i]})
        similarities.sort(key=lambda x: x["score"], reverse=True)
        return similarities[:k]

def generate_response(query: str, context: str, model: str = "gemini-1.5-flash") -> str:
    system_prompt = "You are a helpful assistant. Answer based only on the provided context."
    user_prompt = f"Context:\n{context}\n\nQuestion: {query}"
    try:
        gemini_model = genai.GenerativeModel(model, system_instruction=system_prompt)
        response = gemini_model.generate_content(user_prompt)
        return response.text
    except Exception as e:
        print(f"Response generation error: {e}")
        return "Error"

def rag_with_feedback_loop(query: str, vector_store: SimpleVectorStore, feedback_data: List[Dict]) -> Dict:
    query_embedding = create_embeddings(query)
    results = vector_store.similarity_search(query_embedding, k=5)
    retrieved_texts = [result["text"] for result in results]
    context = "\n\n---\n\n".join(retrieved_texts)
    response = generate_response(query, context)
    return {"response": response}

def process_document(pdf_path: str, chunk_size: int = 800) -> Tuple[List[str], SimpleVectorStore]:
    text = extract_text_from_pdf(pdf_path)
    chunks = chunk_text(text, chunk_size, 0)
    chunk_embeddings = create_embeddings(chunks)
    vector_store = SimpleVectorStore()
    for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
        vector_store.add_item(chunk, embedding, metadata={"id": i})
    return chunks, vector_store

def evaluate_feedback_loop(pdf_path: str, test_queries: List[str], reference_answers: List[str]) -> List[Dict]:
    evaluation_results = []
    print("Processing document to create initial vector store...")
    chunks, vector_store = process_document(pdf_path)
    
    for i, query in enumerate(test_queries):
        print(f"\nQuery {i+1}: {query}")
        result = rag_with_feedback_loop(query, vector_store, [])
        evaluation_results.append({
            "query": query,
            "response": result["response"],
            "reference_answer": reference_answers[i] if i < len(reference_answers) else "N/A"
        })
    return evaluation_results

# --- Main Logic ---
pdf_path = "/Users/kekunkoya/Desktop/ISEM 770 Class Project/Homelessness.pdf"
test_queries = ["Why is a single number inadequate for understanding homelessness?"]
reference_answers = ["Section “How many homeless people are there?” and discussion of stock, flow, and prevalence figures"]

# Run the evaluation
evaluation_results = evaluate_feedback_loop(
    pdf_path=pdf_path,
    test_queries=test_queries,
    reference_answers=reference_answers
)

print("\nEvaluation complete.")

Processing document to create initial vector store...

Query 1: Why is a single number inadequate for understanding homelessness?

Evaluation complete.


## Visualizing Feedback Impact

In [12]:
import os
import google.generativeai as genai
from typing import List, Dict, Any
import fitz # For the PDF extraction stub

# --- Gemini API Configuration ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- Helper functions (revised for Gemini-compatibility) ---
def extract_text_from_pdf(pdf_path: str) -> str:
    """Extracts text from a PDF file."""
    all_text = []
    try:
        with fitz.open(pdf_path) as doc:
            for page in doc:
                all_text.append(page.get_text("text"))
    except Exception as e:
        print(f"Error reading PDF: {e}")
        return ""
    return "\n".join(all_text)

def compress_text(full_text: str, method: str) -> str:
    """Stub for text compression."""
    length = len(full_text)
    if method == "selective": return full_text[:500]
    elif method == "summary": return full_text[max(0, (length - 500) // 2):max(0, (length - 500) // 2)+500]
    elif method == "extraction": return full_text[-500:]
    return full_text

def run_rag_pipeline(context: str, query: str) -> str:
    """
    Stub for RAG pipeline that returns a dummy answer.
    """
    return f"Dummy RAG answer for query: '{query}' based on provided context."

# --- The main evaluation function (revised for Gemini) ---
def evaluate_compression(pdf_path: str, query: str, reference_answer: str, compression_types: list[str]) -> Dict[str, str]:
    """
    Applies each compression method, runs RAG, and evaluates against a reference using Gemini.
    """
    results = {}
    # Bug fix: Use the pdf_path argument instead of a hardcoded path
    full_text = extract_text_from_pdf(pdf_path)

    for ctype in compression_types:
        compressed_ctx = compress_text(full_text, method=ctype)
        rag_answer = run_rag_pipeline(compressed_ctx, query)

        system_prompt = "You are an objective evaluator of RAG outputs."
        user_prompt = (
            f"Compression type: {ctype}\n\n"
            f"Question: {query}\n\n"
            f"Context:\n{compressed_ctx}\n\n"
            f"RAG Answer: {rag_answer}\n\n"
            f"Reference Answer: {reference_answer}\n\n"
            "Evaluate for faithfulness and relevance. Provide details."
        )

        try:
            # Create a Gemini model instance with the system prompt
            resp = genai.GenerativeModel(
                "gemini-1.5-flash",
                system_instruction=system_prompt
            ).generate_content(user_prompt)
            results[ctype] = resp.text
        except Exception as e:
            print(f"An error occurred during evaluation for {ctype}: {e}")
            results[ctype] = "Evaluation failed due to an error."

    return results

# --- Main Logic for a runnable example ---
if __name__ == "__main__":
    pdf_path = "/Users/kekunkoya/Desktop/ISEM 770 Class Project/Homelessness.pdf"
    if not os.path.isfile(pdf_path):
        print(f"Error: PDF file not found at '{pdf_path}'")
        exit()

    query = "What are typical policy objectives in European homelessness strategies?"
    reference_answer = """
Section Setting concrete targets
listing of subsidized housing in the newspapers
prevention reduction of homelessness through predictive AI
supportive programs through workplace, hospitals and churches
"""
    compression_types = ["selective", "summary", "extraction"]

    results = evaluate_compression(
        pdf_path=pdf_path,
        query=query,
        reference_answer=reference_answer,
        compression_types=compression_types
    )

    for ctype, eval_text in results.items():
        print(f"\n--- {ctype.upper()} EVALUATION ---\n{eval_text}")


--- SELECTIVE EVALUATION ---
The RAG answer is completely unfaithful and irrelevant to the provided context.  The context is an abstract discussing the standardization of homelessness definitions across Europe, not the policy objectives themselves.  The RAG answer fabricates policy objectives ("subsidized housing in newspapers," "predictive AI for prevention," etc.) that are not mentioned or implied in the abstract.

**Faithfulness:**  0/5 - The answer hallucinates information entirely unrelated to the provided text.

**Relevance:** 0/5 - The answer does not address the question based on the provided context.  The context offers no information on policy objectives, and the answer does not attempt to connect to the limited information provided.

The provided "Reference Answer" is also a fabrication and should not be used as a benchmark for evaluating the RAG response.  A proper reference answer would acknowledge that the provided text does not contain information on policy objectives a

In [13]:
import os
import json
import random # Used for simulating data

# Simulate the output of a Gemini-powered RAG pipeline
# In a real scenario, this dictionary would be generated by your main evaluation function
evaluation_results = {
    'round1_results': [
        {'response': 'Homelessness is a complex social problem with various contributing factors.'},
        {'response': 'Affordable housing is a key factor.'}
    ],
    'round2_results': [
        {'response': 'Homelessness is a complex social problem caused by a lack of affordable housing and job loss.'},
        {'response': 'A lack of affordable housing and economic issues are key factors.'}
    ],
    'comparison': [
        {
            'query': 'What are the main causes of homelessness?',
            'analysis': "The RAG pipeline in round 2 provided a more complete answer by including economic issues, showing the impact of the feedback loop."
        }
    ]
}

# Add some dummy responses to simulate a real scenario
avg_len_round1 = sum(len(r['response']) for r in evaluation_results['round1_results']) / len(evaluation_results['round1_results'])
avg_len_round2 = sum(len(r['response']) for r in evaluation_results['round2_results']) / len(evaluation_results['round2_results'])
evaluation_results['round1_results'][0]['response_length'] = len(evaluation_results['round1_results'][0]['response'])
evaluation_results['round1_results'][1]['response_length'] = len(evaluation_results['round1_results'][1]['response'])
evaluation_results['round2_results'][0]['response_length'] = len(evaluation_results['round2_results'][0]['response'])
evaluation_results['round2_results'][1]['response_length'] = len(evaluation_results['round2_results'][1]['response'])


# Extract the comparison data which contains the analysis of feedback impact
comparisons = evaluation_results['comparison']

# Print out the analysis results to visualize feedback impact
print("\n=== FEEDBACK IMPACT ANALYSIS ===\n")
for i, comparison in enumerate(comparisons):
    print(f"Query {i+1}: {comparison['query']}")
    print(f"\nAnalysis of feedback impact:")
    print(comparison['analysis'])
    print("\n" + "-"*50 + "\n")

# Additionally, we can compare some metrics between rounds
round_responses = [evaluation_results[f'round{round_num}_results'] for round_num in range(1, len(evaluation_results) - 1)]
response_lengths = [[len(r["response"]) for r in round] for round in round_responses]

print("\nResponse length comparison (proxy for completeness):")
avg_lengths = [sum(lengths) / len(lengths) for lengths in response_lengths]
for round_num, avg_len in enumerate(avg_lengths, start=1):
    print(f"Round {round_num}: {avg_len:.1f} chars")

if len(avg_lengths) > 1:
    changes = [(avg_lengths[i] - avg_lengths[i-1]) / avg_lengths[i-1] * 100 for i in range(1, len(avg_lengths))]
    for round_num, change in enumerate(changes, start=2):
        print(f"Change from Round {round_num-1} to Round {round_num}: {change:.1f}%")


=== FEEDBACK IMPACT ANALYSIS ===

Query 1: What are the main causes of homelessness?

Analysis of feedback impact:
The RAG pipeline in round 2 provided a more complete answer by including economic issues, showing the impact of the feedback loop.

--------------------------------------------------


Response length comparison (proxy for completeness):
Round 1: 55.0 chars


In [14]:
import os
import google.generativeai as genai
from typing import Dict, Any

# --- 1. Gemini API Configuration ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. The main assessment function (revised for Gemini) ---
def assess_feedback_relevance(query: str, doc_text: str, feedback: dict) -> bool:
    """
    Returns True if the given feedback entry is relevant to the current query and document excerpt.
    Handles different feedback field names gracefully.
    """
    # Safely pull out the feedback text, trying multiple possible keys
    feedback_content = (
        feedback.get("feedback_text")
        or feedback.get("feedback")
        or feedback.get("comment")
        or str(feedback)
    )

    system_prompt = "You are an objective evaluator. Answer 'yes' or 'no' only."
    user_prompt = (
        f"Current query: {query}\n"
        f"Document excerpt (first 200 chars): {doc_text[:200]}...\n"
        f"Past feedback: {feedback_content}\n"
        "Is this past feedback relevant to the current query and document? (yes/no)"
    )

    try:
        # Create a Gemini model instance with the system prompt
        gemini_model = genai.GenerativeModel("gemini-1.5-flash", system_instruction=system_prompt)
        
        # Generate the response
        resp = gemini_model.generate_content(user_prompt)
        answer = resp.text.strip().lower()
        return answer.startswith("yes")
    except Exception as e:
        print(f"An error occurred during relevance assessment: {e}")
        return False

# --- 3. Main Logic (Re-implemented for a runnable example) ---
if __name__ == "__main__":
    # Simulate a query, document excerpt, and feedback
    sample_query = "What are the main causes of homelessness?"
    sample_doc_text = "Homelessness is a complex social problem with various contributing factors, including economic, social, and personal issues. A key factor is the lack of affordable housing, which disproportionately affects low-income families and individuals."
    
    # Simulate relevant feedback
    relevant_feedback = {"comment": "The response should mention economic factors more clearly."}
    
    # Simulate irrelevant feedback
    irrelevant_feedback = {"feedback": "The AI provided a good summary of the solar system."}

    print("Assessing relevant feedback...")
    is_relevant_1 = assess_feedback_relevance(sample_query, sample_doc_text, relevant_feedback)
    print(f"Is feedback 1 relevant? {is_relevant_1}")
    
    print("\nAssessing irrelevant feedback...")
    is_relevant_2 = assess_feedback_relevance(sample_query, sample_doc_text, irrelevant_feedback)
    print(f"Is feedback 2 relevant? {is_relevant_2}")

Assessing relevant feedback...
Is feedback 1 relevant? True

Assessing irrelevant feedback...
Is feedback 2 relevant? False


In [16]:
import os
import google.generativeai as genai
from typing import List

# --- 1. Gemini API Configuration ---
# Your GOOGLE_API_KEY should be set in your environment
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Define the response generator for Gemini ---
def generate_response(query: str, context: str, model: str = "gemini-1.5-flash") -> str:
    """
    Generate a response based on the query and context using Gemini.

    Args:
        query (str): User query
        context (str): Context text from retrieved documents
        model (str): LLM model to use

    Returns:
        str: Generated response
    """
    # Define the system prompt to guide the AI's behavior
    system_prompt = """You are a helpful AI assistant. Answer the user's question based only on the provided context. If you cannot find the answer in the context, state that you don't have enough information."""
    
    # Create the user prompt by combining the context and the query
    user_prompt = f"""
Context:
{context}

Question: {query}

Please provide a comprehensive answer based only on the context above.
"""
    
    try:
        # Pass the system prompt to the GenerativeModel's system_instruction parameter
        gemini_model = genai.GenerativeModel(model, system_instruction=system_prompt)
        
        # Generate the response using the specified model
        response = gemini_model.generate_content(user_prompt)
        
        # Return the generated response content
        return response.text
    except Exception as e:
        print(f"An error occurred during response generation: {e}")
        return "I could not generate a response due to an error."

# --- 3. Main Logic (Re-implemented for a runnable example) ---
if __name__ == "__main__":
    # Simulate a query and context from a previous step
    query = "What are the main causes of homelessness?"
    context = "Homelessness is a complex social problem. A key factor is the lack of affordable housing, which disproportionately affects low-income families and individuals."
    
    print("Generating AI response with Gemini...")
    ai_response = generate_response(query, context)
    
    print("\nAI Response:")
    print(ai_response)

Generating AI response with Gemini...

AI Response:
Based on the provided text, a key factor contributing to homelessness is the lack of affordable housing.  This disproportionately impacts low-income families and individuals.



In [17]:
import os
import google.generativeai as genai
from typing import List, Dict, Any, Tuple

# --- 1. Gemini API Configuration ---
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
genai.configure(api_key=GOOGLE_API_KEY)

# --- 2. Define the comparison function for Gemini ---
def compare_results(queries: List[str], round1_results: List[Dict], round2_results: List[Dict], reference_answers: List[str]) -> List[Dict]:
    """
    Compares two rounds of RAG outputs using a Gemini LLM.
    
    Args:
    queries (List[str]): List of user queries.
    round1_results (List[Dict]): Results from the first RAG round.
    round2_results (List[Dict]): Results from the second RAG round.
    reference_answers (List[str]): Ideal answers for each query.
    
    Returns:
    List[Dict]: List of comparison analyses.
    """
    comparisons = []
    
    system_prompt = "You are an objective analyst comparing two rounds of RAG outputs."
    
    try:
        # Create a Gemini model instance with the system prompt
        gemini_model = genai.GenerativeModel("gemini-1.5-flash", system_instruction=system_prompt)
    except Exception as e:
        print(f"Failed to initialize Gemini model: {e}")
        return []
    
    for query, r1, r2, ref in zip(queries, round1_results, round2_results, reference_answers):
        comparison_prompt = (
            f"Query: {query}\n\n"
            f"Round 1 Answer: {r1['response']}\n\n"
            f"Round 2 Answer: {r2['response']}\n\n"
            f"Reference Answer: {ref}\n\n"
            "Compare these responses and explain which one is better and why. "
            "Focus specifically on how the feedback loop has (or hasn't) improved response quality."
        )
        
        # Generate the comparison using the Gemini API
        try:
            resp = gemini_model.generate_content(comparison_prompt)
            comparisons.append({
                "query": query,
                "analysis": resp.text
            })
        except Exception as e:
            print(f"An error occurred during comparison for query '{query}': {e}")
            comparisons.append({
                "query": query,
                "analysis": "Comparison failed due to an error."
            })
            
    return comparisons

# --- 3. Main Logic (Re-implemented for a runnable example) ---
if __name__ == "__main__":
    # Simulate a set of queries and RAG results for two rounds
    queries = ["What are the main causes of homelessness?"]
    reference_answers = ["A lack of affordable housing, economic issues, and personal crises."]
    
    round1_results = [{"response": "Homelessness is a social problem. A lack of affordable housing is a key factor."}]
    round2_results = [{"response": "Homelessness is caused by a lack of affordable housing, economic issues, and personal crises like job loss."}]
    
    print("Comparing RAG results for two rounds with Gemini...")
    comparison_analysis = compare_results(queries, round1_results, round2_results, reference_answers)
    
    for comp in comparison_analysis:
        print(f"\n--- Analysis for Query: {comp['query']} ---")
        print(comp['analysis'])

Comparing RAG results for two rounds with Gemini...

--- Analysis for Query: What are the main causes of homelessness? ---
Round 2 is significantly better than Round 1.  The feedback loop has demonstrably improved the response quality.

**Round 1's shortcomings:**

* **Vague and lacks detail:**  "Homelessness is a social problem" is a tautology and offers no actionable insight.  The mention of affordable housing is correct but lacks the specificity and breadth of the reference answer.

* **Doesn't fully answer the query:** The response is too simplistic and doesn't address the multiple contributing factors.

**Round 2's improvements:**

* **More comprehensive and accurate:**  It correctly identifies multiple key causes of homelessness: lack of affordable housing, economic issues, and personal crises (which encompasses job loss).  This aligns directly with the reference answer's key points.

* **More specific and informative:**  It moves beyond general statements to provide more concret