## 🧠 What Is RAG and Why Should You Care?

**RAG (Retrieval Augmented Generation)** is one of the most powerful techniques in modern AI applications. Let's break it down:

| Component | What It Does | Why It Matters |
|-----------|--------------|----------------|
| **Retrieval** | Finds relevant information from your documents | Ensures answers come from *your* data, not just the AI's training |
| **Augmentation** | Enhances the AI's knowledge with this specific information | Makes responses accurate and up-to-date |
| **Generation** | Creates human-like responses using the retrieved information | Delivers insights in natural, easy-to-understand language |

<div style="background-color: #effaf5; border: 1px solid #0d9488; padding: 15px; margin: 20px 0; border-radius: 5px;">
<h4 style="color: #000000; margin-top: 0;">💡 Real-World Analogy</h4>
<p style="color: #000000;">Think of RAG as the difference between:</p>
<ul style="color: #000000;">
<li><strong>A general knowledge expert</strong> who studied years ago (standard LLM)</li>
<li><strong>A specialist with your documents open</strong> in front of them, referencing exact paragraphs as they answer your questions (RAG system)</li>
</ul>
</div>

## 🛠️ Our Exciting Toolkit

We'll be using several cutting-edge tools to build our RAG system:

| Tool | What It Is | Why It's Amazing |
|------|------------|------------------|
| **Ollama** | An open-source platform that runs AI models locally on your computer | Privacy (your data never leaves your machine), no API costs, and complete control |
| **ChromaDB** | A specialized database for storing and searching "vector embeddings" | Lightning-fast semantic search that understands meaning, not just keywords |
| **LangChain** | A framework that connects AI components together like building blocks | Makes complex AI workflows simple and customizable |
| **Gradio** | A tool for creating web interfaces for AI models | Turns your code into a professional-looking application in minutes |

# 🎯 What We'll Build Together

By the end of this tutorial, you'll have created:

```
📄 Documents → 🔪 Chunker → 🧮 Vector DB → 🔍 Retriever → 🤖 LLM → 💬 Answer
```

A complete RAG system that can:

1. **Process PDF documents** of your choice
2. **Break them into smart chunks** that preserve meaning
3. **Transform text into vectors** that capture semantic meaning
4. **Store everything efficiently** for lightning-fast retrieval
5. **Find the most relevant information** for any question
6. **Generate accurate, helpful responses** with proper citations

<div style="background-color: #ffe4e6; border-left: 6px solid #be123c; padding: 15px; margin: 20px 0; border-radius: 5px;">
<h3 style="color: #000000; margin-top: 0;">🔥 Why This Matters For Your Career</h3>
<p style="color: #000000;">RAG systems are at the forefront of practical AI applications. At MAIA Academy, we've seen how companies are rapidly adopting this technology to:</p>
<ul style="color: #000000;">
<li>Build intelligent document assistants</li>
<li>Create knowledge bases that actually answer questions</li>
<li>Develop customer support systems that handle complex queries</li>
<li>Implement research tools that synthesize information from multiple sources</li>
</ul>
<p style="color: #000000;">The skills you'll learn today are directly transferable to real-world AI projects and align perfectly with our <strong>Foundations of AI Development</strong> and <strong>Deep Learning & LLMs</strong> modules!</p>
</div>

## 2. Setting the Stage: Configuration

<div style="background-color: #2d333b; padding: 20px; border-radius: 8px; margin-bottom: 20px; border-left: 6px solid #58a6ff;">
  <h3 style="color: #ffffff; margin-top: 0;">System Configuration Parameters</h3>
  <p style="color: #ffffff;">Before we build our RAG system, we need to configure some important settings—like tuning a new instrument before a performance. These parameters will determine how our system processes and interacts with documents.</p>
</div>

<table style="width: 100%; border-collapse: collapse; margin: 20px 0; background-color: #22272e;">
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56; width: 200px;"><strong style="color: #58a6ff;">PERSIST_DIRECTORY</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">Where we'll store our "data safe" (the vector database) on disk. This allows our system to remember what it learned even after restarting.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">CHUNK_SIZE</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">How big each text piece will be (in characters). This affects how much context the AI has when answering questions.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">CHUNK_OVERLAP</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">How much the pieces overlap to maintain context between chunks and ensure no information is lost at the boundaries.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">PDF_URLS</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">The documents we'll use as our knowledge base (our "reference library"). These are the sources the system will learn from.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">LLM_MODEL</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">The "brain" that processes the context and generates answers (like llama3 or other models available in Ollama).</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">EMBEDDING_MODEL</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">The "translator" that converts text into numerical vectors that capture meaning. Different models balance between speed and accuracy.</td>
  </tr>
</table>

<div style="background-color: #2d333b; border: 1px solid #444c56; padding: 20px; border-radius: 8px; margin: 20px 0;">
  <h3 style="color: #58a6ff; margin-top: 0;">💡 What's This Chunk Stuff?</h3>
  <p style="color: #adbac7;">Think of cutting a big sandwich. If the pieces are huge, you get more filling but it's hard to bite. If they're tiny, you bite easy but might miss the full flavor. Overlap is like leaving a bit of the last bite on the next one so you don't lose track of the overall taste.</p>
  
  <div style="display: flex; justify-content: space-between; margin-top: 20px; text-align: center;">
    <div style="flex: 1; margin: 0 10px;">
      <p style="color: #58a6ff;"><strong>Large Chunks (2000+)</strong></p>
      <p style="color: #adbac7;">✅ More context<br>✅ Better for complex topics<br>❌ Less precise retrieval<br>❌ Slower processing</p>
    </div>
    <div style="flex: 1; margin: 0 10px;">
      <p style="color: #58a6ff;"><strong>Medium Chunks (800-1200)</strong></p>
      <p style="color: #adbac7;">✅ Balanced approach<br>✅ Good for most cases<br>✅ Reasonable speed<br>✅ Decent precision</p>
    </div>
    <div style="flex: 1; margin: 0 10px;">
      <p style="color: #58a6ff;"><strong>Small Chunks (300-500)</strong></p>
      <p style="color: #adbac7;">✅ Very precise retrieval<br>✅ Fast processing<br>❌ Limited context<br>❌ May miss broader concepts</p>
    </div>
  </div>
</div>

<div style="text-align: center; background-color: #22272e; padding: 10px; border-radius: 5px; margin-top: 20px;">
  <p style="color: #ff7b72; font-weight: bold;">⚠️ Warning</p>
  <p style="color: #adbac7;">This notebook is designed to be read with a dark background. If you program with a white background, just know that you're a complete psychopath and a danger to society.</p>
</div>

In [None]:
#Install libraries (only the first time): !pip install ollama chroma langchain gradio langchain_ollama langchain_community pypdf

# Standard imports
import os
import logging
import time
import sys
import tempfile
from typing import List, Dict, Any

# LangChain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Gradio for web interface
import gradio as gr

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

In [None]:
#parameters
from tkinter.tix import MAX


PERSIST_DIRECTORY = 'chroma_db' # directory to store the vector database
CHUNK_SIZE = 1000 # characters per chunk for text splitting
CHUNK_OVERLAP = 50 # characters of overlap between chunks
PDF_URLS = [os.path.join('data', f) for f in os.listdir('data') if f.endswith('.pdf')]
LLM_MODEL = 'qwen3:1.7b'
EMBEDDING_MODEL = 'all-minilm:latest'
TEMPERATURE = 0.7  # Increased temperature for more natural variation in responses
MAX_TOKENS = 1024  # Control response length

In [None]:
class RAGSystem:
    def __init__(self, pdf_urls: List[str], persist_directory: str = PERSIST_DIRECTORY):
        self.pdf_urls = pdf_urls
        self.persist_directory = persist_directory
        self.documents = []
        self.vectorstore = None
        self.llm = None
        self.chain = None
        
        # Initialize the LLM with streaming capability and additional parameters
        callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
        self.llm = ChatOllama(
            model=LLM_MODEL,
            temperature=TEMPERATURE,
            callback_manager=callback_manager,
            max_tokens=MAX_TOKENS,
            top_p=0.9,      # Nucleus sampling
            top_k=40,       # Top-k sampling
            repeat_penalty=1.2  # Penalize repetition
        )
        
        # Initialize embeddings
        self.embeddings = OllamaEmbeddings(model=EMBEDDING_MODEL)
        
        logger.info(f"Initialized RAG system with {len(pdf_urls)} PDFs")

    def load_documents(self) -> None:
        """Load and split PDF documents"""
        logger.info("Loading and processing PDFs...")
        logger.info(f"Attempting to load {len(self.pdf_urls)} PDFs")
        
        # Text splitter for chunking documents
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=CHUNK_SIZE,
            chunk_overlap=CHUNK_OVERLAP,
            separators=["\n\n", "\n", ". ", " ", ""]
        )
        
        all_pages = []
        successful_loads = 0
        
        for url in self.pdf_urls:
            try:
                logger.info(f"Attempting to load: {url}")
                loader = PyPDFLoader(url)
                pages = loader.load()
                logger.info(f"Successfully loaded {len(pages)} pages from {url}")
                all_pages.extend(pages)
                successful_loads += 1
            except Exception as e:
                logger.error(f"Error loading PDF from {url}: {str(e)}")
                logger.error(f"Exception type: {type(e).__name__}")
        
        logger.info(f"Successfully loaded {successful_loads} out of {len(self.pdf_urls)} PDFs")
        
        # Split the documents into chunks
        self.documents = text_splitter.split_documents(all_pages)
        logger.info(f"Created {len(self.documents)} document chunks")

    def create_vectorstore(self) -> None:
        """Create a fresh vector database"""
        # Remove any existing database
        if os.path.exists(self.persist_directory):
            import shutil
            logger.info(f"Removing existing vectorstore at {self.persist_directory}")
            shutil.rmtree(self.persist_directory, ignore_errors=True)
        
        # Create a new vector store
        logger.info("Creating new vectorstore...")
        self.vectorstore = Chroma.from_documents(
            documents=self.documents,
            embedding=self.embeddings,
            persist_directory=self.persist_directory
        )
        logger.info("Vectorstore creation complete")

    @staticmethod
    def clean_text(text: str) -> str:
        """Enhanced text cleaning function to remove various types of repetitions."""
        import re
        
        # Function to remove consecutive duplicate words
        def remove_word_repetitions(text):
            # Split text into words while preserving punctuation and spacing
            tokens = []
            current_word = []
            
            for char in text:
                if char.isalnum() or char in "-":
                    current_word.append(char)
                else:
                    if current_word:
                        tokens.append(''.join(current_word))
                        current_word = []
                    tokens.append(char)
                    
            if current_word:
                tokens.append(''.join(current_word))
            
            # Remove consecutive duplicates (case-insensitive)
            cleaned = []
            for i, token in enumerate(tokens):
                if i > 0:
                    prev_token = cleaned[-1]
                    # Skip if it's the same word (case-insensitive)
                    if (token.strip().lower() == prev_token.strip().lower() and 
                        token.strip() and prev_token.strip() and 
                        token.strip().isalnum()):
                        continue
                cleaned.append(token)
            
            return ''.join(cleaned)
        
        # Function to fix spacing issues
        def fix_spacing(text):
            # Fix multiple spaces
            text = re.sub(r'\s+', ' ', text)
            # Fix spaces around punctuation
            text = re.sub(r'\s+([.,!?;:])', r'\1', text)
            # Ensure single space after punctuation
            text = re.sub(r'([.,!?;:])(?!\s)', r'\1 ', text)
            return text.strip()
        
        # Function to remove repeated phrases
        def remove_phrase_repetitions(text):
            # Remove repeated phrases of 2-4 words
            for phrase_length in range(2, 5):
                pattern = r'\b(\w+(?:\s+\w+){' + str(phrase_length-1) + r'})\s+\1\b'
                text = re.sub(pattern, r'\1', text, flags=re.IGNORECASE)
            return text
        
        # Apply all cleaning steps
        text = remove_word_repetitions(text)
        text = remove_phrase_repetitions(text)
        text = fix_spacing(text)
        
        return text

    def setup_chain(self) -> None:
        """Set up the RAG chain with improved response processing"""
        # Create the retriever
        if not self.vectorstore:
            self.create_vectorstore()
            
        retriever = self.vectorstore.as_retriever(
            search_type="similarity",
            search_kwargs={"k": 4}
        )
        
        template = """You are a robotics and wearable robots expert. Answer questions about physical dummies in human-exoskeleton interaction research based strictly on the provided documents.

Context from papers:
{context}

Question: {question}

Guidelines:
1. Provide clear, direct answers using technical terminology from the papers
2. Use citations [Author Year] for key points
3. Focus on accuracy and clarity
4. Use natural, varied language - avoid repetition
5. If information isn't in the context, say so directly
6. End with referenced sources

Answer:"""
        
        prompt = PromptTemplate.from_template(template)
        
        # Create the chain with enhanced text processing
        self.chain = (
            {"context": retriever, "question": RunnablePassthrough()}
            | prompt
            | self.llm
            | StrOutputParser()
            | self.clean_text  # Apply enhanced text cleaning
        )
        
        logger.info("RAG chain setup complete")

    def answer_question(self, question: str) -> str:
        """
        Answer a question using the RAG chain
        
        Args:
            question: The question to answer
            
        Returns:
            The answer to the question
        """
        if not self.chain:
            self.setup_chain()
        
        logger.info(f"Answering question: {question}")
        try:
            answer = self.chain.invoke(question)
            return answer
        except Exception as e:
            logger.error(f"Error answering question: {e}")
            return f"Error processing your question: {str(e)}"

In [None]:
def load_documents(self) -> None:
    """Load and split PDF documents"""
    logger.info("Loading and processing PDFs...")
    logger.info(f"Attempting to load {len(self.pdf_urls)} PDFs")
    
    # Text splitter for chunking documents
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=CHUNK_SIZE,
        chunk_overlap=CHUNK_OVERLAP,
        separators=["\n\n", "\n", ". ", " ", ""]
    )
    
    all_pages = []
    successful_loads = 0
    
    for url in self.pdf_urls:
        try:
            logger.info(f"Attempting to load: {url}")
            loader = PyPDFLoader(url)
            pages = loader.load()
            logger.info(f"Successfully loaded {len(pages)} pages from {url}")
            all_pages.extend(pages)
            successful_loads += 1
        except Exception as e:
            logger.error(f"Error loading PDF from {url}: {str(e)}")
            logger.error(f"Exception type: {type(e).__name__}")
    
    logger.info(f"Successfully loaded {successful_loads} out of {len(self.pdf_urls)} PDFs")
    
    # Split the documents into chunks
    self.documents = text_splitter.split_documents(all_pages)
    logger.info(f"Created {len(self.documents)} document chunks")

<div style="background-color: #2d333b; padding: 5px; border-radius: 4px; margin-bottom: 10px;">
  <h3 style="color: #58a6ff; margin: 10px;">3.2 Storing in a Vector Database</h3>
</div>

<div style="background-color: #22272e; padding: 15px; border-radius: 8px; border-left: 4px solid #f0883e; margin-bottom: 20px;">
  <p style="color: #adbac7;">After chunking our documents, we need to store them in a way that allows for intelligent searching. This is where vectors and ChromaDB come into play.</p>
</div>

<div style="background-color: #2d333b; border: 1px solid #444c56; padding: 20px; border-radius: 8px; margin: 20px 0;">
  <h4 style="color: #f0883e; margin-top: 0;">🧮 What Are Vectors?</h4>
  
  <p style="color: #adbac7;">Think of each chunk as a person, and we give it a unique "fingerprint" based on what it says. These fingerprints are actually lists of numbers that capture meaning.</p>
  
  <div style="display: flex; margin-top: 20px; background-color: #22272e; padding: 15px; border-radius: 8px;">
    <div style="flex: 1; padding-right: 15px;">
      <p style="color: #adbac7; font-style: italic; margin-top: 0;">"I like the sun"</p>
      <p style="color: #d2a8ff; font-family: monospace; font-size: 0.9em;">[0.12, -0.33, 0.65, ...]</p>
    </div>
    <div style="flex: 1; padding-left: 15px; border-left: 1px dashed #444c56;">
      <p style="color: #adbac7; font-style: italic; margin-top: 0;">"I love the heat"</p>
      <p style="color: #d2a8ff; font-family: monospace; font-size: 0.9em;">[0.15, -0.28, 0.61, ...]</p>
    </div>
  </div>
  
  <p style="color: #adbac7; margin-top: 20px;">These sentences get similar vector "fingerprints" because they express similar concepts. This lets us search by <strong>meaning</strong>, not just exact words.</p>
</div>

<div style="display: flex; background-color: #22272e; border-radius: 8px; overflow: hidden; margin: 20px 0; border: 1px solid #444c56;">
  <div style="flex: 1; padding: 15px; display: flex; flex-direction: column; align-items: center; text-align: center; border-right: 1px solid #444c56;">
    <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
      <span style="color: #58a6ff; font-weight: bold; font-size: 1.5em;">1</span>
    </div>
    <p style="color: #f0883e; font-weight: bold; margin: 5px 0;">Convert</p>
    <p style="color: #adbac7; margin: 5px 0;">Text → Vector</p>
  </div>
  <div style="flex: 1; padding: 15px; display: flex; flex-direction: column; align-items: center; text-align: center; border-right: 1px solid #444c56;">
    <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
      <span style="color: #58a6ff; font-weight: bold; font-size: 1.5em;">2</span>
    </div>
    <p style="color: #f0883e; font-weight: bold; margin: 5px 0;">Store</p>
    <p style="color: #adbac7; margin: 5px 0;">In ChromaDB</p>
  </div>
  <div style="flex: 1; padding: 15px; display: flex; flex-direction: column; align-items: center; text-align: center;">
    <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
      <span style="color: #58a6ff; font-weight: bold; font-size: 1.5em;">3</span>
    </div>
    <p style="color: #f0883e; font-weight: bold; margin: 5px 0;">Retrieve</p>
    <p style="color: #adbac7; margin: 5px 0;">By Similarity</p>
  </div>
</div>

<div style="background-color: #22272e; padding: 20px; border-radius: 8px; margin: 20px 0; border: 1px solid #444c56;">
  <h4 style="color: #58a6ff; margin-top: 0;">In Plain English:</h4>
  
  <p style="color: #adbac7;">1. <strong>We transform text into numbers</strong> using the embedding model (all-minilm)</p>
  <p style="color: #adbac7;">2. <strong>We store these numbers in ChromaDB</strong> along with the original text</p>
  <p style="color: #adbac7;">3. <strong>When you ask a question</strong>, we convert your question to a vector too</p>
  <p style="color: #adbac7;">4. <strong>ChromaDB finds chunks with similar vectors</strong> to your question</p>
  <p style="color: #adbac7;">5. <strong>These similar chunks</strong> likely contain the answer you need</p>
</div>

<div style="display: flex; background-color: #22272e; border-radius: 8px; margin: 20px 0;">
  <div style="flex: 1; padding: 20px;">
    <h5 style="color: #7ee787; margin-top: 0;">💡 Why This Is Cool</h5>
    <ul style="color: #adbac7; list-style-type: none; padding-left: 0;">
      <li style="margin-bottom: 8px;">✅ <strong>Finds similar concepts</strong>, even with different words</li>
      <li style="margin-bottom: 8px;">✅ <strong>Lightning-fast search</strong> of large document collections</li>
      <li style="margin-bottom: 8px;">✅ <strong>Works across languages</strong> (Spanish "sol" ≈ English "sun")</li>
      <li>✅ <strong>More accurate</strong> than keyword searching</li>
    </ul>
  </div>
</div>

<div style="background-color: #2d333b; border-left: 4px solid #d2a8ff; padding: 15px; border-radius: 5px; margin-top: 20px;">
  <p style="color: #adbac7; margin: 0;"><strong style="color: #d2a8ff;">🚀 Pro Tip:</strong> Think of it like searching a music library - you find songs that "sound similar" to the one you like, not just songs with the exact same title.</p>
</div>

In [None]:
def create_vectorstore(self) -> None:
    """Create a fresh vector database"""
    # Remove any existing database
    if os.path.exists(self.persist_directory):
        import shutil
        logger.info(f"Removing existing vectorstore at {self.persist_directory}")
        shutil.rmtree(self.persist_directory, ignore_errors=True)
    
    # Create a new vectorstore
    logger.info("Creating new vectorstore...")
    if not self.documents:
        self.load_documents()
    
    # Create a temporary directory for the database
    # This helps avoid permission issues on some systems
    temp_dir = tempfile.mkdtemp()
    logger.info(f"Using temporary directory for initial database creation: {temp_dir}")
    
    try:
        # First create in temp directory
        self.vectorstore = Chroma.from_documents(
            documents=self.documents,
            embedding=self.embeddings,
            persist_directory=temp_dir
        )
        
        # Now create the real directory
        if not os.path.exists(self.persist_directory):
            os.makedirs(self.persist_directory)
            
        # And create the final vectorstore
        self.vectorstore = Chroma.from_documents(
            documents=self.documents,
            embedding=self.embeddings,
            persist_directory=self.persist_directory
        )
        self.vectorstore.persist()
        
        logger.info(f"Vectorstore created successfully with {len(self.documents)} documents")
    except Exception as e:
        logger.error(f"Error creating vectorstore: {e}")
        raise
    finally:
        # Clean up temp directory
        if os.path.exists(temp_dir):
            import shutil
            shutil.rmtree(temp_dir, ignore_errors=True)

<div style="background-color: #2d333b; padding: 5px; border-radius: 4px; margin-bottom: 10px;">
  <h3 style="color: #58a6ff; margin: 10px;">3.3 Building the RAG Chain</h3>
</div>

<div style="background-color: #22272e; padding: 15px; border-radius: 8px; border-left: 4px solid #79c0ff; margin-bottom: 20px;">
  <p style="color: #adbac7;">Here's where our system turns into a "detective." We connect all the components into a sequence that transforms questions into accurate answers.</p>
</div>

<div style="background-color: #1c2128; padding: 20px; border-radius: 8px; margin: 20px 0; border: 1px solid #444c56;">
  <h4 style="color: #79c0ff; margin-top: 0; text-align: center; margin-bottom: 20px;">The RAG Chain Components</h4>
  
  <!-- Retriever Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 15px; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">🔍</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The Searcher (Retriever)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Turns your question into a fingerprint and finds the closest matches in the database.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">retriever = vectorstore.as_retriever(search_kwargs={"k": 5})</code>
      </div>
    </div>
  </div>
  
  <!-- Prompt Template Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 15px; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">📝</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The Instructions (Prompt)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Like a recipe: "Be nice, use the chunks, cite your sources." This keeps answers helpful and trustworthy.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">prompt = PromptTemplate.from_template(template)</code>
      </div>
    </div>
  </div>
  
  <!-- LLM Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 15px; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">🧠</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The AI (LLM)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Writes the final response based on the instructions and retrieved chunks.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">llm = ChatOllama(model="llama3", temperature=0.1)</code>
      </div>
    </div>
  </div>
  
  <!-- Output Parser Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 0; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">✨</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The Formatter (Parser)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Makes the response neat and clear for the user to read.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">StrOutputParser()</code>
      </div>
    </div>
  </div>
</div>

<!-- Flow diagram -->
<div style="background-color: #22272e; padding: 20px; border-radius: 8px; margin: 20px 0;">
  <h4 style="color: #79c0ff; margin-top: 0; text-align: center;">How It All Flows Together</h4>
  
  <div style="display: flex; justify-content: center; align-items: center; flex-wrap: wrap; margin: 20px 0;">
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">❓</div>
      <div style="color: #adbac7;">Question</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">🔍</div>
      <div style="color: #adbac7;">Retriever</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">📝</div>
      <div style="color: #adbac7;">Prompt</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">🧠</div>
      <div style="color: #adbac7;">LLM</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">✨</div>
      <div style="color: #adbac7;">Parser</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">💡</div>
      <div style="color: #adbac7;">Answer</div>
    </div>
  </div>
  
  <div style="background-color: #1c2128; padding: 15px; border-radius: 8px; margin-top: 20px;">
    <p style="color: #adbac7; margin: 0; text-align: center;">This entire chain is created with just a few lines of code:</p>
    <div style="background-color: #2d333b; border-radius: 5px; padding: 15px; margin-top: 10px; font-family: monospace;">
      <pre style="color: #d2a8ff; margin: 0; overflow-x: auto; font-size: 0.9em;">self.chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | self.llm
    | StrOutputParser()
)</pre>
    </div>
  </div>
</div>

<div style="background-color: #2d333b; border-left: 4px solid #58a6ff; padding: 15px; border-radius: 5px; margin-top: 20px;">
  <p style="color: #adbac7; margin: 0;"><strong style="color: #adbac7;">💡 Pro Tip:</strong> The key to a good RAG system is balance. A great prompt template with poor retrieval won't work well, and perfect retrieval with bad instructions will still give bad answers. All pieces need to work together!</p>
</div>

In [None]:
def setup_chain(self) -> None:
    """Set up the RAG chain for question answering"""
    if not self.vectorstore:
        self.create_vectorstore()
    
    # Create retriever with search parameters
    retriever = self.vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4}  # Return top 3 most relevant chunks
    )
    
    # Define the prompt template
    template = """
### INSTRUCTIONS:
    You are an expert in robotics and wearable robots. You are going to help me write a state of the art article based on the documents provided, which are focused on the use of physical dummies for the study or improvement of physical human-exoskeleton interaction. These dummies, or surrogates, or mannequins, are robotic replicas of humans or parts of humans. Base your answers strictly on the documents provided. 
    Be polite, professional, and avoid guessing or using outside sources.
 
    (1) Be attentive to details: read the question and the context thoroughly before answering.
    (2) Begin your response with a friendly tone and briefly restate the users question to confirm understanding.
    (3) If the context allows you to answer the question, write a detailed, helpful, and easy-to-understand response.
        - Use precise terminology from the articles 
        - Reference the sources **inline** (e.g., [Article 1 §3.2], [Article 2 Fig.4]) and ONLY cite sections/figures/tables present in the provided context.
      IF NOT: if you cannot find the answer, respond with an explanation, starting with: 
        "I couldn't find the information in the documents I have access to."
    (4) Below your response, list all referenced sources (document titles/IDs and exact sections/figures/tables that support your claims).
    (5) Review your answer to ensure it answers the question, is helpful and professional, and is formatted for easy reading (short paragraphs, bullets if useful).
    Additional constraints:
    - Do not invent citations or content outside the provided context.
    - If there are conflicting statements in the articles, acknowledge the discrepancy and cite them.
 
    THINK STEP BY STEP
 
    Answer the following question using the provided context.
    ### Question: {question} ###
    ### Context: {context} ###
    ### Helpful Answer with Sources:
    """
    
    prompt = PromptTemplate.from_template(template)
    
    # Create the chain
    self.chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | self.llm
        | StrOutputParser()
    )
    
    logger.info("RAG chain setup complete")

In [None]:
def answer_question(self, question: str) -> str:
    """
    Answer a question using the RAG chain
    
    Args:
        question: The question to answer
        
    Returns:
        The answer to the question
    """
    if not self.chain:
        self.setup_chain()
    
    logger.info(f"Answering question: {question}")
    try:
        answer = self.chain.invoke(question)
        return answer
    except Exception as e:
        logger.error(f"Error answering question: {e}")
        return f"Error processing your question: {str(e)}"

In [None]:
def create_gradio_interface(rag_system: RAGSystem) -> gr.Interface:
    """Create an enhanced Gradio interface for the RAG system"""
    
    # Custom CSS for better styling
    custom_css = """
    .container {
        max-width: 900px;
        margin: auto;
    }
    .gr-button-primary {
        background: linear-gradient(45deg, #3b82f6, #0ea5e9) !important;
        border: none !important;
    }
    .gr-button-primary:hover {
        background: linear-gradient(45deg, #2563eb, #0284c7) !important;
    }
    .title-text {
        text-align: center;
        font-weight: 600;
    }
    .example-text {
        font-size: 0.9em;
        font-style: italic;
    }
    .markdown-text {
        font-size: 1.1em;
        line-height: 1.5;
    }
    """

    def process_question(question: str, history: list) -> tuple[list, str]:
        """Enhanced question processing with chat history"""
        if not question.strip():
            return history, ""
        
        try:
            # Get the answer
            answer = rag_system.answer_question(question)
            # Apply cleaning using RAGSystem's static method
            answer = RAGSystem.clean_text(answer)
            
            history = history + [(question, answer)]
            return history, ""  # Clear input after submission
        except Exception as e:
            logger.error(f"Error processing question: {e}")
            return history + [(question, f"Error: {str(e)}")], ""

    with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as interface:
        gr.Markdown(
            """
            # 🤖 Physical Human-Exoskeleton Interaction Research Assistant
            
            Ask questions about physical dummies, human-exoskeleton interaction, testing methods, and related topics.
            All responses are based on academic papers with proper citations.
            """)
        
        chatbot = gr.Chatbot(
            label="Research Discussion",
            height=400,
            show_copy_button=True
        )
        
        question = gr.Textbox(
            placeholder="Ask about physical dummies, testing methods, design approaches...",
            label="Research Question",
            lines=3
        )
        
        submit = gr.Button("Submit Question", variant="primary")
        clear = gr.Button("Clear History")

        with gr.Accordion("Example Questions", open=False):
            gr.Examples(
                examples=[
                    "What are the main applications of physical dummies in exoskeleton testing?",
                    "How are physical surrogates designed for human-robot interaction?",
                    "What measurement methods are used with physical dummies?",
                    "What are the advantages of using dummies over human subjects?",
                    "Which sensors are commonly used in physical dummy testing?"
                ],
                inputs=question,
                label="Click an example to try it"
            )

        gr.Markdown(
            """
            ### About This System
            
            - Specialized in physical dummies, or mannequins, and human-exoskeleton interaction research
            - Provides detailed answers with proper citations from academic papers
            - Uses precise terminology from source documents
            - Covers topics like design approaches, testing methods, and evaluation techniques
            
            *Note: Responses are limited to information present in the loaded research papers.*
            """)

        # Set up event handlers
        submit.click(
            fn=process_question,
            inputs=[question, chatbot],
            outputs=[chatbot, question],
            show_progress=True
        )
        
        # Handle question submission with Enter key
        question.submit(
            fn=process_question,
            inputs=[question, chatbot],
            outputs=[chatbot, question],
            show_progress=True
        )
        
        # Clear chat history
        clear.click(
            lambda: ([], ""),
            outputs=[chatbot, question],
            show_progress=True
        )

    return interface

In [None]:
def main() -> None:
    """Main function to run the RAG system and launch the interface"""
    try:
        # Display banner
        print("\n" + "="*60)
        print("🌊 DUMMY state of the art")
        print("   Review paper Document Intelligence Assistant")
        print("="*60)
        
        # Display available models
        print("\n==== CHECKING OLLAMA MODELS ====")
        try:
            import requests
            response = requests.get("http://localhost:11434/api/tags")
            print("Available Ollama models:")
            if response.status_code == 200:
                models = response.json().get("models", [])
                if models:
                    for model in models:
                        print(f"✓ {model['name']}")
                else:
                    print("❌ No models found")
            else:
                print(f"❌ Error checking Ollama models: {response.status_code}")
        except Exception as e:
            print(f"❌ Error connecting to Ollama: {e}")
            print("   Make sure Ollama is running with: ollama serve")
        
        print(f"\n📋 CONFIGURATION:")
        print(f"   LLM model: {LLM_MODEL}")
        print(f"   Embedding model: {EMBEDDING_MODEL}")
        print(f"   Documents to process: {len(PDF_URLS)} thesis articles")
        print("\n   Make sure these models are available with 'ollama pull' commands.")
        
        # Create and initialize the RAG system
        print("\n==== INITIALIZING RAG SYSTEM ====")
        logger.info("Creating RAG system...")
        rag_system = RAGSystem(pdf_urls=PDF_URLS)
        
        # Load documents and create vectorstore
        print("📚 Loading documents...")
        rag_system.load_documents()
        
        print("🔍 Creating vector embeddings...")
        rag_system.create_vectorstore()
        
        # Test with a control question about hydrology/drought analysis
        print("\n==== TESTING SYSTEM ====")
        logger.info("Testing with a control question...")
        test_questions = [
            "What is a physical dummy?",
            "Which are their advantages for use in the study of human-exoskeleton interaction?",
        ]
        
        # Try the first available test question
        test_question = test_questions[0]
        print(f"🧪 Testing with: '{test_question}'")
        test_answer = rag_system.answer_question(test_question)
        
        if test_answer and len(test_answer) > 50:
            logger.info(f"✓ Control answer received (length: {len(test_answer)})")
            print("✓ System test successful - RAG pipeline working correctly")
        else:
            logger.warning(f"⚠️ Short control answer received (length: {len(test_answer)})")
            print("⚠️ System test completed but response seems short")
        
        # Create and launch Gradio interface
        print("\n==== LAUNCHING INTERFACE ====")
        logger.info("Launching Gradio interface...")
        print("🚀 Starting web interface...")
        print("   - Access locally at: http://localhost:7860")
        print("   - Interface optimized for analysis of the state of the art on physical dummies")
        print("   - All responses based on your documents")
        
        # Use our custom interface
        interface = create_gradio_interface(rag_system)
        interface.launch(
            share=False,  # Set share=True to create a public link
            inbrowser=True,  # Automatically open browser
            show_error=True,
            quiet=False
        )
    
    except Exception as e:
        logger.error(f"An error occurred in the main function: {e}")
        print(f"\n\n❌ ERROR: {str(e)}\n\n")
        print("🔧 TROUBLESHOOTING TIPS FOR RAG SYSTEM:")
        print("="*50)
        print("1. 🖥️  OLLAMA SERVICE:")
        print("   - Make sure Ollama is running: 'ollama serve'")
        print("   - Check if Ollama is accessible at http://localhost:11434")
        print()
        print("2. 🧠 REQUIRED MODELS:")
        print(f"   - Pull LLM model: 'ollama pull {LLM_MODEL}'")
        print(f"   - Pull embedding model: 'ollama pull {EMBEDDING_MODEL}'")
        print("   - For hydrology, recommend: llama3.1:8b or mistral:7b")
        print()
        print("3. 📄 DOCUMENT PROCESSING:")
        print("   - Verify PDF URLs are accessible")
        print("   - Check thesis documents are properly formatted")
        print("   - Ensure PDFs contain extractable text")
        print()
        print("4. 🔧 TECHNICAL ISSUES:")
        print("   - If dimension mismatch: try 'nomic-embed-text' embedding model")
        print("   - Check Python packages: pip install -r requirements.txt")
        print("   - Verify Chroma vector database permissions")
        print()
        print("5. 📊 PERFORMANCE OPTIMIZATION:")
        print("   - For better hydrology responses, use larger models (13B+)")
        print("   - Increase chunk_size for technical documents")
        print("   - Adjust temperature for more/less conservative answers")
        print("="*50)

In [None]:
# No need for method assignments since they are now part of the class definition

In [None]:
# Run the system
if __name__ == "__main__":
    main()
else:
    # If running in a notebook
    main()