# Lab 4.5.1: Complete RAG Demo with Gradio

**Module:** 4.5 - Demo Building & Prototyping  
**Time:** 3 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê (Intermediate)

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Build complex multi-tab interfaces with Gradio Blocks API
- [ ] Implement document upload with indexing progress indicators
- [ ] Create a polished chat interface with source citations
- [ ] Add settings management with persistent state
- [ ] Deploy your demo to Hugging Face Spaces

---

## üìö Prerequisites

- Completed: Module 3.5 (RAG Systems)
- Knowledge of: Python, basic Gradio, vector databases
- Installed: `gradio`, `chromadb`, `ollama`

---

## üåç Real-World Context

You've built an amazing RAG system that can answer questions from documents. But how do you:
- Show it to your boss who doesn't know Python?
- Demo it to potential investors?
- Let beta testers try it without installing anything?

**The answer: A polished web demo!**

Companies like [ChatPDF](https://www.chatpdf.com/), [Consensus](https://consensus.app/), and [Humata](https://www.humata.ai/) built billion-dollar businesses on essentially what we're building today - a nice interface around RAG.

---

## üßí ELI5: Why Do We Need Demos?

> **Imagine you baked the most delicious cake in the world** üéÇ
>
> But you only show people the recipe and a photo. They can't taste it, smell it, or see how moist it is when you cut it. Would they believe it's amazing?
>
> A demo is like inviting people to actually taste your cake. They click a button, ask a question, and *experience* your AI. That's 100x more convincing than any slide deck!
>
> **In AI terms:** Your Jupyter notebooks are the recipe. The trained model is the cake. Gradio/Streamlit is serving it on a beautiful plate so everyone can taste it.

---

## Part 1: Understanding the Gradio Blocks API

### The Evolution of Gradio Interfaces

Gradio has two main APIs:

1. **`gr.Interface`** - Quick and simple, one function, one input, one output
2. **`gr.Blocks`** - Full control, multiple components, custom layouts

Think of `Interface` as a microwave meal and `Blocks` as a full kitchen. Today we're cooking from scratch!

### üßí ELI5: Interface vs Blocks

> **Interface is like a vending machine** - Put money in, get snack out. Simple!
>
> **Blocks is like a restaurant kitchen** - You decide where the stove goes, which pans to use, and exactly how to plate the food.

In [None]:
# First, let's install and import what we need
# Run this once to install dependencies
# !pip install gradio>=4.0.0 chromadb>=0.4.0 ollama>=0.1.0 pypdf>=4.0.0

import gradio as gr
import chromadb
import ollama
import os
import time
from typing import List, Tuple, Dict, Optional
from pathlib import Path

print(f"Gradio version: {gr.__version__}")
print(f"ChromaDB version: {chromadb.__version__}")

### Your First Blocks Interface

Let's start with the simplest Blocks example to understand the pattern:

In [None]:
# The simplest Blocks example
with gr.Blocks() as simple_demo:
    # Everything inside the 'with' block becomes part of the UI
    gr.Markdown("# Hello Gradio Blocks!")
    
    # Create components
    name_input = gr.Textbox(label="Your Name")
    greeting_output = gr.Textbox(label="Greeting")
    greet_button = gr.Button("Greet Me!")
    
    # Define the function
    def greet(name):
        return f"Hello, {name}! Welcome to Gradio Blocks!"
    
    # Connect button click to function
    greet_button.click(
        fn=greet,            # Function to run
        inputs=[name_input],  # Input components
        outputs=[greeting_output]  # Output components
    )

# Launch (in Jupyter, this creates an embedded interface)
simple_demo.launch(share=False, inline=True)

### üîç What Just Happened?

1. `with gr.Blocks() as demo:` - Creates a blank canvas
2. Components are added in order (top to bottom)
3. `.click()` connects a button to a function
4. Inputs/outputs are lists of components

### ‚úã Try It Yourself #1

Modify the greeting to include the current time.

<details>
<summary>üí° Hint</summary>
Import `datetime` and add `datetime.now().strftime("%H:%M")` to the greeting string.
</details>

In [None]:
# Your code here - add time to the greeting
from datetime import datetime

# TODO: Create a Blocks interface that shows "Hello, {name}! It's {time}."


---

## Part 2: Layout with Rows, Columns, and Tabs

Real apps need structured layouts. Gradio provides:
- `gr.Row()` - Components side by side
- `gr.Column()` - Components stacked vertically (with width control)
- `gr.Tabs()` / `gr.TabItem()` - Tabbed interfaces

### üßí ELI5: Layout Components

> Think of building with LEGO:
> - **Row** = A flat LEGO baseplate where you line things up horizontally
> - **Column** = A tower of bricks going up
> - **Tabs** = A LEGO house with different rooms you can visit one at a time

In [None]:
# Layout demonstration
with gr.Blocks(theme=gr.themes.Soft()) as layout_demo:
    gr.Markdown("# Layout Examples")
    
    with gr.Tabs():
        # Tab 1: Rows and Columns
        with gr.TabItem("üìä Row & Column Demo"):
            with gr.Row():
                # Column with scale=2 is twice as wide as scale=1
                with gr.Column(scale=2):
                    gr.Markdown("### Large Column (scale=2)")
                    large_text = gr.Textbox(label="Wide input", lines=3)
                
                with gr.Column(scale=1):
                    gr.Markdown("### Small Column (scale=1)")
                    small_text = gr.Textbox(label="Narrow input")
                    go_button = gr.Button("Go!", variant="primary")
        
        # Tab 2: Nested layouts
        with gr.TabItem("üî≤ Nested Layout"):
            with gr.Row():
                with gr.Column():
                    gr.Markdown("**Left Panel**")
                    with gr.Row():  # Nested row!
                        btn1 = gr.Button("A")
                        btn2 = gr.Button("B")
                    with gr.Row():
                        btn3 = gr.Button("C")
                        btn4 = gr.Button("D")
                
                with gr.Column():
                    gr.Markdown("**Right Panel**")
                    output = gr.Textbox(label="Output", lines=4)
        
        # Tab 3: Accordion (collapsible)
        with gr.TabItem("üìÅ Accordion Demo"):
            gr.Markdown("Accordions hide complexity until needed.")
            
            with gr.Accordion("‚öôÔ∏è Advanced Settings", open=False):
                temperature = gr.Slider(0, 1, 0.7, label="Temperature")
                top_p = gr.Slider(0, 1, 0.9, label="Top-P")
                max_tokens = gr.Number(value=512, label="Max Tokens")
            
            with gr.Accordion("üìñ Help & Documentation", open=False):
                gr.Markdown("""
                **Temperature**: Controls randomness. Lower = more deterministic.
                
                **Top-P**: Nucleus sampling threshold.
                
                **Max Tokens**: Maximum response length.
                """)

layout_demo.launch(inline=True)

### ‚úã Try It Yourself #2

Create a 3-column layout where:
- Left column (scale=1): File upload
- Middle column (scale=2): Main content area
- Right column (scale=1): Settings

<details>
<summary>üí° Hint</summary>
Use `gr.Row()` with three `gr.Column(scale=...)` blocks inside.
</details>

In [None]:
# Your code here - create a 3-column layout


---

## Part 3: Building the RAG Backend

Before we build the UI, we need the RAG logic. Let's create a simple but functional backend.

### üßí ELI5: RAG Backend

> Imagine a librarian who:
> 1. **Indexes books** - Reads every book and remembers what topics are where
> 2. **Finds relevant pages** - When you ask a question, finds the right pages
> 3. **Answers questions** - Reads those pages and gives you an answer
>
> That's exactly what our RAG system does with your documents!

In [None]:
class RAGBackend:
    """
    A simple RAG backend for the demo.
    
    This is intentionally simple - in production, you'd use
    more sophisticated chunking, embedding models, etc.
    """
    
    def __init__(self, collection_name: str = "demo_docs"):
        """Initialize the RAG backend with ChromaDB."""
        # Use persistent storage so documents survive restarts
        self.client = chromadb.Client()  # In-memory for demo, use PersistentClient for real apps
        
        # Get or create collection
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            metadata={"hnsw:space": "cosine"}
        )
        
        # Model settings (can be changed in UI)
        self.llm_model = "llama3.2:3b"  # Default to smaller model
        self.embed_model = "qwen3-embedding:8b"
        self.n_results = 3
        self.temperature = 0.7
        
        print(f"RAG Backend initialized with collection: {collection_name}")
    
    def chunk_text(self, text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
        """
        Split text into overlapping chunks.
        
        Args:
            text: The full text to chunk
            chunk_size: Target size of each chunk in characters
            overlap: Overlap between consecutive chunks
            
        Returns:
            List of text chunks
        """
        chunks = []
        start = 0
        
        while start < len(text):
            end = start + chunk_size
            chunk = text[start:end]
            
            # Try to break at sentence boundary
            if end < len(text):
                last_period = chunk.rfind('.')
                if last_period > chunk_size // 2:
                    chunk = chunk[:last_period + 1]
                    end = start + last_period + 1
            
            chunks.append(chunk.strip())
            start = end - overlap
        
        return [c for c in chunks if len(c) > 50]  # Filter tiny chunks
    
    def index_document(self, text: str, filename: str) -> Tuple[int, str]:
        """
        Index a document into the vector store.
        
        Args:
            text: Document text content
            filename: Name of the source file
            
        Returns:
            Tuple of (number of chunks indexed, status message)
        """
        try:
            # Chunk the document
            chunks = self.chunk_text(text)
            
            if not chunks:
                return 0, "Document too short to index"
            
            # Generate embeddings using Ollama
            embeddings = []
            for chunk in chunks:
                response = ollama.embeddings(
                    model=self.embed_model,
                    prompt=chunk
                )
                embeddings.append(response["embedding"])
            
            # Create unique IDs
            base_id = filename.replace(" ", "_").replace(".", "_")
            ids = [f"{base_id}_chunk_{i}" for i in range(len(chunks))]
            
            # Store in ChromaDB
            self.collection.add(
                ids=ids,
                embeddings=embeddings,
                documents=chunks,
                metadatas=[{"source": filename, "chunk_id": i} for i in range(len(chunks))]
            )
            
            return len(chunks), f"Successfully indexed {len(chunks)} chunks from {filename}"
            
        except Exception as e:
            return 0, f"Error indexing {filename}: {str(e)}"
    
    def search(self, query: str) -> List[Dict]:
        """
        Search for relevant chunks.
        
        Args:
            query: User's question
            
        Returns:
            List of relevant chunks with metadata
        """
        # Get query embedding
        response = ollama.embeddings(
            model=self.embed_model,
            prompt=query
        )
        query_embedding = response["embedding"]
        
        # Search ChromaDB
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=self.n_results
        )
        
        # Format results
        chunks = []
        for i in range(len(results["documents"][0])):
            chunks.append({
                "text": results["documents"][0][i],
                "source": results["metadatas"][0][i]["source"],
                "distance": results["distances"][0][i] if "distances" in results else None
            })
        
        return chunks
    
    def chat(self, query: str, history: List[Tuple[str, str]]) -> Tuple[str, str]:
        """
        Generate a response using RAG.
        
        Args:
            query: User's question
            history: Conversation history as list of (user, assistant) tuples
            
        Returns:
            Tuple of (response text, sources markdown)
        """
        # Search for relevant context
        context_chunks = self.search(query)
        
        if not context_chunks:
            return "I don't have any documents to reference. Please upload some documents first!", ""
        
        # Build context string
        context = "\n\n---\n\n".join([c["text"] for c in context_chunks])
        
        # Build the prompt
        system_prompt = """You are a helpful assistant that answers questions based on the provided context.
Always cite which document your information comes from.
If the context doesn't contain relevant information, say so honestly.
Keep responses concise but complete."""
        
        # Build messages with history
        messages = [{"role": "system", "content": system_prompt}]
        
        # Add history (last 5 turns)
        for user_msg, assistant_msg in history[-5:]:
            messages.append({"role": "user", "content": user_msg})
            messages.append({"role": "assistant", "content": assistant_msg})
        
        # Add current query with context
        user_message = f"""Context from documents:
{context}

---

Question: {query}

Please answer based on the context above."""
        
        messages.append({"role": "user", "content": user_message})
        
        # Generate response
        response = ollama.chat(
            model=self.llm_model,
            messages=messages,
            options={"temperature": self.temperature}
        )
        
        answer = response["message"]["content"]
        
        # Format sources
        sources_md = "**Sources:**\n"
        for i, chunk in enumerate(context_chunks, 1):
            confidence = 1 - (chunk["distance"] or 0)  # Convert distance to similarity
            sources_md += f"\n{i}. **{chunk['source']}** (relevance: {confidence:.0%})\n"
            sources_md += f"   > {chunk['text'][:150]}...\n"
        
        return answer, sources_md
    
    def get_stats(self) -> Dict:
        """Get collection statistics."""
        count = self.collection.count()
        return {
            "total_chunks": count,
            "llm_model": self.llm_model,
            "embed_model": self.embed_model
        }
    
    def clear_collection(self):
        """Clear all documents from the collection."""
        # Delete and recreate
        self.client.delete_collection(self.collection.name)
        self.collection = self.client.create_collection(
            name=self.collection.name,
            metadata={"hnsw:space": "cosine"}
        )
        return "Collection cleared!"


# Test the backend
print("Testing RAG Backend...")
rag = RAGBackend()

# Index a test document
test_doc = """
The DGX Spark is NVIDIA's first desktop AI supercomputer designed for individual developers.
It features the Blackwell GB10 Superchip with 128GB of unified memory shared between CPU and GPU.
This unified memory architecture eliminates the need for data transfers between CPU and GPU,
enabling efficient processing of large AI models.

The system includes 192 fifth-generation Tensor Cores and 6,144 CUDA cores,
delivering up to 1 petaflop of FP4 AI performance. This makes it capable of running
models with up to 200 billion parameters locally using NVFP4 quantization.
"""

chunks, msg = rag.index_document(test_doc, "dgx_spark_overview.txt")
print(f"Indexed: {msg}")
print(f"Stats: {rag.get_stats()}")

### üîç What Just Happened?

We created a `RAGBackend` class that:

1. **Chunks documents** - Splits text into overlapping pieces
2. **Creates embeddings** - Uses Ollama to vectorize each chunk
3. **Stores in ChromaDB** - Vector database for fast similarity search
4. **Searches and answers** - Finds relevant chunks and generates responses

This is the "kitchen" - now let's build the "restaurant front"!

---

## Part 4: Building the Complete RAG Demo

Now let's build a polished three-tab interface:

1. **üìÅ Documents** - Upload and manage files
2. **üí¨ Chat** - Interact with the RAG system
3. **‚öôÔ∏è Settings** - Configure models and parameters

In [None]:
# Custom CSS for a polished look
custom_css = """
.gradio-container {
    max-width: 1200px !important;
    margin: auto !important;
}

.source-box {
    background-color: #f8f9fa;
    border-left: 4px solid #007bff;
    padding: 10px;
    margin: 10px 0;
    border-radius: 4px;
}

.stats-card {
    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
    color: white;
    padding: 20px;
    border-radius: 10px;
    text-align: center;
}

.warning-box {
    background-color: #fff3cd;
    border: 1px solid #ffc107;
    padding: 10px;
    border-radius: 4px;
}
"""

# Custom theme
theme = gr.themes.Soft(
    primary_hue="blue",
    secondary_hue="slate",
    font=gr.themes.GoogleFont("Inter"),
).set(
    button_primary_background_fill="*primary_500",
    button_primary_text_color="white",
    block_title_text_weight="600",
)

In [None]:
def create_rag_demo():
    """
    Create the complete RAG demo interface.
    
    This function creates a polished Gradio Blocks interface with
    document management, chat, and settings tabs.
    """
    
    # Initialize the RAG backend
    rag = RAGBackend()
    
    # =========================================
    # Helper Functions for UI
    # =========================================
    
    def process_uploaded_files(files, progress=gr.Progress()):
        """
        Process uploaded files and add to the index.
        
        Supports: .txt, .md, .pdf files
        """
        if not files:
            return "No files uploaded", 0, []
        
        results = []
        total_chunks = 0
        
        for i, file in enumerate(progress.tqdm(files, desc="Indexing documents")):
            filename = Path(file.name).name
            
            try:
                # Read file content
                if filename.endswith('.pdf'):
                    # Handle PDF
                    try:
                        from pypdf import PdfReader
                        reader = PdfReader(file.name)
                        text = "\n".join([page.extract_text() for page in reader.pages])
                    except ImportError:
                        results.append(f"‚ùå {filename}: pypdf not installed")
                        continue
                else:
                    # Handle text files
                    with open(file.name, 'r', encoding='utf-8', errors='ignore') as f:
                        text = f.read()
                
                # Index the document
                chunks, msg = rag.index_document(text, filename)
                total_chunks += chunks
                
                if chunks > 0:
                    results.append(f"‚úÖ {filename}: {chunks} chunks indexed")
                else:
                    results.append(f"‚ö†Ô∏è {filename}: {msg}")
                    
            except Exception as e:
                results.append(f"‚ùå {filename}: {str(e)}")
        
        status = "\n".join(results)
        doc_count = rag.get_stats()["total_chunks"]
        
        return status, doc_count, files
    
    def chat_respond(message, history):
        """
        Handle chat messages.
        """
        if not message:
            return history, "", ""
        
        if rag.get_stats()["total_chunks"] == 0:
            return history + [[message, "üìö No documents indexed yet! Please upload some documents in the Documents tab first."]], "", ""
        
        # Get response from RAG
        response, sources = rag.chat(message, history)
        
        # Update history
        history = history + [[message, response]]
        
        return history, sources, ""
    
    def clear_history():
        """Clear chat history."""
        return [], ""
    
    def update_settings(model, chunks, temp):
        """Update RAG settings."""
        rag.llm_model = model
        rag.n_results = int(chunks)
        rag.temperature = temp
        return f"Settings updated: Model={model}, Chunks={chunks}, Temp={temp}"
    
    def clear_documents():
        """Clear all indexed documents."""
        rag.clear_collection()
        return "All documents cleared!", 0
    
    def get_doc_count():
        """Get current document count."""
        return rag.get_stats()["total_chunks"]
    
    # =========================================
    # Build the Interface
    # =========================================
    
    with gr.Blocks(theme=theme, css=custom_css, title="RAG Chat Demo") as demo:
        # Header
        gr.Markdown("""
        # ü§ñ RAG Chat Demo
        
        Upload your documents, then chat with them! Powered by local LLMs via Ollama.
        """)
        
        with gr.Tabs() as tabs:
            # ===== TAB 1: DOCUMENTS =====
            with gr.TabItem("üìÅ Documents", id=1):
                gr.Markdown("### Upload & Manage Documents")
                
                with gr.Row():
                    with gr.Column(scale=2):
                        # File upload
                        file_upload = gr.File(
                            label="Upload Documents",
                            file_count="multiple",
                            file_types=[".txt", ".md", ".pdf"],
                            type="filepath"
                        )
                        
                        with gr.Row():
                            index_btn = gr.Button("üì• Index Documents", variant="primary")
                            clear_docs_btn = gr.Button("üóëÔ∏è Clear All", variant="secondary")
                        
                        # Status display
                        status_box = gr.Textbox(
                            label="Indexing Status",
                            lines=8,
                            interactive=False
                        )
                    
                    with gr.Column(scale=1):
                        # Stats
                        gr.Markdown("### üìä Statistics")
                        doc_count = gr.Number(
                            label="Total Chunks Indexed",
                            value=0,
                            interactive=False
                        )
                        
                        gr.Markdown("""
                        ---
                        **Supported formats:**
                        - üìÑ Plain text (.txt)
                        - üìù Markdown (.md)
                        - üìï PDF (.pdf)
                        
                        **Tips:**
                        - Smaller documents index faster
                        - PDFs may take longer to process
                        - Clear and re-index if results seem off
                        """)
                
                # Wire up events
                index_btn.click(
                    fn=process_uploaded_files,
                    inputs=[file_upload],
                    outputs=[status_box, doc_count, file_upload]
                )
                
                clear_docs_btn.click(
                    fn=clear_documents,
                    outputs=[status_box, doc_count]
                )
            
            # ===== TAB 2: CHAT =====
            with gr.TabItem("üí¨ Chat", id=2):
                gr.Markdown("### Chat with Your Documents")
                
                with gr.Row():
                    with gr.Column(scale=3):
                        # Chat interface
                        chatbot = gr.Chatbot(
                            height=450,
                            show_copy_button=True,
                            bubble_full_width=False,
                            avatar_images=(None, "https://em-content.zobj.net/source/twitter/376/robot_1f916.png")
                        )
                        
                        with gr.Row():
                            msg_input = gr.Textbox(
                                label="Message",
                                placeholder="Ask about your documents...",
                                scale=4,
                                show_label=False
                            )
                            send_btn = gr.Button("Send", variant="primary", scale=1)
                        
                        clear_chat_btn = gr.Button("üóëÔ∏è Clear Chat", variant="secondary")
                    
                    with gr.Column(scale=1):
                        gr.Markdown("### üìö Sources")
                        sources_display = gr.Markdown(
                            value="*Sources will appear here after you ask a question*"
                        )
                
                # Example questions
                with gr.Accordion("üí° Example Questions", open=False):
                    gr.Markdown("""
                    Try these example questions:
                    - "What is the main topic of the documents?"
                    - "Summarize the key points"
                    - "What are the most important facts?"
                    """)
                
                # Wire up events
                send_btn.click(
                    fn=chat_respond,
                    inputs=[msg_input, chatbot],
                    outputs=[chatbot, sources_display, msg_input]
                )
                
                msg_input.submit(
                    fn=chat_respond,
                    inputs=[msg_input, chatbot],
                    outputs=[chatbot, sources_display, msg_input]
                )
                
                clear_chat_btn.click(
                    fn=clear_history,
                    outputs=[chatbot, sources_display]
                )
            
            # ===== TAB 3: SETTINGS =====
            with gr.TabItem("‚öôÔ∏è Settings", id=3):
                gr.Markdown("### Configure RAG Settings")
                
                with gr.Row():
                    with gr.Column():
                        gr.Markdown("#### Model Settings")
                        
                        model_select = gr.Dropdown(
                            choices=[
                                "llama3.2:3b",
                                "llama3.2:1b",
                                "qwen3:8b",
                                "qwen3:32b",
                                "mistral:7b",
                                "qwen2:7b"
                            ],
                            value="llama3.2:3b",
                            label="LLM Model",
                            info="Larger models are more capable but slower"
                        )
                        
                        chunks_slider = gr.Slider(
                            minimum=1,
                            maximum=10,
                            value=3,
                            step=1,
                            label="Retrieved Chunks",
                            info="More chunks = more context, but slower"
                        )
                        
                        temp_slider = gr.Slider(
                            minimum=0,
                            maximum=1,
                            value=0.7,
                            step=0.1,
                            label="Temperature",
                            info="Higher = more creative, Lower = more focused"
                        )
                        
                        save_settings_btn = gr.Button("üíæ Save Settings", variant="primary")
                        settings_status = gr.Textbox(label="Status", interactive=False)
                    
                    with gr.Column():
                        gr.Markdown("#### Setting Explanations")
                        gr.Markdown("""
                        **LLM Model:**
                        - `llama3.2:3b` - Fast, good for quick tests
                        - `qwen3:8b` - Balanced performance
                        - `qwen3:32b` - Best quality (needs more RAM)
                        
                        **Retrieved Chunks:**
                        - Lower (1-2): Faster, focused answers
                        - Higher (5-10): More context, comprehensive answers
                        
                        **Temperature:**
                        - 0.0: Deterministic, same answer every time
                        - 0.7: Balanced creativity
                        - 1.0: Maximum creativity/randomness
                        """)
                
                # Wire up events
                save_settings_btn.click(
                    fn=update_settings,
                    inputs=[model_select, chunks_slider, temp_slider],
                    outputs=[settings_status]
                )
        
        # Footer
        gr.Markdown("""
        ---
        *Built with üíô using Gradio, ChromaDB, and Ollama | Module 4.5 Demo*
        """)
    
    return demo

# Create and launch
print("Creating RAG Demo...")
rag_demo = create_rag_demo()
rag_demo.launch(inline=True, share=False)

### üîç What Just Happened?

We built a complete, production-ready RAG demo with:

1. **Document Management Tab**
   - Multi-file upload with progress indicator
   - Real-time indexing status
   - Document statistics

2. **Chat Tab**
   - Clean chat interface with avatars
   - Source citations in a sidebar
   - Example questions for new users

3. **Settings Tab**
   - Model selection dropdown
   - Adjustable retrieval parameters
   - Temperature control

4. **Professional Polish**
   - Custom theme and CSS
   - Responsive layout
   - Clear instructions and feedback

---

## Part 5: Adding Advanced Features

Let's enhance our demo with some advanced Gradio features.

In [None]:
# Advanced feature: Streaming responses
def create_streaming_demo():
    """
    Demo showing streaming responses - much better UX!
    
    Instead of waiting for the full response, users see
    text appear word by word, like ChatGPT.
    """
    
    def stream_response(message, history):
        """
        Stream the response token by token.
        """
        # Build messages
        messages = []
        for user_msg, assistant_msg in history:
            messages.append({"role": "user", "content": user_msg})
            messages.append({"role": "assistant", "content": assistant_msg})
        messages.append({"role": "user", "content": message})
        
        # Stream response
        response_text = ""
        for chunk in ollama.chat(
            model="llama3.2:3b",
            messages=messages,
            stream=True  # Enable streaming!
        ):
            token = chunk["message"]["content"]
            response_text += token
            yield response_text
    
    with gr.Blocks(theme=gr.themes.Soft()) as demo:
        gr.Markdown("# ‚ö° Streaming Chat Demo")
        gr.Markdown("Watch the response appear word by word!")
        
        chatbot = gr.Chatbot(height=400)
        msg = gr.Textbox(label="Message", placeholder="Say something...")
        
        def user_message(message, history):
            return "", history + [[message, None]]
        
        def bot_response(history):
            message = history[-1][0]
            history_without_last = history[:-1]
            
            for response in stream_response(message, history_without_last):
                history[-1][1] = response
                yield history
        
        msg.submit(user_message, [msg, chatbot], [msg, chatbot]).then(
            bot_response, [chatbot], [chatbot]
        )
    
    return demo

# Uncomment to run:
# streaming_demo = create_streaming_demo()
# streaming_demo.launch(inline=True)
print("Streaming demo defined. Uncomment the last lines to run.")

In [None]:
# Advanced feature: Authentication
def create_auth_demo():
    """
    Demo showing basic authentication.
    
    Useful when you want to restrict access to your demo.
    """
    
    def greet(name):
        return f"Hello, {name}! You're authenticated."
    
    with gr.Blocks() as demo:
        gr.Markdown("# üîê Authenticated Demo")
        name = gr.Textbox(label="Your Name")
        output = gr.Textbox(label="Greeting")
        btn = gr.Button("Greet")
        btn.click(greet, name, output)
    
    return demo

# To launch with auth:
# auth_demo = create_auth_demo()
# auth_demo.launch(
#     auth=[("admin", "password123"), ("user", "demo")],
#     auth_message="Please login to access the demo"
# )
print("Auth demo defined. See comments for how to launch with authentication.")

---

## Part 6: Deploying to Hugging Face Spaces

Now let's deploy our demo to the world! Hugging Face Spaces provides free hosting for Gradio apps.

### Step 1: Prepare Your Files

Create a folder with these files:

```
my-rag-demo/
‚îú‚îÄ‚îÄ app.py           # Main application
‚îú‚îÄ‚îÄ requirements.txt  # Dependencies
‚îî‚îÄ‚îÄ README.md         # Space configuration
```

In [None]:
# Let's create the deployment files
import os

# Create deployment directory
deploy_dir = "rag_demo_deploy"
os.makedirs(deploy_dir, exist_ok=True)

# 1. app.py - The main application
app_py_content = '''
"""RAG Chat Demo - Hugging Face Spaces Version"""

import gradio as gr
import chromadb
from pathlib import Path
from typing import List, Tuple, Dict
import os

# Note: For HF Spaces, you might use a different LLM API
# such as Hugging Face Inference API or OpenAI
# This example uses a mock for demonstration

class RAGBackend:
    """Simplified RAG backend for Spaces."""
    
    def __init__(self):
        self.client = chromadb.Client()
        self.collection = self.client.get_or_create_collection("docs")
    
    def index_document(self, text: str, filename: str):
        chunks = [text[i:i+500] for i in range(0, len(text), 450)]
        chunks = [c for c in chunks if len(c) > 50]
        
        if not chunks:
            return 0, "Document too short"
        
        ids = [f"{filename}_{i}" for i in range(len(chunks))]
        self.collection.add(
            ids=ids,
            documents=chunks,
            metadatas=[{"source": filename}] * len(chunks)
        )
        return len(chunks), f"Indexed {len(chunks)} chunks"
    
    def search(self, query: str, n=3):
        results = self.collection.query(query_texts=[query], n_results=n)
        return results["documents"][0] if results["documents"] else []
    
    def chat(self, query: str, history: list):
        context = self.search(query)
        if not context:
            return "No documents indexed. Please upload some first!", ""
        
        # Mock response - replace with actual LLM API
        context_text = "\n".join(context[:2])
        response = f"Based on the documents: {context_text[:200]}..."
        sources = f"Found {len(context)} relevant chunks."
        
        return response, sources
    
    def count(self):
        return self.collection.count()
    
    def clear(self):
        self.client.delete_collection("docs")
        self.collection = self.client.create_collection("docs")


# Initialize backend
rag = RAGBackend()

# UI Functions
def process_files(files):
    if not files:
        return "No files", 0
    
    results = []
    for file in files:
        try:
            with open(file.name, "r") as f:
                text = f.read()
            chunks, msg = rag.index_document(text, Path(file.name).name)
            results.append(f"‚úÖ {Path(file.name).name}: {msg}")
        except Exception as e:
            results.append(f"‚ùå Error: {e}")
    
    return "\n".join(results), rag.count()

def chat_respond(message, history):
    response, sources = rag.chat(message, history)
    return history + [[message, response]], sources, ""

def clear_all():
    rag.clear()
    return "Cleared!", 0

# Build UI
with gr.Blocks(theme=gr.themes.Soft(), title="RAG Demo") as demo:
    gr.Markdown("# ü§ñ RAG Chat Demo")
    
    with gr.Tabs():
        with gr.TabItem("üìÅ Documents"):
            with gr.Row():
                with gr.Column():
                    files = gr.File(file_count="multiple", label="Upload")
                    with gr.Row():
                        index_btn = gr.Button("Index", variant="primary")
                        clear_btn = gr.Button("Clear")
                    status = gr.Textbox(label="Status", lines=4)
                with gr.Column():
                    count = gr.Number(label="Chunks", value=0)
            
            index_btn.click(process_files, [files], [status, count])
            clear_btn.click(clear_all, [], [status, count])
        
        with gr.TabItem("üí¨ Chat"):
            chatbot = gr.Chatbot(height=400)
            with gr.Row():
                msg = gr.Textbox(label="Message", scale=4)
                send = gr.Button("Send", variant="primary")
            sources = gr.Markdown()
            
            send.click(chat_respond, [msg, chatbot], [chatbot, sources, msg])
            msg.submit(chat_respond, [msg, chatbot], [chatbot, sources, msg])

if __name__ == "__main__":
    demo.launch()
'''

with open(f"{deploy_dir}/app.py", "w") as f:
    f.write(app_py_content)

# 2. requirements.txt
requirements_content = '''gradio>=4.0.0
chromadb>=0.4.0
'''

with open(f"{deploy_dir}/requirements.txt", "w") as f:
    f.write(requirements_content)

# 3. README.md with Spaces configuration
readme_content = '''---
title: RAG Chat Demo
emoji: ü§ñ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
license: mit
---

# RAG Chat Demo

Upload documents and chat with them using RAG (Retrieval Augmented Generation).

## Features
- Multi-file upload
- Document indexing
- Question answering

## Usage
1. Go to the Documents tab and upload text files
2. Click "Index" to process them
3. Go to Chat tab and ask questions!
'''

with open(f"{deploy_dir}/README.md", "w") as f:
    f.write(readme_content)

print(f"‚úÖ Deployment files created in '{deploy_dir}/'")
print(f"\nFiles created:")
for f in os.listdir(deploy_dir):
    print(f"  - {f}")

### Step 2: Push to Hugging Face Spaces

```bash
# 1. Create a new Space on huggingface.co/new-space
#    Select "Gradio" as the SDK

# 2. Clone your space
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

# 3. Copy your files
cp rag_demo_deploy/* YOUR_SPACE_NAME/

# 4. Push
cd YOUR_SPACE_NAME
git add .
git commit -m "Initial deploy"
git push
```

Your demo will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`

---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Not Using `gr.Blocks()` Context Manager

```python
# ‚ùå Wrong - components outside context
demo = gr.Blocks()
title = gr.Markdown("# Hello")  # This won't be part of demo!

# ‚úÖ Right - everything inside 'with'
with gr.Blocks() as demo:
    title = gr.Markdown("# Hello")  # Part of demo
```

**Why:** Components must be created inside the `with` block to be added to the interface.

---

### Mistake 2: Mismatched Inputs/Outputs

```python
# ‚ùå Wrong - function returns 2 values but only 1 output
def process(text):
    return text.upper(), len(text)

btn.click(process, [input], [output])  # Missing second output!

# ‚úÖ Right - match outputs to return values
btn.click(process, [input], [output1, output2])
```

**Why:** The number of outputs must match the number of values your function returns.

---

### Mistake 3: Blocking Operations Freeze the UI

```python
# ‚ùå Wrong - long operation blocks everything
def process_files(files):
    for file in files:
        time.sleep(10)  # UI is frozen!
    return "Done"

# ‚úÖ Right - use progress indicator
def process_files(files, progress=gr.Progress()):
    for file in progress.tqdm(files):
        time.sleep(10)  # Shows progress bar
    return "Done"
```

**Why:** Without progress indicators, users don't know if the app is working or frozen.

---

### Mistake 4: Not Handling Errors Gracefully

```python
# ‚ùå Wrong - raw exception shown to user
def query_llm(prompt):
    response = ollama.generate(prompt)  # Might fail!
    return response

# ‚úÖ Right - catch and show friendly message
def query_llm(prompt):
    try:
        response = ollama.generate(prompt)
        return response
    except Exception as e:
        return f"üòî Oops! Something went wrong: {str(e)}. Please try again."
```

**Why:** Users shouldn't see raw Python tracebacks.

---

## üéâ Checkpoint

You've learned:
- ‚úÖ How to use Gradio Blocks API for complex layouts
- ‚úÖ How to structure tabs, rows, and columns
- ‚úÖ How to build a complete RAG demo interface
- ‚úÖ How to add streaming, authentication, and progress indicators
- ‚úÖ How to deploy to Hugging Face Spaces

---

## üöÄ Challenge (Optional)

Enhance the RAG demo with these features:

1. **Multi-language support** - Add a language selector and translate UI elements
2. **Theme toggle** - Let users switch between light and dark mode
3. **Export chat** - Add a button to download conversation history as JSON/Markdown
4. **Voice input** - Use `gr.Audio` to allow voice questions

<details>
<summary>üí° Hints</summary>

- For themes: `gr.themes.Default()` vs `gr.themes.Monochrome()`
- For export: Return a `gr.File` component with the downloaded content
- For audio: `gr.Audio(source="microphone", type="filepath")`
</details>

---

## üìñ Further Reading

- [Gradio Blocks Guide](https://gradio.app/guides/blocks-and-event-listeners)
- [Gradio Custom Components](https://gradio.app/guides/custom-components-in-five-minutes)
- [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces)
- [Gradio Theming](https://gradio.app/guides/theming-guide)

---

## üßπ Cleanup

In [None]:
# Clean up resources
import gc

# Close any running demos
try:
    simple_demo.close()
except:
    pass

try:
    layout_demo.close()
except:
    pass

try:
    rag_demo.close()
except:
    pass

# Force garbage collection
gc.collect()

print("‚úÖ Cleanup complete!")

---

## ‚û°Ô∏è Next Steps

Continue to [Lab 4.5.2: Agent Playground](lab-4.5.2-agent-playground.ipynb) to build a Streamlit app for visualizing agent reasoning!