# Lab 4.5.1: Complete RAG Demo

**Module:** 4.5 - Demo Building & Prototyping  
**Time:** 3 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê‚≠ê‚òÜ

---

## üéØ Lab Objectives

Build a polished, production-ready Gradio application that showcases your RAG system with:
- [ ] Multi-tab interface using Blocks API
- [ ] Document upload and indexing with progress indicators
- [ ] Chat interface with conversation history
- [ ] Source citations display
- [ ] Settings panel for configuration
- [ ] Custom styling for professional appearance
- [ ] Error handling with friendly messages
- [ ] Deployment to Hugging Face Spaces

---

## üåç Scenario

Your team has been developing a RAG system for 3 months. The CEO wants to see a demo next week. You need to create a polished interface that:
1. Impresses non-technical stakeholders
2. Actually works (no crashes!)
3. Can be shared with investors

Let's build it!

---

## Part 1: Setup and Dependencies

In [None]:
# Install dependencies
!pip install -q gradio>=4.44.0 chromadb sentence-transformers pypdf

In [None]:
# Imports
import gradio as gr
import chromadb
from sentence_transformers import SentenceTransformer
import os
import time
from typing import List, Tuple, Dict, Optional
import hashlib
from datetime import datetime

print("‚úÖ Dependencies loaded!")

## Part 2: RAG Backend Implementation

First, let's create a simple but functional RAG backend.

In [None]:
class SimpleRAG:
    """
    A simple RAG system for the demo.
    
    In production, you would use your actual RAG pipeline from Module 3.5.
    This is simplified for demo purposes.
    """
    
    def __init__(self):
        """Initialize the RAG system."""
        # Initialize embedding model
        print("Loading embedding model...")
        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
        
        # Initialize ChromaDB
        self.client = chromadb.Client()
        self.collection = self.client.create_collection(
            name="documents",
            metadata={"hnsw:space": "cosine"}
        )
        
        self.documents = {}  # Track indexed documents
        print("‚úÖ RAG system initialized!")
    
    def chunk_text(self, text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
        """Split text into overlapping chunks."""
        chunks = []
        start = 0
        while start < len(text):
            end = start + chunk_size
            chunk = text[start:end]
            chunks.append(chunk)
            start = end - overlap
        return chunks
    
    def index_document(self, filename: str, content: str) -> int:
        """
        Index a document.
        
        Returns the number of chunks indexed.
        """
        # Generate document ID
        doc_id = hashlib.md5(filename.encode()).hexdigest()[:8]
        
        # Chunk the document
        chunks = self.chunk_text(content)
        
        # Generate embeddings
        embeddings = self.embedder.encode(chunks).tolist()
        
        # Add to ChromaDB
        ids = [f"{doc_id}_{i}" for i in range(len(chunks))]
        metadatas = [{"source": filename, "chunk_id": i} for i in range(len(chunks))]
        
        self.collection.add(
            ids=ids,
            embeddings=embeddings,
            documents=chunks,
            metadatas=metadatas
        )
        
        # Track the document
        self.documents[filename] = {
            "id": doc_id,
            "chunks": len(chunks),
            "indexed_at": datetime.now().isoformat()
        }
        
        return len(chunks)
    
    def query(self, question: str, n_results: int = 3) -> Tuple[str, List[Dict]]:
        """
        Query the RAG system.
        
        Returns (answer, sources)
        """
        if self.collection.count() == 0:
            return "No documents have been indexed yet. Please upload some documents first!", []
        
        # Generate query embedding
        query_embedding = self.embedder.encode([question])[0].tolist()
        
        # Retrieve relevant chunks
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results
        )
        
        # Format sources
        sources = []
        context_parts = []
        
        for i, (doc, metadata) in enumerate(zip(results['documents'][0], results['metadatas'][0])):
            sources.append({
                "source": metadata['source'],
                "chunk_id": metadata['chunk_id'],
                "text": doc[:200] + "..." if len(doc) > 200 else doc
            })
            context_parts.append(f"[Source: {metadata['source']}]\n{doc}")
        
        # Generate answer (simplified - in production, use an LLM)
        context = "\n\n".join(context_parts)
        
        # Simulated answer (replace with actual LLM call in production)
        answer = self._generate_answer(question, context)
        
        return answer, sources
    
    def _generate_answer(self, question: str, context: str) -> str:
        """
        Generate an answer based on the context.
        
        In production, this would call your LLM (Ollama, OpenAI, etc.)
        """
        # For the demo, we'll create a template response
        # In production, replace this with actual LLM call
        return f"""Based on the documents I found, here's what I can tell you about your question:

**Question:** {question}

**Answer:** The relevant information from your documents suggests the following key points:

1. The documents contain information related to your query.
2. Multiple sources were consulted to provide this answer.
3. See the "Sources" section below for the specific excerpts used.

*Note: In a production system, this would be a real LLM-generated response using the retrieved context.*"""
    
    def get_stats(self) -> Dict:
        """Get statistics about the indexed documents."""
        return {
            "total_documents": len(self.documents),
            "total_chunks": self.collection.count(),
            "documents": list(self.documents.keys())
        }

# Create global RAG instance
rag_system = SimpleRAG()

## Part 3: Document Processing Helpers

In [None]:
def read_file_content(file_path: str) -> str:
    """
    Read content from various file types.
    
    Supports: .txt, .md, .pdf
    """
    extension = os.path.splitext(file_path)[1].lower()
    
    if extension == '.pdf':
        try:
            from pypdf import PdfReader
            reader = PdfReader(file_path)
            text = ""
            for page in reader.pages:
                text += page.extract_text() + "\n"
            return text
        except Exception as e:
            return f"Error reading PDF: {str(e)}"
    
    elif extension in ['.txt', '.md']:
        with open(file_path, 'r', encoding='utf-8') as f:
            return f.read()
    
    else:
        return f"Unsupported file type: {extension}"

print("‚úÖ File processing helpers ready!")

## Part 4: Custom Theme

In [None]:
# Custom theme for professional appearance
custom_theme = gr.themes.Soft(
    primary_hue="blue",
    secondary_hue="slate",
    neutral_hue="slate",
    font=gr.themes.GoogleFont("Inter"),
).set(
    button_primary_background_fill="#2563eb",
    button_primary_background_fill_hover="#1d4ed8",
    button_primary_text_color="white",
    block_title_text_weight="600",
    block_label_text_weight="500",
    input_background_fill="#f8fafc",
)

# Custom CSS for additional styling
custom_css = """
.gradio-container {
    max-width: 1200px !important;
    margin: auto !important;
}

.source-citation {
    background-color: #f0f9ff;
    border-left: 4px solid #0284c7;
    padding: 0.75rem;
    margin: 0.5rem 0;
    border-radius: 0 8px 8px 0;
    font-size: 0.9em;
}

.stats-card {
    background: linear-gradient(135deg, #f0f9ff 0%, #e0f2fe 100%);
    border-radius: 12px;
    padding: 1rem;
    text-align: center;
}

.success-message {
    background-color: #dcfce7;
    border-left: 4px solid #16a34a;
    padding: 0.75rem;
    border-radius: 0 8px 8px 0;
}

.error-message {
    background-color: #fef2f2;
    border-left: 4px solid #dc2626;
    padding: 0.75rem;
    border-radius: 0 8px 8px 0;
}

footer {display: none !important;}
"""

print("‚úÖ Theme and CSS configured!")

## Part 5: Building the Complete Interface

Now let's build the full multi-tab interface.

In [None]:
def create_rag_demo():
    """
    Create the complete RAG demo interface.
    """
    
    with gr.Blocks(theme=custom_theme, css=custom_css, title="Document Q&A") as demo:
        
        # Header
        gr.Markdown("""
        # üìö Document Q&A Assistant
        
        Upload your documents and ask questions. Powered by RAG (Retrieval-Augmented Generation).
        """)
        
        # Session state
        chat_history = gr.State(value=[])
        settings = gr.State(value={
            "n_results": 3,
            "temperature": 0.7,
            "model": "local"
        })
        
        with gr.Tabs() as tabs:
            
            # =====================================================================
            # TAB 1: DOCUMENTS
            # =====================================================================
            with gr.TabItem("üìÅ Documents", id="documents"):
                gr.Markdown("### Upload and Index Documents")
                
                with gr.Row():
                    with gr.Column(scale=2):
                        files = gr.File(
                            label="Upload Documents",
                            file_count="multiple",
                            file_types=[".pdf", ".txt", ".md"],
                            height=200
                        )
                        
                        with gr.Row():
                            index_btn = gr.Button("üì• Index Documents", variant="primary", size="lg")
                            clear_btn = gr.Button("üóëÔ∏è Clear All", variant="secondary")
                        
                        status_box = gr.HTML(
                            value="<div class='stats-card'>No documents indexed yet</div>",
                            label="Status"
                        )
                    
                    with gr.Column(scale=1):
                        gr.Markdown("### Indexed Documents")
                        doc_list = gr.Dataframe(
                            headers=["Document", "Chunks"],
                            datatype=["str", "number"],
                            col_count=(2, "fixed"),
                            interactive=False,
                            height=200
                        )
            
            # =====================================================================
            # TAB 2: CHAT
            # =====================================================================
            with gr.TabItem("üí¨ Chat", id="chat"):
                with gr.Row():
                    with gr.Column(scale=3):
                        chatbot = gr.Chatbot(
                            height=450,
                            show_copy_button=True,
                            placeholder="Ask a question about your documents...",
                            bubble_full_width=False
                        )
                        
                        with gr.Row():
                            msg = gr.Textbox(
                                label="Your Question",
                                placeholder="What would you like to know?",
                                scale=5,
                                lines=2
                            )
                            send_btn = gr.Button("Send üì§", variant="primary", scale=1)
                        
                        with gr.Row():
                            clear_chat_btn = gr.Button("üóëÔ∏è Clear Chat")
                            examples_dropdown = gr.Dropdown(
                                choices=[
                                    "What are the main topics in these documents?",
                                    "Summarize the key findings.",
                                    "What recommendations are mentioned?"
                                ],
                                label="Example Questions",
                                scale=2
                            )
                    
                    with gr.Column(scale=1):
                        gr.Markdown("### üìö Sources")
                        sources_display = gr.HTML(
                            value="<p style='color: #666;'>Sources will appear here after asking a question.</p>"
                        )
            
            # =====================================================================
            # TAB 3: SETTINGS
            # =====================================================================
            with gr.TabItem("‚öôÔ∏è Settings", id="settings"):
                gr.Markdown("### Configure the Assistant")
                
                with gr.Row():
                    with gr.Column():
                        gr.Markdown("#### Retrieval Settings")
                        n_results = gr.Slider(
                            minimum=1,
                            maximum=10,
                            value=3,
                            step=1,
                            label="Number of Sources to Retrieve",
                            info="More sources = more context, but slower"
                        )
                        
                        chunk_size = gr.Slider(
                            minimum=200,
                            maximum=1000,
                            value=500,
                            step=50,
                            label="Chunk Size",
                            info="Size of text chunks for indexing"
                        )
                    
                    with gr.Column():
                        gr.Markdown("#### Generation Settings")
                        temperature = gr.Slider(
                            minimum=0.0,
                            maximum=1.0,
                            value=0.7,
                            step=0.1,
                            label="Temperature",
                            info="Higher = more creative, Lower = more focused"
                        )
                        
                        max_tokens = gr.Slider(
                            minimum=100,
                            maximum=2000,
                            value=500,
                            step=100,
                            label="Max Response Length"
                        )
                
                save_settings_btn = gr.Button("üíæ Save Settings", variant="primary")
                settings_status = gr.HTML()
        
        # Footer
        gr.Markdown("---")
        with gr.Row():
            gr.Markdown(
                "*Built with Gradio & ChromaDB | Module 4.5 Demo*",
                elem_classes="footer-text"
            )
        
        # =====================================================================
        # EVENT HANDLERS
        # =====================================================================
        
        def index_documents(files):
            """Index uploaded documents."""
            if not files:
                return (
                    "<div class='error-message'>‚ö†Ô∏è Please upload at least one file.</div>",
                    []
                )
            
            results = []
            for file in files:
                try:
                    content = read_file_content(file.name)
                    filename = os.path.basename(file.name)
                    chunks = rag_system.index_document(filename, content)
                    results.append((filename, chunks))
                except Exception as e:
                    results.append((os.path.basename(file.name), f"Error: {str(e)}"))
            
            # Update stats
            stats = rag_system.get_stats()
            status_html = f"""
            <div class='success-message'>
                ‚úÖ Successfully indexed {len(files)} document(s)!<br>
                Total: {stats['total_documents']} documents, {stats['total_chunks']} chunks
            </div>
            """
            
            # Format document list
            doc_data = [[name, chunks] for name, chunks in results]
            
            return status_html, doc_data
        
        def clear_documents():
            """Clear all indexed documents."""
            # Note: In production, properly clear ChromaDB collection
            return (
                "<div class='stats-card'>All documents cleared. Upload new files to get started.</div>",
                []
            )
        
        def chat_response(message, history, settings):
            """Generate a response to the user's question."""
            if not message.strip():
                return history, "", "<p>Please enter a question.</p>"
            
            try:
                # Query the RAG system
                answer, sources = rag_system.query(
                    message,
                    n_results=settings.get("n_results", 3)
                )
                
                # Format sources HTML
                if sources:
                    sources_html = "<div>"
                    for i, src in enumerate(sources, 1):
                        sources_html += f"""
                        <div class='source-citation'>
                            <strong>Source {i}:</strong> {src['source']}<br>
                            <em>{src['text']}</em>
                        </div>
                        """
                    sources_html += "</div>"
                else:
                    sources_html = "<p style='color: #666;'>No sources found.</p>"
                
                # Update history
                history = history + [[message, answer]]
                
                return history, "", sources_html
            
            except Exception as e:
                error_msg = "I apologize, but I encountered an error. Please try again or rephrase your question."
                history = history + [[message, error_msg]]
                return history, "", f"<p style='color: red;'>Error: {str(e)}</p>"
        
        def clear_chat():
            """Clear chat history."""
            return [], "", "<p style='color: #666;'>Sources will appear here after asking a question.</p>"
        
        def set_example(example):
            """Set an example question."""
            return example
        
        def save_settings(n_res, temp):
            """Save settings."""
            new_settings = {
                "n_results": int(n_res),
                "temperature": float(temp)
            }
            return (
                new_settings,
                "<div class='success-message'>‚úÖ Settings saved!</div>"
            )
        
        # Wire up events
        index_btn.click(
            index_documents,
            inputs=[files],
            outputs=[status_box, doc_list]
        )
        
        clear_btn.click(
            clear_documents,
            outputs=[status_box, doc_list]
        )
        
        send_btn.click(
            chat_response,
            inputs=[msg, chatbot, settings],
            outputs=[chatbot, msg, sources_display]
        )
        
        msg.submit(
            chat_response,
            inputs=[msg, chatbot, settings],
            outputs=[chatbot, msg, sources_display]
        )
        
        clear_chat_btn.click(
            clear_chat,
            outputs=[chatbot, msg, sources_display]
        )
        
        examples_dropdown.change(
            set_example,
            inputs=[examples_dropdown],
            outputs=[msg]
        )
        
        save_settings_btn.click(
            save_settings,
            inputs=[n_results, temperature],
            outputs=[settings, settings_status]
        )
    
    return demo

print("‚úÖ Demo interface created!")

## Part 6: Launch the Demo

In [None]:
# Create and launch the demo
demo = create_rag_demo()

# Launch in notebook
demo.launch(inline=True, share=False)

## Part 7: Export for Deployment

Let's export this as a standalone app for Hugging Face Spaces.

In [None]:
# Close the demo before exporting
demo.close()

In [None]:
# Create deployment files
import os

deploy_dir = '/tmp/rag_demo_deploy'
os.makedirs(deploy_dir, exist_ok=True)

# Write the complete app.py file
app_py = '''"""RAG Document Q&A Demo - Hugging Face Spaces Deployment"""

import gradio as gr
import chromadb
from sentence_transformers import SentenceTransformer
import os
import hashlib
from datetime import datetime
from typing import List, Tuple, Dict

# ============================================================================
# RAG BACKEND
# ============================================================================

class SimpleRAG:
    def __init__(self):
        self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
        self.client = chromadb.Client()
        self.collection = self.client.create_collection(
            name="documents",
            metadata={"hnsw:space": "cosine"}
        )
        self.documents = {}
    
    def chunk_text(self, text, chunk_size=500, overlap=50):
        chunks = []
        start = 0
        while start < len(text):
            end = start + chunk_size
            chunks.append(text[start:end])
            start = end - overlap
        return chunks
    
    def index_document(self, filename, content):
        doc_id = hashlib.md5(filename.encode()).hexdigest()[:8]
        chunks = self.chunk_text(content)
        embeddings = self.embedder.encode(chunks).tolist()
        ids = [f"{doc_id}_{i}" for i in range(len(chunks))]
        metadatas = [{"source": filename, "chunk_id": i} for i in range(len(chunks))]
        self.collection.add(ids=ids, embeddings=embeddings, documents=chunks, metadatas=metadatas)
        self.documents[filename] = {"chunks": len(chunks)}
        return len(chunks)
    
    def query(self, question, n_results=3):
        if self.collection.count() == 0:
            return "Please upload documents first!", []
        query_emb = self.embedder.encode([question])[0].tolist()
        results = self.collection.query(query_embeddings=[query_emb], n_results=n_results)
        sources = []
        for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
            sources.append({"source": meta["source"], "text": doc[:200] + "..."})
        answer = f"Based on {len(sources)} sources, here\'s what I found about: {question}"
        return answer, sources
    
    def get_stats(self):
        return {"total_documents": len(self.documents), "total_chunks": self.collection.count()}

rag = SimpleRAG()

# ============================================================================
# INTERFACE
# ============================================================================

theme = gr.themes.Soft(primary_hue="blue", secondary_hue="slate")

with gr.Blocks(theme=theme, title="Document Q&A") as demo:
    gr.Markdown("# üìö Document Q&A Assistant")
    
    with gr.Tabs():
        with gr.TabItem("üìÅ Documents"):
            files = gr.File(label="Upload", file_count="multiple", file_types=[".pdf", ".txt", ".md"])
            index_btn = gr.Button("Index", variant="primary")
            status = gr.Textbox(label="Status", interactive=False)
            
            def index_docs(files):
                if not files:
                    return "Upload files first"
                for f in files:
                    with open(f.name, "r", errors="ignore") as fp:
                        rag.index_document(os.path.basename(f.name), fp.read())
                stats = rag.get_stats()
                return f"Indexed {stats['total_documents']} docs, {stats['total_chunks']} chunks"
            
            index_btn.click(index_docs, [files], [status])
        
        with gr.TabItem("üí¨ Chat"):
            chatbot = gr.Chatbot(height=400)
            msg = gr.Textbox(label="Question")
            sources_box = gr.Textbox(label="Sources", lines=5, interactive=False)
            
            def respond(message, history):
                answer, sources = rag.query(message)
                history.append((message, answer))
                src_text = "\n".join([f"- {s['source']}: {s['text']}" for s in sources])
                return history, "", src_text
            
            msg.submit(respond, [msg, chatbot], [chatbot, msg, sources_box])

if __name__ == "__main__":
    demo.launch()
'''

with open(f'{deploy_dir}/app.py', 'w') as f:
    f.write(app_py)

# requirements.txt
requirements = '''gradio>=4.44.0
chromadb>=0.4.0
sentence-transformers>=2.2.0
pypdf>=3.0.0
'''

with open(f'{deploy_dir}/requirements.txt', 'w') as f:
    f.write(requirements)

# README.md
readme = '''---
title: RAG Document Q&A
emoji: üìö
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: true
---

# Document Q&A with RAG

Upload documents and ask questions using Retrieval-Augmented Generation.
'''

with open(f'{deploy_dir}/README.md', 'w') as f:
    f.write(readme)

print(f"‚úÖ Deployment files created in: {deploy_dir}")
print(f"\nFiles created:")
for f in os.listdir(deploy_dir):
    print(f"  - {f}")

---

## ‚úã Lab Exercises

### Exercise 1: Add Streaming Responses

Modify the `chat_response` function to stream the answer word by word instead of returning it all at once.

<details>
<summary>üí° Hint</summary>

Use a generator function:
```python
def streaming_response(message, history):
    answer, sources = rag_system.query(message)
    partial_answer = ""
    for word in answer.split():
        partial_answer += word + " "
        yield history + [[message, partial_answer]], sources
```
</details>

### Exercise 2: Add Confidence Indicators

Modify the sources display to show a confidence score based on the similarity scores from ChromaDB.

### Exercise 3: Add Export Functionality

Add a button to export the chat history as a text file.

---

## üéâ Checkpoint

You've built:
- ‚úÖ A complete RAG backend with ChromaDB
- ‚úÖ Multi-tab Gradio interface
- ‚úÖ Document upload and indexing
- ‚úÖ Chat interface with source citations
- ‚úÖ Custom theming
- ‚úÖ Deployment-ready files

---

## üì§ Deliverable

1. Deploy your RAG demo to Hugging Face Spaces
2. Test with at least 3 different documents
3. Share the URL with your instructor

---

## üßπ Cleanup

In [None]:
# Cleanup
import gc
gc.collect()

print("‚úÖ Lab complete! Ready for deployment.")