# Task 12.6: Ollama Web UI Integration

**Module:** 12 - Model Deployment & Inference Engines  
**Time:** 2 hours  
**Difficulty:** ‚≠ê‚≠ê

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Set up Open WebUI for Ollama
- [ ] Configure and customize the interface
- [ ] Add model presets and system prompts
- [ ] Enable advanced features like RAG and web search

---

## üìö Prerequisites

- Completed: Previous tasks in this module
- Running: Ollama with at least one model pulled
- Docker installed (for Open WebUI)

---

## üåç Real-World Context

**Why use a Web UI?**

While APIs are great for integration, a web interface is valuable for:
- **Team use**: Non-technical colleagues can interact with models
- **Experimentation**: Quickly test different models and prompts
- **Documentation**: Conversation history is preserved
- **RAG demos**: Upload documents and query them
- **Model comparison**: Side-by-side model testing

**Open WebUI** (formerly Ollama WebUI) is the most popular choice - it's feature-rich, actively maintained, and designed specifically for Ollama.

---

## üßí ELI5: Web UI vs API

> **Think of ordering food...**
>
> **API** = Ordering through a restaurant's phone line.
> You need to know the exact format: "I'd like one pepperoni pizza, medium, extra cheese."
> Perfect for apps that order automatically.
>
> **Web UI** = Ordering at the restaurant with a menu.
> You see pictures, read descriptions, and click what you want.
> Perfect for humans exploring options.
>
> **Both talk to the same kitchen (Ollama)**, just with different interfaces!

---

## Part 1: Setting Up Open WebUI

Open WebUI is the leading interface for Ollama. Let's set it up.

In [None]:
import subprocess
import requests
import time

# Check Ollama status
def check_ollama():
    """Check if Ollama is running and what models are available."""
    try:
        response = requests.get("http://localhost:11434/api/tags", timeout=5)
        if response.status_code == 200:
            models = response.json().get("models", [])
            return True, [m["name"] for m in models]
    except:
        pass
    return False, []

ollama_running, models = check_ollama()

print("üîç System Check")
print("=" * 50)

if ollama_running:
    print("‚úÖ Ollama is running")
    print(f"   Available models: {', '.join(models[:5])}..." if len(models) > 5 else f"   Available models: {', '.join(models)}")
else:
    print("‚ùå Ollama is not running")
    print("   Start with: ollama serve")
    print("   Then pull a model: ollama pull llama3.1:8b")

### üê≥ Installing Open WebUI

Open WebUI runs as a Docker container. Here are the setup options:

In [None]:
# Generate Open WebUI Docker commands

def generate_webui_commands(port: int = 3000, data_dir: str = "~/.open-webui"):
    """Generate Docker commands for Open WebUI."""
    
    basic = f"""
# Basic Setup (connects to Ollama on host)
# Note: Ollama runs separately on host, models stored in ~/.ollama
docker run -d -p {port}:8080 \\
    --add-host=host.docker.internal:host-gateway \\
    -v {data_dir}:/app/backend/data \\
    --name open-webui \\
    --restart always \\
    ghcr.io/open-webui/open-webui:main
"""
    
    with_gpu = f"""
# With GPU Support (for embeddings/local processing)
docker run -d -p {port}:8080 \\
    --gpus all \\
    --add-host=host.docker.internal:host-gateway \\
    -v {data_dir}:/app/backend/data \\
    --name open-webui \\
    --restart always \\
    ghcr.io/open-webui/open-webui:cuda
"""
    
    with_bundled_ollama = f"""
# All-in-One (Open WebUI + Ollama bundled)
# IMPORTANT: Mount ~/.ollama to persist downloaded models
docker run -d -p {port}:8080 \\
    --gpus all \\
    -v {data_dir}:/app/backend/data \\
    -v ~/.ollama:/root/.ollama \\
    --name open-webui \\
    --restart always \\
    ghcr.io/open-webui/open-webui:ollama
"""
    
    return {
        "basic": basic,
        "with_gpu": with_gpu,
        "bundled": with_bundled_ollama
    }

commands = generate_webui_commands()

print("üê≥ Open WebUI Docker Setup Options")
print("=" * 60)
print("\nüì¶ OPTION 1: Basic (Ollama already running on host)")
print(commands["basic"])
print("\nüì¶ OPTION 2: With GPU (for local embeddings)")
print(commands["with_gpu"])
print("\nüì¶ OPTION 3: All-in-One (includes Ollama)")
print(commands["bundled"])
print("\nüí° Key volumes to persist data:")
print("   ‚Ä¢ ~/.open-webui:/app/backend/data - Chat history, settings")
print("   ‚Ä¢ ~/.ollama:/root/.ollama - Downloaded models (for bundled option)")

In [None]:
# Check if Open WebUI is already running
def check_webui(port: int = 3000):
    """Check if Open WebUI is running."""
    try:
        response = requests.get(f"http://localhost:{port}/", timeout=5)
        return response.status_code == 200
    except:
        return False

if check_webui():
    print("‚úÖ Open WebUI is running!")
    print("   Access it at: http://localhost:3000")
else:
    print("‚ùå Open WebUI is not running")
    print("   Use one of the Docker commands above to start it")
    print("   After starting, access at: http://localhost:3000")

---

## Part 2: Initial Configuration

After starting Open WebUI for the first time:

### üîß First-Time Setup

1. **Create Admin Account**
   - Navigate to `http://localhost:3000`
   - Create your admin account (first user is admin)
   - This account controls all settings

2. **Verify Ollama Connection**
   - Go to Settings ‚Üí Connections
   - Ollama URL should be: `http://host.docker.internal:11434`
   - Click "Verify" to test the connection

3. **Configure Models**
   - Go to Settings ‚Üí Models
   - Your Ollama models should appear automatically
   - Set a default model for new chats

In [None]:
# Configuration recommendations

config_recommendations = {
    "General Settings": [
        "Enable 'Show Username in Chat' for multi-user setups",
        "Set a default system prompt for consistent behavior",
        "Configure chat history retention",
    ],
    "Model Settings": [
        "Set default parameters (temperature: 0.7, max tokens: 2048)",
        "Create model presets for different use cases",
        "Enable 'Show Response Stats' to see tokens/sec",
    ],
    "Interface Settings": [
        "Enable dark mode for long sessions",
        "Configure keyboard shortcuts",
        "Enable code syntax highlighting",
    ],
    "Advanced Features": [
        "Enable RAG for document upload",
        "Configure web search integration",
        "Set up custom tools/functions",
    ]
}

print("‚öôÔ∏è Recommended Configuration")
print("=" * 50)

for category, recommendations in config_recommendations.items():
    print(f"\nüìã {category}:")
    for rec in recommendations:
        print(f"   ‚Ä¢ {rec}")

---

## Part 3: Creating Model Presets

Model presets let you save configurations for different use cases.

In [None]:
# Model preset configurations

model_presets = {
    "Code Assistant": {
        "base_model": "llama3.1:8b",
        "system_prompt": """You are an expert programmer and software engineer. 
You help with coding tasks by:
1. Writing clean, well-documented code
2. Explaining concepts clearly
3. Suggesting best practices
4. Debugging issues step by step

Always include code examples when relevant. Use markdown code blocks with language specification.""",
        "parameters": {
            "temperature": 0.3,  # Lower for more precise code
            "top_p": 0.9,
            "max_tokens": 2048
        }
    },
    "Creative Writer": {
        "base_model": "llama3.1:8b",
        "system_prompt": """You are a creative writing assistant with expertise in storytelling, 
poetry, and various writing styles. You help users:
1. Generate creative content
2. Develop characters and plots
3. Improve writing style
4. Overcome writer's block

Be imaginative and expressive. Adapt your style to the user's preferences.""",
        "parameters": {
            "temperature": 0.9,  # Higher for creativity
            "top_p": 0.95,
            "max_tokens": 4096
        }
    },
    "Research Assistant": {
        "base_model": "llama3.1:8b",
        "system_prompt": """You are a research assistant helping with information synthesis and analysis.
You help users:
1. Summarize complex topics
2. Compare different viewpoints
3. Identify key insights
4. Suggest further reading

Always be accurate and cite limitations when unsure. Structure responses clearly.""",
        "parameters": {
            "temperature": 0.5,
            "top_p": 0.9,
            "max_tokens": 2048
        }
    },
    "Concise Helper": {
        "base_model": "llama3.1:8b",
        "system_prompt": "You are a helpful assistant. Be concise - answer in 1-3 sentences unless more detail is requested.",
        "parameters": {
            "temperature": 0.7,
            "top_p": 0.9,
            "max_tokens": 256
        }
    }
}

print("üé® Model Preset Configurations")
print("=" * 60)

for name, config in model_presets.items():
    print(f"\nüìå {name}")
    print(f"   Base Model: {config['base_model']}")
    print(f"   Temperature: {config['parameters']['temperature']}")
    print(f"   Max Tokens: {config['parameters']['max_tokens']}")
    print(f"   System Prompt: {config['system_prompt'][:80]}...")

### üìù Creating a Preset in Open WebUI

1. Go to **Workspace ‚Üí Models**
2. Click **Create a model**
3. Fill in:
   - Name (e.g., "Code Assistant")
   - Base model (select from dropdown)
   - System prompt (paste from above)
   - Parameters (temperature, etc.)
4. Click **Save**

The preset will now appear in your model dropdown!

---

## Part 4: RAG (Document Upload) Setup

Open WebUI supports RAG - upload documents and chat with them!

In [None]:
# RAG configuration guide

rag_config = {
    "Embedding Model": {
        "option": "Settings ‚Üí Documents ‚Üí Embedding Model",
        "recommendation": "Use 'nomic-embed-text' for good balance of speed/quality",
        "setup": "ollama pull nomic-embed-text"
    },
    "Chunk Size": {
        "option": "Settings ‚Üí Documents ‚Üí Chunk Size",
        "recommendation": "1000-1500 characters for most documents",
        "note": "Larger = more context but less precision"
    },
    "Top K Results": {
        "option": "Settings ‚Üí Documents ‚Üí Top K",
        "recommendation": "3-5 for focused retrieval, 10+ for broad coverage",
        "note": "More results = slower but potentially more complete"
    },
    "Supported Formats": {
        "formats": ["PDF", "TXT", "DOCX", "MD", "CSV", "JSON"],
        "note": "PDF requires additional processing"
    }
}

print("üìö RAG Configuration Guide")
print("=" * 60)

for setting, details in rag_config.items():
    print(f"\nüîß {setting}:")
    for key, value in details.items():
        if key != "setup":
            print(f"   {key}: {value}")
        else:
            print(f"   Setup command: {value}")

### üì§ Using Document Upload

1. **Pull an embedding model** (if not done):
   ```bash
   ollama pull nomic-embed-text
   ```

2. **Configure in Open WebUI**:
   - Settings ‚Üí Documents ‚Üí Select embedding model
   - Enable RAG

3. **Upload documents**:
   - Click the **+** button in chat
   - Select "Upload document"
   - Wait for processing

4. **Query your documents**:
   - Start with "Based on the uploaded document..."
   - The model will search and cite relevant sections

---

## Part 5: Advanced Features

Let's explore some power features of Open WebUI.

In [None]:
# Advanced features overview

advanced_features = {
    "Web Search": {
        "description": "Let the model search the web for current information",
        "setup": "Settings ‚Üí Web Search ‚Üí Enable (requires SearXNG or similar)",
        "use_case": "Current events, recent updates, fact-checking"
    },
    "Function Calling": {
        "description": "Define custom tools the model can use",
        "setup": "Workspace ‚Üí Functions ‚Üí Create",
        "use_case": "Calculations, API calls, custom integrations"
    },
    "Pipelines": {
        "description": "Chain multiple models or add preprocessing",
        "setup": "Settings ‚Üí Pipelines ‚Üí Configure",
        "use_case": "Translation ‚Üí Response, RAG ‚Üí Generation"
    },
    "Voice Input": {
        "description": "Speak to the model using Whisper",
        "setup": "Settings ‚Üí Audio ‚Üí Enable STT (Whisper)",
        "use_case": "Hands-free interaction, accessibility"
    },
    "Image Generation": {
        "description": "Generate images with DALL-E or local models",
        "setup": "Settings ‚Üí Images ‚Üí Configure backend",
        "use_case": "Creative content, diagrams, illustrations"
    },
    "Arena Mode": {
        "description": "Compare responses from multiple models",
        "setup": "Toggle in chat interface",
        "use_case": "Model evaluation, quality comparison"
    }
}

print("üöÄ Advanced Open WebUI Features")
print("=" * 60)

for feature, details in advanced_features.items():
    print(f"\n‚ú® {feature}")
    print(f"   Description: {details['description']}")
    print(f"   Setup: {details['setup']}")
    print(f"   Use case: {details['use_case']}")

---

## Part 6: Multi-User Setup

For team deployments, you may want to enable multi-user features.

In [None]:
# Multi-user configuration

multiuser_config = {
    "User Management": {
        "Admin Panel": "Accessible via Settings ‚Üí Admin (admin users only)",
        "User Roles": ["Admin", "User", "Pending"],
        "Registration": "Can be enabled/disabled in admin settings"
    },
    "Access Control": {
        "Model Access": "Restrict which models users can access",
        "Feature Toggles": "Enable/disable features per user role",
        "Rate Limiting": "Per-user request limits"
    },
    "Data Isolation": {
        "Chats": "Each user's chats are private by default",
        "Documents": "Uploaded docs can be personal or shared",
        "Presets": "Personal vs shared model presets"
    }
}

print("üë• Multi-User Configuration")
print("=" * 60)

for category, details in multiuser_config.items():
    print(f"\nüìå {category}:")
    for key, value in details.items():
        if isinstance(value, list):
            print(f"   {key}: {', '.join(value)}")
        else:
            print(f"   {key}: {value}")

---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Ollama Connection Issues

```bash
# ‚ùå Wrong - Can't reach Ollama from container
OLLAMA_BASE_URL=http://localhost:11434

# ‚úÖ Right - Use Docker's host gateway
OLLAMA_BASE_URL=http://host.docker.internal:11434

# Or on Linux without Docker Desktop:
--network=host
```

### Mistake 2: Data Not Persisting

```bash
# ‚ùå Wrong - Data lost when container restarts
docker run -d ghcr.io/open-webui/open-webui:main

# ‚úÖ Right - Mount a volume
docker run -d -v ~/.open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main
```

### Mistake 3: RAG Not Working

```bash
# ‚ùå Wrong - Embedding model not installed
# RAG uploads but retrieval fails

# ‚úÖ Right - Install embedding model first
ollama pull nomic-embed-text
# Then configure in Settings ‚Üí Documents
```

---

## ‚úã Try It Yourself

### Exercise 1: Create a Custom Preset

Create a model preset for a specific use case you care about.

In [None]:
# Exercise 1: Design your custom preset

my_preset = {
    "name": "My Custom Assistant",  # TODO: Name it
    "base_model": "llama3.1:8b",
    "system_prompt": """TODO: Write your system prompt here.
    
Think about:
- What role should the assistant have?
- What style should it use?
- What constraints or guidelines?
""",
    "parameters": {
        "temperature": 0.7,  # TODO: Adjust for your use case
        "max_tokens": 1024,  # TODO: Adjust based on expected response length
    }
}

# Print for copying into Open WebUI
print("üìù Your Custom Preset:")
print("=" * 50)
print(f"Name: {my_preset['name']}")
print(f"System Prompt:\n{my_preset['system_prompt']}")
print(f"\nParameters: {my_preset['parameters']}")

### Exercise 2: Set Up RAG

Configure RAG and upload a document to test it.

In [None]:
# Exercise 2: RAG testing checklist

rag_checklist = [
    "[ ] Pull nomic-embed-text: ollama pull nomic-embed-text",
    "[ ] Configure embedding model in Settings ‚Üí Documents",
    "[ ] Upload a test document (PDF, TXT, or MD)",
    "[ ] Ask a question that requires document knowledge",
    "[ ] Verify citations are shown in the response",
]

print("üìã RAG Setup Checklist:")
for item in rag_checklist:
    print(f"   {item}")

---

## üéâ Checkpoint

You've learned:
- ‚úÖ How to set up Open WebUI for Ollama
- ‚úÖ Creating and configuring model presets
- ‚úÖ Setting up RAG for document uploads
- ‚úÖ Advanced features like web search and function calling

---

## üöÄ Challenge (Optional)

**Build a Custom Knowledge Base**

Create a specialized assistant by:
1. Uploading 10+ related documents
2. Creating a custom preset with domain-specific system prompt
3. Testing retrieval quality
4. Documenting your findings

---

## üìñ Further Reading

- [Open WebUI Documentation](https://docs.openwebui.com/)
- [Open WebUI GitHub](https://github.com/open-webui/open-webui)
- [Ollama Model Library](https://ollama.com/library)
- [RAG Best Practices](https://www.llamaindex.ai/blog/a-guide-to-rag-evaluation)

---

## üßπ Cleanup

In [None]:
print("‚úÖ Module 12 Complete!")
print("\nüìã What you've accomplished:")
print("   ‚úì Benchmarked inference engines")
print("   ‚úì Deployed vLLM with continuous batching")
print("   ‚úì Explored TensorRT-LLM optimization")
print("   ‚úì Learned speculative decoding")
print("   ‚úì Built a production FastAPI server")
print("   ‚úì Set up Open WebUI")
print("\nüöÄ You're ready for production deployments!")