Skip to content

atulkumar2/llmops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMOps

A production-ready LLMOps setup for running Ollama (LLM inference engine) with Open WebUI (web interface) on Linux with NVIDIA GPU acceleration.

Overview

This project provides a complete, automated setup for local LLM operations with the following design principles:

  • GPU stays on the host (not inside Docker) for direct CUDA access
  • Ollama runs as a systemd service for stability and automatic updates
  • Open WebUI runs in Docker for easy management and updates
  • Automated maintenance via systemd timers and Watchtower
  • Production-ready with proper systemd integration and boot auto-start

Architecture

Windows/Linux Client
   │
   │ HTTP (Browser)
   ▼
Open WebUI (Docker container)
   │
   │ HTTP API
   ▼
Ollama (systemd service on Linux host)
   │
   │ CUDA
   ▼
NVIDIA GPU (on host)

Key Components:

  • Ollama Service: Runs natively on the Linux host with direct GPU access via CUDA
  • Open WebUI: Web-based chat interface running in Docker, connects to host Ollama service
  • Auto-updates: Systemd timer for Ollama binary and models, Watchtower for Docker containers
  • Test Scripts: Python examples for testing Ollama API endpoints

Project Structure

llmops/
├── README.md                    # This file - project overview
├── pyproject.toml               # Python project configuration (uv)
├── uv.lock                      # Dependency lock file
├── .python-version              # Python version specification
│
├── ollama/                      # Ollama setup and automation
│   ├── README.md                # Detailed Ollama setup instructions
│   ├── setup/                   # Systemd units and update scripts
│   │   ├── ollama_autoupdate_script      # Auto-update script
│   │   ├── ollama_autoupdate.service     # Systemd service
│   │   ├── ollama_autoupdate.timer       # Systemd timer (daily updates)
│   │   ├── ollama_commands               # CLI reference
│   │   ├── ollama_model_list             # Models to auto-update
│   │   └── ollama_models_table.sh        # Model info generator
│   └── test/                    # Test and example scripts
│       ├── main.py              # Main entry point
│       ├── test_ollama_chat.py           # Chat API example
│       ├── test_ollama_streaming.py      # Streaming API example
│       └── test_ollama_translategemma.py # Translation example
│
└── openwebui/                   # Open WebUI Docker setup
    ├── README.md                # Detailed Open WebUI setup instructions
    ├── docker-compose.yaml      # Docker Compose configuration
    └── .env                     # Environment configuration (not in repo)

Prerequisites

Hardware

  • Linux host (Ubuntu 20.04+ or similar)
  • NVIDIA GPU with CUDA support
  • Sufficient disk space for models (50GB+ recommended)

Software

  • Linux with systemd
  • Docker and Docker Compose plugin
  • NVIDIA Driver and CUDA toolkit
  • Python 3.13+ (for test scripts)
  • uv package manager (optional, for test scripts)

Quick Start

1. Install Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

# Pull a model
ollama pull llama3.1

See ollama/README.md for:

  • Systemd auto-update setup (daily updates for Ollama binary and models)
  • Model management
  • API usage examples

2. Set Up Open WebUI

# Create installation directory
sudo mkdir -p /opt/openwebui

# Copy docker-compose.yaml from this repo (adjust path to your repo location)
sudo cp /path/to/llmops/openwebui/docker-compose.yaml /opt/openwebui/

# Generate secret key in proper format
cd /opt/openwebui
echo "WEBUI_SECRET_KEY=$(openssl rand -hex 32)" | sudo tee .env
sudo chmod 600 .env

# Start services
sudo docker compose up -d

Access Open WebUI at http://localhost:3000

See openwebui/README.md for:

  • Detailed configuration
  • Systemd auto-start setup
  • Watchtower auto-updates
  • Troubleshooting

Features

Ollama Features

  • ✅ Native GPU acceleration (CUDA)
  • ✅ Automated daily updates (binary + models)
  • ✅ Systemd service integration
  • ✅ Lock-based update safety
  • ✅ Comprehensive logging
  • ✅ Multiple model support

Open WebUI Features

  • ✅ Modern web-based chat interface
  • ✅ Multi-model support
  • ✅ Chat history and management
  • ✅ Auto-updates via Watchtower
  • ✅ Systemd auto-start on boot
  • ✅ Secure session management

Test Scripts

  • ✅ Chat API examples with code generation (test_ollama_chat.py using qwen2.5-coder)
  • ✅ Streaming API examples (test_ollama_streaming.py)
  • ✅ Translation examples (test_ollama_translategemma.py)

Models Included

The setup supports various models for different use cases:

Testing

Run the included test scripts to verify your setup:

# From the repository root, create virtual environment (using uv)
uv sync

# Run test scripts
uv run python ollama/test/test_ollama_chat.py
uv run python ollama/test/test_ollama_streaming.py
uv run python ollama/test/test_ollama_translategemma.py

Maintenance

Ollama Updates

  • Automatic: Daily via systemd timer (configured in ollama/setup/)
  • Manual: sudo systemctl start ollama-autoupdate.service

Open WebUI Updates

  • Automatic: Every 24 hours via Watchtower
  • Manual: cd /opt/openwebui && sudo docker compose pull && sudo docker compose up -d

Logs

# Ollama service logs
journalctl -u ollama.service -n 50

# Ollama auto-update logs
journalctl -u ollama-autoupdate.service -n 50
tail -n 100 /var/log/ollama_update.log

# Open WebUI logs
docker logs -f openwebui

# Watchtower logs
docker logs watchtower

Troubleshooting

Common Issues

Issue Solution
Open WebUI can't connect to Ollama Ensure Ollama is bound to 0.0.0.0:11434 (see openwebui/README.md)
No GPU acceleration Verify nvidia-smi works and CUDA is properly installed
Models not updating Check /var/log/ollama_update.log and systemd timer status
Session resets in UI Don't change WEBUI_SECRET_KEY in .env

See component-specific READMEs for detailed troubleshooting:

Resources

Documentation

Project Links

Community

Design Rationale

This setup follows these principles:

  1. GPU Outside Docker: Running Ollama natively on the host avoids Docker CUDA complexity and provides better performance
  2. Systemd Integration: Proper service management ensures stability and automatic recovery
  3. Separation of Concerns: UI layer (Docker) is separate from inference layer (host service)
  4. Automated Maintenance: Reduces manual intervention with safe, tested automation
  5. Production-Ready: Designed for long-running, reliable operation

License

This project configuration is provided as-is for setting up and managing Ollama and Open WebUI. Please refer to the respective projects for their licenses:

  • Ollama: MIT License
  • Open WebUI: MIT License

Contributing

This is a personal LLMOps setup. Feel free to fork and adapt for your needs. Suggestions and improvements are welcome via issues or pull requests.

Status

✅ Production-ready for single-host deployment
✅ CUDA GPU acceleration verified
✅ Automated updates working
✅ Boot auto-start configured
✅ Test scripts validated

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors