LLMOps

A production-ready LLMOps setup for running Ollama (LLM inference engine) with Open WebUI (web interface) on Linux with NVIDIA GPU acceleration.

Overview

This project provides a complete, automated setup for local LLM operations with the following design principles:

GPU stays on the host (not inside Docker) for direct CUDA access
Ollama runs as a systemd service for stability and automatic updates
Open WebUI runs in Docker for easy management and updates
Automated maintenance via systemd timers and Watchtower
Production-ready with proper systemd integration and boot auto-start

Architecture

Windows/Linux Client
   │
   │ HTTP (Browser)
   ▼
Open WebUI (Docker container)
   │
   │ HTTP API
   ▼
Ollama (systemd service on Linux host)
   │
   │ CUDA
   ▼
NVIDIA GPU (on host)

Key Components:

Ollama Service: Runs natively on the Linux host with direct GPU access via CUDA
Open WebUI: Web-based chat interface running in Docker, connects to host Ollama service
Auto-updates: Systemd timer for Ollama binary and models, Watchtower for Docker containers
Test Scripts: Python examples for testing Ollama API endpoints

Project Structure

llmops/
├── README.md                    # This file - project overview
├── pyproject.toml               # Python project configuration (uv)
├── uv.lock                      # Dependency lock file
├── .python-version              # Python version specification
│
├── ollama/                      # Ollama setup and automation
│   ├── README.md                # Detailed Ollama setup instructions
│   ├── setup/                   # Systemd units and update scripts
│   │   ├── ollama_autoupdate_script      # Auto-update script
│   │   ├── ollama_autoupdate.service     # Systemd service
│   │   ├── ollama_autoupdate.timer       # Systemd timer (daily updates)
│   │   ├── ollama_commands               # CLI reference
│   │   ├── ollama_model_list             # Models to auto-update
│   │   └── ollama_models_table.sh        # Model info generator
│   └── test/                    # Test and example scripts
│       ├── main.py              # Main entry point
│       ├── test_ollama_chat.py           # Chat API example
│       ├── test_ollama_streaming.py      # Streaming API example
│       └── test_ollama_translategemma.py # Translation example
│
└── openwebui/                   # Open WebUI Docker setup
    ├── README.md                # Detailed Open WebUI setup instructions
    ├── docker-compose.yaml      # Docker Compose configuration
    └── .env                     # Environment configuration (not in repo)

Prerequisites

Hardware

Linux host (Ubuntu 20.04+ or similar)
NVIDIA GPU with CUDA support
Sufficient disk space for models (50GB+ recommended)

Software

Linux with systemd
Docker and Docker Compose plugin
NVIDIA Driver and CUDA toolkit
Python 3.13+ (for test scripts)
uv package manager (optional, for test scripts)

Quick Start

1. Install Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

# Pull a model
ollama pull llama3.1

See ollama/README.md for:

Systemd auto-update setup (daily updates for Ollama binary and models)
Model management
API usage examples

2. Set Up Open WebUI

# Create installation directory
sudo mkdir -p /opt/openwebui

# Copy docker-compose.yaml from this repo (adjust path to your repo location)
sudo cp /path/to/llmops/openwebui/docker-compose.yaml /opt/openwebui/

# Generate secret key in proper format
cd /opt/openwebui
echo "WEBUI_SECRET_KEY=$(openssl rand -hex 32)" | sudo tee .env
sudo chmod 600 .env

# Start services
sudo docker compose up -d

Access Open WebUI at http://localhost:3000

See openwebui/README.md for:

Detailed configuration
Systemd auto-start setup
Watchtower auto-updates
Troubleshooting

Features

Ollama Features

✅ Native GPU acceleration (CUDA)
✅ Automated daily updates (binary + models)
✅ Systemd service integration
✅ Lock-based update safety
✅ Comprehensive logging
✅ Multiple model support

Open WebUI Features

✅ Modern web-based chat interface
✅ Multi-model support
✅ Chat history and management
✅ Auto-updates via Watchtower
✅ Systemd auto-start on boot
✅ Secure session management

Test Scripts

✅ Chat API examples with code generation (test_ollama_chat.py using qwen2.5-coder)
✅ Streaming API examples (test_ollama_streaming.py)
✅ Translation examples (test_ollama_translategemma.py)

Models Included

The setup supports various models for different use cases:

Qwen2.5-Coder (14B) - Code generation and assistance
Llama 3.1 - General-purpose LLM
TranslateGemma - Translation (55 languages)
Llava - Vision-language model
Deepseek R1 - Advanced reasoning
Mistral - Efficient LLM
Nomic Embed Text - Text embeddings

Testing

Run the included test scripts to verify your setup:

# From the repository root, create virtual environment (using uv)
uv sync

# Run test scripts
uv run python ollama/test/test_ollama_chat.py
uv run python ollama/test/test_ollama_streaming.py
uv run python ollama/test/test_ollama_translategemma.py

Maintenance

Ollama Updates

Automatic: Daily via systemd timer (configured in ollama/setup/)
Manual: sudo systemctl start ollama-autoupdate.service

Open WebUI Updates

Automatic: Every 24 hours via Watchtower
Manual: cd /opt/openwebui && sudo docker compose pull && sudo docker compose up -d

Logs

# Ollama service logs
journalctl -u ollama.service -n 50

# Ollama auto-update logs
journalctl -u ollama-autoupdate.service -n 50
tail -n 100 /var/log/ollama_update.log

# Open WebUI logs
docker logs -f openwebui

# Watchtower logs
docker logs watchtower

Troubleshooting

Common Issues

Issue	Solution
Open WebUI can't connect to Ollama	Ensure Ollama is bound to `0.0.0.0:11434` (see openwebui/README.md)
No GPU acceleration	Verify `nvidia-smi` works and CUDA is properly installed
Models not updating	Check `/var/log/ollama_update.log` and systemd timer status
Session resets in UI	Don't change `WEBUI_SECRET_KEY` in `.env`

See component-specific READMEs for detailed troubleshooting:

ollama/README.md - Ollama issues
openwebui/README.md - Open WebUI issues

Resources

Documentation

Project Links

Community

Design Rationale

This setup follows these principles:

GPU Outside Docker: Running Ollama natively on the host avoids Docker CUDA complexity and provides better performance
Systemd Integration: Proper service management ensures stability and automatic recovery
Separation of Concerns: UI layer (Docker) is separate from inference layer (host service)
Automated Maintenance: Reduces manual intervention with safe, tested automation
Production-Ready: Designed for long-running, reliable operation

License

This project configuration is provided as-is for setting up and managing Ollama and Open WebUI. Please refer to the respective projects for their licenses:

Ollama: MIT License
Open WebUI: MIT License

Contributing

This is a personal LLMOps setup. Feel free to fork and adapt for your needs. Suggestions and improvements are welcome via issues or pull requests.

Status

✅ Production-ready for single-host deployment
✅ CUDA GPU acceleration verified
✅ Automated updates working
✅ Boot auto-start configured
✅ Test scripts validated

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ollama		ollama
openwebui		openwebui
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

LLMOps

Overview

Architecture

Project Structure

Prerequisites

Hardware

Software

Quick Start

1. Install Ollama

2. Set Up Open WebUI

Features

Ollama Features

Open WebUI Features

Test Scripts

Models Included

Testing

Maintenance

Ollama Updates

Open WebUI Updates

Logs

Troubleshooting

Common Issues

Resources

Documentation

Project Links

Community

Design Rationale

License

Contributing

Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages