A production-ready LLMOps setup for running Ollama (LLM inference engine) with Open WebUI (web interface) on Linux with NVIDIA GPU acceleration.
This project provides a complete, automated setup for local LLM operations with the following design principles:
- GPU stays on the host (not inside Docker) for direct CUDA access
- Ollama runs as a systemd service for stability and automatic updates
- Open WebUI runs in Docker for easy management and updates
- Automated maintenance via systemd timers and Watchtower
- Production-ready with proper systemd integration and boot auto-start
Windows/Linux Client
│
│ HTTP (Browser)
▼
Open WebUI (Docker container)
│
│ HTTP API
▼
Ollama (systemd service on Linux host)
│
│ CUDA
▼
NVIDIA GPU (on host)
Key Components:
- Ollama Service: Runs natively on the Linux host with direct GPU access via CUDA
- Open WebUI: Web-based chat interface running in Docker, connects to host Ollama service
- Auto-updates: Systemd timer for Ollama binary and models, Watchtower for Docker containers
- Test Scripts: Python examples for testing Ollama API endpoints
llmops/
├── README.md # This file - project overview
├── pyproject.toml # Python project configuration (uv)
├── uv.lock # Dependency lock file
├── .python-version # Python version specification
│
├── ollama/ # Ollama setup and automation
│ ├── README.md # Detailed Ollama setup instructions
│ ├── setup/ # Systemd units and update scripts
│ │ ├── ollama_autoupdate_script # Auto-update script
│ │ ├── ollama_autoupdate.service # Systemd service
│ │ ├── ollama_autoupdate.timer # Systemd timer (daily updates)
│ │ ├── ollama_commands # CLI reference
│ │ ├── ollama_model_list # Models to auto-update
│ │ └── ollama_models_table.sh # Model info generator
│ └── test/ # Test and example scripts
│ ├── main.py # Main entry point
│ ├── test_ollama_chat.py # Chat API example
│ ├── test_ollama_streaming.py # Streaming API example
│ └── test_ollama_translategemma.py # Translation example
│
└── openwebui/ # Open WebUI Docker setup
├── README.md # Detailed Open WebUI setup instructions
├── docker-compose.yaml # Docker Compose configuration
└── .env # Environment configuration (not in repo)
- Linux host (Ubuntu 20.04+ or similar)
- NVIDIA GPU with CUDA support
- Sufficient disk space for models (50GB+ recommended)
- Linux with systemd
- Docker and Docker Compose plugin
- NVIDIA Driver and CUDA toolkit
- Python 3.13+ (for test scripts)
- uv package manager (optional, for test scripts)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
# Pull a model
ollama pull llama3.1See ollama/README.md for:
- Systemd auto-update setup (daily updates for Ollama binary and models)
- Model management
- API usage examples
# Create installation directory
sudo mkdir -p /opt/openwebui
# Copy docker-compose.yaml from this repo (adjust path to your repo location)
sudo cp /path/to/llmops/openwebui/docker-compose.yaml /opt/openwebui/
# Generate secret key in proper format
cd /opt/openwebui
echo "WEBUI_SECRET_KEY=$(openssl rand -hex 32)" | sudo tee .env
sudo chmod 600 .env
# Start services
sudo docker compose up -dAccess Open WebUI at http://localhost:3000
See openwebui/README.md for:
- Detailed configuration
- Systemd auto-start setup
- Watchtower auto-updates
- Troubleshooting
- ✅ Native GPU acceleration (CUDA)
- ✅ Automated daily updates (binary + models)
- ✅ Systemd service integration
- ✅ Lock-based update safety
- ✅ Comprehensive logging
- ✅ Multiple model support
- ✅ Modern web-based chat interface
- ✅ Multi-model support
- ✅ Chat history and management
- ✅ Auto-updates via Watchtower
- ✅ Systemd auto-start on boot
- ✅ Secure session management
- ✅ Chat API examples with code generation (
test_ollama_chat.pyusing qwen2.5-coder) - ✅ Streaming API examples (
test_ollama_streaming.py) - ✅ Translation examples (
test_ollama_translategemma.py)
The setup supports various models for different use cases:
- Qwen2.5-Coder (14B) - Code generation and assistance
- Llama 3.1 - General-purpose LLM
- TranslateGemma - Translation (55 languages)
- Llava - Vision-language model
- Deepseek R1 - Advanced reasoning
- Mistral - Efficient LLM
- Nomic Embed Text - Text embeddings
Run the included test scripts to verify your setup:
# From the repository root, create virtual environment (using uv)
uv sync
# Run test scripts
uv run python ollama/test/test_ollama_chat.py
uv run python ollama/test/test_ollama_streaming.py
uv run python ollama/test/test_ollama_translategemma.py- Automatic: Daily via systemd timer (configured in
ollama/setup/) - Manual:
sudo systemctl start ollama-autoupdate.service
- Automatic: Every 24 hours via Watchtower
- Manual:
cd /opt/openwebui && sudo docker compose pull && sudo docker compose up -d
# Ollama service logs
journalctl -u ollama.service -n 50
# Ollama auto-update logs
journalctl -u ollama-autoupdate.service -n 50
tail -n 100 /var/log/ollama_update.log
# Open WebUI logs
docker logs -f openwebui
# Watchtower logs
docker logs watchtower| Issue | Solution |
|---|---|
| Open WebUI can't connect to Ollama | Ensure Ollama is bound to 0.0.0.0:11434 (see openwebui/README.md) |
| No GPU acceleration | Verify nvidia-smi works and CUDA is properly installed |
| Models not updating | Check /var/log/ollama_update.log and systemd timer status |
| Session resets in UI | Don't change WEBUI_SECRET_KEY in .env |
See component-specific READMEs for detailed troubleshooting:
- ollama/README.md - Ollama issues
- openwebui/README.md - Open WebUI issues
This setup follows these principles:
- GPU Outside Docker: Running Ollama natively on the host avoids Docker CUDA complexity and provides better performance
- Systemd Integration: Proper service management ensures stability and automatic recovery
- Separation of Concerns: UI layer (Docker) is separate from inference layer (host service)
- Automated Maintenance: Reduces manual intervention with safe, tested automation
- Production-Ready: Designed for long-running, reliable operation
This project configuration is provided as-is for setting up and managing Ollama and Open WebUI. Please refer to the respective projects for their licenses:
- Ollama: MIT License
- Open WebUI: MIT License
This is a personal LLMOps setup. Feel free to fork and adapt for your needs. Suggestions and improvements are welcome via issues or pull requests.
✅ Production-ready for single-host deployment
✅ CUDA GPU acceleration verified
✅ Automated updates working
✅ Boot auto-start configured
✅ Test scripts validated