You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llama-cpp/ stack: GPU-accelerated llama.cpp server (image ghcr.io/ggml-org/llama.cpp:server-cuda, pinned by digest). aarch64+CUDA confirmed on GB10 (compute capability 12.1, 124 GiB VRAM). OpenAI-compatible API + web UI fronted by Caddy at https://llama.${CADDY_DOMAIN}. Default model is gpt-oss-safeguard-120b via HuggingFace auto-download — workaround for the Ollama pull bug (ollama/ollama#16121). New Caddy site block + mDNS alias.
llama-cpp: read-only mounts of Ollama's blob store (open-webui-ollama external volume) and the host's HuggingFace CLI cache, plus a MODEL_PATH env var so llama-server can skip downloading and reuse any file from those caches.
Direct Caddy-fronted access to the Ollama API at https://ollama.${CADDY_DOMAIN} (no auth, LAN-trust). The ollama container joins the shared web network in addition to internal. New Caddyfile.d/ollama.caddyfile + mDNS alias.
mdns/Makefile with install / uninstall / list / help targets. Replaces the install.sh / uninstall.sh pair.
open-webui/README.md and .github/README.md so each component documents itself.
Dedicated .github/workflows/trivy.md with the full Trivy workflow doc; .github/README.md is now a thin workflow index.
Trivy: relaxed extract-tags regex to allow @: so digest-pinned tags (server-cuda@sha256:…) are accepted; added llama-cpp to the image-scan matrix.
Changed
Slim top-level README.md to an overview + per-component links; per-stack details now live in each directory's README.md. Added a table-of-contents.
Split caddy/Caddyfile into per-service files under caddy/Caddyfile.d/<name>.caddyfile, loaded via import. Adding a new app is now a single file drop + reload.
.gitignore: added host-local /opt trees we don't manage in this repo (containerd, MicronTechnology, nvidia, NVIDIA AI Workbench).
Removed
HTTP basic auth in front of Netdata. The dashboard exposes read-only telemetry on a trusted LAN; one more password to manage was friction without meaningful security gain. Use Netdata Cloud (SSO/MFA) or an OAuth forward-auth proxy if you want real auth.