Skip to content

v0.2.0

Choose a tag to compare

@a1exus a1exus released this 12 May 23:42
· 117 commits to main since this release

Added

  • llama-cpp/ stack: GPU-accelerated llama.cpp server (image ghcr.io/ggml-org/llama.cpp:server-cuda, pinned by digest). aarch64+CUDA confirmed on GB10 (compute capability 12.1, 124 GiB VRAM). OpenAI-compatible API + web UI fronted by Caddy at https://llama.${CADDY_DOMAIN}. Default model is gpt-oss-safeguard-120b via HuggingFace auto-download — workaround for the Ollama pull bug (ollama/ollama#16121). New Caddy site block + mDNS alias.
  • llama-cpp: read-only mounts of Ollama's blob store (open-webui-ollama external volume) and the host's HuggingFace CLI cache, plus a MODEL_PATH env var so llama-server can skip downloading and reuse any file from those caches.
  • Direct Caddy-fronted access to the Ollama API at https://ollama.${CADDY_DOMAIN} (no auth, LAN-trust). The ollama container joins the shared web network in addition to internal. New Caddyfile.d/ollama.caddyfile + mDNS alias.
  • mdns/Makefile with install / uninstall / list / help targets. Replaces the install.sh / uninstall.sh pair.
  • open-webui/README.md and .github/README.md so each component documents itself.
  • Dedicated .github/workflows/trivy.md with the full Trivy workflow doc; .github/README.md is now a thin workflow index.
  • Trivy: relaxed extract-tags regex to allow @: so digest-pinned tags (server-cuda@sha256:…) are accepted; added llama-cpp to the image-scan matrix.

Changed

  • Slim top-level README.md to an overview + per-component links; per-stack details now live in each directory's README.md. Added a table-of-contents.
  • Split caddy/Caddyfile into per-service files under caddy/Caddyfile.d/<name>.caddyfile, loaded via import. Adding a new app is now a single file drop + reload.
  • .gitignore: added host-local /opt trees we don't manage in this repo (containerd, MicronTechnology, nvidia, NVIDIA AI Workbench).

Removed

  • HTTP basic auth in front of Netdata. The dashboard exposes read-only telemetry on a trusted LAN; one more password to manage was friction without meaningful security gain. Use Netdata Cloud (SSO/MFA) or an OAuth forward-auth proxy if you want real auth.