You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Added
tailscale/ stack: Tailscale sidecar — third ingress path alongside the LAN (mDNS) and public (Cloudflare Tunnel) ones. Registers this host as a node on your tailnet (spark-1822.<tailnet>.ts.net) and runs Tailscale Serve (config in serve.json) to terminate TLS on tailnet :443 with a real publicly-trusted MagicDNS cert and reverse-proxy to http://traefik:80. Joins the external traefik Docker network so backend container names resolve. Userspace networking mode (no /dev/net/tun, no privileged caps); node key + machine state in a named docker volume so secrets stay out of /opt. tailscale/README.md documents the Host-header routing caveat (Traefik routers match *.spark-1822.local, so tailnet hostnames need to be added to existing rule= labels) plus a "Hardening" section calling out four deferred follow-ups vs. Tailscale's production guidance: file-mounted auth secret over plain env, OAuth client credentials over static auth keys, --advertise-tags=tag:server for ACL identity, and kernel networking mode for throughput.
llama-cpp/ router mode (now the default): when no MODEL_* env var is set, llama-server starts with --models-dir /models and serves every GGUF in the symlink farm built by make hf-sync, loading on demand. Up to MODELS_MAX models stay resident in VRAM; LRU evicts the rest. Per-model overrides in an auto-generated config.ini with managed-fields semantics — hf-sync owns the model = line; user-edits to other keys (ctx-size, n-gpu-layers, hand-added alias, etc.) are preserved across regenerations; removed GGUFs are moved to config.ini.orphans (restored verbatim if they come back). Classic single-model mode (MODEL_PATH / MODEL_OLLAMA / MODEL_URL) still works for one-off pinning. Each GGUF is reachable under three IDs in /v1/models: the short alias from the config section name, the bare filename, and an HF-style <org>/<repo>:<quant> the router auto-derives from the symlink target path. New LLAMA_API_KEY bearer-token auth — required when the endpoint is reachable from anywhere other than 127.0.0.1 (Cloudflare Tunnel path is genuine internet exposure); generated by openssl rand -hex 32. Two new helper scripts: llama-cpp/scripts/sync-router.sh (builds the symlink farm; enumerates via hf cache scan with a find fallback) and llama-cpp/scripts/regen-config-ini.py (managed-fields INI regenerator with atomic writes). New Make target make models pretty-prints /v1/models from the running container. make up now accepts an empty ENV= to start in router mode.
Spec + plan docs under docs/superpowers/specs/ and docs/superpowers/plans/ recording the router-mode rollout (locked-in decisions, design, verification plan, and post-deployment findings).
Changed
llama-cpp/.env.example + entrypoint.sh + the hf-sync per-variant template: default CTX_SIZE 8192 → 32768. Matches what most contemporary 27B–120B GGUFs handle comfortably without YaRN tricks; per-variant overrides still win (uncomment CTX_SIZE=… in envs/<name>.env).
vllm/.env.example + entrypoint.sh + the hf-sync per-variant template: default VLLM_MAX_LEN 8192 → 32768 (same rationale as llama-cpp).
Top-level README.md: tailscale/ added to the intro bullets, Topology diagram (three ingress paths now), Layout tree, Components table, and First-time setup (new optional step 6, mirroring the Cloudflare Tunnel step).
Trivy: tailscale/tailscale added to the image-scan matrix. extract-tags now reads TAILSCALE_TAG from tailscale/.env.example (same strict regex as the other tags). trivy.md jobs table updated to list the new image.
tailscale/serve.json: now exposes both schemes on the tailnet node. :80 → http://traefik:80 (Traefik's web entrypoint 308-redirects to HTTPS, sending the client back to :443). :443 → https-insecure://traefik:443 (Tailscale terminates with the MagicDNS cert, then proxies to Traefik's websecure entrypoint; -insecure skips Traefik's wildcard-cert verification since the inner hop is container-to-container over the traefik Docker network). The previous single-listener config (:443 → http://traefik:80) would have caused a 308 redirect loop now that the dashboard router lives behind Host-based routing only. AllowFunnel: false for both ports — tailnet members only.
tailscale/services.json: new file. Tailscale VIP Service endpoint config — a separate tailnet entity from the node, with its own MagicDNS name and cert. Endpoints proxy to Traefik (tcp:80 → http://traefik:80, tcp:443 → https+insecure://traefik:443). Single shared file applied per-service with --service=svc:<name>; distinct schema from serve.json (per Tailscale Services configuration file — single-service ServiceDetailsFile form: flat version + endpoints, no outer wrapper).
tailscale/Makefile: new. make services-apply / services-status / services-clear loops over six per-backend VIP services (svc:traefik, svc:vllm, svc:llama, svc:ollama, svc:open-webui, svc:netdata) — pushes services.json into the container, runs tailscale serve set-config --service=<svc> + serve advertise <svc> for each. Daemon state persists in the tailscale-state volume. Service creation in the admin console (https://login.tailscale.com/admin/services) is one-time and manual — Tailscale doesn't expose a CLI for that. Tailnet clients hit https://<svc>.<tailnet>.ts.net and land on the matching Traefik backend.
Every Traefik router relaxed from <svc>.spark{x:.+} to <svc>.{x:.+}. The new form accepts both the LAN names (vllm.spark-1822.local) and the per-service tailnet names (vllm.<tailnet>.ts.net) without hardcoding either domain. Slightly more permissive than before — any <svc>.<anything> hitting Traefik will match its router — but the reachability surface (LAN + tailnet + Cloudflare Tunnel) is unchanged, so it's a non-issue in practice.
Legacy svc:spark (single-service-for-everything) kept working for backward compatibility: the Traefik dashboard router carries an || HostRegexp(spark{x:.+}) fallback so https://spark.<tailnet>.ts.net/ still lands on the dashboard. Retire it later — see "Legacy svc:spark" in tailscale/README.md.
Every Traefik router's rule= switched from a hardcoded Host() literal to a single HostRegexp(.spark{x:.+}) matcher. Matches the per-service subdomain form across any LAN/tailnet/Cloudflare domain (vllm.spark-1822.local, vllm.sparky.example.com, …). Six routers updated: ollama, open-webui, vllm, llama (label-based, in each app's compose) and netdata, traefik (file-based, in traefik/dynamic/services.yml). Renames of the hostname (spark-1822 → anything starting with spark) or the LAN/tailnet/public domain no longer require touching any rule. The Traefik dashboard router (traefik in dynamic/services.yml) additionally carries a second matcher HostRegexp(spark{x:.+}), so the bare tailnet hostname Tailscale Serve forwards (spark-1822.<tailnet>.ts.net) lands on the dashboard — Tailscale Serve only knows the node's MagicDNS hostname, and the dashboard is the most useful default landing. tailscale/README.md "Host-header routing" and traefik/README.md "Add an app" sections track the final state.
Image-pin policy flipped: float by default, pin in production. Every committed .env.example now uses a floating tag (latest for ollama / open-webui / netdata / tailscale / vllm / cloudflared; v2 for traefik because latest would resolve to v3 and silently break the named-capture HostRegexp rules; server-cuda for ggml-org/llama.cpp since no latest tag exists for that image). Operators override in their host-local .env with an immutable tag or content-digest pin for reproducibility — each .env.example carries the previous explicit pin in a comment + a release-page link. Root README's "Conventions" section rewritten to reflect the new rule (was: "never :latest"). Trivy's extract-tags regex (^[A-Za-z0-9._@:+-]+$) already accepted floating tags — no CI logic change; only stale "pinned" wording was cleaned up in .github/workflows/trivy.md and the cron comment.
llama-cpp/: GPU exclusivity reclassified. Router mode and Ollama are both lazy — neither claims VRAM until a model is loaded — so they coexist on the GB10's 124 GiB. The actually-exclusive engines in this stack are vLLM (--gpu-memory-utilization 0.9 reserves ~90% of VRAM eagerly) and llama-cpp's classic single-model mode (-ngl 999 pins every layer at startup). README's old "GPU exclusivity" subsection rewritten as "GPU sharing" with a posture-by-engine table. Documents that Ollama's restart: unless-stopped brings it back on Docker daemon restarts even after an explicit stop — so hard exclusivity needs ongoing intent.
llama-cpp/README.md clarifications: made explicit that the OpenAI-compatible API and the built-in web UI share one port (8080) on the same Traefik router, discriminated by path — there is no separate UI port. New "Router quirks worth knowing" subsection covering: /v1/models is unauthenticated by OpenAI-compat convention (only /v1/chat/completions enforces the bearer), each part of a multi-part split GGUF appears as its own model ID (only part 1 is loadable), loading the same physical file under two IDs counts twice toward MODELS_MAX, and gpt-oss-* models need the harmony chat template (default ChatML produces a 500 on parse). README "Pinning the image" snippet updated for ghcr.io's switch from Docker manifest list to OCI image index (the old Accept: application/vnd.docker.distribution.manifest.list.v2+json now returns 404; the new snippet sends OCI first), and adds a --help sanity check to catch upstream builds with broken RPATH/RUNPATH before deployment.
Removed
caddy/ stack: dropped. Traefik covers the same routing surface and is a better fit for a multi-container setup — discovery via Docker labels means adding a stack is a label drop, not a Caddyfile.d/*.caddyfile edit + reload. Eliminates the "two reverse proxies, pick one" footgun. Every sibling stack (open-webui/, vllm/, llama-cpp/, netdata/) had Caddy-as-backup phrasing scrubbed from its README + compose comments; ${CADDY_DOMAIN}-style URL templates replaced with the canonical literal https://<svc>.spark-1822.local (the HostRegexp rule accepts any <svc>.spark*.<domain>, so tailnet and Cloudflare Tunnel hostnames also resolve through the same routers). Trivy's image-scan matrix lost the caddy row. Host migration: docker compose -f /opt/caddy/docker-compose.yml down -v (also drops the caddy-data volume holding Caddy's internal CA), then sudo rm -rf /opt/caddy after the next git pull. Anything that was trusting caddy-root.crt should be re-pointed at traefik-root.crt (per-OS install steps in traefik/README.md).
Security
New LLAMA_API_KEY bearer-token auth on llama-cpp/ (see Added). Required for non-127.0.0.1 exposure paths. /v1/chat/completions returns 401 without it; /v1/models stays open by OpenAI-compat convention.
Known finding: Trivy flagged CVE-2026-33186 (gRPC-Go authorization bypass via HTTP/2 path validation, fixed in google.golang.org/grpc 1.79.3) inside cloudflare/cloudflared:2026.5.0's embedded Go binary. Cloudflare hadn't shipped a rebuild as of this release; with the new floating-tag convention, the next docker compose pull picks up the rebuild automatically once it ships. Practical risk for this host is low — cloudflared is a gRPC client to Cloudflare's edge (the vector requires an attacker-controlled server). The entry in .github/workflows/trivy.md → "Known findings" tracks it.