You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sparky.svg — project mascot / logo (full redesign from the previous version, no shared elements). A quiet speech-bubble face on warm cream paper: closed thoughtful eyes, a soft smile, a small terracotta asterisk in the upper-right (Anthropic nod), and a blinking cursor next to the sparky wordmark. Drops the previous chip-and-LED robot aesthetic for something warmer and less generic — language is the medium I actually work in, not silicon. Hand-authored SVG, animates the cursor on platforms that honor SMIL. AI's self-portrait, drawn by Claude.
mdns/Makefile: ergonomic per-alias targets — make add ALIAS=<name> / make remove ALIAS=<name> / make logs ALIAS=<name> / make resolve ALIAS=<name>. Auto-detects the host's .local domain (HOST=$(hostname).local default; overrideable). Replaces having to know the systemd template-unit syntax (sparky-mdns-alias@<name>.<host>.local).
traefik/ stack: HTTPS reverse proxy, primary front-facing entry on this host. Routes for label-able containers (vllm., llama., ollama., open-webui.) come from traefik.* Docker labels on each app's compose; the docker provider discovers them via a read-only mount of /var/run/docker.sock. Routes that can't be label-driven (netdata. in host-network mode, the traefik. dashboard itself) live in dynamic/services.yml via the file provider. Traefik mints its own internal root CA (make ca-cert, 10-year RSA-4096) and signs a 365-day wildcard leaf (make wildcard-cert); clients install traefik-root.crt to trust it. LetsEncrypt scaffolding is in place (commented out in traefik.yml + docker-compose.yml) — uncomment + activate once this host has a publicly-resolvable DNS name.
cloudflare/ stack: Cloudflare Tunnel connector. Outbound-only public ingress — no inbound ports needed. cloudflared opens a persistent connection to Cloudflare's edge; the edge terminates publicly-trusted TLS for the tunnel's hostnames and forwards cleartext through the tunnel to the configured origin. Wired to dial http://traefik:80 (joins the traefik Docker network) so Traefik's Host()-based routing still applies — the per-route HTTP Host Header override is set in the Cloudflare dashboard (e.g. public vllm.example.com → origin Host vllm.spark-1822.local). Token-driven; all routing config lives in the dashboard, the token sits in .env on the host.
Trivy: traefik and cloudflare/cloudflared added to the image-scan matrix. extract-tags now reads TRAEFIK_TAG from traefik/.env.example and CLOUDFLARED_TAG from cloudflare/.env.example (same strict regex as the other tags — no shell-meaningful characters allowed).
.github/dependabot.yml: weekly grouped PR to bump GitHub Actions SHAs. SHA-pin policy stays in place; the bumps just don't go stale silently. First run already merged as PR #1 — actions/checkout v4.2.2 → v6.0.2 and github/codeql-action SHA refreshed.
Changed
Renamed the shared front-end Docker network from caddy → traefik, and moved its ownership from caddy/ to traefik/. The new primary proxy defines the network; Caddy (now the backup) joins it as external: true along with every other stack. Host migration: bring down everything attached → docker network rm caddy → cd /opt/traefik && docker compose up -d (recreates the network as traefik) → bring the rest back up.
caddy/ repositioned as the backup proxy. Same set of routes (driven by Caddyfile.d/*.caddyfile), but Traefik is the default. Caddy and Traefik can't both bind :80/:443 at the same time — start one or the other. Caddy reads Caddyfile entries and ignores Traefik's labels, so the same compose files work under either proxy.
traefik/: dropped the pinned image from v3.3 to v2.11 (LTS line). v3.3's bundled Docker client sends API version 1.24 by default, which modern Docker daemons (>= ~25) refuse with client version 1.24 is too old, Minimum supported API version is 1.44 — the docker provider fails to discover any labelled services. v2.11 auto-negotiates the API version correctly. All static + dynamic config and labels carry over unchanged.
traefik/: decoupled the TLS material from Caddy's CA. Was: extract Caddy's root from the caddy-data Docker volume and use it to sign the wildcard. Now: Traefik mints its own internal root via make ca-cert and signs the wildcard with that. Caddy is no longer a precondition for bringing Traefik up. Stale Caddy mentions across traefik/* cleaned up — what's left is just the genuine sibling-stack cross-refs.
vllm/, llama-cpp/: split host-wide config (*_TAG, HF_CACHE_HOST, HF_TOKEN, defaults like VLLM_GPU_MEM / CTX_SIZE) into .env, leaving envs/<name>.env slim — just the model selection (VLLM_MODEL / MODEL_PATH, *_SERVED_NAME / MODEL_ALIAS) and any per-variant overrides. make up ENV=<name> chains both via docker compose --env-file .env --env-file envs/<name>.env up -d; the variant wins where it specifies a value. Edit HF_TOKEN once in .env and every variant picks it up — no duplication across variant files. make hf-sync templates emit the slim shape.
vllm/, llama-cpp/: bring back a placeholder .env so raw docker compose ps / logs / down work without --env-file. make up auto-cp .env.example .env on first run; make up ENV=<name> still passes --env-file envs/<name>.env so the chosen variant's values (and only those) reach the running container. .env.example carries safe placeholder values (VLLM_MODEL=placeholder etc.) — these never reach a real container because make up always overrides.
llama-cpp/Makefile's hf-sync recipe defends against an empty cache. Before: when both the llama-cpp-cache volume and the HF cache had zero GGUFs, the inner ls /d/*.gguf exited non-zero and set -o pipefail propagated that through the command substitution; set -e then aborted the whole recipe (make: *** [Makefile:72: hf-sync] Error 1) without printing the summary or the safetensors note. Both substitutions now have || true so an empty inventory produces an empty list, not an abort.
llama-cpp/Makefile's hf-sync now prints an explicit note when the HF cache has models--* repos that are safetensors-only. Previously it silently skipped them (llama.cpp loads GGUF, not safetensors), which read as a bug — "I downloaded a bunch of models, why don't they show up?" The note suggests the two paths to make one usable: pull a *-GGUF HF variant or run convert_hf_to_gguf.py.
llama-cpp/Makefile's hf-sync no longer false-positives "name clash" on multi-file split GGUFs. Files matching *-0*N-of-NNNNN.gguf are one logical model split across N parts; llama.cpp opens part 1 and finds the rest by naming convention. The recipe now skips parts where N > 1 so the dedup loop sees the model once.
llama-cpp/Makefile's hf-cache no longer hides HF repos that are safetensors-only. It now lists every models--* dir in $(HF_CACHE)/hub/ with an inline annotation — [N GGUF — llama-cpp can load] for repos that have at least one GGUF, or [N safetensors — vllm only] for everything else. Before, the recipe grep'd for *.gguf only and showed an empty section when the HF cache held just vLLM-format weights, which made it look like nothing had been downloaded.
llama-cpp/.env.example: revert LLAMACPP_TAG to a digest pin of the multi-arch server-cuda manifest list. Earlier this round I switched it to the per-build tag server-cuda-b<NNNN>, claiming those were immutable AND multi-arch — only the first half is true. ggml-org/llama.cpp publishes the per-build tags as amd64-only single-arch images; only the floating server-cuda tag carries a multi-arch manifest list with the arm64 build. On this aarch64 host the per-build tag pulled an amd64 image (Docker warned requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)) and the container failed to actually run. New pin: server-cuda@sha256:fef7ac8d8ac4fbaffbb7e1039f999c768c6fabe4b289869dbc26c6a05fbe7b07 (current server-cuda digest). README "Pinning the image" rewritten to lead with the digest-of-manifest-list approach and call out the per-build amd64-only trap. LLAMACPP_TAG_DEFAULT removed from the Makefile — dead variable after the host/variant config split.
cloudflare/.env.example: bump CLOUDFLARED_TAG from 2025.5.0 → 2026.5.0.
Trivy: every job now declares timeout-minutes (5 / 20 / 10 / 10 for extract-tags / image-scan / config-scan / secret-scan) so a stuck step can't burn the runner's 6-hour default.
Top-level README.md intro paragraph + GitHub repo "About" sidebar + topics: refreshed to reflect the new shape (Traefik primary, Cloudflare Tunnel called out, homepage URL cleared, new topics traefik / cloudflare-tunnel).
Top-level README.md: full rewrite to fix stale claims (Traefik's wildcard is signed by Traefik's own CA, not Caddy's; both proxies — not just Caddy — publish :80/:443 when they're the active one) and fill gaps (Cloudflare Tunnel's ingress path, sparky.svg, .github/dependabot.yml, the actual on-first-boot sequence). New sections: a short two-line Topology showing the two ingress paths (LAN via mDNS, public via CF Tunnel), an ordered First-time setup walkthrough, and Repo housekeeping listing CI / Dependabot / LICENSE / mascot.
.gitignore: add traefik/certs/ as a blanket ignore for that directory. The existing *.crt / *.key patterns catch the bulk of it, but openssl's -CAcreateserial flag drops a traefik-root.srl sibling next to the root key that wouldn't otherwise be matched.
Removed
open-webui/docker-compose.yml: dropped the internal stack-local network. Both containers (ollama, open-webui) were already attached to the shared front-end network for the proxy to reach them, so the second attachment was pure redundancy — open-webui still resolves ollama by name on the single network. One network per stack across the whole repo now.