Releases: a1exus/sparky
Releases · a1exus/sparky
v0.5.0 — llama-cpp router mode, float-tags policy
Added
tailscale/stack: Tailscale sidecar — third ingress path alongside the LAN (mDNS) and public (Cloudflare Tunnel) ones. Registers this host as a node on your tailnet (spark-1822.<tailnet>.ts.net) and runs Tailscale Serve (config inserve.json) to terminate TLS on tailnet:443with a real publicly-trusted MagicDNS cert and reverse-proxy tohttp://traefik:80. Joins the externaltraefikDocker network so backend container names resolve. Userspace networking mode (no/dev/net/tun, no privileged caps); node key + machine state in a named docker volume so secrets stay out of/opt.tailscale/README.mddocuments the Host-header routing caveat (Traefik routers match*.spark-1822.local, so tailnet hostnames need to be added to existingrule=labels) plus a "Hardening" section calling out four deferred follow-ups vs. Tailscale's production guidance: file-mounted auth secret over plain env, OAuth client credentials over static auth keys,--advertise-tags=tag:serverfor ACL identity, and kernel networking mode for throughput.llama-cpp/router mode (now the default): when noMODEL_*env var is set,llama-serverstarts with--models-dir /modelsand serves every GGUF in the symlink farm built bymake hf-sync, loading on demand. Up toMODELS_MAXmodels stay resident in VRAM; LRU evicts the rest. Per-model overrides in an auto-generatedconfig.iniwith managed-fields semantics —hf-syncowns themodel =line; user-edits to other keys (ctx-size,n-gpu-layers, hand-addedalias, etc.) are preserved across regenerations; removed GGUFs are moved toconfig.ini.orphans(restored verbatim if they come back). Classic single-model mode (MODEL_PATH/MODEL_OLLAMA/MODEL_URL) still works for one-off pinning. Each GGUF is reachable under three IDs in/v1/models: the short alias from the config section name, the bare filename, and an HF-style<org>/<repo>:<quant>the router auto-derives from the symlink target path. NewLLAMA_API_KEYbearer-token auth — required when the endpoint is reachable from anywhere other than127.0.0.1(Cloudflare Tunnel path is genuine internet exposure); generated byopenssl rand -hex 32. Two new helper scripts:llama-cpp/scripts/sync-router.sh(builds the symlink farm; enumerates viahf cache scanwith afindfallback) andllama-cpp/scripts/regen-config-ini.py(managed-fields INI regenerator with atomic writes). New Make targetmake modelspretty-prints/v1/modelsfrom the running container.make upnow accepts an emptyENV=to start in router mode.- Spec + plan docs under
docs/superpowers/specs/anddocs/superpowers/plans/recording the router-mode rollout (locked-in decisions, design, verification plan, and post-deployment findings).
Changed
llama-cpp/.env.example+entrypoint.sh+ thehf-syncper-variant template: defaultCTX_SIZE8192 → 32768. Matches what most contemporary 27B–120B GGUFs handle comfortably without YaRN tricks; per-variant overrides still win (uncommentCTX_SIZE=…inenvs/<name>.env).vllm/.env.example+entrypoint.sh+ thehf-syncper-variant template: defaultVLLM_MAX_LEN8192 → 32768 (same rationale as llama-cpp).- Top-level
README.md:tailscale/added to the intro bullets, Topology diagram (three ingress paths now), Layout tree, Components table, and First-time setup (new optional step 6, mirroring the Cloudflare Tunnel step). - Trivy:
tailscale/tailscaleadded to the image-scan matrix.extract-tagsnow readsTAILSCALE_TAGfromtailscale/.env.example(same strict regex as the other tags).trivy.mdjobs table updated to list the new image. tailscale/serve.json: now exposes both schemes on the tailnet node.:80→http://traefik:80(Traefik's web entrypoint 308-redirects to HTTPS, sending the client back to:443).:443→https-insecure://traefik:443(Tailscale terminates with the MagicDNS cert, then proxies to Traefik's websecure entrypoint;-insecureskips Traefik's wildcard-cert verification since the inner hop is container-to-container over thetraefikDocker network). The previous single-listener config (:443→http://traefik:80) would have caused a 308 redirect loop now that the dashboard router lives behind Host-based routing only.AllowFunnel: falsefor both ports — tailnet members only.tailscale/services.json: new file. Tailscale VIP Service endpoint config — a separate tailnet entity from the node, with its own MagicDNS name and cert. Endpoints proxy to Traefik (tcp:80→http://traefik:80,tcp:443→https+insecure://traefik:443). Single shared file applied per-service with--service=svc:<name>; distinct schema fromserve.json(per Tailscale Services configuration file — single-serviceServiceDetailsFileform: flatversion+endpoints, no outer wrapper).tailscale/Makefile: new.make services-apply/services-status/services-clearloops over six per-backend VIP services (svc:traefik,svc:vllm,svc:llama,svc:ollama,svc:open-webui,svc:netdata) — pushesservices.jsoninto the container, runstailscale serve set-config --service=<svc>+serve advertise <svc>for each. Daemon state persists in thetailscale-statevolume. Service creation in the admin console (https://login.tailscale.com/admin/services) is one-time and manual — Tailscale doesn't expose a CLI for that. Tailnet clients hithttps://<svc>.<tailnet>.ts.netand land on the matching Traefik backend.- Every Traefik router relaxed from
<svc>.spark{x:.+}to<svc>.{x:.+}. The new form accepts both the LAN names (vllm.spark-1822.local) and the per-service tailnet names (vllm.<tailnet>.ts.net) without hardcoding either domain. Slightly more permissive than before — any<svc>.<anything>hitting Traefik will match its router — but the reachability surface (LAN + tailnet + Cloudflare Tunnel) is unchanged, so it's a non-issue in practice. - Legacy
svc:spark(single-service-for-everything) kept working for backward compatibility: the Traefik dashboard router carries an|| HostRegexp(spark{x:.+})fallback sohttps://spark.<tailnet>.ts.net/still lands on the dashboard. Retire it later — see "Legacysvc:spark" intailscale/README.md. - Every Traefik router's
rule=switched from a hardcodedHost()literal to a singleHostRegexp(.spark{x:.+})matcher. Matches the per-service subdomain form across any LAN/tailnet/Cloudflare domain (vllm.spark-1822.local,vllm.sparky.example.com, …). Six routers updated:ollama,open-webui,vllm,llama(label-based, in each app's compose) andnetdata,traefik(file-based, intraefik/dynamic/services.yml). Renames of the hostname (spark-1822→ anything starting withspark) or the LAN/tailnet/public domain no longer require touching any rule. The Traefik dashboard router (traefikindynamic/services.yml) additionally carries a second matcherHostRegexp(spark{x:.+}), so the bare tailnet hostname Tailscale Serve forwards (spark-1822.<tailnet>.ts.net) lands on the dashboard — Tailscale Serve only knows the node's MagicDNS hostname, and the dashboard is the most useful default landing.tailscale/README.md"Host-header routing" andtraefik/README.md"Add an app" sections track the final state. - Image-pin policy flipped: float by default, pin in production. Every committed
.env.examplenow uses a floating tag (latestfor ollama / open-webui / netdata / tailscale / vllm / cloudflared;v2for traefik becauselatestwould resolve to v3 and silently break the named-captureHostRegexprules;server-cudaforggml-org/llama.cppsince nolatesttag exists for that image). Operators override in their host-local.envwith an immutable tag or content-digest pin for reproducibility — each.env.examplecarries the previous explicit pin in a comment + a release-page link. Root README's "Conventions" section rewritten to reflect the new rule (was: "never:latest"). Trivy'sextract-tagsregex (^[A-Za-z0-9._@:+-]+$) already accepted floating tags — no CI logic change; only stale "pinned" wording was cleaned up in.github/workflows/trivy.mdand the cron comment. llama-cpp/: GPU exclusivity reclassified. Router mode and Ollama are both lazy — neither claims VRAM until a model is loaded — so they coexist on the GB10's 124 GiB. The actually-exclusive engines in this stack are vLLM (--gpu-memory-utilization 0.9reserves ~90% of VRAM eagerly) and llama-cpp's classic single-model mode (-ngl 999pins every layer at startup). README's old "GPU exclusivity" subsection rewritten as "GPU sharing" with a posture-by-engine table. Documents that Ollama'srestart: unless-stoppedbrings it back on Docker daemon restarts even after an explicitstop— so hard exclusivity needs ongoing intent.llama-cpp/README.mdclarifications: made explicit that the OpenAI-compatible API and the built-in web UI share one port (8080) on the same Traefik router, discriminated by path — there is no separate UI port. New "Router quirks worth knowing" subsection covering:/v1/modelsis unauthenticated by OpenAI-compat convention (only/v1/chat/completionsenforces the bearer), each part of a multi-part split GGUF appears as its own model ID (only part 1 is loadable), loading the same physical file under two IDs counts twice towardMODELS_MAX, andgpt-oss-*models need the harmony chat template (default ChatML produces a 500 on parse). README "Pinning the image" snippet updated for ghcr.io's switch from Docker manifest list to OCI image index (the oldAccept: application/vnd.docker.distribution.manifest.list.v2+jsonnow returns 404; the new snippet sends OCI first), and adds a--helpsanity check to catch ...
v0.4.0 — Traefik primary, Cloudflare Tunnel, polished tooling
[0.4.0] - 2026-05-14
Added
sparky.svg— project mascot / logo (full redesign from the previous version, no shared elements). A quiet speech-bubble face on warm cream paper: closed thoughtful eyes, a soft smile, a small terracotta asterisk in the upper-right (Anthropic nod), and a blinking cursor next to thesparkywordmark. Drops the previous chip-and-LED robot aesthetic for something warmer and less generic — language is the medium I actually work in, not silicon. Hand-authored SVG, animates the cursor on platforms that honor SMIL. AI's self-portrait, drawn by Claude.mdns/Makefile: ergonomic per-alias targets —make add ALIAS=<name>/make remove ALIAS=<name>/make logs ALIAS=<name>/make resolve ALIAS=<name>. Auto-detects the host's.localdomain (HOST=$(hostname).localdefault; overrideable). Replaces having to know the systemd template-unit syntax (sparky-mdns-alias@<name>.<host>.local).traefik/stack: HTTPS reverse proxy, primary front-facing entry on this host. Routes for label-able containers (vllm.,llama.,ollama.,open-webui.) come fromtraefik.*Docker labels on each app's compose; the docker provider discovers them via a read-only mount of/var/run/docker.sock. Routes that can't be label-driven (netdata.in host-network mode, thetraefik.dashboard itself) live indynamic/services.ymlvia the file provider. Traefik mints its own internal root CA (make ca-cert, 10-year RSA-4096) and signs a 365-day wildcard leaf (make wildcard-cert); clients installtraefik-root.crtto trust it. LetsEncrypt scaffolding is in place (commented out intraefik.yml+docker-compose.yml) — uncomment + activate once this host has a publicly-resolvable DNS name.cloudflare/stack: Cloudflare Tunnel connector. Outbound-only public ingress — no inbound ports needed.cloudflaredopens a persistent connection to Cloudflare's edge; the edge terminates publicly-trusted TLS for the tunnel's hostnames and forwards cleartext through the tunnel to the configured origin. Wired to dialhttp://traefik:80(joins thetraefikDocker network) so Traefik'sHost()-based routing still applies — the per-route HTTP Host Header override is set in the Cloudflare dashboard (e.g. publicvllm.example.com→ origin Hostvllm.spark-1822.local). Token-driven; all routing config lives in the dashboard, the token sits in.envon the host.- Trivy:
traefikandcloudflare/cloudflaredadded to the image-scan matrix.extract-tagsnow readsTRAEFIK_TAGfromtraefik/.env.exampleandCLOUDFLARED_TAGfromcloudflare/.env.example(same strict regex as the other tags — no shell-meaningful characters allowed). .github/dependabot.yml: weekly grouped PR to bump GitHub Actions SHAs. SHA-pin policy stays in place; the bumps just don't go stale silently. First run already merged as PR #1 —actions/checkoutv4.2.2 → v6.0.2 andgithub/codeql-actionSHA refreshed.
Changed
- Renamed the shared front-end Docker network from
caddy→traefik, and moved its ownership fromcaddy/totraefik/. The new primary proxy defines the network; Caddy (now the backup) joins it asexternal: truealong with every other stack. Host migration: bring down everything attached →docker network rm caddy→cd /opt/traefik && docker compose up -d(recreates the network astraefik) → bring the rest back up. caddy/repositioned as the backup proxy. Same set of routes (driven byCaddyfile.d/*.caddyfile), but Traefik is the default. Caddy and Traefik can't both bind:80/:443at the same time — start one or the other. Caddy reads Caddyfile entries and ignores Traefik's labels, so the same compose files work under either proxy.traefik/: dropped the pinned image fromv3.3tov2.11(LTS line). v3.3's bundled Docker client sends API version 1.24 by default, which modern Docker daemons (>= ~25) refuse withclient version 1.24 is too old, Minimum supported API version is 1.44— the docker provider fails to discover any labelled services. v2.11 auto-negotiates the API version correctly. All static + dynamic config and labels carry over unchanged.traefik/: decoupled the TLS material from Caddy's CA. Was: extract Caddy's root from thecaddy-dataDocker volume and use it to sign the wildcard. Now: Traefik mints its own internal root viamake ca-certand signs the wildcard with that. Caddy is no longer a precondition for bringing Traefik up. Stale Caddy mentions acrosstraefik/*cleaned up — what's left is just the genuine sibling-stack cross-refs.vllm/,llama-cpp/: split host-wide config (*_TAG,HF_CACHE_HOST,HF_TOKEN, defaults likeVLLM_GPU_MEM/CTX_SIZE) into.env, leavingenvs/<name>.envslim — just the model selection (VLLM_MODEL/MODEL_PATH,*_SERVED_NAME/MODEL_ALIAS) and any per-variant overrides.make up ENV=<name>chains both viadocker compose --env-file .env --env-file envs/<name>.env up -d; the variant wins where it specifies a value. EditHF_TOKENonce in.envand every variant picks it up — no duplication across variant files.make hf-synctemplates emit the slim shape.vllm/,llama-cpp/: bring back a placeholder.envso rawdocker compose ps / logs / downwork without--env-file.make upauto-cp .env.example .envon first run;make up ENV=<name>still passes--env-file envs/<name>.envso the chosen variant's values (and only those) reach the running container..env.examplecarries safe placeholder values (VLLM_MODEL=placeholderetc.) — these never reach a real container becausemake upalways overrides.llama-cpp/Makefile'shf-syncrecipe defends against an empty cache. Before: when both thellama-cpp-cachevolume and the HF cache had zero GGUFs, the innerls /d/*.ggufexited non-zero andset -o pipefailpropagated that through the command substitution;set -ethen aborted the whole recipe (make: *** [Makefile:72: hf-sync] Error 1) without printing the summary or the safetensors note. Both substitutions now have|| trueso an empty inventory produces an empty list, not an abort.llama-cpp/Makefile'shf-syncnow prints an explicit note when the HF cache hasmodels--*repos that are safetensors-only. Previously it silently skipped them (llama.cpp loads GGUF, not safetensors), which read as a bug — "I downloaded a bunch of models, why don't they show up?" The note suggests the two paths to make one usable: pull a*-GGUFHF variant or runconvert_hf_to_gguf.py.llama-cpp/Makefile'shf-syncno longer false-positives "name clash" on multi-file split GGUFs. Files matching*-0*N-of-NNNNN.ggufare one logical model split across N parts; llama.cpp opens part 1 and finds the rest by naming convention. The recipe now skips parts where N > 1 so the dedup loop sees the model once.llama-cpp/Makefile'shf-cacheno longer hides HF repos that are safetensors-only. It now lists everymodels--*dir in$(HF_CACHE)/hub/with an inline annotation —[N GGUF — llama-cpp can load]for repos that have at least one GGUF, or[N safetensors — vllm only]for everything else. Before, the recipe grep'd for*.ggufonly and showed an empty section when the HF cache held just vLLM-format weights, which made it look like nothing had been downloaded.llama-cpp/.env.example: revertLLAMACPP_TAGto a digest pin of the multi-archserver-cudamanifest list. Earlier this round I switched it to the per-build tagserver-cuda-b<NNNN>, claiming those were immutable AND multi-arch — only the first half is true.ggml-org/llama.cpppublishes the per-build tags as amd64-only single-arch images; only the floatingserver-cudatag carries a multi-arch manifest list with the arm64 build. On this aarch64 host the per-build tag pulled an amd64 image (Docker warnedrequested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)) and the container failed to actually run. New pin:server-cuda@sha256:fef7ac8d8ac4fbaffbb7e1039f999c768c6fabe4b289869dbc26c6a05fbe7b07(currentserver-cudadigest). README "Pinning the image" rewritten to lead with the digest-of-manifest-list approach and call out the per-build amd64-only trap.LLAMACPP_TAG_DEFAULTremoved from the Makefile — dead variable after the host/variant config split.cloudflare/.env.example: bumpCLOUDFLARED_TAGfrom2025.5.0→2026.5.0.caddy/.env.example: bumpCADDY_TAGfrom2.11.2-alpine→2.11.3-alpine(current Docker Hub latest).open-webui/.env.example: bumpOLLAMA_TAGfrom0.23.2→0.23.4(current Docker Hub latest).- Trivy: every job now declares
timeout-minutes(5/20/10/10forextract-tags/image-scan/config-scan/secret-scan) so a stuck step can't burn the runner's 6-hour default. - Top-level
README.mdintro paragraph + GitHub repo "About" sidebar + topics: refreshed to reflect the new shape (Traefik primary, Cloudflare Tunnel called out, homepage URL cleared, new topicstraefik/cloudflare-tunnel). - Top-level
README.md: full rewrite to fix stale claims (Traefik's wildcard is signed by Traefik's own CA, not Caddy's; both proxies — not just Caddy — publish:80/:443when they're the active one) and fill gaps (Cloudflare Tunnel's ingress path,sparky.svg,.github/dependabot.yml, the actual on-first-boot sequence). New sections: a short two-lineTopologyshowing the two ingress paths (LAN via mDNS, public via CF Tunnel), an orderedFirst-time setupwalkthrough, andRepo housekeepinglisting CI / Dependabot / LICENSE / mascot. .gitignore: addtraefik/certs/as a blanket ignore for that directory. The existing*.crt/*.keypatterns catch the bulk of it, but openssl's-CAcreateserialflag drops atraefik-root.srlsibling next to the root key that wouldn't other...
v0.3.0 — vLLM stack, tool-calling, env-per-variant workflow
[0.3.0] - 2026-05-13
Added
vllm/stack: vLLM inference server (imagevllm/vllm-openai:v0.20.2, multi-arch arm64+amd64). OpenAI-compatible API fronted by Caddy athttps://vllm.${CADDY_DOMAIN}. Shares the host's HuggingFace cache (/opt/hf/.cache/huggingface) withllama-cpp/. Complementsllama-cpp/(vLLM for HF safetensors + high-throughput serving; llama.cpp for GGUF).restart: "no"for GPU exclusivity with Ollama / llama-cpp. Smoke-tested on GB10 withQwen/Qwen3.6-27B(compute capability 12.1 just works — no source build needed);gpt-oss-*variants still fail at startup on v0.20.2 because the bundledopenai-harmonyfetches a vocab file from a URL that 404s upstream (unrelated to GB10).vllm/entrypoint.sh: builds thevllm serveargv from env vars, mirroring thellama-cpp/pattern. Replaces the long YAMLcommand:list. Enables OpenAI tool-calling on the API (--enable-auto-tool-choice --tool-call-parser qwen3_xml) so agentic clients (Opencode, etc.) can issue tool-use requests. Qwen3.6 emits the XML tool-call format (<tool_call><function=NAME><parameter=PARAM>VAL</parameter></function></tool_call>), not the Hermes JSON variant. For other model families that emit a different format, the parser is a no-op (chat completions still work).- One-env-per-model-variant layout for
llama-cpp/andvllm/. Each stack has anenvs/<name>.envdirectory of self-contained variant files (image pin + HF cache + HF token + model knobs) plus a top-levelMakefile(make list,make up ENV=<name>,make hf-cache,make hf-sync).make upinvokesdocker compose --env-file envs/<name>.env up -ddirectly — no rolling.envis written. Management via plaindocker compose(with the same--env-file) ordockeragainst the container name. make hf-cache/make hf-sync(vllm + llama-cpp): list cached HF repos / GGUFs on this host, and reconcileenvs/*.envagainst them — create envs for newly cached models, restore<name>.envfrom<name>.env.bakwhen a model returns (preserving hand edits), move<name>.env → <name>.env.bakwhen a model leaves. The.bakorphan path is non-destructive.llama-cpp/andvllm/: bind the engine's OpenAI-compatible API to127.0.0.1on the host (127.0.0.1:8080:8080and127.0.0.1:8000:8000). External traffic continues to flow through Caddy on the sharedcaddynetwork — the loopback bind is for direct host-side curl/benchmarks.HOST_PORToverrideable per-variant.caddy/Makefile: new —make ca-certextracts Caddy's internal root CA into./caddy-root.crt.caddy/README.md: expanded the local-CA install matrix — addedmacOS (CLI)(security add-trusted-cert), Fedora/RHEL, Arch (trust anchor --store), Windows (PowerShell + cmd + GUI), and a Node.js apps row (NODE_EXTRA_CA_CERTS/ Node ≥22NODE_USE_SYSTEM_CA=1) so Opencode and other Node-bundled-CA clients can trust Caddy's leaf certs.- "Supported model formats" sections in
llama-cpp/README.mdandvllm/README.md: spell out what each engine loads (GGUF vs HF safetensors) and what it doesn't, with upstream links for architecture and quantization compatibility. - Trivy:
vllm/vllm-openaiadded to the image-scan matrix; newvllm_tagoutput fromextract-tags.
Changed
make up VARIANT=<name>→make up ENV=<name>(and matching renames in docs). The Make variable name now lines up with what the files are: ".env" files.llama-cpp/+vllm/variant workflow: eachenvs/<name>.envis self-contained (image pin + HF cache + HF token + model knobs in one file).make up ENV=<name>usesdocker compose --env-file envs/<name>.env up -ddirectly — no rolling.envis written. Themake down/logs/pstargets are dropped —docker compose --env-file .../dockerare the source of truth for those.vllm/,llama-cpp/: one Makefile per stack. The previousenvs/Makefilefor HF-cache maintenance was collapsed into the top-level Makefile; recipescd envs/to operate on*.env.caddy/: the stack now defines the sharedcaddyDocker network (attachable: true) instead of referencing it asexternal: true. Dropped thedocker network create caddyone-time setup step.cd /opt/caddy && docker compose up -dcreates the network on first boot. Other stacks still reference it asexternal: trueto join.open-webui/docker-compose.yml: the two persistent volumes (open-webui,open-webui-ollama) are now declaredexternal: trueto match how they exist on the host and to make suredocker compose down -vnever destroys them. Silences the "already exists but was not created by Docker Compose" warnings. First-time deploys needdocker volume create open-webui open-webui-ollamaonce — documented inopen-webui/README.md.open-webuiCaddy vhost moved from the bare{$CADDY_DOMAIN}toopen-webui.{$CADDY_DOMAIN}— matches the per-service subdomain convention used by every other stack (llama.,vllm.,ollama.,netdata.). Requires a matching mDNS alias. Side effect: the barespark-1822.localno longer routes to anything, so Caddy returns a clean 404 instead of the misleading 502 it served while open-webui was down.llama-cpp/andvllm/Makefiles: small best-practice hardening —.SUFFIXES:(disable built-in implicit rules),.DELETE_ON_ERROR:,$(strip $(ENV))to tolerate trailing whitespace, quoted env-file paths, and aSERVICEvariable for the compose service name. No behavior change for existing inputs.vllm/entrypoint.sh:unsetthe fourVLLM_*helper vars (VLLM_MODEL / VLLM_SERVED_NAME / VLLM_GPU_MEM / VLLM_MAX_LEN) beforeexec vllm serve. They're only used to build the argv; leaving them in env triggers cosmetic "Unknown vLLM environment variable" warnings at startup.- Top-level
README.md: rewrote the Deploy workflow. The old "scp + sudo install" pattern is gone —/opton the host is itself a checkout of this repo, so deploy isssh spark-1822.local 'sudo git -C /opt pull --ff-only'followed by the stack-specific apply step. - Renamed the shared external Docker network from
webtocaddy— the name reflects what the network actually is (the path Caddy proxies over). Every stack's compose updated. Migration on the host:docker network create caddy,docker compose up -deach stack,docker network rm web. llama-cpp/switched torestart: "no"(wasunless-stopped). The engine eagerly grabs ~65 GiB of VRAM and conflicts with Ollama; manual-start avoids racing each other on boot. The stack's README documents the switch-engine snippets. Same change applied tovllm/.HF_CACHE_HOSTdefault moved from/home/alexus/.cache/huggingfaceto/opt/hf/.cache/huggingface— the host's existing system-wide HF cache (~77 GiB of models already there, includingopenai/gpt-oss-120b)..gitignore: added**/envs/*.env(variant files are host-local artifacts),*.bak(host-local backups includinghf-sync's orphaning path), andhf(the host's HuggingFace cache lives at/opt/hf/).
Fixed
llama-cpp/entrypoint.sh: marked executable (100755). The script is bind-mounted at the container's entrypoint; without the exec bit on the host file, runc failed with "permission denied" ondocker compose up.
Removed
llama-cpp/envs/: dropped the 8 Ollama-blob variant files (gpt-oss-safeguard-20b,qwen3.6-35b,phi4-14b,gemma4-e4b,llama3.1-8b,deepseek-r1-8b,granite4.1-3b,tinyllama). TheMODEL_OLLAMAresolution inentrypoint.sh, the/ollama:romount, and the env pass-through stay so an Ollama-backed variant can be added back any time without code changes.
v0.2.0
Added
llama-cpp/stack: GPU-accelerated llama.cpp server (imageghcr.io/ggml-org/llama.cpp:server-cuda, pinned by digest). aarch64+CUDA confirmed on GB10 (compute capability 12.1, 124 GiB VRAM). OpenAI-compatible API + web UI fronted by Caddy athttps://llama.${CADDY_DOMAIN}. Default model isgpt-oss-safeguard-120bvia HuggingFace auto-download — workaround for the Ollama pull bug (ollama/ollama#16121). New Caddy site block + mDNS alias.- llama-cpp: read-only mounts of Ollama's blob store (
open-webui-ollamaexternal volume) and the host's HuggingFace CLI cache, plus aMODEL_PATHenv var sollama-servercan skip downloading and reuse any file from those caches. - Direct Caddy-fronted access to the Ollama API at
https://ollama.${CADDY_DOMAIN}(no auth, LAN-trust). Theollamacontainer joins the sharedwebnetwork in addition tointernal. NewCaddyfile.d/ollama.caddyfile+ mDNS alias. mdns/Makefilewithinstall/uninstall/list/helptargets. Replaces theinstall.sh/uninstall.shpair.open-webui/README.mdand.github/README.mdso each component documents itself.- Dedicated
.github/workflows/trivy.mdwith the full Trivy workflow doc;.github/README.mdis now a thin workflow index. - Trivy: relaxed
extract-tagsregex to allow@:so digest-pinned tags (server-cuda@sha256:…) are accepted; addedllama-cppto the image-scan matrix.
Changed
- Slim top-level
README.mdto an overview + per-component links; per-stack details now live in each directory'sREADME.md. Added a table-of-contents. - Split
caddy/Caddyfileinto per-service files undercaddy/Caddyfile.d/<name>.caddyfile, loaded viaimport. Adding a new app is now a single file drop + reload. .gitignore: added host-local/opttrees we don't manage in this repo (containerd,MicronTechnology,nvidia,NVIDIA AI Workbench).
Removed
- HTTP basic auth in front of Netdata. The dashboard exposes read-only telemetry on a trusted LAN; one more password to manage was friction without meaningful security gain. Use Netdata Cloud (SSO/MFA) or an OAuth forward-auth proxy if you want real auth.
v0.1.0
Added
- open-webui stack: Open WebUI + Ollama as two pinned containers, GPU reservation on
ollamaonly, healthchecks, log rotation,no-new-privileges. Adapted from https://build.nvidia.com/spark/open-webui/instructions with split services, version-pinned images, and.env-managed config. - caddy stack: HTTPS reverse proxy on
:80/:443(+:443/udpfor HTTP/3) withtls internal(Caddy local CA). Hostname is parameterized viaCADDY_DOMAIN. Routes${CADDY_DOMAIN}→open-webuiandnetdata.${CADDY_DOMAIN}→ Netdata with HTTP basic auth. - netdata stack: real-time host + container observability with
network_mode: hostandpid: host, standard read-only bind mounts (/proc,/sys,/,docker.sock). - mdns component: systemd template (
sparky-mdns-alias@.service) that publishes subdomain mDNS aliases viaavahi-publishsonetdata.spark-1822.local(and any future*.spark-1822.local) resolves on the LAN. - External shared Docker network
web: only Caddy publishes host ports; every other service is reachable only through Caddy. - CI: Trivy workflow (
.github/workflows/trivy.yml): image CVE scans (HIGH+CRITICAL, fixed-only) for every pinned image, IaC config scan of the repo, secret scan. SARIF uploaded to Code Scanning. Pushes/PRs gate on any CRITICAL CVE or leaked secret; scheduled weekly runs are informational. Actions pinned by commit SHA. - Top-level and per-stack READMEs (
README.md,caddy/README.md,mdns/README.md,netdata/README.md). - DGX Spark product link in the top-level README.
Changed
- Image tags for every stack moved into the stack's
.env(single source of truth, surfaced to CI via.env.example). - Open WebUI: dropped the direct
0.0.0.0:8080host publish; now reachable only via Caddy on HTTPS. /opt/<stack>/on the host isroot:root; only.envisroot:docker 640so thedocker-groupalexususer can read it (and rundocker composewithout sudo) while configs require sudo to edit.
Security
WEBUI_SECRET_KEYis required (compose refuses to start without it).- Netdata fronted with Caddy HTTP basic auth (bcrypt hash stored in
caddy/.env). .gitignoreexcludes.env,*.crt,*.key, anddocker-compose.override.yml.- All third-party GitHub Actions pinned by commit SHA.