feat(dns): vanity hostname claims (first-come-first-served)#145
Closed
posix4e wants to merge 1 commit into
Closed
Conversation
Workloads can now declare `expose.claim_hostname: "nvidia-smi"` to
grab a stable short URL (`nvidia-smi.<domain>`) directly under the
zone apex, instead of the auto-labeled per-agent URL shape. DNS
uniqueness is the lock — the first agent to POST the CNAME wins,
subsequent callers get a deterministic conflict error from the CP.
- Schema: `expose:` gains a mutually-exclusive `claim_hostname`
field alongside `hostname_label`. `apps/README.md` documents both.
`web-nvidia-smi` switches to `claim_hostname: "nvidia-smi"`.
- Wire: `DD_EXTRA_INGRESS` env var extends to `@name:port` for
claim entries; `label:port` stays for auto-labeled. The agent
parses into two Vecs and forwards each `/register` +
`/ingress/replace` payload entry as either `{hostname_label, port}`
or `{claim_hostname, port}`.
- CF: new `try_claim_cname` POSTs without upsert, bubbles up a
conflict if the record already exists. `apply_ingress` accepts
a claims slice, adds the ingress rules, and either POSTs a
fresh CNAME (first caller) or confirms we already own it
(idempotent re-apply) — anything else fails.
- CP: `RegisterReq`/`IngressReplaceReq` gain the variant field;
`provision_agent_access` creates a public-bypass CF Access app
per claim at the zone apex. `collector::Agent` stores claims
alongside extras for recovery via /health scrape.
- Release: the collector's orphan-GC path now iterates a dead
agent's claims and calls `cf::release_claim` (checks ownership
before deleting — avoids stomping on a legitimate takeover),
then `delete_access_apps_for` on each vanity domain. So when
the nvidia-smi agent STONITHs, the name frees up for the next
deploy.
No automatic failover / relaunch yet — that's Phase 2 in the plan.
For now, when the owning agent dies, the URL 404s until a fresh
deploy lands somewhere eligible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DD preview readyURL: https://pr-145.devopsdefender.com Browser login: visit https://pr-145.devopsdefender.com — Cloudflare Access routes you Machine-to-machine: GitHub Actions workflows in the Register endpoint for a local agent: |
This was referenced Apr 19, 2026
Member
Author
|
Closing — parked in #148 for later. No longer actively working on this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Workloads can now declare
expose.claim_hostname: "nvidia-smi"to grab a stable short URL (nvidia-smi.<domain>) directly under the zone apex, instead of the auto-labeled per-agent URL (<agent>-<label>.<domain>). DNS uniqueness is the lock — the first agent to POST the CNAME wins, subsequent callers get a deterministic conflict.This is Phase 1 of the "DNS-based deployments" plan. Phase 2 (automatic relaunch on agent death) is not in this PR — for now, when the owning agent dies, the URL 404s until a fresh deploy lands.
What changes
apps/README.md,apps/web-nvidia-smi/workload.json) —expose:gains a mutually-exclusiveclaim_hostnamefield. nvidia-smi switches fromhostname_label: \"gpu\"toclaim_hostname: \"nvidia-smi\".DD_EXTRA_INGRESSenv extends to@name:portfor claim entries. Agent parses into two Vecs (extra_ingress+claims); register/ingress_replace payload entries are either{hostname_label, port}or{claim_hostname, port}.try_claim_cnamePOSTs without upsert, bubbles up conflict if the record already exists.apply_ingressaccepts a claims slice, POSTs fresh CNAMEs or confirms ownership on re-apply.RegisterReq/IngressReplaceReqparse the new variant field;provision_agent_accesscreates a public-bypass CF Access app per claim at the zone apex.collector::Agentstores claims alongside extras; the collector recovers them via /health scrape after a CP restart.cf::release_claim(checks ownership before deleting — doesn't stomp on a legitimate takeover), then sweeps the per-claim Access apps. So when the nvidia-smi agent STONITHs, the name frees up for the next deploy.Test plan
cargo build --release/cargo clippy -D warnings/cargo test— clean (23 tests; 4 new for claim parsing and CF helpers)nvidia-smi.devopsdefender.comresolves and serves after prod agent registersvirsh destroy dd-local-prod); confirm the collector's next tick frees the claim (CNAME gone, Access app gone). Next deploy can re-claim.Follow-ups (plan file)
/logs/cloudflared-tunnelreturns tunnel connection output.🤖 Generated with Claude Code