Parked work from closed PR #145. Full design lives in the PR diff / code if we want to pick it up.
Problem
Every workload URL today is welded to an agent UUID (`-.devopsdefender.com`). When an agent STONITHs or relaunches, the URL breaks. No way to declare a stable short URL like `nvidia-smi.devopsdefender.com` that follows the workload around the fleet, and no automatic failover — if the agent serving a user-visible demo dies, the URL orphans until someone redeploys manually.
Shape
- Schema: `expose:` gains a mutually-exclusive `claim_hostname` field alongside `hostname_label`.
- Wire: `DD_EXTRA_INGRESS` env extends to `@name:port` for claim entries (auto-label `label:port` stays).
- Arbitration: CP POSTs CNAMEs without upsert (`cf::try_claim_cname`). First call wins; later callers hit 409. DNS uniqueness is the lock.
- Teardown: collector's orphan-GC path releases the CNAME + CF Access app when the owning agent dies (ownership-checked so it doesn't stomp a legitimate takeover).
Phase 2 (not in #145)
Scraper-driven automatic relaunch: when an agent with active claims goes unhealthy, CP picks another eligible agent (capability match: `require_labels: ["gpu"]` etc.), posts the spec, repoints the CNAME.
Why we closed the PR
Pausing while we focus on smaller near-term fixes. Design is solid; re-open when we want DNS to be the deployment contract.
Parked work from closed PR #145. Full design lives in the PR diff / code if we want to pick it up.
Problem
Every workload URL today is welded to an agent UUID (`-.devopsdefender.com`). When an agent STONITHs or relaunches, the URL breaks. No way to declare a stable short URL like `nvidia-smi.devopsdefender.com` that follows the workload around the fleet, and no automatic failover — if the agent serving a user-visible demo dies, the URL orphans until someone redeploys manually.
Shape
Phase 2 (not in #145)
Scraper-driven automatic relaunch: when an agent with active claims goes unhealthy, CP picks another eligible agent (capability match: `require_labels: ["gpu"]` etc.), posts the spec, repoints the CNAME.
Why we closed the PR
Pausing while we focus on smaller near-term fixes. Design is solid; re-open when we want DNS to be the deployment contract.