Skip to content

feat(infra): DNS ownership policy gate + infra.dns_record step + wfctl infra-dns CLI#13

Merged
intel352 merged 26 commits into
masterfrom
feat/dns-ownership-policy
May 25, 2026
Merged

feat(infra): DNS ownership policy gate + infra.dns_record step + wfctl infra-dns CLI#13
intel352 merged 26 commits into
masterfrom
feat/dns-ownership-policy

Conversation

@intel352
Copy link
Copy Markdown
Contributor

Summary

Ships workflow-plugin-infra v0.2.0 with per-record DNS ownership policy gate. Apps (multisite, BMW, ratchet) provision DNS records via the new typed infra.dns_record step; the gate validates against a zone-root _workflow-dns-policy.<zone> TXT policy. SRE manages the policy via a new wfctl infra-dns admin CLI.

  • New TXT-record-based per-record ownership marker at _workflow-dns-policy.<zone> (one RR per owner, CAA-style; heritage=wfinfra-v1 o=<owner> p=<patterns> [t=<types>] [d=true]).
  • New infra.dns_record STEP type registered as TypedStepFactory[Config,Input,Output]; replaces deprecated infra.dns module.
  • internal/dnspolicy (parser + serializer + matcher + CheckAllowed) + internal/dnsgate (Gate + CachingGate for per-zone caching across multi-record applies) + internal/dnsprovider (libdns DO+CF adapters, RRset-replace semantic for DO TXT updates) + internal/dnsaudit (shared audit log) + internal/admincli (wfctl infra-dns set-policy/drift/transfer-ownership/policy show).
  • sdk.ServePluginFull + sdk.CLIProvider wiring; plugin.json declares capabilities.cliCommands for the dynamic wfctl infra-dns subcommand.

Design

See: docs/plans/2026-05-25-dns-ownership-policy-design.md — 7 adversarial cycles PASSed.

Implementation Plan

See: docs/plans/2026-05-25-dns-ownership-policy.md — 3 plan-phase adversarial cycles PASSed; alignment-check PASSed (cycle 2); scope-locked 2026-05-25T20:24:21Z (sha256=953622ea2d87…).

Scope Manifest

  • PR Count: 1
  • Tasks: 10
  • Status: Locked 2026-05-25T20:24:21Z
PR # Title Tasks Branch
1 feat(infra): DNS ownership policy gate + infra.dns_record step + wfctl infra-dns CLI Task 1-10 feat/dns-ownership-policy

Changes

  • Task 1 (da89d22): internal/dnspolicy/ Parse + Serialize + sentinel errors (ErrMultipleDefaults, ErrEmptyOwner, ErrUnknownHeritage); 7 tests including round-trip + deterministic serialize.
  • Task 2 (aba18f2): MatchPattern (* single label, ** multi-segment, @ apex) + 15 test cases.
  • Task 3 (0afb235): Policy.CheckAllowed 2-phase logic (explicit claims block default fallback) + SOA/NS protection + Types restriction (also applied to default owner).
  • Task 4 (00bcc8a): DNSPolicyReader + DNSRecordWriter + combined Adapter interfaces.
  • Task 5 (0c5e170): dnsgate.Gate (uncached) + dnsgate.CachingGate (per-zone cache; 1 GetTXT across N records in same zone). Test verifies caching behavior.
  • Task 9a (7b0e78a): internal/dnsaudit/ shared audit package (LogAttempt + LogOutcome + LogPolicyEdit). Audit log path: \${XDG_STATE_HOME:-\$HOME/.local/state}/wfctl/plugins/workflow-plugin-infra/dns-policy-audit.jsonl.
  • Task 6 (8a2214a): 3-message proto split (DNSRecordStepConfig/Input/Output); plugin.contracts.json step entry with config/input/output field names; plugin_test.go kind-guards patched for step contracts.
  • Task 7 (0573b94): internal/dnsprovider/ libdns v1.1.1 adapters for DigitalOcean + Cloudflare; NewAdapter with case-folded provider name + map[string]string creds; ExpandCredsMap exported for handler use; Apply post-gate mutation; DO uses GET → AppendRecords + SetRecords(ID) pattern.
  • Task 8 (437e535): registers infra.dns_record step (both StepTypes + TypedStepTypes); handler uses shared CachingGate per step instance; deprecates infra.dns module Start() with migration error; ships docs/migration/infra-dns-to-step.md.
  • Task 9b (e5a89b2): internal/admincli/ CLIProvider with 4 subcommands (set-policy, drift, transfer-ownership, policy show); audit hooks; subcommands use dnsgate.PolicyName(zone) helper.
  • Task 10 (6bea8c6): cmd/workflow-plugin-infra/main.go switches to sdk.ServePluginFull with admincli.CLIProvider; plugin.json declares capabilities.cliCommands: [infra-dns], bumps minEngineVersion: 0.51.7 → 0.64.0, restores version: 0.0.0 sentinel per #758 ldflag pattern.

Post-review fixes (420b46f)

Code review surfaced 1 Critical + 4 Important + 2 Minor:

  • CRITICAL: DO UpsertTXT was leaving stale TXT entries (loop early-break bug). Fixed via new upsertTXTRRset helper (DELETE-all-then-APPEND for RRset-replace semantic, matching Cloudflare's SetRecords contract).
  • Guarded type-assert for libdns DO concrete type (was panicking on unexpected type).
  • transfer-ownership --from X --to X now errors with exit 2 + clear stderr (was silently deleting).
  • Migration doc examples corrected to match actual --owner/--patterns/--token flag surface; future -f/--bootstrap modes flagged for followup.
  • Step handler now uses shared CachingGate per step instance (amortizes GetTXT across records).
  • ErrUnknownHeritage doc clarifies reserved-for-future status.
  • All 4 admincli subcommands use dnsgate.PolicyName(zone) helper (no hardcoded prefix).

Test Plan

  • GOWORK=off go test -count=1 -timeout 300s ./... — 7 packages PASS (admincli, contracts, dnsaudit, dnsgate, dnspolicy, dnsprovider, internal)
  • GOWORK=off go vet ./... — 0 warnings
  • GOWORK=off go build ./... — exit 0
  • wfctl plugin validate-contract --for-publish --tag v0.2.0 . — PASS
  • wfctl plugin verify-capabilities --binary <built> . — OK with ldflag-injected version
  • <binary> --wfctl-cli infra-dns → exits 2 + prints "usage" (CLI dispatch sanity)

Out of scope (per Scope Manifest)

  • Route53, GCP, Azure, Namecheap, GoDaddy, Hover provider adapters (v2 — multi-credential)
  • DNSSEC self-signing zones (v1 = managed-DNSSEC only)
  • Generation-counter concurrent-write protection (v2)
  • Provider WhoAmI / token-bound owner verification (v2)
  • Cross-driver ownership-tagging beyond DNS (workflow#779)
  • infra.dns module rewrite (deprecated to migration-error stub only)
  • gocodealone-dns/ownership/<zone>.yaml mirror + import script (separate-repo deliverable, followup)
  • set-policy -f <yaml> --bootstrap --overwrite-existing flag surface (v1 ships individual-flag set-policy; bootstrap modes deferred)

Adversarial process record

  • Design: 7 adversarial cycles (cycle 7 PASS) — surfaced 13 Criticals + 22 Importants across cycles
  • Plan: 3 adversarial cycles (cycle 3 PASS) — surfaced 7 Criticals + 7 Importants
  • Alignment: 2 cycles (cycle 2 PASS — closed gocodealone-dns mirror as out-of-scope + added CachingGate)
  • Scope-locked at 2026-05-25T20:24:21Z (sha256=953622ea2d87…)
  • Spec review: PASS
  • Code review: REJECTED → fixed → APPROVED (1 Critical + 4 Important + 2 Minor all addressed)

🤖 Generated with Claude Code

intel352 and others added 26 commits May 25, 2026 14:50
Per-record DNS ownership via zone-root _dns-mgmt TXT records with
pattern-based authorization. Lives in workflow-plugin-infra (per repo
owner). Fired by wfctl apply via Gate(provider, zone, name, type, owner).

Prior art researched: ExternalDNS (inspire-from labels.go API), libdns
(adopt for provider abstraction), miekg/dns (adopt for wire parsing),
RFC 1464 + 8552 + 8659 (justify format + naming + CAA-style multi-RR).
No existing OSS solves this exactly; write our own ~80-line parser
mimicking ExternalDNS API.

TXT byte budget: ~110 bytes per owner, ~800-byte response cap, 5-6
owners per zone comfortable. Short keys (o/p/t/d), heritage=wfinfra-v1
sentinel, schema versioning baked in.

Trust boundary: gate trusts the owner string; real auth at DNS provider
token level. Fail-closed bootstrap; SRE bootstraps via dedicated cmd.

Related: workflow#779.
…om adversarial cycle 1

C-1 Bootstrap: set-policy command bypasses Gate (different code path)
C-2 Owner field: added at YAML config layer (not proto — additive)
C-3 IaCProvider unimplemented: use libdns directly via DNSPolicyReader
I-1 Heritage collision: rename _dns-mgmt → _workflow-dns-policy
I-2 Stranded records: add transfer-records cmd + drift orphan detection
I-3 Owner trust: document credential-trust mitigation + v2 path
I-4 Race conditions: documented v1 risk + v2 generation-counter path
I-5 d=true ambiguity: multiple → parse error; zero → fail-closed
m-1 EDNS0 budget: revised to 700B / 4-5 owners
m-2 ACME shorthand: dropped (YAGNI)
m-3 pkg/ → internal/dnspolicy
m-4 Pattern syntax locked: * label, ** segments, @ apex
m-5 DNSSEC: v1 scope = managed-DNSSEC only
…om cycle 2

C-1 NEW (cycle-2): STRICT_PROTO conflict → typed owner field in NEW DNSRecordStepInput proto
C-2 NEW (cycle-2): infra.dns_record step type doesn't exist → explicit registration in plugin.go
I-1 NEW (cycle-2): libdns/hover absent → provider coverage matrix; workflow-plugin-hover gRPC for Hover
I-2 NEW (cycle-2): libdns module burden → isolated in internal/dnsprovider/ package
I-3 NEW (cycle-2): step/module conflation → resolved via C-2 NEW
m-1 (cycle-2): bootstrap overwrite guard → --overwrite-existing required if existing policy
m-2 (cycle-2): transfer-records rename → transfer-ownership throughout
m-3 (cycle-2): Serialize validation → ErrMultipleDefaults at write time
…om cycle 3

C-1 NEW: split DNSRecordStepInput into Config+Input+Output (TypedStepFactory[C,I,O] pattern, platform plugin precedent)
C-2 NEW: ship cmd/wfctl-infra-dns binary; wfctl plugin-binary dispatch picks it up as 'wfctl infra-dns <cmd>' — zero core changes
I-1+I-3 NEW: embed hover.Client HTTP calls directly (v1 copy ~80 lines; v2 extract to pkg/hoverclient)
I-2 NEW: plugin.contracts.json gets explicit step entry per validate-contract requirement
I-4 NEW: ErrMultipleDefaults + sibling sentinels declared in internal/dnspolicy/errors.go
m-2: full TypedStepProvider wiring shown (CreateTypedStep + TypedStepTypes)
m-3: infra.dns module Start() now returns non-nil deprecation error + migration doc
…om cycle 4

C-1 NEW: plugin.contracts.json field names config/input/output (NOT *_descriptor) per plugin_audit.go firstStringField lookup
C-2 NEW: wfctl plugin dispatch reads plugin.json.capabilities.cliCommands[] — NOT wfctl-<name> scanning. Ship dns-policy-admin binary + declare in cliCommands
I-1+I-2 NEW: Hover deferred to v2 (508-line client, multi-step adapter, username+password+TOTP auth incompatible with single-token NewAdapter). Zero blast radius (0 Hover zones).
I-3 NEW: stale 'wfctl plugin infra dns' → 'wfctl infra-dns' global replace
I-4 NEW: secret resolution via existing config.ExpandEnvInMapPreservingVars at config-load (matches infra modules)
m-1: ADDITIVE to existing infra.proto (clarified)
m-2: full TypedStepFactory 4-arg signature with handler closure
m-3: libdns/route53 import verification deferred to implementation
…om cycle 5

C-1 NEW: 'binary' field doesn't exist in CLICommandDeclaration. Switched to single-binary --wfctl-cli sentinel pattern (main plugin binary handles both plugin-server + admin CLI roles via os.Args check).
C-2 NEW: dnsprovider.Apply undefined. Defined separate DNSRecordWriter interface; Adapter combines policy R/W + record R/W; explicit Apply signature.
I-1 NEW: Route53/GCP/Azure also break single-token. Changed NewAdapter signature to map[string]string creds. v1 narrowed to DO+Cloudflare only.
I-2 NEW: Hover deferral contradicted by stale paragraph — removed.
I-3 NEW: Bootstrap UX confusion — defined 6-case behavior table.
m-1: --wfctl-cli prefix documented in main().
m-2: Audit log path explicit (XDG_STATE_HOME).
…om cycle 6

C-1 NEW: switched to sdk.ServePluginFull + admincli.CLIProvider (supply-chain plugin precedent)
C-2 NEW: stale provider_token_ref → provider_creds globally
I-1 NEW: ExpandEnvInMapPreservingVars not on step-config path; documented template form + bare-shell form
I-2 NEW: 'remains untouched' contradiction rewritten
I-3 NEW: dnsgate.Gate signature uses narrow DNSPolicyReader interface (not Adapter) for test simplicity
m-1 NEW: sha256 input defined as sha256(strings.Join(sort.StringSlice(Serialize(policy)), '\n'))
m-2 NEW: secret leakage explicit guidance — dnsprovider redacts creds in errors
Handler snippet: expandCredsMap + audit-attempt comment added
Serialize: pattern + type within-entry sorting documented (deterministic hash)
NewAdapter: case-folding via strings.ToLower documented
Prose: 'dnsgate.Gate.CheckAllowed' clarified as 'Gate (function) → Policy.CheckAllowed (method)'
Apply-attempt audit: pre-mutation log entry + post-Apply outcome entry
10 tasks across 1 PR. TDD per-task with failing-test-first discipline.
Implements design at 2026-05-25-dns-ownership-policy-design.md (7 adversarial
cycles PASSed). 3 internal packages (dnspolicy + dnsgate + dnsprovider),
new infra.dns_record typed step, admincli CLIProvider with 4 subcommands,
audit log, infra.dns deprecation. Single PR feat/dns-ownership-policy.

Runtime-launch validation triggered at Task 10 (main.go startup config +
version pin change). Rollback paths per task + PR-level revert.
… plan cycle 1

C-1 m.typeName→m.infraType (correct struct field per plugin.go:193)
C-2 main.go uses internal.Version (existing goreleaser ldflag target, no new var)
C-3 CheckAllowed two-phase: explicit claims block default fallback for non-claimer
C-4 DO upsertRecords helper: GET+SetRecords(ID) for updates, AppendRecords for new
C-5 DO DeleteRecord: GET first for IDs then DeleteRecords (idempotent)
I-1 Removed make proto alternative (Makefile lacks target); protoc canonical
I-2 ExpandCredsMap exported from Task 7 (was lowercase causing Task 8 compile fail)
I-3 Extracted internal/dnsaudit/ as shared package (was wrongly in admincli)
I-4 plugin.json surgical edits with correct 13 module type names
m-1 added empty-string test for *
m-3 SOA/NS protection documented for ops
m-2/m-4 acknowledged as accepted
… plan cycle 2

C-1 NEW plugin_test.go hard-fatalf on non-module contracts → patch instructions added for both kind-guards + new TestContractDeclaresStrictStepContracts
C-2 NEW Task 9 audit test wrong package → moved to internal/dnsaudit/audit_test.go (package dnsaudit)
I-1 Task 10 manual build ldflag wrong path → -X github.com/.../internal.Version=...
I-2 Priority int32→uint wrap → validation guard added (Task 7/8)
I-3 Phase-2 CheckAllowed ignores Types restriction → added Types check for default owner
m-1 internal/version.go misleading prose → corrected to internal/plugin.go:25
- gocodealone-dns mirror: explicitly out-of-scope (separate-repo deliverable; followup issue)
- per-zone policy cache: added CachingGate to Task 5 + test for 1-GetTXT-per-zone behavior
internal/dnspolicy package: Parse + Serialize + Policy + Entry types
+ ErrMultipleDefaults/ErrEmptyOwner/ErrUnknownHeritage sentinels.
6 tests covering parse happy path, foreign-RR skip, multiple-default
error (parse + serialize), empty-owner error, deterministic ordering,
round-trip idempotency.
…ly + ExpandCredsMap

internal/dnsprovider package isolates the libdns boundary. Adapter
combines DNSPolicyReader + DNSRecordWriter. NewAdapter dispatches
on provider name (case-folded) + ErrUnknownProvider sentinel.
v1 supports digitalocean + cloudflare (single-token providers).
ExpandCredsMap applies os.ExpandEnv for bare-shell creds. Apply
performs post-gate upsert/delete with redacted provider errors.
…fra.dns module

- StepTypes/CreateStep + TypedStepTypes/CreateTypedStep wire the new step type
- Handler closure: ExpandCredsMap → NewAdapter → Gate → Apply
- infra.dns module Start() returns migration-hint error
- docs/migration/infra-dns-to-step.md guides operators

Rollback: revert commit + remove docs file. (Module deprecation reverses;
existing infra.dns YAML configs would re-work.)
internal/admincli implements sdk.CLIProvider with set-policy / drift /
transfer-ownership / policy show subcommands. Audit log at
\$XDG_STATE_HOME/wfctl/plugins/workflow-plugin-infra/dns-policy-audit.jsonl
captures policy edits + apply attempts (LogAttempt/LogOutcome/LogPolicyEdit).
RunCLI returns exit code via ServePluginFull → os.Exit.
…mands manifest

main.go switches from sdk.Serve to sdk.ServePluginFull so the SDK handles
--wfctl-cli dispatch + os.Exit propagation. plugin.json declares
capabilities.cliCommands[infra-dns] so wfctl plugin install registers
the dynamic subcommand. version restored to 0.0.0 sentinel per
workflow#758 ldflag pattern.

Rollback: revert commit + rebuild from previous main.go (sdk.Serve only;
infra-dns CLI subcommand unavailable but plugin gRPC serving intact).
- DO UpsertTXT: separate upsertTXTRRset helper does DELETE-all-then-APPEND for RRset-replace semantic (closes Critical: stale TXT entries surviving update)
- DO type-assert guarded: returns error instead of panic on unexpected libdns concrete type
- transfer-ownership: reject --from X --to X with exit 2 + clear stderr
- migration doc: examples match actual --owner/--patterns/--token flags; future file-input/bootstrap modes flagged
- plugin.go handler: shared CachingGate per step instance (amortize GetTXT across records)
- ErrUnknownHeritage doc clarifies reserved-for-future status
- admincli subcommands use dnsgate.PolicyName helper (no hardcoded prefix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@intel352 intel352 merged commit 885a8cc into master May 25, 2026
3 checks passed
@intel352 intel352 deleted the feat/dns-ownership-policy branch May 25, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant