feat(infra): DNS ownership policy gate + infra.dns_record step + wfctl infra-dns CLI#13
Merged
Conversation
Per-record DNS ownership via zone-root _dns-mgmt TXT records with pattern-based authorization. Lives in workflow-plugin-infra (per repo owner). Fired by wfctl apply via Gate(provider, zone, name, type, owner). Prior art researched: ExternalDNS (inspire-from labels.go API), libdns (adopt for provider abstraction), miekg/dns (adopt for wire parsing), RFC 1464 + 8552 + 8659 (justify format + naming + CAA-style multi-RR). No existing OSS solves this exactly; write our own ~80-line parser mimicking ExternalDNS API. TXT byte budget: ~110 bytes per owner, ~800-byte response cap, 5-6 owners per zone comfortable. Short keys (o/p/t/d), heritage=wfinfra-v1 sentinel, schema versioning baked in. Trust boundary: gate trusts the owner string; real auth at DNS provider token level. Fail-closed bootstrap; SRE bootstraps via dedicated cmd. Related: workflow#779.
…om adversarial cycle 1 C-1 Bootstrap: set-policy command bypasses Gate (different code path) C-2 Owner field: added at YAML config layer (not proto — additive) C-3 IaCProvider unimplemented: use libdns directly via DNSPolicyReader I-1 Heritage collision: rename _dns-mgmt → _workflow-dns-policy I-2 Stranded records: add transfer-records cmd + drift orphan detection I-3 Owner trust: document credential-trust mitigation + v2 path I-4 Race conditions: documented v1 risk + v2 generation-counter path I-5 d=true ambiguity: multiple → parse error; zero → fail-closed m-1 EDNS0 budget: revised to 700B / 4-5 owners m-2 ACME shorthand: dropped (YAGNI) m-3 pkg/ → internal/dnspolicy m-4 Pattern syntax locked: * label, ** segments, @ apex m-5 DNSSEC: v1 scope = managed-DNSSEC only
…om cycle 2 C-1 NEW (cycle-2): STRICT_PROTO conflict → typed owner field in NEW DNSRecordStepInput proto C-2 NEW (cycle-2): infra.dns_record step type doesn't exist → explicit registration in plugin.go I-1 NEW (cycle-2): libdns/hover absent → provider coverage matrix; workflow-plugin-hover gRPC for Hover I-2 NEW (cycle-2): libdns module burden → isolated in internal/dnsprovider/ package I-3 NEW (cycle-2): step/module conflation → resolved via C-2 NEW m-1 (cycle-2): bootstrap overwrite guard → --overwrite-existing required if existing policy m-2 (cycle-2): transfer-records rename → transfer-ownership throughout m-3 (cycle-2): Serialize validation → ErrMultipleDefaults at write time
…om cycle 3 C-1 NEW: split DNSRecordStepInput into Config+Input+Output (TypedStepFactory[C,I,O] pattern, platform plugin precedent) C-2 NEW: ship cmd/wfctl-infra-dns binary; wfctl plugin-binary dispatch picks it up as 'wfctl infra-dns <cmd>' — zero core changes I-1+I-3 NEW: embed hover.Client HTTP calls directly (v1 copy ~80 lines; v2 extract to pkg/hoverclient) I-2 NEW: plugin.contracts.json gets explicit step entry per validate-contract requirement I-4 NEW: ErrMultipleDefaults + sibling sentinels declared in internal/dnspolicy/errors.go m-2: full TypedStepProvider wiring shown (CreateTypedStep + TypedStepTypes) m-3: infra.dns module Start() now returns non-nil deprecation error + migration doc
…om cycle 4 C-1 NEW: plugin.contracts.json field names config/input/output (NOT *_descriptor) per plugin_audit.go firstStringField lookup C-2 NEW: wfctl plugin dispatch reads plugin.json.capabilities.cliCommands[] — NOT wfctl-<name> scanning. Ship dns-policy-admin binary + declare in cliCommands I-1+I-2 NEW: Hover deferred to v2 (508-line client, multi-step adapter, username+password+TOTP auth incompatible with single-token NewAdapter). Zero blast radius (0 Hover zones). I-3 NEW: stale 'wfctl plugin infra dns' → 'wfctl infra-dns' global replace I-4 NEW: secret resolution via existing config.ExpandEnvInMapPreservingVars at config-load (matches infra modules) m-1: ADDITIVE to existing infra.proto (clarified) m-2: full TypedStepFactory 4-arg signature with handler closure m-3: libdns/route53 import verification deferred to implementation
…om cycle 5 C-1 NEW: 'binary' field doesn't exist in CLICommandDeclaration. Switched to single-binary --wfctl-cli sentinel pattern (main plugin binary handles both plugin-server + admin CLI roles via os.Args check). C-2 NEW: dnsprovider.Apply undefined. Defined separate DNSRecordWriter interface; Adapter combines policy R/W + record R/W; explicit Apply signature. I-1 NEW: Route53/GCP/Azure also break single-token. Changed NewAdapter signature to map[string]string creds. v1 narrowed to DO+Cloudflare only. I-2 NEW: Hover deferral contradicted by stale paragraph — removed. I-3 NEW: Bootstrap UX confusion — defined 6-case behavior table. m-1: --wfctl-cli prefix documented in main(). m-2: Audit log path explicit (XDG_STATE_HOME).
…om cycle 6 C-1 NEW: switched to sdk.ServePluginFull + admincli.CLIProvider (supply-chain plugin precedent) C-2 NEW: stale provider_token_ref → provider_creds globally I-1 NEW: ExpandEnvInMapPreservingVars not on step-config path; documented template form + bare-shell form I-2 NEW: 'remains untouched' contradiction rewritten I-3 NEW: dnsgate.Gate signature uses narrow DNSPolicyReader interface (not Adapter) for test simplicity m-1 NEW: sha256 input defined as sha256(strings.Join(sort.StringSlice(Serialize(policy)), '\n')) m-2 NEW: secret leakage explicit guidance — dnsprovider redacts creds in errors
Handler snippet: expandCredsMap + audit-attempt comment added Serialize: pattern + type within-entry sorting documented (deterministic hash) NewAdapter: case-folding via strings.ToLower documented Prose: 'dnsgate.Gate.CheckAllowed' clarified as 'Gate (function) → Policy.CheckAllowed (method)' Apply-attempt audit: pre-mutation log entry + post-Apply outcome entry
10 tasks across 1 PR. TDD per-task with failing-test-first discipline. Implements design at 2026-05-25-dns-ownership-policy-design.md (7 adversarial cycles PASSed). 3 internal packages (dnspolicy + dnsgate + dnsprovider), new infra.dns_record typed step, admincli CLIProvider with 4 subcommands, audit log, infra.dns deprecation. Single PR feat/dns-ownership-policy. Runtime-launch validation triggered at Task 10 (main.go startup config + version pin change). Rollback paths per task + PR-level revert.
… plan cycle 1 C-1 m.typeName→m.infraType (correct struct field per plugin.go:193) C-2 main.go uses internal.Version (existing goreleaser ldflag target, no new var) C-3 CheckAllowed two-phase: explicit claims block default fallback for non-claimer C-4 DO upsertRecords helper: GET+SetRecords(ID) for updates, AppendRecords for new C-5 DO DeleteRecord: GET first for IDs then DeleteRecords (idempotent) I-1 Removed make proto alternative (Makefile lacks target); protoc canonical I-2 ExpandCredsMap exported from Task 7 (was lowercase causing Task 8 compile fail) I-3 Extracted internal/dnsaudit/ as shared package (was wrongly in admincli) I-4 plugin.json surgical edits with correct 13 module type names m-1 added empty-string test for * m-3 SOA/NS protection documented for ops m-2/m-4 acknowledged as accepted
… plan cycle 2 C-1 NEW plugin_test.go hard-fatalf on non-module contracts → patch instructions added for both kind-guards + new TestContractDeclaresStrictStepContracts C-2 NEW Task 9 audit test wrong package → moved to internal/dnsaudit/audit_test.go (package dnsaudit) I-1 Task 10 manual build ldflag wrong path → -X github.com/.../internal.Version=... I-2 Priority int32→uint wrap → validation guard added (Task 7/8) I-3 Phase-2 CheckAllowed ignores Types restriction → added Types check for default owner m-1 internal/version.go misleading prose → corrected to internal/plugin.go:25
- gocodealone-dns mirror: explicitly out-of-scope (separate-repo deliverable; followup issue) - per-zone policy cache: added CachingGate to Task 5 + test for 1-GetTXT-per-zone behavior
internal/dnspolicy package: Parse + Serialize + Policy + Entry types + ErrMultipleDefaults/ErrEmptyOwner/ErrUnknownHeritage sentinels. 6 tests covering parse happy path, foreign-RR skip, multiple-default error (parse + serialize), empty-owner error, deterministic ordering, round-trip idempotency.
…ly + ExpandCredsMap internal/dnsprovider package isolates the libdns boundary. Adapter combines DNSPolicyReader + DNSRecordWriter. NewAdapter dispatches on provider name (case-folded) + ErrUnknownProvider sentinel. v1 supports digitalocean + cloudflare (single-token providers). ExpandCredsMap applies os.ExpandEnv for bare-shell creds. Apply performs post-gate upsert/delete with redacted provider errors.
…fra.dns module - StepTypes/CreateStep + TypedStepTypes/CreateTypedStep wire the new step type - Handler closure: ExpandCredsMap → NewAdapter → Gate → Apply - infra.dns module Start() returns migration-hint error - docs/migration/infra-dns-to-step.md guides operators Rollback: revert commit + remove docs file. (Module deprecation reverses; existing infra.dns YAML configs would re-work.)
internal/admincli implements sdk.CLIProvider with set-policy / drift / transfer-ownership / policy show subcommands. Audit log at \$XDG_STATE_HOME/wfctl/plugins/workflow-plugin-infra/dns-policy-audit.jsonl captures policy edits + apply attempts (LogAttempt/LogOutcome/LogPolicyEdit). RunCLI returns exit code via ServePluginFull → os.Exit.
…mands manifest main.go switches from sdk.Serve to sdk.ServePluginFull so the SDK handles --wfctl-cli dispatch + os.Exit propagation. plugin.json declares capabilities.cliCommands[infra-dns] so wfctl plugin install registers the dynamic subcommand. version restored to 0.0.0 sentinel per workflow#758 ldflag pattern. Rollback: revert commit + rebuild from previous main.go (sdk.Serve only; infra-dns CLI subcommand unavailable but plugin gRPC serving intact).
- DO UpsertTXT: separate upsertTXTRRset helper does DELETE-all-then-APPEND for RRset-replace semantic (closes Critical: stale TXT entries surviving update) - DO type-assert guarded: returns error instead of panic on unexpected libdns concrete type - transfer-ownership: reject --from X --to X with exit 2 + clear stderr - migration doc: examples match actual --owner/--patterns/--token flags; future file-input/bootstrap modes flagged - plugin.go handler: shared CachingGate per step instance (amortize GetTXT across records) - ErrUnknownHeritage doc clarifies reserved-for-future status - admincli subcommands use dnsgate.PolicyName helper (no hardcoded prefix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 26, 2026
intel352
added a commit
that referenced
this pull request
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships workflow-plugin-infra v0.2.0 with per-record DNS ownership policy gate. Apps (multisite, BMW, ratchet) provision DNS records via the new typed
infra.dns_recordstep; the gate validates against a zone-root_workflow-dns-policy.<zone>TXT policy. SRE manages the policy via a newwfctl infra-dnsadmin CLI._workflow-dns-policy.<zone>(one RR per owner, CAA-style;heritage=wfinfra-v1 o=<owner> p=<patterns> [t=<types>] [d=true]).infra.dns_recordSTEP type registered asTypedStepFactory[Config,Input,Output]; replaces deprecatedinfra.dnsmodule.internal/dnspolicy(parser + serializer + matcher + CheckAllowed) +internal/dnsgate(Gate + CachingGate for per-zone caching across multi-record applies) +internal/dnsprovider(libdns DO+CF adapters, RRset-replace semantic for DO TXT updates) +internal/dnsaudit(shared audit log) +internal/admincli(wfctl infra-dns set-policy/drift/transfer-ownership/policy show).sdk.ServePluginFull+sdk.CLIProviderwiring;plugin.jsondeclarescapabilities.cliCommandsfor the dynamicwfctl infra-dnssubcommand.Design
See:
docs/plans/2026-05-25-dns-ownership-policy-design.md— 7 adversarial cycles PASSed.Implementation Plan
See:
docs/plans/2026-05-25-dns-ownership-policy.md— 3 plan-phase adversarial cycles PASSed; alignment-check PASSed (cycle 2); scope-locked 2026-05-25T20:24:21Z (sha256=953622ea2d87…).Scope Manifest
Changes
da89d22):internal/dnspolicy/Parse + Serialize + sentinel errors (ErrMultipleDefaults, ErrEmptyOwner, ErrUnknownHeritage); 7 tests including round-trip + deterministic serialize.aba18f2):MatchPattern(*single label,**multi-segment,@apex) + 15 test cases.0afb235):Policy.CheckAllowed2-phase logic (explicit claims block default fallback) + SOA/NS protection + Types restriction (also applied to default owner).00bcc8a):DNSPolicyReader+DNSRecordWriter+ combinedAdapterinterfaces.0c5e170):dnsgate.Gate(uncached) +dnsgate.CachingGate(per-zone cache; 1 GetTXT across N records in same zone). Test verifies caching behavior.7b0e78a):internal/dnsaudit/shared audit package (LogAttempt + LogOutcome + LogPolicyEdit). Audit log path:\${XDG_STATE_HOME:-\$HOME/.local/state}/wfctl/plugins/workflow-plugin-infra/dns-policy-audit.jsonl.8a2214a): 3-message proto split (DNSRecordStepConfig/Input/Output);plugin.contracts.jsonstep entry withconfig/input/outputfield names;plugin_test.gokind-guards patched for step contracts.0573b94):internal/dnsprovider/libdns v1.1.1 adapters for DigitalOcean + Cloudflare;NewAdapterwith case-folded provider name +map[string]stringcreds;ExpandCredsMapexported for handler use;Applypost-gate mutation; DO uses GET → AppendRecords + SetRecords(ID) pattern.437e535): registersinfra.dns_recordstep (bothStepTypes+TypedStepTypes); handler uses sharedCachingGateper step instance; deprecatesinfra.dnsmodule Start() with migration error; shipsdocs/migration/infra-dns-to-step.md.e5a89b2):internal/admincli/CLIProvider with 4 subcommands (set-policy, drift, transfer-ownership, policy show); audit hooks; subcommands usednsgate.PolicyName(zone)helper.6bea8c6):cmd/workflow-plugin-infra/main.goswitches tosdk.ServePluginFullwithadmincli.CLIProvider;plugin.jsondeclarescapabilities.cliCommands: [infra-dns], bumpsminEngineVersion: 0.51.7 → 0.64.0, restoresversion: 0.0.0sentinel per #758 ldflag pattern.Post-review fixes (
420b46f)Code review surfaced 1 Critical + 4 Important + 2 Minor:
UpsertTXTwas leaving stale TXT entries (loop early-break bug). Fixed via newupsertTXTRRsethelper (DELETE-all-then-APPEND for RRset-replace semantic, matching Cloudflare's SetRecords contract).transfer-ownership --from X --to Xnow errors with exit 2 + clear stderr (was silently deleting).--owner/--patterns/--tokenflag surface; future-f/--bootstrapmodes flagged for followup.CachingGateper step instance (amortizes GetTXT across records).ErrUnknownHeritagedoc clarifies reserved-for-future status.dnsgate.PolicyName(zone)helper (no hardcoded prefix).Test Plan
GOWORK=off go test -count=1 -timeout 300s ./...— 7 packages PASS (admincli, contracts, dnsaudit, dnsgate, dnspolicy, dnsprovider, internal)GOWORK=off go vet ./...— 0 warningsGOWORK=off go build ./...— exit 0wfctl plugin validate-contract --for-publish --tag v0.2.0 .— PASSwfctl plugin verify-capabilities --binary <built> .— OK with ldflag-injected version<binary> --wfctl-cli infra-dns→ exits 2 + prints "usage" (CLI dispatch sanity)Out of scope (per Scope Manifest)
infra.dnsmodule rewrite (deprecated to migration-error stub only)gocodealone-dns/ownership/<zone>.yamlmirror + import script (separate-repo deliverable, followup)set-policy -f <yaml> --bootstrap --overwrite-existingflag surface (v1 ships individual-flag set-policy; bootstrap modes deferred)Adversarial process record
🤖 Generated with Claude Code