[WIP] Upgrade Modular#4
Closed
Copilot wants to merge 2 commits into
Closed
Conversation
Contributor
|
@copilot Try again |
Copilot stopped work on behalf of
intel352 due to an error
July 11, 2025 19:57
intel352
added a commit
that referenced
this pull request
May 2, 2026
Important #1 — pre-scan all ghosts for protected resources before any state mutation (infra_apply_refresh.go). The original loop could prune an unprotected ghost then fail on a protected one, leaving partial state. Two-pass pattern: collect all blocked names first, return error listing every blocked resource, then execute mutations only when pre-scan passes. Important #2 — validate --allow-protected-prune requires --refresh (infra.go). Without this check the flag was silently no-op'd, misleading operators. Now returns a clear pre-flight error before any work begins. Minor #3 — replace broken docs/plans/2026-05-02-infra-drift-recovery.md link in drift-recovery.md (design worktree path, never merged) with a pointer to the canonical source file. Minor #4 — markdown table was already correct standard format; no change needed (table separator rows are standard |---|---|). Tests added: - TestApplyRefresh_MultipleGhostsAllOrNothing (all-or-nothing invariant) - TestApplyRefresh_AllGhostsUnprotectedPrunesAll (pre-scan allows clean batch) - TestInfraApply_AllowProtectedPruneRequiresRefresh (flag validation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 2, 2026
…ction state recovery (#519) * feat(interfaces): add DriftClass enum + Class field to DriftResult Add DriftClass string type with 4 constants: - DriftClassUnknown (zero value, omitempty-safe for backwards compat) - DriftClassInSync - DriftClassGhost (state has resource; cloud returns ErrResourceNotFound) - DriftClassConfig (both exist; configs differ) Extend DriftResult with Class DriftClass json:"class,omitempty" field (additive, backwards-compatible — consumers without the field see no JSON change due to omitempty). 4 tests covering constant values, omitempty-on-zero, ghost JSON rendering, and round-trip marshal/unmarshal for all 3 non-zero classes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(wfctl): implement runInfraApplyRefreshPhase for ghost-prune recovery New function runInfraApplyRefreshPhase calls provider.DetectDrift and prunes ghost-in-state entries (DriftClassGhost) from the state store: - Dry-run by default (no autoApprove): prints "would prune" per ghost - autoApprove=true: calls store.DeleteResource + emits audit log to stderr - Protected resources blocked unless allowProtectedPrune=true - Transient DetectDrift errors propagate immediately; no pruning happens - DriftClassConfig / DriftClassInSync entries skipped (regular plan path) 6 tests covering: dry-run no-mutate, auto-approve prune, protected-block, protected-with-flag, transient-error-propagation, in-sync-skip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(wfctl): wire --refresh + --allow-protected-prune flags to infra apply Add two flags to runInfraApply: - --refresh: runs runInfraApplyRefreshPhase before plan+apply, iterating all state-tracked provider groups via groupStatesByProvider and pruning any DriftClassGhost entries. - --allow-protected-prune: passed to runInfraApplyRefreshPhase to permit pruning resources with protected:true in state Outputs. Refresh phase only fires when --refresh is set and the config has infra.* modules; silently skipped for legacy platform.* configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(wfctl): extend infra drift output with Class column driftInfraModules now prints drift class (GHOST / CONFIG / IN-SYNC) using the DriftClass constants from interfaces: GHOST <name> <type> — cloud reports not found CONFIG <name> <type> <field>: expected=<v> actual=<v> IN-SYNC <name> <type> Providers still returning DriftClassUnknown fall through to the legacy Drifted-bool behavior for backwards compatibility. Column-aligned format matches wfctl infra status output style. Drift-found message updated to suggest --refresh flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add drift-recovery operator guide + CHANGELOG Unreleased entry docs/wfctl/drift-recovery.md (~100 lines) covering: - Three drift classes (ghost / config / in-sync) with recovery actions - wfctl infra drift usage + example output with Class column - Dry-run-first workflow → auto-approve prune - Protected resource two-key contract (--allow-protected-prune) - Audit log format - Production safety checklist - CI integration patterns CHANGELOG.md Unreleased section: DriftClass enum, --refresh flag, --allow-protected-prune flag, drift output Class column, docs file. Notes omitempty additions to DriftResult.Expected/Actual/Fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(wfctl): address 4 Copilot review findings on apply --refresh Important #1 — pre-scan all ghosts for protected resources before any state mutation (infra_apply_refresh.go). The original loop could prune an unprotected ghost then fail on a protected one, leaving partial state. Two-pass pattern: collect all blocked names first, return error listing every blocked resource, then execute mutations only when pre-scan passes. Important #2 — validate --allow-protected-prune requires --refresh (infra.go). Without this check the flag was silently no-op'd, misleading operators. Now returns a clear pre-flight error before any work begins. Minor #3 — replace broken docs/plans/2026-05-02-infra-drift-recovery.md link in drift-recovery.md (design worktree path, never merged) with a pointer to the canonical source file. Minor #4 — markdown table was already correct standard format; no change needed (table separator rows are standard |---|---|). Tests added: - TestApplyRefresh_MultipleGhostsAllOrNothing (all-or-nothing invariant) - TestApplyRefresh_AllGhostsUnprotectedPrunesAll (pre-scan allows clean batch) - TestInfraApply_AllowProtectedPruneRequiresRefresh (flag validation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…rkflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows.
intel352
added a commit
that referenced
this pull request
May 4, 2026
…ft postcondition + diff cache (W-3a of 12) (#527) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * fix(iac): diffcache.Get refreshes mtime so LRU is actually LRU (Copilot review) Without this, frequently-read entries were evicted as if unused because maybeEvict orders by mtime. Now Get touches mtime via os.Chtimes(now, now), turning eviction from FIFO-by-write into true LRU. Mtime-touch chosen over a sidecar last-accessed file to keep the on-disk shape trivial; cost is one extra syscall per hit, errors are ignored (failure degrades eviction precision but never produces wrong cache results). Adds TestCache_LRURefreshesOnGet regression test: writes N entries, Gets the oldest, then triggers over-cap, asserts the oldest survives and the second-oldest (now the LRU) is evicted instead. * fix(iac): diffcache.Put uses unique temp filename to avoid same-key write races (Copilot review) Pre-fix, two goroutines calling Put with the same Key both wrote to `<key>.json.tmp` and one would clobber the other's temp file mid-write, producing either a Rename failure or a half-written final file. Now Put uses os.CreateTemp so each call gets a unique `<key>.json.<random>.tmp` filename; the final rename is racy on which payload wins, but both payloads were derived from the same Key so the outcome is deterministic from the caller's perspective. Adds godoc "Concurrency: safe for concurrent use, including concurrent Puts of the same Key." Adds TestCache_ConcurrentSameKeyPut regression: 20 goroutines Put the same Key, asserts no leftover *.tmp files, asserts final cache file decodes. Run under -race. * fix(iac): diffcache.Put atomic rename on Windows (Copilot review) Document the os.Rename Windows limitation explicitly: on Windows, os.Rename fails when the destination exists, so an in-place cache update via Put will fail. The caller treats this as a write failure and proceeds without caching — correct because apply remains correct on a 100% miss rate (per the package's cache-as-amortization framing). We chose documentation over vendoring github.com/google/renameio: adding renameio would introduce the first such dependency in the repo, and there is no Windows-supported wfctl use case today. The existing precedent in cmd/wfctl/update.go and cmd/wfctl/plugin_install.go also uses bare os.Rename without Windows guards. The fix tracks the limitation in two places: the Put godoc (where the rename happens) and the package godoc Known Limitations section (where consumers will look). * fix(iac): diffcache returns deep-copy of DiffResult to avoid shared-slice mutation (Copilot review) Pre-fix, the in-memory cache stored DiffResult by value but the Changes slice ([]FieldChange) shared its backing array between the cached entry and the value returned to the caller. A caller mutating the returned Changes slice (element-level or via append- into-cap) would silently mutate the cached entry. The symmetric case is the same: mutating the Put argument after the Put call would leak into the cached value. Fix: clone the Changes slice via slices.Clone in both Get and Put. Scalar struct fields are value-copied by struct assignment so a single helper (cloneDiffResult) covers both directions. The filesystem cache deserializes from JSON each time so each Get already yields a fresh slice — no change needed there. FieldChange.Old/New are typed any; if a caller stores a pointer or mutable map there, the deep-copy stops at the slice level. By convention DiffResult.Changes carries scalar Old/New (strings, numbers, bools), so that is the right tradeoff between correctness and copy cost. Documented in memoryCache godoc. Adds TestCache_MemoryDeepCopiesChanges regression: Put a value, mutate the original argument, Get + mutate (element + append), Get again, assert original is preserved. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…s on manifest computePlanVersion (W-3b of 12) (#528) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * fix(iac): T3.6e env-var hygiene — TestMain unsets WFCTL_PLAN_DIFF_CONCURRENCY (Copilot review) A developer shell with WFCTL_PLAN_DIFF_CONCURRENCY=1 (or any other non-default value) would serialize ComputePlan's parallel Diff dispatch and break the parallelism assertions in differ tests. Explicitly unset the var in TestMain alongside the existing WFCTL_DIFFCACHE=disabled hygiene so test runs are deterministic regardless of shell environment. Addresses Copilot inline comment on PR #528 (platform/main_test.go:24). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.6 polish — drop double error: prefix + reuse precomputed configHash (Copilot review round 2) Two real fixes from Copilot's re-review of PR #528: 1. **Double "error:" prefix on plugin-load failure** — cmd/wfctl/main.go's top-level printer already emits "error: %v" on command failure. The T3.6b error string in cmd/wfctl/infra_plan_provider.go was prefixed with a literal "error: " of its own, producing operator output like `error: error: failed to load plugin "do": ...`. Drop the in-error prefix; update the assertion in infra_plan_provider_load_test.go to match the unprefixed root error; clarify in the CHANGELOG that the "error:" prefix in the rendered string is added by wfctl's top-level printer (not the underlying error). 2. **Duplicate configHash work in classifyModification** — ComputePlan already computes `hash := configHash(spec.Config)` while bucketing create vs modification candidates; classifyModification was re-computing the same hash on every Diff dispatch. Thread the precomputed hash through via a new `hash string` field on modCandidate + new parameter on classifyModification, so the per- candidate hashing happens exactly once. Addresses Copilot inline comments on PR #528 (round 2): - cmd/wfctl/infra_plan_provider.go:121 - platform/differ.go:104 Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./interfaces/... ./iac/... ./plugin/sdk/... ./module/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.6/T3.9 polish — diff-cache bypass on empty ProviderID + omit empty current_sensitive arg (Copilot review round 3) Two real fixes from Copilot's re-review of PR #528 round 3: 1. **Diff-cache hash-collision risk on empty ProviderID** — The cache key shape (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) does not include the resource Name. When two existing-state resources of the same Type both have ProviderID=="" (state-bootstrap, broken-plugin paths, transient races) and matching SHAConfig + SHAOutputs (e.g., both freshly-discovered with default-config and empty-outputs), they would share a cache key and could serve each other's cached DiffResult — misclassifying actions or skipping a required Diff. Defensive fix: classifyModification now skips both cache.Get and cache.Put when rs.ProviderID is empty, always re- dispatching to the driver. Cost is one extra Diff call per pre-bootstrap resource; benefit is correctness regardless of state completeness. New pin: TestComputePlan_EmptyProviderID_BypassesCache. 2. **`current_sensitive` arg serialized as null instead of omitted** — sensitiveToAny's docstring promises "trim-friendly" wire shape by returning nil for empty input, but the call site at remoteResourceDriver.Diff was unconditionally setting `args["current_sensitive"] = sensitiveToAny(...)`, which structpb serializes as a NullValue field rather than omitting the key. Conditionally include the key only when sensitiveToAny returns a non-nil map, matching the docstring intent. New pins: TestRemoteDriver_Diff_OmitsCurrentSensitiveWhenEmpty + TestRemoteDriver_Diff_IncludesCurrentSensitiveWhenPopulated. Addresses Copilot inline comments on PR #528 (round 3): - platform/differ.go:240 (cache key empty-ProviderID collision) - cmd/wfctl/deploy_providers.go:542 (current_sensitive null vs omit) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.6/T3.9 polish — preserve loadErr chain + lock-free diff cache + bypass-side-effect-free (Copilot review round 4) Three real fixes from Copilot's re-review of PR #528 round 4: 1. **loadErr chain lost across runInfraPlan re-wrap (errors.Is/As)** — computePlanForInfraSpecs returned `failed to load plugin %q: %v; ...` (using %v), losing the underlying error. After runInfraPlan re-wraps with `compute plan: %w`, callers could not errors.Is / errors.As against the original loader failure (e.g. to differentiate "plugin binary missing" from "plugin crashed during handshake"). Switch the inner wrap to %w. Rendered text is identical to %v. New pin: TestRunInfraPlan_FailsLoudOnPluginLoadFailure now asserts `errors.Is(err, loadErr)` reaches the sentinel through both wrap layers. 2. **getDiffCache called even on the empty-ProviderID bypass path** — classifyModification was calling getDiffCache() unconditionally, which (under the old per-call mutex) acquired the lock, and (under any backend-init pattern) would eagerly construct the filesystem cache backend at ~/.cache/wfctl/diff/ on the operator's machine even for resources that bypass the cache. Move the getDiffCache call inside the `if cacheable` branch so the bypass path is fully side-effect free. Round-3 already pinned the bypass behavior via TestComputePlan_EmptyProviderID_BypassesCache. 3. **Per-call sync.Mutex contention on getDiffCache hot path** — Under ComputePlan's parallel Diff fan-out (planDiffConcurrency() workers), the per-call mutex on getDiffCache was contention on every cache.Get / cache.Put, especially on cache hits where the Get itself is cheap. Refactor to sync.Once for one-time init + atomic.Pointer[diffcache.Cache] for lock-free reads. Subsequent reads are just an atomic.Load (and a typed deref). The test-swap helper setDiffCacheForTest is updated to Store/Restore directly on the atomic; cleanup seeds a fresh default when there was no prior value (so subsequent tests in the binary still observe a working cache). Addresses Copilot inline comments on PR #528 (round 4): - cmd/wfctl/infra_plan_provider.go:124 (%v → %w) - platform/differ.go:235 (getDiffCache eager call on bypass path) - platform/differ.go:405 (per-call mutex on hot path) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... ./module/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.7/T3.6 polish — DispatchVersionFor centralizes type assertion + cache nil-DiffResult as zero-value (Copilot review round 5) Two real fixes from Copilot's re-review of PR #528 round 5 (a third finding, plan/apply discovery duplication, is filed as a follow-up issue rather than addressed in-PR to keep W-3b scope-locked). 1. **DispatchVersionFor docstring vs signature mismatch** — The helper claims to centralize the type assertion + non-implementer defaulting, but its parameter type was `ComputePlanVersionDeclarer`, forcing every call site to type-assert externally. Change the signature to accept `any` and perform the type assertion inside; non-implementers + nil now both return "v1" inside the helper as the docstring already promised. Param is `any` (not interfaces.IaCProvider) to keep the helper package import-free of the engine's interfaces package and to keep non-engine call sites (tests, stubs) frictionless. Updated the only production call site (cmd/wfctl/infra_apply.go) to drop the external type-assert. 2. **Cache no-op when driver.Diff returns (nil, nil)** — The cache.Put was guarded by `fresh != nil`, so providers using the nil-as-no-op convention (a documented option in the (DiffResult|nil, error|nil) return shape) re-Diffed on every ComputePlan call — undermining the cache contract for that whole class of providers. Cache a zero-value DiffResult on (nil, nil) returns; classifyModification's downstream switch already treats zero-value the same as nil (no plan action), so the semantic is preserved while the cache stays effective. New pin: TestComputePlan_NilDiffResult_CachesAsZeroValue verifies that the second ComputePlan against unchanged inputs is served from cache (driver.Diff invoked exactly once across two calls). 3. **Plan/apply provider-discovery duplication** (Copilot finding R5-C, not addressed in this PR) — computePlanForInfraSpecs duplicates the iac.provider discovery + grouping logic in applyInfraModules. Per workspace memory feedback_implementer_scope_bleed, refactoring to a shared helper is a separate task: the duplication exists pre-W-3b (apply was the original; plan was added in W-3b mirroring it intentionally), and the extraction touches code paths W-3b's test plan does not cover. Filed as follow-up rather than expanding W-3b's blast radius. Documented in PR description. Addresses Copilot inline comments on PR #528 (round 5): - iac/wfctlhelpers/dispatch.go:41 (signature vs docstring mismatch) - platform/differ.go:265 (cache write skipped on (nil, nil)) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(iac): T3.7 — correct DispatchVersionFor + findIaCPluginDir doc claims (Copilot review round 6) Two doc-comment accuracy fixes from Copilot's re-review of PR #528 round 6 — both surfaced by/exposed in the round-5 changes: 1. **findIaCPluginDir docstring referenced wrong helper** — Round 5 changed wfctlhelpers.DispatchVersionFor to take `any` (a provider value), but findIaCPluginDir's docstring still told callers to pass the returned `computePlanVersion` string through DispatchVersionFor. That call wouldn't type-assert to ComputePlanVersionDeclarer (a string isn't a provider) and would silently default to "v1". Replaced with the correct pattern: string-equality against wfctlhelpers.DispatchVersionV2 at this loader-level seam where only the raw string is in hand. Includes example snippet. 2. **DispatchVersionFor docstring overstated the validation guarantee** — Claimed plugin/sdk.ParseManifest schema-validation means the dispatch only sees {"v1", "v2", ""}. True for callers that load via ParseManifest, but cmd/wfctl/deploy_providers.go's findIaCPluginDir / readIaCPluginComputePlanVersion path uses a minimal json.Unmarshal with NO schema validation — so unknown values CAN reach DispatchVersionFor at runtime. Updated the docstring to flag this honestly and call out that the default-to-v1 behavior is the safety net for those paths (callers must not rely on the validation guarantee). Doc-only; no code change. All packages still build + vet cleanly. Addresses Copilot inline comments on PR #528 (round 6): - cmd/wfctl/deploy_providers.go:107 (wrong helper referenced) - iac/wfctlhelpers/dispatch.go:18 (overstated validation guarantee) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(iac): T3.5 — TestParseConcurrencyEnv subtest names (Copilot review round 7) The first table case had `in: ""` and used `tc.in` directly as the t.Run subtest name. Go's testing package silently rewrites empty subtest names to "#00", which is unique enough to run but masks the case identity in -v output and failure reports. Add a `name` field to the table struct and use stable descriptive labels (empty, non_numeric, negative, zero, one, eight, thirty_two, thirty_three_clamped_to_max, one_hundred_clamped_to_max) while still passing the raw `tc.in` to parseConcurrencyEnv. Identical test coverage; clearer reporting. Addresses Copilot inline comment on PR #528 (round 7): - platform/differ_cache_test.go:253 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(iac): T3.5/T3.6 — clamp + in-flight counter doc accuracy (Copilot review round 8) Two doc-only nits surfaced in Copilot's round-8 re-review of PR #528. Both are accuracy fixes — no behaviour change. 1. **planDiffConcurrencyMin/Max comment overstated "disable"** — The comment said "Below 1 disables concurrency (worse than serial)", but parseConcurrencyEnv clamps values <=0 UP to planDiffConcurrencyMin (=1), which produces effectively-serial dispatch (one Diff in flight), not "disabled". Operators cannot turn the worker pool off, only narrow it to one. Updated the comment to spell that out and call out both clamp directions explicitly. 2. **channelGatedDriver.inFlight docstring claimed "peak"** — The docstring said inFlight tracks the *peak* number of simultaneous Diff goroutines, but…
intel352
added a commit
that referenced
this pull request
May 4, 2026
* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type
* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel
* feat(iac): wfctl infra plan writes InputSnapshot to plan.json
* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash
* feat(iac): wfctl infra plan warns when plan.json not in .gitignore
* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring
* feat(iac): add refreshoutputs.Refresh — read-only state output refresh
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): add wfctl infra refresh-outputs subcommand
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)
T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.
Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
no-op. Operators who use the "0"/"false" convention to disable a
feature get the expected behaviour rather than a presence-only
foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
environments that force the env var on globally).
Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).
Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(iac): concurrency stress test for refreshoutputs.Refresh
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(wfctl): document infra refresh-outputs subcommand
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)
* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema
Addresses code-reviewer findings on commit 695a070:
- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
capture both *jsonschema.Schema and the compile error so concurrent
callers observe a single deterministic outcome. Adds a 32-goroutine
ParseManifest stress test that fires under -race to lock in the
invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
cannot mutate the //go:embed slice (defense-in-depth; embed slices
are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
typo like "computeplanversion" or an unknown key is rejected at
parse time instead of silently defaulting to v1 dispatch. The root
object stays permissive — existing plugin.json files carry
version/author/dependencies/etc. and the SDK manifest is a strict
subset by design. New test covers both the typo-rejection and the
root-permissivity contracts.
* feat(iac): add refreshoutputs.Refresh — read-only state output refresh
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): add wfctl infra refresh-outputs subcommand
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)
T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.
Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
no-op. Operators who use the "0"/"false" convention to disable a
feature get the expected behaviour rather than a presence-only
foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
environments that force the env var on globally).
Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).
Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(iac): concurrency stress test for refreshoutputs.Refresh
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(wfctl): document infra refresh-outputs subcommand
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields
* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)
* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract
Addresses code-reviewer findings on commit 13a6fad:
- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
Name" but the populating site (T3.4 plan §1625) sets
result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
this. Re-worded to "Keyed by the *replaced* resource's Name" with an
explicit reference to action.Resource.Name + a sentence on how W-5 JIT
substitution will use the map (lookup by replaced-resource name to
obtain the new ProviderID for dependent configs). Locks the contract
before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
to its enforcing test (TestComputeDrift_ResultIsSortedByName in
iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
nil and empty-but-non-nil values for all three new fields, asserting
the JSON keys are absent from the encoded form. Locks the omitempty
tag behavior so a future refactor cannot silently regress to emitting
"initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.
* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test
Addresses code-reviewer findings on commit 8416498:
- Important 1 (weak Replace assertion): converted fakeDriver from
boolean call recorders to integer counters. The 4-action plan
[create, update, replace, delete] now asserts Create==2, Update==1,
Delete==2. If "case replace" were silently dropped from
dispatchAction the counts would shift to 1/1/1 and the test would
fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
fakeProvider.driverErr, asserts the canonical "resolve driver:"
prefix, and verifies the loop continues past action[0] to action[1]
(best-effort contract). Folded the loop-continues-after-failure
coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
using a selectiveFakeProvider that errors on one type only — proves
one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
fmt.Sprintf("resolve driver: %v", err) since the destination is a
string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
iteration boundary; on cancel, returns the result accumulated so far
+ the ctx error as top-level. Added
TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
documenting the same-name-same-type invariant for Replace plans.
Documenting rather than enforcing — ComputePlan upstream is the
contract owner.
Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.
* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test
Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.
Pure-mechanical cleanup. No behavior change.
* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)
* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery
* feat(iac): doUpdate + doDelete actions
* feat(iac): doReplace populates ApplyResult.ReplaceIDMap
* feat(iac): add diff cache with LRU eviction + corruption recovery
* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy
Three independent review-fix bundles:
T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
imports. Future Compute-algorithm changes (prefix length, hash) now
re-align both test files automatically — keeps the cross-package
fixture parity guaranteed.
T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
+ alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
assertion — distinct code path from the existing
ok-but-SupportsUpsert==false test. Compile-time premise check
ensures the test stays meaningful if a future refactor lifts
SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
the in-package vs at-the-ActionError-boundary distinction explicit.
External callers reading [interfaces.ApplyResult].Errors lose
errors.Is matching at the string-conversion boundary; the canonical
"upsert: read after conflict:" prefix is the discriminant. Also
documents the single-pass recovery contract (recovery Update that
itself returns ErrResourceAlreadyExists surfaces unchanged rather
than retriggering the recovery loop).
T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
now also asserts len(result.Resources) == 1 on the success path —
locks the resource-append contract so a regression that skipped the
append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
shape: empty ProviderID flows to driver, no synthesized precondition
error, deleteCount==1 (latent bug-fix from design — the v1 path
silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
section documenting the decompose-then-prefix rule (bare on simple
actions; "upsert: ..." / "replace: ..." on decomposing paths) so
future reviewers don't suggest "let's add prefixes for consistency."
* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace
Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.
Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.
Failure shape:
return fmt.Errorf("replace: canceled after delete: %w", err)
Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.
New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.
Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.
Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.
* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows
T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:
1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
as an amortization-only optimization (not correctness mechanism),
the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
the corruption recovery contract (silent eviction + once-per-process
info log), the plugin-downgrade safety property, and the rev3
"all CI workflows set :memory: explicitly" statement plus a list
of the affected workflow files.
2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
every workflow that runs `go test` or `wfctl`:
- .github/workflows/ci.yml (test + lint jobs)
- .github/workflows/benchmark.yml (performance benchmarks)
- .github/workflows/pre-release.yml (pre-release tests)
- .github/workflows/release.yml (release tests)
- .github/workflows/dependency-update.yml (post-update test gate)
Workflow files that don't invoke go test / wfctl are not modified
(codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
osv-scanner.yml, test-dispatch.yml).
Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.
Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.
* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup
Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:
- Minor 1 (atomic Put, worth-doing production improvement): Put now
uses write-temp-then-rename. POSIX rename(2) is atomic on the same
filesystem, so a process crash mid-write leaves either the prior
contents or the new contents — never a partial write. The
corruption-recovery path in Get is still the safety net for cross-
filesystem renames or NFS edge cases that don't honor atomicity.
In production this means corruption recovery essentially never
fires from native crashes. The .json extension filter in
maybeEvict already excludes .tmp orphans, so no additional
filtering needed. On rename failure, best-effort cleanup of the
temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
language. Linux honors XDG_CACHE_HOME; macOS uses
~/Library/Caches; Windows uses %LocalAppData%. The previous
comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
explaining the tags are for log/transcript serialization, not
cache keying — keyFingerprint uses NUL-separated string concat,
not JSON marshaling. Future readers checking the fingerprint
shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
`os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
meaningless — no code path creates a file with `*` in its name.
Likely leftover from earlier debugging. Removing it lets us drop
the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
resolution assumption and listing the supported filesystems
(ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
filesystems (FAT32, SMB) are explicitly out of scope.
Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
in the package godoc already sets the expectation."
* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)
* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)
W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.
BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.
Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.
cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.
* feat(iac): wfctl infra apply threads provider into ComputePlan
* test(iac): update cross-package fakes for ComputePlan provider arg
W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.
Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).
Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.
Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
in first-reference-in-`desired` order, not iac.provider declaration
order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
(loop over groupOrder is empty when no specs reference any provider;
use `wfctl infra destroy --dry-run` to preview teardown).
* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace
W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).
ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:
- replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
is true (the latter closes design issue C — pre-W-3b ForceNew was
silently downgraded to update);
- update, when DiffResult.NeedsUpdate is true and replace did not
fire;
- skip, when neither flag is set.
Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.
Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).
Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):
- TestComputePlan_NeedsReplaceEmitsReplaceAction
- TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
- TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
- TestComputePlan_DiffReturnsNoChanges_EmitsNothing
- TestComputePlan_NilProvider_FallsBackToConfigHash
- TestComputePlan_NilDriver_FallsBackToConfigHash
- TestComputePlan_DriverDiffError_PropagatesAsError
platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).
Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
phrasing.
cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
rely on the post-adopt ComputePlan emitting update; pre-W-3b that
was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
panicking — the refresh-outputs test fixture only exercises Read.
* perf(iac): ComputePlan consults diffcache before invoking provider.Diff
W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).
Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.
platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).
Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):
- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
pure function; new TestParseConcurrencyEnv table-driven test covers
empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
N=5 modification candidates, asserts driver.diffCount.Load() == 5
and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
TestComputePlan_DriverReturnsNilDiff_EmitsNothing.
And T3.6e adversarial-review minor cleanups:
- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
the ResourceDriver instances it returns MUST be safe for concurrent
use.
New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
re-dispatches)
* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)
Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.
Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.
* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)
Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.
Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.
Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)
Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).
* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion
W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.
rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.
Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.
New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
interface + DispatchVersionV2 const + DispatchVersionFor helper.
Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
ProviderVersion fields and ComputePlanVersion() method so tests
drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
iacProvider.computePlanVersion alongside existing
capabilities.iacProvider.name; findIaCPluginDir returns the
version; readIaCPluginComputePlanVersion is the load-time helper;
remoteIaCProvider stores the value and exposes it via
ComputePlanVersion() to satisfy the optional interface. (Re-reads
plugin.json once per provider load rather than threading through
loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
for the existing test override; cost is one tiny os.ReadFile vs
the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
test seam + dispatch branch in applyWithProviderAndStore. Drift
report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
(v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
PrintsDriftReport (drift wiring asserted via writer-buffer
substring). v1 fixture v1RecordingProvider intentionally does NOT
implement ComputePlanVersionDeclarer to prove the dispatcher's
"default to v1 when un-declared" branch.
* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)
Code-reviewer flagged 3 IMPORTANT items in T3.7:
1. Comment/code mismatch on drift-report timing. The comment promised
"Run on success or partial failure" but the code gated on
`err == nil` (success only). The contract the comment described
is the more useful behavior — operators most need the
stale-input diagnostic when an apply fails ("which input went
stale during the failed apply?"). Without it, the failure error
and the "what changed" context are disconnected.
Fix: gate on `result != nil` instead of `err == nil`.
printDriftReportIfAny already no-ops on empty/nil reports so
unconditional-on-result-non-nil is safe.
2. No test for the drift-on-partial-failure path. Added
TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
and asserts both: (a) the err propagates, AND (b) the drift
report still reaches the writer.
3. Optional-interface coverage gap. Two semantically-different "v1"
paths exist:
- Path A: provider doesn't implement ComputePlanVersionDeclarer
at all → type-assert fails → legacy. Covered by
v1RecordingProvider.
- Path B: provider implements interface but ComputePlanVersion()
returns "" (the realistic mid-transition state for v1 plugins
after the SDK update lands but before they migrate) → type-
assert succeeds, DispatchVersionFor returns "v1" → legacy.
Was untested.
Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
using iactest.NoopProvider{DispatchVersion: ""}, which always
implements the interface (the method exists on the type). Pins
Path B specifically.
Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.
* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion
cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.
v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.
Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).
Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.
* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)
W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.
# ADR 007 — non-trivial deviation from plan-literal
Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:
1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
test-helper package, not a runnable binary.
2. Real-gRPC runtime validation lands in P-DO when DO sets
computePlanVersion: v2 in its plugin.json.
3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
4. W-7 conformance suite is the recurring cross-PR gRPC harness.
Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.
# Tests
- TestApply_V2_LoaderSeamDispatch_EndToEnd:
- Writes a real config + filesystem state seeded with vpc
region=nyc3 (under iacStateRecord shape).
- Sets desired region=nyc1.
- Substitutes the resolveIaCProvider seam to return a Go provider
that declares v2 + has a driver returning NeedsReplace=true.
- Calls applyInfraModules (the production runInfraApply
entrypoint) and asserts driver.diffCount == 1, deleteCount ==
1, createCount == 1, plus exact identity of the deleted
ProviderID and the created Config["region"].
- TestApply_V2_LoaderSeam_DriftReportPrinted:
- Same loader-seam setup + applyV2ApplyPlanFn substitution
returning InputDriftReport with one entry.
- Captures os.Stderr and asserts the FormatStaleError block
reaches the operator (drift-report wiring T3.7 added is
end-to-end alive in the v2 loader path).
# Test infrastructure
- cmd/wfctl/main_test.go: NEW TestMain forces
WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
scoped via getDiffCache lazy init) doesn't observe stale entries
from a developer's local ~/.cache/wfctl/diff/ as false-positive
cache hits skipping driver Diff dispatch. Same pattern as
platform/main_test.go from T3.6f. Caught during dev when the
end-to-end test failed in the full cmd/wfctl test run but passed
in isolation.
# Bug-class context
The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.
* docs(pr): note bugs incidentally fixed by W-3b
W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.
Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:
1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
converter (commit 40e07a1; surfaced during T3.9 runtime
validation; v1 plugins never tripped it)
Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.
* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)
Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.
This commit:
- Status header updated to "Accepted (with extensive deliberation
history — see Decision history section)".
- Context section adjusted to preface the deliberation history
rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
flips mid-execution, reviewer + implementer should refuse to
proceed and force explicit disambiguation. Both reviewers
endorsed this hold during transition 4; the strict-interpretation
invariant from using-superpowers was the operative rule.
Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.
Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."
* feat(iac): add ProviderValidator optional interface + PlanDiagnostic type
Adds an OPTIONAL `interfaces.ProviderValidator` interface that an IaCProvider
implementation MAY also satisfy to expose provider-side cross-resource
constraint validation at plan time:
type ProviderValidator interface {
ValidatePlan(plan *IaCPlan) []PlanDiagnostic
}
Plus the supporting `PlanDiagnostic` type and `PlanDiagnosticSeverity` enum
(Info/Warning/Error). Consumers (e.g. the R-A10 align rule landing in the
next commit) discover ValidatePlan via type-assertion, so providers that do
not implement it keep working unchanged — purely additive.
Naming note: plan T4.1 originally proposed `Diagnostic` for this type, but
`interfaces.Diagnostic` is already taken by the unrelated Troubleshooter
runtime-event finding (`iac_resource_driver.go`). Renamed to PlanDiagnostic
to preserve W-4's pure-additive contract; the existing Troubleshooter type
is untouched.
TDD via interfaces/iac_provider_test.go covering severity-constant ordering,
PlanDiagnostic field shape, and type-assertion against both an implementor
and a non-implementor (confirms the interface remains optional).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): R-A10 align rule — provider.ValidatePlan dispatch
Adds R-A10, the align rule that surfaces provider-side cross-resource
constraint diagnostics at plan time. Wiring:
cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan
Iterates providers, type-asserts ProviderValidator, calls
ValidatePlan(plan), maps each PlanDiagnostic to an AlignFinding.
Severity mapping: Error→FAIL, Warning→WARN, Info→WARN (advisory;
align has no INFO tier today). Resource label falls back to
"<provider-name>:plan" for plan-level findings; field path is
appended to the message when present.
cmd/wfctl/infra_align.go::runInfraAlignChecks
Dispatches R-A10 only when --plan is provided (R-A7 predicate parity).
Loads providers via the new alignLoadProviders test seam — the
default implementation enumerates iac.provider modules in the YAML
and loads each through the existing resolveIaCProvider plugin path.
Closers are released after the rule runs; a per-provider load failure
logs a stderr warning and continues so other R-A* findings are not
hidden.
TDD via cmd/wfctl/infra_align_ra10_test.go covers nil-plan, no-providers,
non-validating-provider-skipped, Error→FAIL, Warning→WARN, Info→WARN,
plan-level resource fallback, and multi-provider mixed-implementation
cases. Two integration tests exercise dispatch through the seam: one
asserts R-A10 fires under --strict and produces non-zero exit; the other
asserts the rule (and the loader) is silent without --plan.
Pure-additive: providers that do not implement ProviderValidator are
skipped, so this commit changes no existing align behaviour.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(iac): document ProviderValidator + R-A10 align rule
Adds the documentation pieces for W-4:
- DOCUMENTATION.md gains a new top-level "IaC Provider Plugin Interfaces"
section that documents the optional interfaces.ProviderValidator
interface, the PlanDiagnostic/PlanDiagnosticSeverity types, the
ValidatePlan contract (read-only, no remote calls), the R-A10 consumer
and its severity mapping, and the naming-distinction note vs. the
pre-existing interfaces.Diagnostic (Troubleshooter) type.
- docs/WFCTL.md adds an `infra align` subsection under the existing
`infra` command. It lists every R-A* rule (R-A1 through R-A10 with
severities), the flag table, the R-A10 severity-mapping submatrix,
and example invocations covering both plan-less and --plan/--strict
modes.
- cmd/wfctl/dsl-reference-embedded.md (the source for `wfctl
dsl-reference`) gains the R-A9 and R-A10 rows in the rule-families
table and a short paragraph on R-A10's behaviour. The `--plan`
description is updated to enable both R-A7 and R-A10.
Pure docs change; no code touched.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(iac): T4.5 verification — `--plan` help text mentions R-A10
T4.5 verification surfaced one cosmetic gap: the `--plan` flag's help
description still read "enables R-A7 checks" after T4.2 added R-A10 as a
second `--plan`-gated rule. Updated to "enables R-A7 and R-A10 checks" so
`wfctl infra align --help` reflects current behaviour.
Verification steps (no further code change required):
- `GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \
./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/...` → all PASS.
- `go build ./cmd/wfctl` → builds clean.
- `wfctl infra align --help` → shows existing flags plus the corrected
`--plan` description.
- Fixture-provider smoke (TestInfraAlign_RA10_FixtureProvider_Fires) wires
a ProviderValidator returning a fatal diagnostic through the
alignLoadProviders seam → R-A10 finding emitted, FAIL severity, non-zero
exit under `--strict`. This satisfies T4.5 Step 3 manual rule-trigger
smoke without needing a real plugin subprocess.
- `go vet ./interfaces/... ./cmd/wfctl/... ./iac/...` → clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(iac): T4.2/T4.4 review — Info diagnostics log, no finding
Spec-reviewer flagged that the rev10 plan T4.2 acceptance criteria specify
a three-tier severity mapping ("Errors → align failures; Warnings →
warnings; Info → logs"), and that the previous commit (76c4160) collapsed
Info into WARN. The collapse meant `wfctl infra align --strict` could exit
non-zero on a purely informational provider hint — the exact scenario the
Info tier exists to prevent (e.g. billing-tier change notices, deprecation
hints) — defeating the tier's contract.
Code (cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan):
Severity switch reworked to three explicit cases plus a conservative
default. PlanDiagnosticInfo now writes to a new package-level sink
`ra10LogInfo` (stderr by default; overridable for tests) and emits NO
AlignFinding, so it never affects exit code under any flag combination.
PlanDiagnosticError → FAIL and PlanDiagnosticWarning → WARN are unchanged.
Unknown future severities fall back to WARN so they cannot slip past
--strict undetected.
Doc-comment rewritten to spell out the three-tier mapping and the
motivating "Info must not break --strict CI" rule.
Test (cmd/wfctl/infra_align_ra10_test.go):
TestCheckRA10_InfoDiagnostic_BecomesWARN renamed/rewritten as
TestCheckRA10_InfoDiagnostic_LogsAndEmitsNoFinding. Asserts:
- len(findings) == 0
- the captured log line carries the rule tag, [info] severity marker,
"<provider>/<resource>" identifier, the diagnostic message, and the
"field: <name>" suffix
- alignExitCode(findings, strict=true) == 0 (the load-bearing guarantee)
Docs (DOCUMENTATION.md, docs/WFCTL.md):
Both severity-mapping summaries replaced with a three-row table
(Error → FAIL finding, Warning → WARN finding, Info → stderr log/no
finding/no exit-code effect). Prose surrounding the table now
explicitly calls out the strict-CI safety guarantee.
Verification:
- GOWORK=off go test -race -count=1 ./interfaces/... ./iac/...
./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/... → all PASS.
- markdown-link-check on the three modified docs → 0 dead links.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(iac): T4.4 review — embedded reference Info-tier mapping
Spec-reviewer caught one stale doc site missed in commit 9c41c1d:
`cmd/wfctl/dsl-reference-embedded.md:1358-1359` (the source for `wfctl
dsl-reference`) still claimed `PlanDiagnosticInfo` produced a WARN
AlignFinding. Replaced with the full three-tier prose so `wfctl
dsl-reference` callers see the corrected mapping:
- PlanDiagnosticError → FAIL AlignFinding (always non-zero exit)
- PlanDiagnosticWarning → WARN AlignFinding (non-zero only under --strict)
- PlanDiagnosticInfo → stderr log "R-A10 [info] <provider>/<resource>:
<message>"; no AlignFinding so --strict CI
gates never fail on informational hints
The R-A10 row in the table at :1354 ("FAIL or WARN") is unchanged — Info
no longer produces a finding so the existing severity range still
exhaustively covers the possible AlignFinding severities.
Verification:
- `markdown-link-check cmd/wfctl/dsl-reference-embedded.md` → 0 dead links.
- `GOWORK=off go test -race -count=1 ./cmd/wfctl/...` → PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(iac): R1 review — load plan + cfg once; clean R-A10 Info log fmt; clarify PlanDiagnosticSeverity doc (Copilot review)
- runInfraAlignChecks loads --plan once and reuses the parsed *IaCPlan
for R-A7 and R-A10 (was: 2x file open + JSON decode).
- alignLoadProviders now takes *alignContext (built once via
buildAlignContext in runInfraAlignChecks) instead of re-loading the
YAML from disk. Test seam updated.
- R-A10 Info log identifies plan-level diagnostics as `<provider>/plan`
(matches the documented `R-A10 [info] <provider>/<resource>: ...`
format) instead of the redundant `<provider>/<provider>:plan: ...`.
Table label still uses `<provider>:plan`.
- PlanDiagnosticSeverity doc comment now spells out the exit-code
mapping: Error always FAILs; Warning is advisory by default but FAILs
under --strict; Info never affects exit code.
New test: TestCheckRA10_PlanLevelInfoDiagnostic_LogsAsProviderSlashPlan
covers the log-format fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…#531) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * feat(iac): jitsubst.ResolveSpec for per-module deferred substitution T5.1 — new package iac/jitsubst hosts ResolveSpec, the apply-time helper that resolves ${VAR}, ${MODULE.field}, and ${MODULE.id} references in a ResourceSpec.Config tree. Strict semantics: every reference MUST resolve or the helper returns an error and the input spec unchanged. ${MODULE.id} prefers the in-apply replaceIDMap (W-3b/T3.4) over syncedOutputs so cascade-replace ProviderID propagation is authoritative over potentially stale state outputs. Used by W-5 T5.2 (wire into wfctlhelpers.ApplyPlan) and T5.3 (wire into doReplace). No behavior change yet — helper has no in-tree caller. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): ApplyPlan resolves JIT substitutions per action T5.2 — wfctlhelpers.ApplyPlan now invokes jitsubst.ResolveSpec on every action.Resource before dispatch. The substitution sees: - result.ReplaceIDMap (this-apply Replace ProviderIDs from doReplace) - syncedOutputs (state-side outputs from action.Current entries + this-apply outputs from successful prior dispatches in the same loop) - os.LookupEnv (production env source) syncedOutputs is pre-populated from every action.Current at start-of-apply so a NEW action can reference an in-state sibling module's outputs from action zero. After each successful dispatch (when result.Resources grows), the new entry is folded into syncedOutputs via flattenOutputs — flat-copy of Outputs with the canonical 'id' key shadowed by ProviderID so ${MODULE.id} resolves predictably across new and existing modules. JIT failure surfaces as a per-action ActionError with the canonical 'jit substitution:' prefix; the offending action SKIPS dispatch (unresolved spec must not reach the driver). The loop continues to the next action — best-effort apply contract preserved. Tests in apply_jit_test.go cover: 2-create plan with B referencing ${A.id}, pre-syncing from action.Current, unresolved-ref skipping dispatch with canonical prefix, no-refs passthrough, and loop-continues- after-per-action-JIT-error. T5.3 wires Replace cascade. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): ApplyPlan replace cascade propagates new ProviderID T5.3 — locks the Replace-cascade contract via apply_replace_cascade_test.go and updates doReplace godoc to document the cascade hookup explicitly. Two scenarios: - ReplaceCascade_DependentCreateGetsNewParentID: [Replace parent, Create dependent] where dependent's Config has ${parent.id}; dependent's Create receives the new ProviderID. - ReplaceCascade_DependentReplaceGetsNewParentID: extends to Replace-on- Replace shape; dependent's post-Delete Create still sees the resolved parent.id, while its own Delete continues to target the OLD ProviderID via action.Current (JIT does not alter action.Current). The behavior was already operational after T5.2's loop-level jitsubst.ResolveSpec call: doReplace populates result.ReplaceIDMap inside iteration N, and the loop's pre-dispatch substitution at iteration N+1 sees the fresh entry. T5.3 adds the assertion + doc that locks this ordering as a contract; future refactors that move substitution out of the loop OR delay ReplaceIDMap population will break these tests loudly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plan SchemaVersion=2 when JIT substitution required T5.4 — runInfraPlan now stamps plan.SchemaVersion conditionally: - V1 (1) baseline when no plan action's resolved Resource.Config carries a JIT-style ${MODULE.field} or ${MODULE.id} reference. - V2 (2) when any action does — older wfctl binaries reading the persisted plan reject with the existing 'newer than supported' diagnostic at runInfraApply. Detection is centralized in jitsubst.HasModuleRefs (recursive walk over map[string]any / []any / string), gated by a simple regex that requires non-empty segments on both sides of the dot — plain ${VAR} env-var refs (no dot) do NOT trigger the bump, so the common operator secret-via-env workflow stays at V1. cmd/wfctl/infra.go gains: - infraPlanSchemaVersionV1 (=1) and infraPlanSchemaVersionJIT (=2) constants alongside the existing infraPlanSchemaVersion (=2, max readable). The 'max readable' constant ticks up with every schema bump; V1/JIT name the per-plan choice runInfraPlan makes. - planRequiresJITSubstitution(plan) helper that walks plan.Actions once via jitsubst.HasModuleRefs. Tests: - iac/jitsubst/jitsubst_test.go — 8 new HasModuleRefs cases (env-var is false, .field/.id are true, nested map/slice, nil-safe, malformed refs are false, mixed-string is true). - cmd/wfctl/infra_plan_schema_test.go — V1 baseline (env-var only), V2 for both .field and .id, V1 negative for env-var-only, and persisted-plan SchemaVersion=2 end-to-end (where T5.5's rejection has not yet landed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): reject persisted JIT-style plans (canonical path is apply-without-plan) T5.5 — runInfraPlan now refuses to write a plan.json via -o when the plan is JIT-style (SchemaVersion = infraPlanSchemaVersionJIT). The exact operator-facing error string is contract-stable: error: plan -o requires JIT-free config; this plan references ${MODULE.field} which only resolves at apply time. Use 'wfctl infra apply' (without --plan) for JIT-aware applies. Stdout-only emission (no -o) of a JIT-style plan is permitted — it's a preview, not a contract. The guard fires AFTER plan computation so the operator sees the plan table on stdout before the rejection at the persistence step. Tests in cmd/wfctl/infra_plan_jit_reject_test.go (4 cases): - exact-string match (the strict contract) - stdout-only JIT plan permitted (negative-control on the guard scope) - persisted non-JIT plan permitted (V1 happy path unchanged) - canonical-keyword substring match (operator-search-engine safety net) Removed T5.4's now-redundant TestInfraPlan_SchemaVersionV2_PersistedToFile- Matches — its happy path has been replaced by T5.5's strict rejection contract; SchemaVersion stamping correctness is still locked by the helper-direct tests in the same file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): T5.7 runtime-launch-validation — JIT subst + plan rejection W-5 Task T5.7: per the plan's 'Files: none' instruction, this is a documentation-only commit recording the runtime-launch-validation transcript against the built wfctl binary. # Step 1: Build $ GOWORK=off go build -o /tmp/wfctl-jit-validation ./cmd/wfctl (no output, exit 0) # Step 3: T5.5 persisted-JIT-plan rejection (build-binary verification) Fixture (infra.yaml): modules: - name: app type: infra.container_service config: env_vars: VPC_UUID: "${vpc.id}" DB_HOST: "${pg.private_ip}" $ wfctl infra plan -o /tmp/jit-validation/plan.json --config infra.yaml Infrastructure Plan — infra.yaml + create app (infra.container_service) Plan: 1 to create, 0 to update, 0 to destroy. error: error: plan -o requires JIT-free config; this plan references ${MODULE.field} which only resolves at apply time. Use 'wfctl infra apply' (without --plan) for JIT-aware applies. EXIT=1 The doubled 'error: error:' prefix is because cmd/wfctl/main.go's top-level error reporter prepends 'error: ' to every command failure (line 211: `fmt.Fprintf(os.Stderr, "error: %v\n", rootErr)`), AND the team-lead-specified literal also begins with 'error: '. Per implementer brief: 'Match exactly.' Flagging here for visibility — a follow-up could either drop the prefix from the literal or special-case main.go's wrapping. Not addressing in W-5. # T5.5 inverse: stdout-only JIT plan permitted (no rejection) $ wfctl infra plan --config infra.yaml Infrastructure Plan — infra.yaml + create app (infra.container_service) Plan: 1 to create, 0 to update, 0 to destroy. EXIT=0 # T5.4 V1 baseline: non-JIT config persisted to disk still works Fixture (infra-novars.yaml): modules: - name: app type: infra.container_service config: cidr: "10.0.0.0/16" $ wfctl infra plan -o plan-novars.json --config infra-novars.yaml Plan: 1 to create, 0 to update, 0 to destroy. Plan saved to /tmp/jit-validation/plan-novars.json EXIT=0 $ jq .schema_version plan-novars.json 1 ← V1 (T5.4 stamp logic working) # Step 2: apply with ${A.id} reference — covered by in-tree tests T5.7 plan §Step 2 specifies running 'apply against fixture with ${A.id} reference' against the built binary. wfctl infra apply requires a fully- configured iac.provider plugin (manifest, plugin.json, gRPC binary), so running this end-to-end against an ad-hoc fixture is non-trivial without W-7's conformance harness. The same code path is fully covered by: - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_TwoCreate_BSpec- ResolvesAID (T5.2 — basic create+create cascade) - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace- Cascade_DependentCreateGetsNewParentID (T5.3 — replace+create cascade) - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace- Cascade_DependentReplaceGetsNewParentID (T5.3 — replace+replace cascade) - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_UnresolvedRef_- RecordsActionErrorAndSkipsDispatch (T5.2 — failure path) These exercise the SAME wfctlhelpers.ApplyPlan code path the binary invokes; the unit-test fake driver is functionally equivalent to a v2 plugin from ApplyPlan's perspective. A binary-level apply smoke test is deferred to W-7's conformance gate (which adds the DO smoke test against real-cloud fixtures). # Verification Tests pass: GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... ./platform/... ./cmd/wfctl/... ./module/... → all packages OK. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(iac): T5.5 review — exact plan-literal error string Spec-reviewer caught that the shipped error string in cmd/wfctl/infra.go diverged from the plan literal at docs/plans/2026-05-03-iac-conformance- and-replace.md §T5.5 line 2104. The kickoff brief I worked from substituted a wordier alternate string; team-lead confirmed the plan literal is the correct contract. Three fixes: 1. cmd/wfctl/infra.go:297 — replace fmt.Errorf literal with errors.New(<plan literal>). No leading 'error:' prefix — that's prepended by cmd/wfctl/main.go's top-level error wrapper, so the doubled 'error: error:' artifact in T5.7's runtime transcript is resolved as a side benefit. Switched to errors.New per spec-reviewer suggestion: avoids govet's no-format-verbs noise on the no-substitution case and is the canonical Go pattern for fixed-string sentinels. 2. cmd/wfctl/infra_plan_jit_reject_test.go:16 — expectedJITRejectError constant updated to the plan literal. Comment block expanded to document the literal's source + the leading-error-prefix nuance for future readers. 3. cmd/wfctl/infra_plan_jit_reject_test.go:125 — substring keyword list in TestInfraPlan_RejectionErrorContainsCanonicalKeywords updated to keys actually present in the new literal: 'JIT resolution', 'persisted plan.json', 'wfctl infra apply', '-o/--plan'. The exact-match test above is the strict contract; this one stays as the operator-search-engine safety net. Verified end-to-end via rebuilt wfctl binary against the same fixture from T5.7's transcript: $ wfctl infra plan -o plan.json --config infra.yaml Infrastructure Plan — infra.yaml + create app (infra.container_service) Plan: 1 to create, 0 to update, 0 to destroy. error: this plan requires JIT resolution; persisted plan.json is not supported. Run 'wfctl infra apply' directly without -o/--plan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): ADR 008 — JIT substitution at dispatch loop, not per-helper Records the architectural choice resolved during T5.3: jitsubst.ResolveSpec runs once at the wfctlhelpers.ApplyPlan dispatch loop (immediately before each dispatchAction call), NOT inside per-action helpers. doReplace populates result.ReplaceIDMap; the next iteration's pre-dispatch ResolveSpec consumes it. This honors the Replace-cascade contract via loop-ordering invariant rather than via an explicit substitution call inside doReplace. Plan §T5.3 specified inner-resolve in doReplace; T5.2's loop-level call already covered the cascade case. Threading syncedOutputs through dispatchAction → doReplace would have made the helper boundary leaky for one call site. Option 1 (test-only T5.3 + this ADR) chosen by team-lead over option 2 (inner-resolve rework) on 2026-05-04 after spec-reviewer escalation. Cascade contract is locked by apply_replace_cascade_test.go's two scenarios; this ADR ensures future refactors that move substitution out of the loop OR delay ReplaceIDMap population see the trade-off rather than rediscovering it via git bla…
intel352
added a commit
that referenced
this pull request
May 4, 2026
…W-6 of 12) (#532) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * feat(iac): --allow-replace flag for per-resource protected-replace opt-in W-6/T6.1: gate replace and delete actions targeting `protected: true` resources behind a per-resource opt-in flag at apply time. Without --allow-replace=<csv>, the apply errors before any provider Apply or wfctlhelpers.ApplyPlan dispatch with the design-spec literal ("resource %q is protected: true and would be %sd; pass --allow-replace=%s to override"). With the resource name listed in --allow-replace, the protection is bypassed for that resource only. Gate fires on both dispatch paths — live-diff (applyWithProviderAndStore) and --plan (applyPrecomputedPlanWithStore) — so the safety guarantee holds regardless of plan provenance. The protected flag is sourced from Resource.Config for replace actions and Current.AppliedConfig for delete actions (where platform.differ leaves Resource.Config empty). The allow-set is published via package-level applyAllowReplaceSet (matching the computeInfraPlan / applyV2ApplyPlanFn seam pattern) and reset to nil at the top of every runInfraApply via deferred cleanup — override authorization must not leak across runs. T6.2 will swap this fail-fast for an aggregated multi-blocker report with a copy-paste --allow-replace=name1,name2,... value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply batch-reports protected-replace blockers with copy-paste flag W-6/T6.2: validateAllowReplaceProtected now walks the entire plan and aggregates ALL replace/delete blockers (resources annotated `protected: true` and not in --allow-replace) into a single error, instead of failing fast on the first one. The operator sees the complete blocker set in one apply attempt and gets a pre-formatted copy-paste flag value to authorize them all at once: plan would require destructive action on N protected resource(s): <name1> (replace) <name2> (delete) ... to authorize, re-run with: --allow-replace=<name1>,<name2>,... Names and the csv preserve plan-action declaration order so output is deterministic. The single-blocker case still emits the batch format — operator-facing UX is consistent regardless of blocker count, which matters for automation pinning the copy-paste flag pattern. Per plan T6.2 "(or apply-time check; pick one — apply is cleaner since plan output already shows all actions)" — the gate stays in cmd/wfctl/infra_apply.go rather than platform/differ.go::ComputePlan. ComputePlan remains plugin-agnostic; the protected-resource policy is a wfctl-side operator-experience concern. T6.1's single-line error literal is superseded; T6.1 tests are updated to assert on the operator-facing essentials (resource name + copy-paste flag value) rather than the legacy literal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document --allow-replace flag W-6/T6.4: add a dedicated `infra apply` subsection to docs/WFCTL.md covering the protected-resource gate, the new --allow-replace=<csv> override, and its relation to the older --allow-protected-prune flag. Includes the canonical aggregated-blocker error format from T6.2 so operators know what to expect (and what to copy-paste) when the gate fires, plus three runnable examples (standard apply, --plan apply, authorized Replace cascade). Per W-4 team-lead Option-3, mdformat is waived; markdown-link-check is the meaningful baseline. WFCTL.md links all resolve clean against the local repo (3 internal/external refs). Pre-existing dead links elsewhere in docs/ are unchanged by this commit and out of W-6 scope. Verification: markdown-link-check docs/WFCTL.md → 0 errors GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \ ./platform/... ./cmd/wfctl/... ./module/... → all pass Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(merge): restore T6.1 + T6.2 helpers lost during cascade-merge with -X theirs * fix(iac): R1 review — drop redundant ComputePlanVersionDeclarer assertion at apply call site (Copilot review) DispatchVersionFor is documented to centralise the type-assertion plus the default-to-v1 fallback so call sites pass the raw provider value rather than re-asserting the optional interface. The v2 dispatch condition reverts to the canonical form: if wfctlhelpers.DispatchVersionFor(provider) == wfctlhelpers.DispatchVersionV2 { ... } No behavior change: a provider that doesn't implement the interface, or returns anything other than "v2", still routes to the legacy v1 provider.Apply path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…9) (#534) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * feat(iac): add optional ProviderPlanner interface for v2 plugins (rev10 user override) * ci(iac): cross-plugin build gate + ADR 009 (ProviderPlanner included per user override) * docs(iac): document ProviderPlanner adapter author guide * docs(adr): restore plan-literal Context para 1 in ADR 009 (T9.2 spec-review fix) * docs(iac): point ProviderPlanner author guide at real ProviderIDValidator precedent (T9.3 quality fix) * ci(iac): add fail-fast=false, concurrency, go.mod/go.sum paths to cross-plugin gate (T9.2 quality fix) * fix(iac): R2 review — correct ProviderPlanner doc/ADR/test/CI findings (Copilot review) Six Copilot inline findings + CodeQL workflow-permissions warning: 1. docs/iac/providerplanner.md: ComputePlan in v0.21.0 dispatches driver.Diff directly (in platform/differ.go); it does NOT call IaCProvider.Plan. The reverse is true (Plan delegates to ComputePlan in some implementations). Updated the call-chain description and the illustrative dispatch-site code block to reference the actual file (platform/differ.go) so adapter authors don't follow the wrong call chain. 2. docs/adr/009: replaced the personal email reference with "the workspace owner" so ADR provenance doesn't embed PII. 3. interfaces/iac_provider_planner_test.go: now actually verifies the additivity claim by reusing the package's existing mockProvider as the negative case — runtime assertion confirms mockProvider does NOT satisfy ProviderPlanner. Moved file to interfaces_test package to share fixtures. 4. .github/workflows/cross-plugin-build-test.yml: explicit `permissions: contents: read` (CodeQL workflow-permissions guidance); added `env: GOPRIVATE/GONOSUMCHECK` matching ci.yml + codeql.yml so downstream plugin builds resolve github.com/GoCodeAlone/* deps consistently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
Spec-reviewer round-2 finding (commit 26ac916): the dispatcher only forced DryRun=false on -fix, but did NOT prevent a user-supplied -dry-run=false from leaving the gate open. With the natural mode predicate `if !opts.DryRun { mutate() }`, this would silently bypass the explicit -fix gate that plan §W-8 line 2347 names as the sole mutation entry point ("-dry-run flag default true; -fix opts into mutation"). Fix: normalize the gate at the dispatcher boundary — when Fix is set, DryRun=false; when Fix is unset, DryRun=true regardless of what the user passed via -dry-run=. Fix is now the single source of truth for "may I mutate?", so any natural mode predicate is safe by construction. Options.DryRun's doc comment now states this contract explicitly so T8.2-T8.5 implementers cannot reach for the wrong predicate. Tests pin all three cases: - -dry-run=false alone → DryRun stays true (the bypass) - -fix -dry-run=false → mutation authorized (Fix wins) - -dry-run=true -fix → mutation authorized (Fix wins) Also adds TestPackageDoc_MentionsSkipMarker (process note #6) — cheap file-content guard so a future SkipMarker rename trips a test rather than silently desyncing the package doc comment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
Round-1 review on PR #538 surfaced 11 substantive findings; all addressed: Critical (real bugs that broke compile or silently dropped logic): #1 [lint, refactor-plan] Rewrite target wrong — `wfctlhelpers.Plan` does not exist in the repo today. Pivoted to `platform.ComputePlan` (the real helper at platform/differ.go:72). Both targets now accepted by the lint analyzer for forward-compat with rev0 fixtures. Plan-doc §T8.3 named the wrong helper; flagged for retro. #2 [refactor-plan] rewritePlanBody only renamed `_` ctx params. A method declared `Plan(c context.Context, ...)` would be rewritten referencing undefined `ctx`. Now: any non-blank ctx-name preserved; only blank `_` renamed to `ctx`. #3 [refactor-plan] isCanonicalPlanBody too loose — extra side-effects inside the desired loop still classified as canonical. Tightened to require exactly the 3-statement template (lookup + !exists guard + configHash compare), no else branches, no trailing junk. Regression test: TestRefactorPlan_ExtraLoggingNotCanonical. #4 [refactor-plan, refactor-apply] SkipMarker only consulted on fn.Doc. PR description promised type-doc + GenDecl-doc honoring. Added receiverTypeDocs + carriesMarker; both modes now check all 3 doc levels. #5 [refactor-apply] hasCanonicalCases only checked case labels. Bespoke bookkeeping inside a case body (logging, metrics, alternate driver calls) classified as canonical and would be silently dropped on -fix. Added caseBodyIsCanonical whitelist (driver call, ResourceRef construction, ProviderID guard). Regression test: TestRefactorApply_ExtraBookkeepingNotCanonical. #6 [refactor-apply] custom-error-wrapping suggestion named fictional APIs (ApplyResultErrorHook / WrapActionError). Replaced with honest hand-port advice: skip-marker + manual switch, OR move wrap into driver methods so wfctlhelpers records it verbatim. #7 [add-validate-plan] Stub always emitted unqualified `*IaCPlan` / `[]PlanDiagnostic`. Files importing the interfaces module under a qualifier (e.g. `*interfaces.IaCPlan`) failed to compile after -fix. Added interfacesQualifier detector + qualified stub emission. Regression: TestAddValidatePlan_Fix_QualifiedSignature. #8 [add-validate-plan, lint] hasValidatePlanMethod / AssertProviderImplementsValidatePlan checked method NAME only. Wrong-signature ValidatePlan (e.g. takes a string) was treated as compliant even though interfaces.ProviderValidator wouldn't be satisfied. Added validatePlanSignatureMatches: shape-checks the receiver param + return slice (qualified-or-unqualified). Both callers now use it. Regression: TestAddValidatePlan_DryRun_FlagsWrongSignature. #9 [refactor-plan, refactor-apply, add-validate-plan] Single-file pass — providers whose Plan + Apply lived in sibling files were silently omitted. Added planLikeReceiversInDir: directory-wide method-set scan. Per-file fallback retained for isolated single- file targets. Important: #10 [lint] Per-file parse/type-check errors accumulated in report.errors but exit code stayed 0 if there were no findings — green CI hid coverage gaps. Now exits 1 on either findings OR errors. #11 [refactor-apply] -report-file mode flag never appeared in usage text. Documented in main.go's global usage block (the `-h` path intercepts before the per-mode FlagSet). Plan-doc gap surfaced for retro: §T8.3 line 2373 reads "replaces with `return wfctlhelpers.Plan(ctx, p, desired, current)`", but no such function exists; reality is `platform.ComputePlan`. Recurring defect class (plan-literal vs reality gap, W-4/W-5/W-7/W-9/W-8). Documented in planHelperImportPath docstring + this commit body.
11 tasks
intel352
added a commit
that referenced
this pull request
May 4, 2026
Round 2 surfaced 9 substantive findings; all addressed: Critical (compile-break / contract-break): #1 [refactor-plan, lint] platform.ComputePlan returns IaCPlan BY VALUE, but provider Plan methods return *IaCPlan. Single-statement `return platform.ComputePlan(...)` rewrite produced uncompilable code. Switched to canonical 2-statement form: plan, err := platform.ComputePlan(ctx, p, desired, current) return &plan, err isAlreadyDelegatedPlanBody widened to recognise both the new shape and the legacy single-statement forms (idempotent across revs). #3 [refactor-plan] rewritePlanBody fell back to recvName="p" but didn't update the receiver decl when the source had an unnamed receiver (`func (*Provider) Plan(...)`). Rewritten call referenced undefined `p`. Added ensureReceiverName: injects identifier and mutates the AST. Regression: TestRefactorPlan_Fix_UnnamedReceiverGetsName. Also added: TestRefactorPlan_Fix_PreservesCustomCtxName for round-1 finding #2 (custom ctx name preserved). #4 [refactor-apply] Same unnamed-receiver bug as #3. Same fix (ensureReceiverName + ensureCtxParamName + ensureNthParamName helpers shared with refactor-plan). Regression: TestRefactorApply_Fix_UnnamedReceiverGetsName. #5 [add-validate-plan] Stub always emitted `func (p *T) ValidatePlan(...)` even when the type used value receivers. Method-set mismatch made the type fail interfaces.ProviderValidator type assertion. Added providerReceiverConvention + receiverIsPointer; stub now matches the existing Plan/Apply convention. Regression: TestAddValidatePlan_Fix_ValueReceiverConvention. Important (skip-marker not honored in lint, single-file pass): #6 [lint] AssertPlanDelegatesToHelper checked fn.Doc only, ignoring type-doc and GenDecl-doc skip markers. Added receiverTypeDocsForPass helper; analyzer now checks all 3 doc levels. #7 [lint] AssertApplyDelegatesToHelper — same fix as #6. #8 [lint] AssertDiffSetsNeedsReplaceForForceNew — same fix as #6. #9 [lint] lintFile passed only the target file to the analyzers, so cross-file method sets were invisible (same blind spot the refactor-* modes had in round 1). Now lintFile loads sibling non-test .go files from the same package directory and feeds the full slice to each analyzer; diagnostics for sibling files are dropped (the outer walker visits them in their own turn) so no duplicate findings. All 4 modes now compile-clean rewrites + honor 3-level skip-marker + package-aware method-set detection.
intel352
added a commit
that referenced
this pull request
May 4, 2026
Round 3 surfaced 7 substantive findings; all addressed: Critical (compile-break / silent data loss): #1 [add-validate-plan] Directory-wide detection only widened `provs` in round 2; methodsByRecv stayed file-local. A provider with ValidatePlan in a sibling file (or value-receiver Plan/Apply declared elsewhere) would receive a duplicate or wrong-receiver stub. Now planLikeProviderMethodsInDir returns both the recv set AND the merged method slice; methodsByRecv carries the package-wide view (deduped by method name). Stub injection still only fires when typeDecls[recv] is non-nil so we never append to a sibling file. #2 [refactor-plan] isCanonicalPlanBody accepted ANY 2-result return statement at the trailing slot. A planner with the canonical scaffold but a bespoke return (cloned plan, propagated error value) would classify as canonical and the bespoke logic would be silently dropped. Tightened to require EXACTLY `return plan, nil`. #3 [refactor-plan] rewritePlanBody hardcoded "desired"/"current" as args. A canonical Plan with renamed params (e.g. `Plan(ctx, specs, state)`) would rewrite to references to undefined identifiers. ensureNthParamName now extracts the actual signature names. #4 [refactor-plan] rewritePlanBody hardcoded "platform" as the call selector. A file using `pf "github.com/.../platform"` wouldn't compile because `platform` is undefined (ensureImport sees the aliased import as satisfying the path check). Added pkgAliasFor helper; rewrite now uses whatever local name the file imports under. #5 [refactor-apply] caseBodyIsCanonical accepted ANY AssignStmt as canonical. Bookkeeping AssignStmts (metrics counters, map updates, accumulators) passed and would be silently dropped. Tightened to a narrow whitelist: multi-target driver call, single-target driver call (LHS=err), composite-literal construction, selector-assignment to ResourceRef-style fields (ProviderID/Name/Type). Anything else rejected. #6 [refactor-apply] Same import-alias issue as #4 for `wfctlhelpers`. pkgAliasFor reused; rewriteApplyBody now uses whatever local name the file imports under. Important: #7 [lint] AssertProviderImplementsValidatePlan checked ts.Doc only, missing markers placed on the wrapping GenDecl. Aligns now with the receiverDoc.carriesMarker pattern used by the other 3 analyzers (round-2 #6/#7/#8). typeDocsByName captures both TypeSpec.Doc and GenDecl.Doc. Round-2 regression tests retained (TestRefactorPlan_Fix_UnnamedReceiverGetsName, TestRefactorPlan_Fix_PreservesCustomCtxName, TestRefactorApply_Fix_UnnamedReceiverGetsName, TestAddValidatePlan_Fix_ValueReceiverConvention). Round-3 fix verified end-to-end against an aliased-import fixture (pf "github.com/.../platform" + wfh "github.com/.../wfctlhelpers"): the rewritten output compiles cleanly under gofmt.
intel352
added a commit
that referenced
this pull request
May 4, 2026
… findings Round 4 surfaced 6 findings, all real. The recurring theme: rev3's pattern detectors were either too loose (accepted bookkeeping shapes as canonical) or too rigid (literal package-name matching, breaking on aliased imports). Fixes: #1 [add-validate-plan] interfacesQualifier(file) returned "" when the type-only file (no Plan/Apply imports) received the stub via cross-file detection (round-3 #1). Stub then emitted unqualified types that wouldn't compile. Now: when the file lacks an interfaces import but ANY sibling does, fall back to "interfaces" qualifier AND inject the interfaces import into the type-file via AST printing (format.Node) before appending the stub. Added siblingUsesInterfacesImport helper. #2 [refactor-apply] isCanonicalCaseAssign accepted ANY composite literal (`x := <CompositeLit>`) as canonical. A bookkeeping struct construction (audit payload, metric envelope) silently passed. Tightened to require the literal type's name (qualified or unqualified) match "ResourceRef". #3 [refactor-apply] isDriverMethodCall only checked selector NAME (Create/Read/Update/Delete). Calls like `helper.Update(...)` or `metrics.Delete(...)` were misclassified as canonical driver dispatch. Added receiver-allowlist check: only `d`, `drv`, or `driver` accepted as driver-bound identifiers (matching the standard `d, err := p.ResourceDriver(...)` pattern in DO/AWS/GCP/Azure). #4 [refactor-apply, refactor-plan] isAlreadyDelegatedApplyBody and isAlreadyDelegatedPlanBody required literal `wfctlhelpers` / `platform` package idents. Files using aliased imports (`wf "..."`, `pf "..."`) were misreported as non-canonical even though they were valid delegations. Both functions now resolve the file's local alias via pkgAliasFor; literal names retained as fallbacks. Same fix for isPlatformComputePlanAssign (the helper inside isAlreadyDelegatedPlanBody). #5 [lint] AssertPlanDelegatesToHelper / AssertApplyDelegatesToHelper selector matchers required literal `platform` / `wfctlhelpers` package names. Same false-positive risk as #4 for aliased imports. Both analyzers now resolve the alias and accept either the aliased OR literal form. #6 [refactor-apply] caseBodyIsCanonical accepted ANY DeclStmt as canonical, so `var x SomeBookkeepingType` declarations passed even though they're exactly the bespoke logic the codemod is supposed to preserve. Tightened via isLocalOutPointerDecl: only `var <name> *<ResourceOutput-suffix>` accepted. Smoke-tested against an aliased-import fixture (`wf "...wfctlhelpers"` + `pf "...platform"`): - refactor-apply correctly classifies as already-delegated (was: misreported as missing-action-switch) - lint reports 0 findings (was: false-positive AssertPlanDelegatesToHelper + AssertApplyDelegatesToHelper)
intel352
added a commit
that referenced
this pull request
May 4, 2026
Round 5 surfaced 9 findings; all addressed. Recurring theme: the detectors and reporters needed deeper structural verification (branch contents, outer-shape, receiver-kind, package isolation, exit-code semantics) — not just shape matching at one level. Critical (silent data loss / repair regression): #1 [refactor-plan] rangeBodyMatchesCanonicalDesired only checked the guard expressions and statement count; never inspected what the `!exists` and `configHash != configHash` branch BODIES did. A planner with extra logic (telemetry, alternate action construction, different create/update payload) inside those branches was silently rewritten away. Added isCanonicalCreateBranchBody + isCanonicalUpdateBranchBody + isPlanActionsAppendAssign to verify the create branch is exactly `append+continue` and the update branch is exactly `append`. #2 [refactor-apply] classifyApplyBody verified only the switch shape; setup/teardown/result aggregation OUTSIDE the switch was silently dropped on -fix. Added isCanonicalApplyOuterShape: the Apply body must be exactly the 3-statement scaffold (result-init + range-loop + return result, nil). #3 [add-validate-plan] hasValidatePlanMethod ignored receiver kind. A value-receiver provider with a pointer-receiver ValidatePlan still failed the ProviderValidator type assertion (method-set on `T` does not include `*T` methods), but rev2 treated it as already-implemented. Now also requires receiver-kind match. #4 [lint] AssertProviderImplementsValidatePlan had the same receiver-kind blind spot. Now delegates to hasValidatePlanMethod (centralised + DRY). #5 [refactor-plan] isAlreadyDelegatedPlanBody accepted single-statement `return platform.ComputePlan(...)` (broken rev1 form) as already-delegated, so rerunning the fixed codemod never repaired output from the earlier broken rewrite. Now ONLY accepts the canonical 2-statement form; broken single-statement forms classify as non-canonical so a fresh -fix produces compilable output. #6 [refactor-plan] planLikeProviderMethodsInDir merged methods from every non-test .go file regardless of `package P` clause. Mixed- package or build-tagged directories could fold methods from unrelated packages into a synthetic provider. Added two-pass package-clause check: aggregate only files matching the dominant package. Important (CI fidelity / detector recall): #7 [Makefile, lint] `|| true` in migrate-providers swallowed real execution failures alongside expected advisory findings, because lint returned 1 for both findings AND parse errors. Split the exit codes: 0 clean / 1 findings / 2 errors. Makefile now gates on `[ $? -ne 0 ] && [ $? -ne 1 ]` so parse errors fail the target. #8 [refactor-plan] Canonical matcher hardcoded the lookup flag name as `exists`. The semantically-identical `cur, ok :=` idiomatic Go form was reported non-canonical. Widened to accept both `exists` and `ok`. #9 [refactor-apply] isDriverMethodCall allowlist {d, drv, driver} missed common alternates. Widened to {d, dr, drv, rdrv, driver, resourceDriver}. Still rejects bookkeeping receivers like `metrics`, `audit`, `helper` (preserves round-4 #3 fix). End-to-end verification: lint against DO plugin produces exit 1 (3 advisory findings, no errors); broken-Go-source produces exit 2; clean source produces exit 0. Smoke-tested via /tmp/iac-codemod.
intel352
added a commit
that referenced
this pull request
May 4, 2026
…ening findings Round 7 surfaced 10 findings; 4 were stale (already fixed in R6). 6 real findings addressed: Critical (compile-break / silent data loss): #1 [refactor-plan] isPlanActionsAppendAssign verified the LHS but not append's first argument. A bespoke `plan.Actions = append(otherSlice, ...)` was misclassified as canonical and the alternate-slice logic silently dropped during rewrite. Now both LHS and append's first arg must reference plan.Actions. #3+#9 [refactor-apply] isCanonicalApplyOuterShape only checked the outer 3-statement scaffold; per-action logic INSIDE the for-loop body (logging, metrics, custom error handling, accumulators) was silently dropped on -fix. Added isCanonicalApplyLoopBody + isCanonicalApplyLoopAssign + isCanonicalApplyLoopIf + isCanonicalApplyLoopIfBodyStmt: every loop-body statement must match a tight whitelist (driver lookup, var-out decl, action switch, err-/out-guard ifs). #7+#8 [add-validate-plan] provs[recv].Pos() panicked when the TypeSpec was nil (cross-file scenario from round-3 #1: type declaration in sibling file). Now defaults Pos to NoPos for nil specs; sort still works (stable on name when Pos ties). Important (cross-file consistency): #4 [add-validate-plan] qualifier fallback to "interfaces" fired based on whether ANY sibling imported interfaces — unreliable if THIS provider uses local types but an unrelated sibling imports interfaces. Replaced with qualifierFromProviderMethods: inspects the provider's OWN Plan/Apply parameter types (directory-wide via round-3 #1) for the qualifier they use. #5 [add-validate-plan] skip-marker check only consulted typeDecls (current file). When Plan/Apply are here but the type with `// wfctl:skip-iac-codemod` lives in a SIBLING file, the marker was ignored. Added siblingTypeDocs lookup via receiverTypeDocsInDir (the round-6 helper). #10 [add-validate-plan] sibling-method merge deduped by method NAME only. If local file has wrong-signature ValidatePlan and sibling has correct one, sibling dropped, hasValidatePlanMethod saw only bad declaration, injected duplicate stub. Replaced with isLocalDuplicate: dedupes by name + parameter arity + result arity, so distinct signatures both survive. Stale findings (already fixed in R6, no action needed): #2 refactor-apply receiverTypeDocsInDir already in place #6 lint receiver-doc lookup already merged via receiverTypeDocsForPass Smoke-tested against DO plugin: refactor-plan reports DOProvider.Plan canonical, refactor-apply reports DOProvider.Apply upsert-recovery with the upsertSupporter suggestion. Output matches T8.7 baseline.
intel352
added a commit
that referenced
this pull request
May 5, 2026
…fier-name findings Round 8 surfaced 9 findings; all addressed: Critical (silent data loss / behavior change): #1 [add-validate-plan] isLocalDuplicate compared by name+arity only. Wrong-signature ValidatePlan(name string) []PlanDiagnostic and correct ValidatePlan(plan *IaCPlan) []PlanDiagnostic have same arity but different types — sibling-correct dropped, duplicate stub injected. Replaced with signature-fingerprint dedupe (signatureFingerprint + typeFingerprint walk all type shapes). #4 [refactor-apply] `default:` case clauses accepted without body inspection. Logging/metrics in default body silently dropped. Added isCanonicalDefaultBody: only `err = fmt.Errorf("unknown action %q", ...)` accepted. #5 [refactor-apply] isCanonicalApplyLoopAssign accepted any `<x>.ResourceDriver(...)`. `helper.ResourceDriver(...)` / `plan.ResourceDriver(...)` falsely classified. Now requires the receiver to match the provider's own receiver identifier (threaded through from classifyApplyBody). #8 [refactor-apply] Bare `if err != nil { continue }` accepted as canonical, but wfctlhelpers ALWAYS records ActionError before continuing — the rewrite would silently change behavior. Now requires the if-body to ALSO append to result.Errors before any continue/break. Important (skip-marker scope + identifier flexibility): #2 [add-validate-plan] Skip-marker check fired on EVERY method's fn.Doc — a marker on Destroy/Status/etc. accidentally suppressed the whole provider's analysis. Restricted to Plan/Apply (the provider-defining methods). #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2. #6 [refactor-plan] Canonical detector hardcoded `current`/`desired` body identifiers. Providers using `state`/`specs` reported non-canonical despite rewriter preserving names. Added nthParamName extraction; isCanonicalPlanBody now takes the actual parameter names. #7 [refactor-apply] Driver-receiver allowlist comment claimed `rd` accepted, but the switch was missing it. Added. #9 [refactor-apply] Canonical detector hardcoded `result` /`plan` identifier names. Providers using `res` /`pl` rejected. Now recovers actual identifier from signature (planName) and from statement-1 LHS (resultName); both must be consistent within the body but can be any identifier. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline. Removed redundant TestRefactorApply_Fix_UnnamedReceiverGetsName: the unnamed-receiver path can't have a canonical-shape Apply body (`<recv>.ResourceDriver(...)` requires recv in scope). Receiver-name injection is shared between refactor-plan and refactor-apply via ensureReceiverName; coverage stays in TestRefactorPlan_Fix_UnnamedReceiverGetsName.
intel352
added a commit
that referenced
this pull request
May 5, 2026
Round 9 surfaced 4 findings; all addressed: Critical (silent behavior change): #1 [refactor-apply] If-guard body accepted bare `break`, but wfctlhelpers.ApplyPlan records the error and KEEPS processing later actions. A `break` would silently change loop semantics on rewrite. Now only `continue` is accepted in if-guard bodies. #2 [refactor-apply] Driver-method allowlist accepted `Driver` / `DriverFor` alongside `ResourceDriver`. wfctlhelpers dispatches SPECIFICALLY through IaCProvider.ResourceDriver; a wrapper like `provider.Driver(...)` would have its caching/instrumentation bypassed. Restricted to `ResourceDriver` only. Important (false positives / cross-file alias mismatch): #3 [add-validate-plan, lint] Receiver-kind enforcement was too strict. Per Go spec, `*T`'s method set includes BOTH pointer-receiver and value-receiver methods of T. So a value-receiver ValidatePlan on a pointer-receiver provider IS valid (satisfies ProviderValidator). hasValidatePlanMethod now only requires strict matching when the provider uses VALUE receivers (T's method set excludes *T methods). #4 [add-validate-plan] When the qualifier was derived from a sibling method's aliased import (e.g. `iface "github.com/.../interfaces"`), the post-loop import injection used unaliased `ensureImport`, leaving the stub's `iface.IaCPlan` referring to undefined `iface`. Added ensureImportAs helper; now the import alias matches the stub's qualifier. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline.
intel352
added a commit
that referenced
this pull request
May 5, 2026
Round 10 surfaced 8 findings; all addressed: Critical (cross-file duplicate stub / silent override): #1 [add-validate-plan] Cross-file duplicate stub injection: when type is in file_a and Plan/Apply are in file_b, both files classified as missing-ValidatePlan and -fix injected duplicate stubs. Now only inject in the file containing the receiver TypeSpec (`if ts == nil { skip }`); the type-file's own pass handles it. #2 [add-validate-plan] Embedded-field promoted ValidatePlan not detected; -fix would shadow it with a no-op stub, silently dropping real plan diagnostics. Added typeHasEmbeddedFields: if the receiver type has any embedded fields, suppress the missing classification (we can't statically resolve method promotion without full type info, so err on the side of NOT injecting). #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2. #4 [refactor-apply] ProviderID/Name/Type assignment-target whitelist didn't check struct identity. `audit.Type = ...` or `result.ProviderID = ...` (wrong struct) classified as canonical and dropped on rewrite. Now requires the LHS receiver to be `ref` (the canonical ResourceRef construction site name). Important (perf / determinism / lint precision): #5 [lint] O(n²) lintFile re-parsed every sibling per-call. Added lintDirCache: lintPath now groups files by directory and builds one parse cache per dir, reused across the directory's files. Per-call fallback retained for single-file invocation. #6 [refactor-plan] planLikeProviderMethodsInDir's dominant-package selection used range-over-map (random iteration), so on a package-count tie the dominant could differ across runs and rewrite against the wrong method set. Sort the package names so tie-break is lexicographic-first (deterministic). #7 [lint] AssertPlanDelegatesToHelper accepted ANY platform.ComputePlan call ANYWHERE in the body. Now requires the canonical SHAPE: either the 2-statement rev2 form (matches isAlreadyDelegatedPlanBody) OR a single-statement legacy `return <X>.Plan(...)` / `return <X>.ComputePlan(...)`. Bespoke wrappers that call the helper as an intermediate step now correctly flag. #8 [lint] AssertApplyDelegatesToHelper — same fix: now uses isAlreadyDelegatedApplyBody (the rewriter's idempotency check) so anything but the canonical single-statement form flags. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline.
intel352
added a commit
that referenced
this pull request
May 5, 2026
Round 11 surfaced 6 findings; all addressed: Critical (broken-output false-clean / mode clobber): #1 [lint] planBodyDelegatesCanonically accepted single-statement `return platform.ComputePlan(...)` (the BROKEN rev1 form, uncompilable due to value/pointer mismatch). Lint reported partially-migrated providers as clean, so migrate-providers silently missed them. Now ONLY the canonical 2-statement rev2 form OR legacy `return wfctlhelpers.Plan(...)` is accepted; the broken single-statement platform form falls through to non-canonical so lint surfaces the still-needs-fixup state. #2 [refactor-plan] writeFileAtomic left the temp file at os.CreateTemp's default 0600 mode; rename clobbered the source's original permissions (e.g., 0644 → 0600). Added writeFileAtomicBytesPreserveMode: captures original mode via os.Stat and chmods the temp file before rename. #5 [add-validate-plan] Same 0600 mode-clobber bug in writeFileAtomicBytes. Now delegates to writeFileAtomicBytesPreserveMode. Important (revert + comment polish): #3 [add-validate-plan] Round-10 #2's "any embedded field suppresses missing-ValidatePlan" was too broad — sync.Mutex, loggers, config mixins don't promote ValidatePlan, so real targets were silently missed. Reverted: report missing unconditionally. Maintainers whose providers actually promote ValidatePlan suppress with the explicit `// wfctl:skip-iac-codemod` marker. #4 [lint] AssertProviderImplementsValidatePlan — same revert as #3. #6 [refactor-plan] Stale enum comment for planAlreadyDelegated still referenced `wfctlhelpers.Plan` as the recognised shape; actual implementation recognises the 2-statement platform.ComputePlan form. Comment updated. Removed dead typeHasEmbeddedFields helper (both call sites reverted in #3/#4). Source-file mode preservation verified end-to-end: chmod 0644 → -fix → stat shows 0644 retained. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline.
intel352
added a commit
that referenced
this pull request
May 5, 2026
…tening findings Round 12 surfaced 8 findings; all addressed: Critical (CLI bug + silent rewrite of wrong file): #1 [main.go] Top-level dispatcher used a single FlagSet with only -dry-run and -fix registered, so any mode-specific flag (e.g. refactor-apply's -report-file) failed with "flag provided but not defined" BEFORE the mode could parse it. -report-file was documented but UNUSABLE from the CLI entrypoint. Replaced stdlib FlagSet with a manual-scan loop in run(): -dry-run/-fix are extracted; everything else (including unknown flags) flows through to the mode's own FlagSet. Bonus: flag-position flexibility (`/path -fix` now works), updated test + usage text accordingly. #2 [refactor-plan] Walked every .go file but built provs/typeDocs from only the dominant package. Mixed-package or build-tagged directories: a non-dominant file with overlapping receiver names was processed against another package's method set, rewriting the wrong file. Added dominantPackageForDir; each file processor now skips files in non-dominant packages. #3 [refactor-apply] Same fix as #2. #4 [add-validate-plan] Same fix as #2. Important (canonical-detection precision): #5 [refactor-plan] isPlanActionsAppendAssign didn't validate the appended action's payload — `plan.Actions = append(plan.Actions, PlanAction{Action: "queue"})` was misclassified as canonical and silently rewritten. Added `expectedAction` parameter; create branch requires `Action: "create"` and update branch requires `Action: "update"`. #6 [refactor-apply] hasCanonicalCases verified case labels but not that the body's driver call MATCHED the label. A `case "create"` body that called `.Update()` or `.Delete()` was misclassified and silently rewritten away. Added caseBodyMatchesLabel: scans each case body for driver method calls and verifies the label- to-method mapping (create→Create, update→Update, delete→Delete, replace→Update). #7 [refactor-apply] Driver-lookup check accepted any `<recv>.ResourceDriver(<arg>)` regardless of <arg>. wfctlhelpers always dispatches with `action.Resource.Type`, so providers using a different lookup key (e.g. action.Tag, computed value) would see different driver behavior on rewrite. Now requires the lookup key to be exactly `action.Resource.Type`. #8 [lint] looksLikeProvider checked method NAMES + rough arity, so any unrelated type with `Plan(...)` and `Apply(...)` was treated as a provider (e.g., a deploy strategy). Tightened to verify signature shapes via type-name suffix matching: Plan must be `Plan(ctx, []ResourceSpec, []ResourceState) (*IaCPlan, error)` and Apply must be `Apply(ctx, *IaCPlan) (*ApplyResult, error)`. Qualified or unqualified accepted via typeNameTailMatches. Smoke-tested: - `iac-codemod refactor-apply -report-file <path> <dir>` now works (previously: "flag provided but not defined") - DO plugin still reports DOProvider.Plan canonical / Apply upsert-recovery with stable upsertSupporter suggestion (T8.7 baseline preserved)
intel352
added a commit
that referenced
this pull request
May 5, 2026
…get (W-8 of 12) (#538) * feat(codemod): scaffold cmd/iac-codemod with 4-mode subcommand dispatcher T8.1: Adds cmd/iac-codemod skeleton with dispatcher for the four codemod modes — refactor-plan, refactor-apply, add-validate-plan, lint — and the shared -dry-run / -fix flag pair. Modes are registered via a map of modeFunc entries so subsequent tasks (T8.2-T8.5) can wire in real implementations file-by-file. Each mode currently delegates to a stub that prints a "not yet implemented" message and exits zero. Defaults: -dry-run is true; -fix opts into mutation and forces -dry-run to false. Unknown modes return exit 2 with usage. The // wfctl:skip-iac-codemod marker convention is documented in the package doc and usage text. Tests cover dispatch, default flag values, -fix semantics, unknown-mode handling, help routing, and positional-arg forwarding via a swappable modes map (no subprocess required). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(codemod): pin SkipMarker const + document flag ordering (T8.1 review) Addresses spec-reviewer findings on b76ab2f: 1. (BLOCKER) Extract `const SkipMarker = "// wfctl:skip-iac-codemod"` so T8.3-T8.5 parsers reference the canonical literal in one place. Plan rev2 (line 2400) unifies the four modes on this single marker specifically to prevent mismatched-marker silent-no-op surfaces; the const + TestSkipMarker_LiteralPinned + TestUsage_MentionsSkipMarker guards close the drift hole the reviewer flagged. usage() now formats the marker via the const rather than a duplicated string literal. 2. (MINOR) usage() documents the stdlib flag-parser ordering constraint (flags must precede paths). TestRun_FlagAfterPath_SilentlyTreatedAsPositional pins the failure mode so it is intentional, not a parser bug, and so future maintainers see the constraint exercised in tests. 3. (NIT) stubMode's unused args parameter renamed to _; cosmetic only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(codemod): close -dry-run=false mutation-gate bypass (T8.1 review #4) Spec-reviewer round-2 finding (commit 26ac916): the dispatcher only forced DryRun=false on -fix, but did NOT prevent a user-supplied -dry-run=false from leaving the gate open. With the natural mode predicate `if !opts.DryRun { mutate() }`, this would silently bypass the explicit -fix gate that plan §W-8 line 2347 names as the sole mutation entry point ("-dry-run flag default true; -fix opts into mutation"). Fix: normalize the gate at the dispatcher boundary — when Fix is set, DryRun=false; when Fix is unset, DryRun=true regardless of what the user passed via -dry-run=. Fix is now the single source of truth for "may I mutate?", so any natural mode predicate is safe by construction. Options.DryRun's doc comment now states this contract explicitly so T8.2-T8.5 implementers cannot reach for the wrong predicate. Tests pin all three cases: - -dry-run=false alone → DryRun stays true (the bypass) - -fix -dry-run=false → mutation authorized (Fix wins) - -dry-run=true -fix → mutation authorized (Fix wins) Also adds TestPackageDoc_MentionsSkipMarker (process note #6) — cheap file-content guard so a future SkipMarker rename trips a test rather than silently desyncing the package doc comment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(codemod): warn future maintainers off t.Parallel in main_test.go (T8.1 review #5) Code-reviewer round-3 authorized now-fix: tests in this file mutate the package-global `modes` map under defer-restore. -race is currently clean because no test calls t.Parallel(), but the swap-and-restore pattern is a latent data race the next agent (T8.2-T8.5) could trigger by adding parallelism. Top-of-file guard comment names the constraint and points at the dependency-injection refactor as the unlock path if parallelism is ever required. Comment-only change; tests still pass with -race. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(codemod): lint mode with 4 static-check assertions T8.2: Wires the lint subcommand using golang.org/x/tools/go/analysis with the four assertions named in plan §T8.2: AssertPlanDelegatesToHelper — provider Plan() delegates to wfctlhelpers.Plan AssertApplyDelegatesToHelper — provider Apply() delegates to wfctlhelpers.ApplyPlan AssertDiffSetsNeedsReplaceForForceNew — driver Diff() sets NeedsReplace on ForceNew AssertProviderImplementsValidatePlan — provider satisfies ProviderValidator Carry-forwards from T8.1 review baked in: 1. Dispatcher fs.Usage override (main.go:run) so `iac-codemod <mode> -h` produces the global usage rather than the per-FlagSet banner. Pinned by TestRun_HelpAfterMode_PrintsGlobalUsage across all 4 modes. 2. Mutation-gate negative test pinning lint-is-read-only-by-definition: TestRunLint_DoesNotMutateFilesEvenWithFixFlag invokes lint with hostile {Fix:true, DryRun:false} flags and asserts mtime + content unchanged. Plus TestRunLint_FixFlag_WarnsItHasNoEffect surfaces a warning so users know -fix did nothing. 3. Skip-marker honored at func-doc and type-doc levels via hasSkipMarkerOn(fn.Doc) / ts.Doc; skipped sites flow through the pass.Report channel with a [skipped] prefix and are split into a separate report section by lintReport.unpackSkippedFromFindings. Plan rev2 (line 2400) requires each mode to surface a list of skipped sites in its report — pinned by TestRunLint_SkipMarker_SurfacedInReport. Precision: all helper-call analyzers gate on providerLikeReceivers (method set must contain BOTH Plan + Apply matching IaCProvider shape) to avoid false-positive flags on deploy targets and other Apply-shaped types. Manual verification against the workflow repo went from 9 findings (incl. 2 false positives in pkg/k8s) down to 7 (all genuine provider implementations awaiting v2 migration). Implementation notes: - File-by-file analysis via parser.ParseFile + tolerant types.Check (stub importer ignores unresolved imports). This works on plugin sources that haven't vendored their dependencies. Cross-file references won't resolve, but IaC providers and drivers are typically co-located by Go convention. - Skip-marker is encoded as a synthetic diagnostic with a `[skipped]` prefix; the driver post-processes it out of the findings list. This keeps the analyzer API surface to one channel. - go.mod: promotes golang.org/x/tools from indirect to direct. No new modules, no go.sum changes. Verification: 33/33 tests pass with -race; binary smoke-tested against workflow repo root (7 findings, exit 1). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(codemod): T8.2 review — stdout for -h, marker context, precision filter Spec-reviewer round-2 on commit 2908fa1; addresses 5 substantive findings + 1 nit (findings 5 & 6 are PR-body notes, no code change): 1. (BLOCKER) `iac-codemod <mode> -h` now prints the global usage to STDOUT, matching `iac-codemod -h` and the kubectl/git/gh convention for help-on-success. Previously it landed on STDERR via the FlagSet's SetOutput handler. Pinned by TestRun_HelpAfterMode_PrintsGlobalUsageToStdout — asserts stream specifically rather than the union of stdout+stderr (the prior test would have passed even with stderr output). Parse-error noise still flows through stderr; only the help-text body moved to stdout. 2. (MEDIUM) hasSkipMarkerOn now accepts a trailing space + arbitrary justification text after SkipMarker: // wfctl:skip-iac-codemod legacy upsert recovery, see ADR-042 Annotating WHY a site is skipped is a Go idiom; silently ignoring the marker because of trailing context would replicate the exact silent-no-op surface plan rev2 line 2400 unifies the marker to prevent. Two new tests pin both sides of the contract: - TestSkipMarker_AcceptsTrailingJustification - TestSkipMarker_RejectsCloseButWrongMarker (negative — the legacy `// wfctl:skip-codemod` prefix from design rev1 must still flag the diagnostic) 3. (MEDIUM) AssertDiffSetsNeedsReplaceForForceNew now gates on a new driverLikeReceivers helper (method set must contain Diff AND at least one canonical companion: Read/Create/Update/Delete). Brings the analyzer in line with the precision treatment Plan/Apply already had via providerLikeReceivers. New TestAssertDiffSetsNeedsReplaceForForceNew_NonDriverNotFlagged pins the negative case (a SettingsDiff struct with just Diff() is correctly invisible to the analyzer). 4. (LOW-MEDIUM) bodyAssignsFieldTrue → bodyAssignsField: the matcher now accepts ANY RHS, not just literal `= true`. The terser canonical pattern `r.NeedsReplace = c.ForceNew` is equally valid expression of the W-3 force-new contract; flagging it was a false positive previously hit by cmd/wfctl/deploy_providers.go remoteResourceDriver (which propagates NeedsReplace from a gRPC response via `result.NeedsReplace, _ = res["needs_replace"].(bool)`). Pinned by TestAssertDiffSetsNeedsReplaceForForceNew_AcceptsDirectAssign. 7. (NIT) Removed dead/misleading comment in lintFile that referenced a never-implemented passSkippedSink scratch field. Findings 5 & 6 (no code change — PR-body notes for team-lead): 5. Plan §T8.2 line 2363 says `golang.org/x/tools/go/analysis/passes` framework, but `/passes` is the directory of canonical reusable analyzers. The actual framework is `golang.org/x/tools/go/analysis` (which is what we import). Likely a plan typo; flag for post-merge retrospective. 6. go.mod promotes golang.org/x/tools from indirect to direct. Already-transitive dep, no go.sum changes, no new modules. Should be fine but flagged for team-lead per W-7 trigger-list rigor. Smoke-test re-verification on workflow repo: 6 genuine findings (down from 7), zero false positives. -h now correctly streams to stdout for both top-level and per-mode invocations. 37/37 tests pass with -race; build clean; vet clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(codemod): T8.2 review round-2 — tab-delimited marker, literal-false guard, adjacent-suffix rejection * feat(codemod): refactor-plan mode (canonical pattern detection + rewrite); honors // wfctl:skip-iac-codemod marker * feat(codemod): refactor-apply with informative non-canonical idiom reports; honors // wfctl:skip-iac-codemod marker * feat(codemod): add-validate-plan mode (no-op stub injection); honors // wfctl:skip-iac-codemod marker * chore(make): add migrate-providers target for workspace-wide codemod * fix(codemod): T8.7 verification — exclude _worktrees and other underscore-prefixed dirs from walk * fix(codemod): Copilot review round 1 — 9 critical + 2 important findings Round-1 review on PR #538 surfaced 11 substantive findings; all addressed: Critical (real bugs that broke compile or silently dropped logic): #1 [lint, refactor-plan] Rewrite target wrong — `wfctlhelpers.Plan` does not exist in the repo today. Pivoted to `platform.ComputePlan` (the real helper at platform/differ.go:72). Both targets now accepted by the lint analyzer for forward-compat with rev0 fixtures. Plan-doc §T8.3 named the wrong helper; flagged for retro. #2 [refactor-plan] rewritePlanBody only renamed `_` ctx params. A method declared `Plan(c context.Context, ...)` would be rewritten referencing undefined `ctx`. Now: any non-blank ctx-name preserved; only blank `_` renamed to `ctx`. #3 [refactor-plan] isCanonicalPlanBody too loose — extra side-effects inside the desired loop still classified as canonical. Tightened to require exactly the 3-statement template (lookup + !exists guard + configHash compare), no else branches, no trailing junk. Regression test: TestRefactorPlan_ExtraLoggingNotCanonical. #4 [refactor-plan, refactor-apply] SkipMarker only consulted on fn.Doc. PR description promised type-doc + GenDecl-doc honoring. Added receiverTypeDocs + carriesMarker; both modes now check all 3 doc levels. #5 [refactor-apply] hasCanonicalCases only checked case labels. Bespoke bookkeeping inside a case body (logging, metrics, alternate driver calls) classified as canonical and would be silently dropped on -fix. Added caseBodyIsCanonical whitelist (driver call, ResourceRef construction, ProviderID guard). Regression test: TestRefactorApply_ExtraBookkeepingNotCanonical. #6 [refactor-apply] custom-error-wrapping suggestion named fictional APIs (ApplyResultErrorHook / WrapActionError). Replaced with honest hand-port advice: skip-marker + manual switch, OR move wrap into driver methods so wfctlhelpers records it verbatim. #7 [add-validate-plan] Stub always emitted unqualified `*IaCPlan` / `[]PlanDiagnostic`. Files importing the interfaces module under a qualifier (e.g. `*interfaces.IaCPlan`) failed to compile after -fix. Added interfacesQualifier detector + qualified stub emission. Regression: TestAddValidatePlan_Fix_QualifiedSignature. #8 [add-validate-plan, lint] hasValidatePlanMethod / AssertProviderImplementsValidatePlan checked method NAME only. Wrong-signature ValidatePlan (e.g. takes a string) was treated as compliant even though interfaces.ProviderValidator wouldn't be satisfied. Added validatePlanSignatureMatches: shape-checks the receiver param + return slice (qualified-or-unqualified). Both callers now use it. Regression: TestAddValidatePlan_DryRun_FlagsWrongSignature. #9 [refactor-plan, refactor-apply, add-validate-plan] Single-file pass — providers whose Plan + Apply lived in sibling files were silently omitted. Added planLikeReceiversInDir: directory-wide method-set scan. Per-file fallback retained for isolated single- file targets. Important: #10 [lint] Per-file parse/type-check errors accumulated in report.errors but exit code stayed 0 if there were no findings — green CI hid coverage gaps. Now exits 1 on either findings OR errors. #11 [refactor-apply] -report-file mode flag never appeared in usage text. Documented in main.go's global usage block (the `-h` path intercepts before the per-mode FlagSet). Plan-doc gap surfaced for retro: §T8.3 line 2373 reads "replaces with `return wfctlhelpers.Plan(ctx, p, desired, current)`", but no such function exists; reality is `platform.ComputePlan`. Recurring defect class (plan-literal vs reality gap, W-4/W-5/W-7/W-9/W-8). Documented in planHelperImportPath docstring + this commit body. * fix(codemod): Copilot review round 2 — 5 critical + 4 important findings Round 2 surfaced 9 substantive findings; all addressed: Critical (compile-break / contract-break): #1 [refactor-plan, lint] platform.ComputePlan returns IaCPlan BY VALUE, but provider Plan methods return *IaCPlan. Single-statement `return platform.ComputePlan(...)` rewrite produced uncompilable code. Switched to canonical 2-statement form: plan, err := platform.ComputePlan(ctx, p, desired, current) return &plan, err isAlreadyDelegatedPlanBody widened to recognise both the new shape and the legacy single-statement forms (idempotent across revs). #3 [refactor-plan] rewritePlanBody fell back to recvName="p" but didn't update the receiver decl when the source had an unnamed receiver (`func (*Provider) Plan(...)`). Rewritten call referenced undefined `p`. Added ensureReceiverName: injects identifier and mutates the AST. Regression: TestRefactorPlan_Fix_UnnamedReceiverGetsName. Also added: TestRefactorPlan_Fix_PreservesCustomCtxName for round-1 finding #2 (custom ctx name preserved). #4 [refactor-apply] Same unnamed-receiver bug as #3. Same fix (ensureReceiverName + ensureCtxParamName + ensureNthParamName helpers shared with refactor-plan). Regression: TestRefactorApply_Fix_UnnamedReceiverGetsName. #5 [add-validate-plan] Stub always emitted `func (p *T) ValidatePlan(...)` even when the type used value receivers. Method-set mismatch made the type fail interfaces.ProviderValidator type assertion. Added providerReceiverConvention + receiverIsPointer; stub now matches the existing Plan/Apply convention. Regression: TestAddValidatePlan_Fix_ValueReceiverConvention. Important (skip-marker not honored in lint, single-file pass): #6 [lint] AssertPlanDelegatesToHelper checked fn.Doc only, ignoring type-doc and GenDecl-doc skip markers. Added receiverTypeDocsForPass helper; analyzer now checks all 3 doc levels. #7 [lint] AssertApplyDelegatesToHelper — same fix as #6. #8 [lint] AssertDiffSetsNeedsReplaceForForceNew — same fix as #6. #9 [lint] lintFile passed only the target file to the analyzers, so cross-file method sets were invisible (same blind spot the refactor-* modes had in round 1). Now lintFile loads sibling non-test .go files from the same package directory and feeds the full slice to each analyzer; diagnostics for sibling files are dropped (the outer walker visits them in their own turn) so no duplicate findings. All 4 modes now compile-clean rewrites + honor 3-level skip-marker + package-aware method-set detection. * fix(codemod): Copilot review round 3 — 6 critical + 1 important findings Round 3 surfaced 7 substantive findings; all addressed: Critical (compile-break / silent data loss): #1 [add-validate-plan] Directory-wide detection only widened `provs` in round 2; methodsByRecv stayed file-local. A provider with ValidatePlan in a sibling file (or value-receiver Plan/Apply declared elsewhere) would receive a duplicate or wrong-receiver stub. Now planLikeProviderMethodsInDir returns both the recv set AND the merged method slice; methodsByRecv carries the package-wide view (deduped by method name). Stub injection still only fires when typeDecls[recv] is non-nil so we never append to a sibling file. #2 [refactor-plan] isCanonicalPlanBody accepted ANY 2-result return statement at the trailing slot. A planner with the canonical scaffold but a bespoke return (cloned plan, propagated error value) would classify as canonical and the bespoke logic would be silently dropped. Tightened to require EXACTLY `return plan, nil`. #3 [refactor-plan] rewritePlanBody hardcoded "desired"/"current" as args. A canonical Plan with renamed params (e.g. `Plan(ctx, specs, state)`) would rewrite to references to undefined identifiers. ensureNthParamName now extracts the actual signature names. #4 [refactor-plan] rewritePlanBody hardcoded "platform" as the call selector. A file using `pf "github.com/.../platform"` wouldn't compile because `platform` is undefined (ensureImport sees the aliased import as satisfying the path check). Added pkgAliasFor helper; rewrite now uses whatever local name the file imports under. #5 [refactor-apply] caseBodyIsCanonical accepted ANY AssignStmt as canonical. Bookkeeping AssignStmts (metrics counters, map updates, accumulators) passed and would be silently dropped. Tightened to a narrow whitelist: multi-target driver call, single-target driver call (LHS=err), composite-literal construction, selector-assignment to ResourceRef-style fields (ProviderID/Name/Type). Anything else rejected. #6 [refactor-apply] Same import-alias issue as #4 for `wfctlhelpers`. pkgAliasFor reused; rewriteApplyBody now uses whatever local name the file imports under. Important: #7 [lint] AssertProviderImplementsValidatePlan checked ts.Doc only, missing markers placed on the wrapping GenDecl. Aligns now with the receiverDoc.carriesMarker pattern used by the other 3 analyzers (round-2 #6/#7/#8). typeDocsByName captures both TypeSpec.Doc and GenDecl.Doc. Round-2 regression tests retained (TestRefactorPlan_Fix_UnnamedReceiverGetsName, TestRefactorPlan_Fix_PreservesCustomCtxName, TestRefactorApply_Fix_UnnamedReceiverGetsName, TestAddValidatePlan_Fix_ValueReceiverConvention). Round-3 fix verified end-to-end against an aliased-import fixture (pf "github.com/.../platform" + wfh "github.com/.../wfctlhelpers"): the rewritten output compiles cleanly under gofmt. * fix(codemod): Copilot review round 4 — 6 critical-detection-loosening findings Round 4 surfaced 6 findings, all real. The recurring theme: rev3's pattern detectors were either too loose (accepted bookkeeping shapes as canonical) or too rigid (literal package-name matching, breaking on aliased imports). Fixes: #1 [add-validate-plan] interfacesQualifier(file) returned "" when the type-only file (no Plan/Apply imports) received the stub via cross-file detection (round-3 #1). Stub then emitted unqualified types that wouldn't compile. Now: when the file lacks an interfaces import but ANY sibling does, fall back to "interfaces" qualifier AND inject the interfaces import into the type-file via AST printing (format.Node) before appending the stub. Added siblingUsesInterfacesImport helper. #2 [refactor-apply] isCanonicalCaseAssign accepted ANY composite literal (`x := <CompositeLit>`) as canonical. A bookkeeping struct construction (audit payload, metric envelope) silently passed. Tightened to require the literal type's name (qualified or unqualified) match "ResourceRef". #3 [refactor-apply] isDriverMethodCall only checked selector NAME (Create/Read/Update/Delete). Calls like `helper.Update(...)` or `metrics.Delete(...)` were misclassified as canonical driver dispatch. Added receiver-allowlist check: only `d`, `drv`, or `driver` accepted as driver-bound identifiers (matching the standard `d, err := p.ResourceDriver(...)` pattern in DO/AWS/GCP/Azure). #4 [refactor-apply, refactor-plan] isAlreadyDelegatedApplyBody and isAlreadyDelegatedPlanBody required literal `wfctlhelpers` / `platform` package idents. Files using aliased imports (`wf "..."`, `pf "..."`) were misreported as non-canonical even though they were valid delegations. Both functions now resolve the file's local alias via pkgAliasFor; literal names retained as fallbacks. Same fix for isPlatformComputePlanAssign (the helper inside isAlreadyDelegatedPlanBody). #5 [lint] AssertPlanDelegatesToHelper / AssertApplyDelegatesToHelper selector matchers required literal `platform` / `wfctlhelpers` package names. Same false-positive risk as #4 for aliased imports. Both analyzers now resolve the alias and accept either the aliased OR literal form. #6 [refactor-apply] caseBodyIsCanonical accepted ANY DeclStmt as canonical, so `var x SomeBookkeepingType` declarations passed even though they're exactly the bespoke logic the codemod is supposed to preserve. Tightened via isLocalOutPointerDecl: only `var <name> *<ResourceOutput-suffix>` accepted. Smoke-tested against an aliased-import fixture (`wf "...wfctlhelpers"` + `pf "...platform"`): - refactor-apply correctly classifies as already-delegated (was: misreported as missing-action-switch) - lint reports 0 findings (was: false-positive AssertPlanDelegatesToHelper + AssertApplyDelegatesToHelper) * fix(codemod): Copilot review round 5 — 9 deeper-detection findings Round 5 surfaced 9 findings; all addressed. Recurring theme: the detectors and reporters needed deeper structural verification (branch contents, outer-shape, receiver-kind, package isolation, exit-code semantics) — not just shape matching at one level. Critical (silent data loss / repair regression): #1 [refactor-plan] rangeBodyMatchesCanonicalDesired only checked the guard expressions and statement count; never inspected what the `!exists` and `configHash != configHash` branch BODIES did. A planner with extra logic (telemetry, alternate action construction, different create/update payload) inside those branches was silently rewritten away. Added isCanonicalCreateBranchBody + isCanonicalUpdateBranchBody + isPlanActionsAppendAssign to verify the create branch is exactly `append+continue` and the update branch is exactly `append`. #2 [refactor-apply] classifyApplyBody verified only the switch shape; setup/teardown/result aggregation OUTSIDE the switch was silently dropped on -fix. Added isCanonicalApplyOuterShape: the Apply body must be exactly the 3-statement scaffold (result-init + range-loop + return result, nil). #3 [add-validate-plan] hasValidatePlanMethod ignored receiver kind. A value-receiver provider with a pointer-receiver ValidatePlan still failed the ProviderValidator type assertion (method-set on `T` does not include `*T` methods), but rev2 treated it as already-implemented. Now also requires receiver-kind match. #4 [lint] AssertProviderImplementsValidatePlan had the same receiver-kind blind spot. Now delegates to hasValidatePlanMethod (centralised + DRY). #5 [refactor-plan] isAlreadyDelegatedPlanBody accepted single-statement `return platform.ComputePlan(...)` (broken rev1 form) as already-delegated, so rerunning the fixed codemod never repaired output from the earlier broken rewrite. Now ONLY accepts the canonical 2-statement form; broken single-statement forms classify as non-canonical so a fresh -fix produces compilable output. #6 [refactor-plan] planLikeProviderMethodsInDir merged methods from every non-test .go file regardless of `package P` clause. Mixed- package or build-tagged directories could fold methods from unrelated packages into a synthetic provider. Added two-pass package-clause check: aggregate only files matching the dominant package. Important (CI fidelity / detector recall): #7 [Makefile, lint] `|| true` in migrate-providers swallowed real execution failures alongside expected advisory findings, because lint returned 1 for both findings AND parse errors. Split the exit codes: 0 clean / 1 findings / 2 errors. Makefile now gates on `[ $? -ne 0 ] && [ $? -ne 1 ]` so parse errors fail the target. #8 [refactor-plan] Canonical matcher hardcoded the lookup flag name as `exists`. The semantically-identical `cur, ok :=` idiomatic Go form was reported non-canonical. Widened to accept both `exists` and `ok`. #9 [refactor-apply] isDriverMethodCall allowlist {d, drv, driver} missed common alternates. Widened to {d, dr, drv, rdrv, driver, resourceDriver}. Still rejects bookkeeping receivers like `metrics`, `audit`, `helper` (preserves round-4 #3 fix). End-to-end verification: lint against DO plugin produces exit 1 (3 advisory findings, no errors); broken-Go-source produces exit 2; clean source produces exit 0. Smoke-tested via /tmp/iac-codemod. * fix(codemod): Copilot review round 6 — type-doc skip-marker honored across sibling files Round 6 surfaced 1 finding: #1 [refactor-plan, refactor-apply, lint] receiverTypeDocs ran per-file only, so a `// wfctl:skip-iac-codemod` marker placed on a SIBLING file's type declaration was ignored when processing methods in the primary file. Round-3's directory-wide method-set scan made this layout possible (provider type in types.go, Plan/Apply in provider.go, skip-marker on the type), but the type-doc lookup wasn't widened in tandem. Effectively: providers explicitly opted out at the type-doc level were still rewritten if their methods were in a different file from the type. Fix: - Added receiverTypeDocsInDir(dir, primary) — merges receiverTypeDocs across every non-test .go file in dir whose `package P` matches the dominant package. Honors the same dominant-package filter introduced in round-5 #6 to keep build-tagged / mixed-package directories safe. - refactor-plan + refactor-apply switched from receiverTypeDocs(file) to receiverTypeDocsInDir(filepath.Dir(path), file). - lint's receiverTypeDocsForPass refactored to build a SINGLE merged map across pass.Files (which is already directory-wide after round-2 #9) and return it per-file. First-occurrence wins. add_validate_plan unaffected: stub injection only fires when typeDecls[recv] != nil (type IS in the current file), so its skip-marker check on ts.Doc was never the cross-file scenario. * fix(codemod): Copilot review round 7 — 6 cross-file + detection-tightening findings Round 7 surfaced 10 findings; 4 were stale (already fixed in R6). 6 real findings addressed: Critical (compile-break / silent data loss): #1 [refactor-plan] isPlanActionsAppendAssign verified the LHS but not append's first argument. A bespoke `plan.Actions = append(otherSlice, ...)` was misclassified as canonical and the alternate-slice logic silently dropped during rewrite. Now both LHS and append's first arg must reference plan.Actions. #3+#9 [refactor-apply] isCanonicalApplyOuterShape only checked the outer 3-statement scaffold; per-action logic INSIDE the for-loop body (logging, metrics, custom error handling, accumulators) was silently dropped on -fix. Added isCanonicalApplyLoopBody + isCanonicalApplyLoopAssign + isCanonicalApplyLoopIf + isCanonicalApplyLoopIfBodyStmt: every loop-body statement must match a tight whitelist (driver lookup, var-out decl, action switch, err-/out-guard ifs). #7+#8 [add-validate-plan] provs[recv].Pos() panicked when the TypeSpec was nil (cross-file scenario from round-3 #1: type declaration in sibling file). Now defaults Pos to NoPos for nil specs; sort still works (stable on name when Pos ties). Important (cross-file consistency): #4 [add-validate-plan] qualifier fallback to "interfaces" fired based on whether ANY sibling imported interfaces — unreliable if THIS provider uses local types but an unrelated sibling imports interfaces. Replaced with qualifierFromProviderMethods: inspects the provider's OWN Plan/Apply parameter types (directory-wide via round-3 #1) for the qualifier they use. #5 [add-validate-plan] skip-marker check only consulted typeDecls (current file). When Plan/Apply are here but the type with `// wfctl:skip-iac-codemod` lives in a SIBLING file, the marker was ignored. Added siblingTypeDocs lookup via receiverTypeDocsInDir (the round-6 helper). #10 [add-validate-plan] sibling-method merge deduped by method NAME only. If local file has wrong-signature ValidatePlan and sibling has correct one, sibling dropped, hasValidatePlanMethod saw only bad declaration, injected duplicate stub. Replaced with isLocalDuplicate: dedupes by name + parameter arity + result arity, so distinct signatures both survive. Stale findings (already fixed in R6, no action needed): #2 refactor-apply receiverTypeDocsInDir already in place #6 lint receiver-doc lookup already merged via receiverTypeDocsForPass Smoke-tested against DO plugin: refactor-plan reports DOProvider.Plan canonical, refactor-apply reports DOProvider.Apply upsert-recovery with the upsertSupporter suggestion. Output matches T8.7 baseline. * fix(codemod): Copilot review round 8 — 9 dedup + skip-marker + identifier-name findings Round 8 surfaced 9 findings; all addressed: Critical (silent data loss / behavior change): #1 [add-validate-plan] isLocalDuplicate compared by name+arity only. Wrong-signature ValidatePlan(name string) []PlanDiagnostic and correct ValidatePlan(plan *IaCPlan) []PlanDiagnostic have same arity but different types — sibling-correct dropped, duplicate stub injected. Replaced with signature-fingerprint dedupe (signatureFingerprint + typeFingerprint walk all type shapes). #4 [refactor-apply] `default:` case clauses accepted without body inspection. Logging/metrics in default body silently dropped. Added isCanonicalDefaultBody: only `err = fmt.Errorf("unknown action %q", ...)` accepted. #5 [refactor-apply] isCanonicalApplyLoopAssign accepted any `<x>.ResourceDriver(...)`. `helper.ResourceDriver(...)` / `plan.ResourceDriver(...)` falsely classified. Now requires the receiver to match the provider's own receiver identifier (threaded through from classifyApplyBody). #8 [refactor-apply] Bare `if err != nil { continue }` accepted as canonical, but wfctlhelpers ALWAYS records ActionError before continuing — the rewrite would silently change behavior. Now requires the if-body to ALSO append to result.Errors before any continue/break. Important (skip-marker scope + identifier flexibility): #2 [add-validate-plan] Skip-marker check fired on EVERY method's fn.Doc — a marker on Destroy/Status/etc. accidentally suppressed the whole provider's analysis. Restricted to Plan/Apply (the provider-defining methods). #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2. #6 [refactor-plan] Canonical detector hardcoded `current`/`desired` body identifiers. Providers using `state`/`specs` reported non-canonical despite rewriter preserving names. Added nthParamName extraction; isCanonicalPlanBody now takes the actual parameter names. #7 [refactor-apply] Driver-receiver allowlist comment claimed `rd` accepted, but the switch was missing it. Added. #9 [refactor-apply] Canonical detector hardcoded `result` /`plan` identifier names. Providers using `res` /`pl` rejected. Now recovers actual identifier from signature (planName) and from statement-1 LHS (resultName); both must be consistent within the body but can be any identifier. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline. Removed redundant TestRefactorApply_Fix_UnnamedReceiverGetsName: the unnamed-receiver path can't have a canonical-shape Apply body (`<recv>.ResourceDriver(...)` requires recv in scope). Receiver-name injection is shared between refactor-plan and refactor-apply via ensureReceiverName; coverage stays in TestRefactorPlan_Fix_UnnamedReceiverGetsName. * fix(codemod): Copilot review round 9 — 4 behavior-preservation findings Round 9 surfaced 4 findings; all addressed: Critical (silent behavior change): #1 [refactor-apply] If-guard body accepted bare `break`, but wfctlhelpers.ApplyPlan records the error and KEEPS processing later actions. A `break` would silently change loop semantics on rewrite. Now only `continue` is accepted in if-guard bodies. #2 [refactor-apply] Driver-method allowlist accepted `Driver` / `DriverFor` alongside `ResourceDriver`. wfctlhelpers dispatches SPECIFICALLY through IaCProvider.ResourceDriver; a wrapper like `provider.Driver(...)` would have its caching/instrumentation bypassed. Restricted to `ResourceDriver` only. Important (false positives / cross-file alias mismatch): #3 [add-validate-plan, lint] Receiver-kind enforcement was too strict. Per Go spec, `*T`'s method set includes BOTH pointer-receiver and value-receiver methods of T. So a value-receiver ValidatePlan on a pointer-receiver provider IS valid (satisfies ProviderValidator). hasValidatePlanMethod now only requires strict matching when the provider uses VALUE receivers (T's method set excludes *T methods). #4 [add-validate-plan] When the qualifier was derived from a sibling method's aliased import (e.g. `iface "github.com/.../interfaces"`), the post-loop import injection used unaliased `ensureImport`, leaving the stub's `iface.IaCPlan` referring to undefined `iface`. Added ensureImportAs helper; now the import alias matches the stub's qualifier. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline. * fix(codemod): Copilot review round 10 — 8 cross-file + perf + tightening Round 10 surfaced 8 findings; all addressed: Critical (cross-file duplicate stub / silent override): #1 [add-validate-plan] Cross-file duplicate stub injection: when type is in file_a and Plan/Apply are in file_b, both files classified as missing-ValidatePlan and -fix injected duplicate stubs. Now only inject in the file containing the receiver TypeSpec (`if ts == nil { skip }`); the type-file's own pass handles it. #2 [add-validate-plan] Embedded-field promoted ValidatePlan not detected; -fix would shadow it with a no-op stub, silently dropping real plan diagnostics. Added typeHasEmbeddedFields: if the receiver type has any embedded fields, suppress the missing classification (we can't statically resolve method promotion without full type info, so err on the side of NOT injecting). #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2. #4 [refactor-apply] ProviderID/Name/Type assignment-target whitelist didn't check struct identity. `audit.Type = ...` or `result.ProviderID = ...` (wrong struct) classified as canonical and dropped on rewrite. Now requires the LHS receiver to be `ref` (the canonical ResourceRef construction site name). Important (perf / determinism / lint precision): #5 [lint] O(n²) lintFile re-parsed every sibling per-call. Added lintDirCache: lintPath now groups files by directory and builds one parse cache per dir, reused across the directory's files. Per-call fallback retained for single-file invocation. #6 [refactor-plan] planLikeProviderMethodsInDir's dominant-package selection used range-over-map (random iteration), so on a package-count tie the dominant could differ across runs and rewrite against the wrong method set. Sort the package names so tie-break is lexicographic-first (deterministic). #7 [lint] AssertPlanDelegatesToHelper accepted ANY platform.ComputePlan call ANYWHERE in the body. Now requires the canonical SHAPE: either the 2-statement rev2 form (matches isAlreadyDelegatedPlanBody) OR a single-statement legacy `return <X>.Plan(...)` / `return <X>.ComputePlan(...)`. Bespoke wrappers that call the helper as an intermediate step now correctly flag. #8 [lint] AssertApplyDelegatesToHelper — same fix: now uses isAlreadyDelegatedApplyBody (the rewriter's idempotency check) so anything but the canonical single-statement form flags. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline. * fix(codemod): Copilot review round 11 — 6 polish + revert findings Round 11 surfaced 6 findings; all addressed: Critical (broken-output false-clean / mode clobber): #1 [lint] planBodyDelegatesCanonically accepted single-statement `return platform.ComputePlan(...)` (the BROKEN rev1 form, uncompilable due to value/pointer mismatch). Lint reported partially-migrated providers as clean, so migrate-providers silently missed them. Now ONLY the canonical 2-statement rev2 form OR legacy `return wfctlhelpers.Plan(...)` is accepted; the broken single-statement platform form falls through to non-canonical so lint surfaces the still-needs-fixup state. #2 [refactor-plan] writeFileAtomic left the temp file at os.CreateTemp's default 0600 mode; rename clobbered the source's original permissions (e.g., 0644 → 0600). Added writeFileAtomicBytesPreserveMode: captures original mode via os.Stat and chmods the temp file before rename. #5 [add-validate-plan] Same 0600 mode-clobber bug in writeFileAtomicBytes. Now delegates to writeFileAtomicBytesPreserveMode. Important (revert + comment polish): #3 [add-validate-plan] Round-10 #2's "any embedded field suppresses missing-ValidatePlan" was too broad — sync.Mutex, loggers, config mixins don't promote ValidatePlan, so real targets were silently missed. Reverted: report missing unconditionally. Maintainers whose providers actually promote ValidatePlan suppress with the explicit `// wfctl:skip-iac-codemod` marker. #4 [lint] AssertProviderImplementsValidatePlan — same revert as #3. #6 [refactor-plan] Stale enum comment for planAlreadyDelegated still referenced `wfctlhelpers.Plan` as the recognised shape; actual implementation recognises the 2-statement platform.ComputePlan form. Comment updated. Removed dead typeHasEmbeddedFields helper (both call sites reverted in #3/#4). Source-file mode preservation verified end-to-end: chmod 0644 → -fix → stat shows 0644 retained. Smoke-tested against DO plugin: refactor-plan / refactor-apply still report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with stable upsertSupporter suggestion. Output matches T8.7 baseline. * fix(codemod): Copilot review round 12 — 8 dispatcher + detection-tightening findings Round 12 surfaced 8 findings; all addressed: Critical (CLI bug + silent rewrite of wrong file): #1 [main.go] Top-level dispatcher used a single FlagSet with only -dry-run and -fix registered, so any mode-specific flag (e.g. refactor-apply's -report-file) failed with "flag provided but not defined" BEFORE the mode could parse it. -report-file was documented but UNUSABLE from the CLI entrypoint. Replaced stdlib FlagSet with a manual-scan loop in run(): -dry-run/-fix are extracted; everything else (including unknown flags) flows through to the mode's own FlagSet. Bonus: flag-position flexibility (`/path -fix` now works), updated test + usage text accordingly. #2 [refactor-plan] Walked every .go file but built provs/typeDocs from only the dominant package. Mixed-package or build-tagged directories: a non-dominant file with overlapping receiver names was processed against another package's method set, rewriting the wrong file. Added dominantPackageForDir; each file processor now skips files in non-dominant packages. #3 [refactor-apply] Same fix as #2. #4 [add-validate-plan] Same fix as #2. Important (canonical-detection precision): #5 [refactor-plan] isPlanActionsAppendAssign didn't validate the appended action's payload — `plan.Actions = append(plan.Actions, PlanAction{Action: "queue"})` was misclassified as canonical and silently rewritten. Added `expectedAction` parameter; create branch requires `Action: "create"` and update branch requires `Action: "update"`. #6 [refactor-apply] hasCanonicalCases verified case labels but not that the body's driver call MATCHED the label. A `case "create"` body that called `.Update()` or `.Delete()` was misclassified and silently rewritten away. Added caseBodyMatchesLabel: scans each case body for driver method calls and verifies the label- to-method mapping (create→Create, update→Update, delete→Delete, replace→Update). #7 [refactor-apply] Driver-lookup check accepted any `<recv>.ResourceDriver(<arg>)` regardless of <arg>. wfctlhelpers always dispatches with `action.Resource.Type`, so providers using a different lookup key (e.g. action.Tag, computed value) would see different driver behavior on rewrite. Now requires the lookup key to be exactly `action.Resource.Type`. #8 [lint] looksLikeProvider checked method NAMES + rough arity, so any unrelated type with `Plan(...)` and `Apply(...)` was treated as a provider (e.g., a deploy strategy). Tightened to verify signature shapes via type-name suffix matching: Plan must be `Plan(ctx, []ResourceSpec, []ResourceState) (*IaCPlan, error)` and Apply must be `Apply(ctx, *IaCPlan) (*ApplyResult, error)`. Qualified or unqualified accepted via typeNameTailMatches. Smoke-tested: - `iac-codemod refactor-apply -report-file <path> <dir>` now works (previously: "flag provided but not defined") - DO plugin still reports DOProvider.Plan canonical / Apply upsert-recovery with stable upsertSupporter suggestion (T8.7 baseline preserved) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 13, 2026
* fix(plugin/external): handle empty ConfigMessage for input-only STRICT_PROTO step contracts Steps that declare STRICT_PROTO mode + InputMessage + OutputMessage but no ConfigMessage (e.g., step.eventbus.ack, step.eventbus.publish) failed engine initialization with: STRICT_PROTO contract for config message "" cannot use legacy Struct fallback: missing protobuf message name The step has no per-instance config schema — data flows through the input message. Engine now treats empty ConfigMessage as "no typed config", encodes cfg as legacy *structpb.Struct, returns nil typed payload. Plugin's typed factory reads from InputMessage as designed. Caught by BMW PR #278 image-launch smoke against v0.51.3 + eventbus v0.3.0 (steps.eventbus.{ack,publish,consume} have empty ConfigMessage). Test: TestCreateTypedConfigRequestEmptyConfigMessageStrictProto. * fix: address Copilot review — comment scope + test asserts both nil + non-nil cfg paths * docs(#617): design for godo removal from workflow core Force-cutover single-PR plan: delete 11 legacy DO modules+steps (~3042 LOC), strip 8 registration sites, remove godo from go.mod, add load-time migration error pointing to workflow-plugin-digitalocean + infra.* IaC types. AWS SDK audit deferred to follow-up issue (will auto-progress after merge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise design per adversarial review cycle 1 C-1 fix: add step-type migration guard (5 step.do_* types) alongside the module-type guard; error message branches on plugin-loaded detection. I-1 fix: parity matrix split into per-step rows; step.do_logs and step.do_scale flagged as GAPs with pre-merge follow-up issues in workflow-plugin-digitalocean. I-2 fix: migration error has two branches — 'install plugin' vs 'config-only issue, plugin already loaded'. Minors: exact grep invocation in T4; dns.go typo; infra_apply_test.go:1990 added to T2 review list. Companion: wfctl modernize rules in scope of T5 (auto-rewrite YAML). Considered approaches: added Option B' (build tag fence — rejected). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise design per adversarial review cycle 2 I-1: platform_doks_test.go (164 LOC) added to deletion inventory. Total now 12 files / ~3206 LOC; T1 scope updated. m-1: wfctl modernize flag corrected (--apply, not --write). m-2: example/ sub-module go.mod also pins godo as indirect; T4 now runs go mod tidy in both root and example/, plus a second grep over go.mod files to catch residual indirect dependencies. Cycle-1 fixes verified to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): incorporate adversarial cycle 3 minor amendments (PASS) m-1: grep gates now !-prefixed to fail CI on match (|| true was silent no-op). m-2: plugin-loaded detection simplified to single factory-map lookup. m-3: workflow-scenarios migration sequencing constraint added. t-1: T2 file count 9→10. Cycle 3 verdict PASS (0 Critical / 0 Important / 3 Minor incorporated). Pipeline advances to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): implementation plan (5 tasks, 1 PR) Single-PR force-cutover, 5 tasks: T1: delete 12 legacy DO files (~3206 LOC) T2: strip 10 registration sites + remap wfctl detection hooks T3: add legacy-type migration error guards (module + step paths) T4: go mod tidy + CI grep gate T5: docs + CHANGELOG + migration guide + wfctl modernize rules + file follow-up issues in workflow-plugin-digitalocean (logs/scale GAPs) and workflow (AWS audit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise plan per adversarial review cycle 1 (plan phase) C-1 fix: T3 engine test uses NewStdEngine(app, logger) + AddModuleType() per engine.go:146,210; package workflow. C-2 fix: T3 step test uses module.NewStepRegistry().Create() per pipeline_step_registry.go:18,32. I-1 fix: T2 test calls KnownModuleTypes() / KnownStepTypes() directly (invented buildTypeRegistry() was never a thing). I-2 fix: iacProviderLoaded is now sync/atomic.Bool with IsIaCProviderLoaded() accessor — eliminates race with parallel tests under go test -race. I-3 fix: gap-type modernize test covers all 3 gap types (do_logs, do_scale, do_networking) — previously only first two. m-1: acknowledged walkTypeNodes vs walkNodes duplication; documented intent. m-2 fix: module.RemovedInVersion constant; no more v0.52.0 sprinkled in 7+ places. m-3 fix: modernize/testdata/legacy-do-config.{,.expected}.yaml committed; end-of-PR checklist points at it. End-of-PR checklist: added mandatory `go test -race ./...`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise plan per adversarial review cycle 2 (plan phase) C-1 fix (scope-limit Option 2): modernize Fix only renames type:, does NOT inject config.provider:digitalocean. Migration guide now has explicit manual provider-add step + example YAML + error string user will see. C-2 fix: cmd/wfctl/deploy.go added to T2 (platform.* prefix collector + "no platform.* modules" error message — both updated to include infra.*). I-1: newTestEngine intentional plugin omission documented. I-2: T5 includes comment-hygiene cleanup for hasPlatformModules / isInfraType. m-1 fix: newTestEngine uses mockLogger{} matching engine_test.go pattern. m-2 fix: legacyDORemovedInVersion duplicated in modernize package (import cycle prevents shared constant) with keep-in-sync comment. m-3 fix: AWS issue body now derives in-scope list from a runtime grep rather than copying speculative names. Cycle 1 plan-phase fixes verified to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise plan per adversarial review cycle 3 (plan phase) C-1 fix: drop redeclared mockLogger from engine_legacy_do_migration_test.go; reuse the existing in-package type from engine_test.go:482. C-2 fix: drop legacyDORemovedInVersion duplicate; no import cycle exists (verified via go list). modernize now imports module and uses module.RemovedInVersion directly. Single source of truth. I-1 fix: add TestLegacyDOStepError_PluginLoaded (was missing — only not-loaded branch was tested for steps). m-1 fix: actions/checkout@v5 → @v4 (repo standard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise plan per adversarial review cycle 4 (plan phase) C-1 fix: extract shared constants/formatters to internal/legacydo leaf package. Earlier cycle's "no import cycle" claim was wrong: module→plugin→modernize is a real transitive chain (verified via go list -deps). modernize cannot import module. Both packages now import only the leaf legacydo package. I-1 fix: replace package-level atomic.Bool iacProviderLoaded global with StepRegistry instance field. Per-registry state; parallel tests can own fresh NewStepRegistry() instances; no global mutation between tests. Engine sets the field via r.SetIaCProviderLoaded(loaded) just before pipeline construction. I-2 fix: design doc drops the credential-registry-zero-DO-entries test (unimplementable — credentialResolvers is unexported). Rationale: registry is additive via init(); deleting file removes init() — self- evidencing. No API-surface-for-test added. m-1 fix: T2 spec includes rename of platformModules local variable to deployTargetModules in cmd/wfctl/deploy.go. Cycle 1/2/3 plan-phase fixes verified to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise plan per adversarial review cycle 5 (plan phase) C-1 fix: schema.ValidateConfig fires at engine.go:400 BEFORE the factory loop at :506. Removing legacy DO types from schema/schema.go alone would cause the generic schema error to mask the actionable migration message. T3 now appends legacydo.ModuleTypes + StepTypes to schema.WithExtra{Module,Step}Types so schema passes them through to the factory guard — the real rejection point. I-1 fix: e.stepRegistry is interfaces.StepRegistrar; SetIaCProviderLoaded is not on the interface. Plan now uses the type-assertion pattern from engine.go:163,216 (matches precedent; interface NOT widened). I-2 fix: stale "T3 introduces a package-level atomic" comment in the end-of-PR checklist updated to reflect the per-registry instance field. m-1 fix: legacyDORule() unexported (matches peers); test in internal package modernize (matches sibling test files); external modernize import dropped. Cycle 1-4 plan-phase fixes verified to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise plan per adversarial review cycle 6 (plan phase) C-1 fix (two parts): - Phantom schema.WithExtraStepTypes: schema.ValidateConfig only checks module types, not step types. Step migration guard at StepRegistry.Create is correctly the sole gate. Step-types schema-injection sentence/loop deleted from T3. - wfctl validate path: cmd/wfctl/validate.go and ci_validate.go call schema.ValidateConfig directly (not via engine.BuildFromConfig). Without a hook, AC3 fails on these commands. T2 now includes both files: inject legacydo.ModuleTypes into opts + add post-ValidateConfig legacy sweep emitting legacydo.Format{Module,Step}Error. I-1 fix: `if len(...) > 0 || true` replaced with unconditional code (staticcheck SA4010 was a CI lint blocker). m-1: cycle-5 history line referenced the now-removed step-types injection; implicit fix via T3 edit. Cycle 1-5 fixes verified to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): revise plan per adversarial review cycle 7 (plan phase) C-1 fix: validate/ci_validate post-pass step sweep was incorrect — cfg.Pipelines is map[string]any (verified config/config.go:149), not a typed slice. T2 now uses yaml.Marshal/Unmarshal pattern matching engine.go configurePipelines. Also separates ciValidateFile's accumulating errs=append from validateFile's early-return. I-1 fix: added TestValidateFile_LegacyDOModule_ReturnsActionableError and TestCIValidateFile_LegacyDOStep_ReturnsActionableError to T2 to give AC3 automated coverage on the validate path. Cycle 1-6 fixes verified to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#617): convert task headings to H3 for scope-manifest check plan-scope-check.sh requires "### Task N:" headings (H3); plan was using H2. PR Grouping rows reference Task 1-5 and the body must match. Now passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: lock scope for issue #617 godo removal (alignment passed) * feat(#617): delete legacy DO modules (godo importers) Removes 12 files / ~3206 LOC. Registration sites cleaned in T2. * platform_do_app.go + test * platform_do_database.go + test * platform_do_dns.go + test * platform_do_networking.go + test * platform_doks.go + test * cloud_account_do.go (DO credential resolvers + doClient()) * pipeline_step_do.go (5 DO App Platform step types) Adds godo_absent_test.go as a regression gate inside module/. * feat(#617): strip DO registration sites + remap wfctl detection hooks * plugins/platform: drop 5 module + 5 step factories and manifest entries. * schema/*: drop 10 entries from module/step type lists + schema descriptions. Update editor-schemas.golden.json to match. * cmd/wfctl/type_registry.go: drop 10 legacy DO type entries. * cmd/wfctl/{infra.go,deploy_providers.go,ci_run_dryrun.go}: remap isContainerType and deployTargetTypes to remove platform.do_app. * cmd/wfctl/deploy.go: extend prefix check to include infra.* + rename platformModules → deployTargetModules + update error message. * module/multi_region.go: rewrite DOKS multi-region hint to point at infra.k8s_cluster + workflow-plugin-digitalocean. * cmd/wfctl/infra_apply_test.go: replace platform.do_app negative-test fixture with example.legacy_unknown synthetic type. * cmd/wfctl/{validate.go,ci_validate.go}: inject legacydo.ModuleTypes into schema opts + post-ValidateConfig sweep emits actionable migration errors. * cmd/wfctl/deploy_test.go: update error message assertion. Creates internal/legacydo/types.go (leaf package — stdlib only) with the legacy-DO type maps and message formatters needed by T3's engine/step-registry guards and this task's wfctl validate edits. Adds legacy_do_types_removed_test.go (registry-absence regression gate) + TestValidateFile_LegacyDOModule_ReturnsActionableError and TestCIValidateFile_LegacyDOStep_ReturnsActionableError (validate-path AC3). * feat(#617): actionable migration errors for legacy DO types Adds legacydo.FormatModuleError + legacydo.FormatStepError (already in internal/legacydo from T2) and wires them into two rejection points: engine.go:508 (module path) — factory-loop guard now emits the actionable migration error for the 5 removed legacy DO module types, branching on whether iac.provider is already registered in the engine. pipeline_step_registry.go:Create (step path) — unknown-step guard now emits the actionable migration error for the 5 removed legacy DO step types, using the per-registry iacProviderLoaded field set via SetIaCProviderLoaded before pipeline construction. engine.go:393-398 — guarded WithExtraModuleTypes block replaced with unconditional injection that also includes legacydo.ModuleTypes so that schema.ValidateConfig passes legacy DO module types through to the factory-loop guard (schema rejection would mask the migration message). SetIaCProviderLoaded bridges the boolean from engine to module package via type assertion (interface deliberately NOT widened — no method burden on alternate StepRegistrar implementors). Each step type gets a per-step message; step.do_logs and step.do_scale carry GAP messages with workarounds because no 1:1 pipeline-step successor exists yet (follow-up issues in T5). Tests: 5 module × 2 branches + 5 step × 2 branches = 12 sub-cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(#617): drop godo from go.mod + add CI grep gate * go mod tidy on root and example/ drops github.com/digitalocean/godo (direct from root, indirect from example/). * New CI job 'godo-banned' fails the build on any *.go import of godo OR any mention of godo in go.mod files. Excludes _worktrees, .worktrees, .claude (local agent state) and godo_absent_test.go (T1 regression gate that references the import path as a string literal, not an actual import). This satisfies acceptance criterion #4 (dependabot bumps target the provider repo, not workflow core). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(#617): wfctl modernize rule + migration guide + CHANGELOG * New modernize rule "legacy-do-types": auto-rewrites 5 module types and 3 of 5 step types to infra.*; flags but does not modify the two GAP step types (step.do_logs, step.do_scale) and the 1→2 platform.do_networking split. Registered in AllRules(). * testdata/legacy-do-config.yaml: smoke-test fixture exercising all 10 legacy types; testdata/legacy-do-config.expected.yaml: golden post-Fix output (types renamed, GAP types preserved, provider NOT auto-injected). * CHANGELOG: v0.52.0 BREAKING entry. * docs/migrations/v0.52.0-godo-removal.md: full migration guide with mapping tables, before/after YAML, error reference, rollback note. workflow-plugin-digitalocean follow-up issue URLs wired in: step.do_logs GAP → GoCodeAlone/workflow-plugin-digitalocean#107 step.do_scale GAP → GoCodeAlone/workflow-plugin-digitalocean#108 * DOCUMENTATION.md: replace 10 legacy DO rows with pointers to the plugin and the migration guide. * Comment hygiene: drop "legacy" framing from hasPlatformModules and parseInfraResourceSpecs doc comments (both functions correctly handle the surviving platform.kubernetes / platform.ecs module types). Follow-up issues filed: GoCodeAlone/workflow-plugin-digitalocean#107 — step.iac_logs GAP GoCodeAlone/workflow-plugin-digitalocean#108 — step.iac_scale GAP #653 — AWS SDK audit (continuation of #617) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(#617): bump modernize rule-count expectation to include legacy-do-types T5 appended legacyDORule (id: legacy-do-types) to AllRules() but missed this counter test in cmd/wfctl/modernize_test.go. Single-line fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#617): make legacy DO step modernize findings non-fixable; fix migration guide step example step.do_deploy/status/destroy require different config keys in their successors (platform + state_store vs legacy app:) — auto-rewriting the type alone produces an invalid config. Mark step findings Fixable: false, remove step rewrites from Fix(), update testdata fixture and tests to reflect unchanged step types post-modernize. Also update FormatStepError to include required config keys in the migration error message, and fix the migration guide pipeline-step example to show the correct step.iac_apply config shape. Addresses Copilot review comments: - r3232996570: make step findings non-fixable (option a) - r3232996648: fix migration guide step example config shape - r3232996683: fix testdata fixture to leave step types unchanged - r3232996732: add required config keys to step migration error Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(#617): update migration guide step table to reflect non-fixable step rewrites step.do_deploy/status/destroy are now flagged-not-rewritten (Fixable: false) because their successors use different config keys. Update the step mapping table Auto-fix column and the recipe description to match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): add return after t.Fatal to resolve SA5011 nil-dereference false positives staticcheck SA5011 flags t.Fatal()/t.Fatalf() as non-terminating because testing.T.Fatal calls runtime.Goexit (not a panic/return), which staticcheck does not model as a definite exit. Adding an unreachable `return` statement after each t.Fatal in iac/conformance scenarios makes the nil-guard pattern unambiguous to static analysis: execution cannot reach the pointer dereference if result/res is nil. Affected files: - iac/conformance/scenario_delete_action.go - iac/conformance/scenario_grpc_roundtrip.go - iac/conformance/scenario_replace_cascade_preserves_dependents.go - iac/conformance/scenario_upsert_on_already_exists.go Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address Copilot review round 2 — doc YAML shape + test coverage gap Two corrections from Copilot's second-pass review: 1. docs/migrations/v0.52.0-godo-removal.md: plugin install snippet used a top-level `plugins:` sequence with `source:` which does not match the app config schema (ExternalPluginDecl has no source field; PluginsConfig wraps external under plugins.external:). Replace with the correct `wfctl plugin install` CLI command + wfctl.yaml manifest form (WfctlPluginEntry has source). 2. module/godo_absent_test.go: `filepath.Glob("*.go")` is non-recursive and only checks the current directory, not subdirectories. The comment claimed it covered "no file under module/", which was misleading. Switch to `filepath.WalkDir(".", ...)` to make the assertion match the comment's intent and guard against future subdirectory additions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 14, 2026
…ipt CI gate (+ design & plan) (#668) * docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and cloud.google.com/go + google.golang.org/api direct deps from workflow core's module/ package. Architecture: 3 extension surfaces, 3 strategies: - IaC state backends → new IaCStateBackend strict proto contract; iac.state stays core, config.backend dispatches to plugin gRPC client. - platform.* provisioners → new PlatformBackend strict proto contract; module types + provider: key stay core, kind backend stays in-core, cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract. - standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload, storage.s3, storage.gcs) → plugin-native module/step types via the existing ModuleFactories/StepFactories SDK — no new contract. Credentials (Option 1): each plugin-native module carries its own credentials: block + builds aws.Config in-process; optional in-plugin credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp cloud_account files have no SDK import and stay. 4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp, D digitalocean (spaces backend, minor bump + migration doc). Includes Assumptions + Rollback sections + self-challenge top-3 doubts (PlatformBackend over-generality, provider-separability fragility, benchmark-could-invalidate-unary-default — all with mitigations deferred to writing-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions Addresses 2 Critical + 5 Important findings from adversarial-design-review: Critical: - iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit home: deleted by Phase B's core PR; Phase D reframed from soft-compat to a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2 entirely" is now actually achieved by the phases as written + enforced by a go list -deps CI gate. - kinesis: added Non-Goals entry explaining it's a transitive dep of modular/modules/eventbus/v2, not a direct workflow import — out of scope, with the go mod why chain documented so the literal ask is fully answered. Important: - Full grep-verified 13-file AWS inventory table in Phase B with per-file destinations; reconciled aws_api_gateway.go (route-sync module) vs platform_apigateway.go (provisioner) as two distinct files. - aksBackend assigned to Phase A (Azure gets the PlatformBackend half too); platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with explicit always-compiles coordination. - Proto contracts fold into existing plugin/external/proto/iac.proto (8 services already) instead of new files — matches precedent. - New Security section: secret-redaction in config-version-store/tracing + gRPC interceptor logging are blocking writing-plans tasks; credentials_ref blast radius documented as strictly narrower than today's cloud.account. Minor: - IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no speculative surface. - Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR). - IaCProviderRequired/ResourceDriver reuse promoted to a first-class Alternatives Considered entry with accept/reject rationale + retained as the gated fallback for PlatformBackend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions Addresses 2 Critical + 3 Important from cycle-2 review: Critical: - platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with its own import block). The "always compiles across phases" property is now structural, not asserted. Added a verified per-file import-ownership table. - Corrected the false Phase A rationale: aksBackend uses raw net/http REST, NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend region). The Azure go.mod drop comes entirely from iac_state_azure.go deletion + iac_module.go edit; aksBackend extraction is code-organisation, not a dependency change. - Documented the eksBackend → cloud_account_aws.go call-graph edge as a hard same-commit atomicity constraint (verified: eksBackend calls awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138). Important: - Phase B core-PR bullet now explicitly lists "strip the spaces case from iac_module.go" (was only obliquely referenced). - New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds contract field; SaveState lost-response retry → documented idempotent (full-state replace, last-writer-wins); plugin-unreachable → abort before mutation; PlatformBackend mid-Apply crash → identical to today's in-process risk, no new mitigation. - §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no body-logging interceptor (grpc.NewServer(opts...) passthrough; only callback_server.go logs, never module config). Writing-plans adds a guard test instead of a conditional interceptor. Minor: file-count table footnoted (count = importers, not deletions); shared s3compat module added as Alternatives Considered #3 (deferred, not rejected); self-challenge doubt numbering tidied (2 mitigations cover 3 doubts, intentionally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): fix stale Phase A/B refs + Status line post-cycle-2 sed in the cycle-2 commit ran from the wrong cwd — Status line still said "cycle 1" and two interface-audit-spike references still said "Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions Addresses 2 Critical + 2 Important from cycle-3 review: Critical (same root — symbol-level coupling the import-block audit missed): - parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers the plugin-bound backend files call. An import-block audit is symbol-blind. Fix: Phase 0 now does TWO moves — the file split AND relocating both helpers into a new SDK-free core module/cloud_helpers.go. Per-file table gains a "cross-file symbol deps (the trap)" column listing every helper edge per backend. Phase 0 acceptance criteria now include a grep that no core file references the helpers from their old homes. - §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file (module shell + kubernetesBackend interface + safeIntToInt32) — NOT touched by the split; only platform_kubernetes_kind.go (holds all 4 backends) is split. Earlier draft conflated the two files. Important: - Per-file ownership table relabelled "intended post-split — verified by the Phase 0 build gate" (was asserted-as-verified against an unsplit file — same hand-waving class cycle-2 flagged for "always compiles"). - lease_ttl_seconds DROPPED from the Phase A proto. It was a contract field with no enforced semantics and no implementing backend in scope — YAGNI. §Failure-modes orphaned-lock reworked: documented limitation + operator-side lock-object delete for recovery; TTL is a planned ADDITIVE follow-up paired with a conformance test, shipped with the first backend that honors expiry. Added explicit Lock-contention behavior (immediate error, matches today's in-process IaCStateStore.Lock — no new waiting state). Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence fixed; §Assumptions 2 notes Phase 0 de-risks it structurally. Also: removed a stray stale cycle-1 copy of this doc that was sitting untracked in the main workflow checkout (the canonical doc is here in the feat/cloud-sdk-extraction worktree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions Addresses 2 Critical + 2 Important from cycle-4 review: Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't — that's eksBackend) and aksBackend has no cross-file deps (it does — CloudCredentials/CloudCredentialProvider from cloud_account.go, same as gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The symbol-ownership map is now a Phase 0 build artifact — scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a design-doc claim that rots on every edit. The design commits to the *method* + the *known shape* (cloud_account.go stays core; all 3 cloud backends bind to it via k.provider.GetCredentials; eksBackend additionally binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no cloud SDK). Critical 2 — Phase 0's "split into four, zero logic change" silently dropped the single func init() that registers kind/k3s/eks/gke/aks. Splitting REQUIRES partitioning init() per-file (a distribution, not zero-change). Phase 0 now has an explicit step 2 for the init() partition; relabelled "behavior-equivalent" not "zero logic change"; k3s documented as reusing kindBackend (both stay core). Important 1 — platform.* cloud credential flow across PlatformBackend was unspecified (aksBackend needs CloudCredentials — how does it reach the plugin?). Added: PlatformBackend requests carry a CloudCredentials proto message; engine resolves k.provider.GetCredentials() in-core (config-map parsing, no SDK) and serialises it. Unified with the Architecture-3 credentials story — ONE CloudCredentials proto shape for both surfaces, so secret-redaction has one shape to redact. Important 2 — core actually imports FOUR cloud SDK trees, not three: godo is still in cloud_account_do.go + 5 platform_do_*.go files. §Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT (user's ask was 3 trees), and the go list -deps gate is reworded to assert "zero packages from the three in-scope trees" not "zero cloud SDKs". All "zero cloud SDKs" phrasing reconciled throughout. Minor: ListStates filter + remaining-proto-messages notes folded in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions Addresses 1 Critical + 2 Important from cycle-5 review: Critical — the init()-partition fix (cycle-4) was kubernetes-only, but the SAME defect class exists in platform_dns.go / platform_ecs.go / platform_networking.go / platform_autoscaling.go: each has a single func init() registering BOTH a core-staying `mock` backend AND a plugin-bound `aws` backend. The old Phase B inventory moved those files wholesale → would exile the mock backends + dangle the route53 registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go" to a repo-wide uniform `_core.go` / `_<provider>.go` convention across the WHOLE platform.* family. Every mixed init() is partitioned; the audit script flags any init() registering a mix of core-staying + plugin-bound factories as a CI failure. Phase B inventory rewritten to delete only `_aws.go`/`_eks.go` files, never a mixed file. Important 1 — the cycle-4 "Known shape" prose reintroduced hand-maintained cross-file symbol claims (one already incomplete: parseStringSlice consumers). FIX: cut all per-file symbol enumerations; the section now states only invariants the script VERIFIES (not discovers) + the method. No transcribed symbol lists remain. Important 2 + own finding — cycle-4 said the engine resolves credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile) and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK. FIX: §Architecture-2 corrected — engine passes the DECLARED credential config (plain strings) in the CloudCredentials proto; the PLUGIN resolves (incl. the SDK-bearing profile/role_arn paths). Both cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B, no core replacement — all AWS cred resolution moves plugin-side. azure/gcp resolver files stay (their resolvers are genuinely SDK-free). Minor — backend-name collision: core-reserved names (memory/filesystem/ postgres/kind/k3s/mock) cause a load-time error if a plugin collides, not silent shadowing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions Addresses 1 Critical + 2 Important from cycle-6 review: Critical — cycle-5's credential-flow fix replaced one false claim with another: it said the CloudCredentials struct already holds "declared config (plain strings incl. profile)". VERIFIED FALSE — the struct (cloud_account.go:18) has no Profile field (profile lives in Extra map) and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner than the struct change the reviewer proposed: the struct needs NO change (Extra map already carries markers, RoleARN field exists). Instead, cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig, sts.AssumeRole) are removed; they keep their SDK-free heads (record declared inputs + an Extra["credential_source"] marker, exactly as awsStaticResolver already does). After the edit the file is SDK-free and stays in core alongside the azure/gcp resolver files. Only cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider + awsProviderFrom) is deleted; its profile-chain/STS logic moves into the plugin's buildAWSConfig. Every in-core resolver becomes uniformly "declare, don't resolve"; the plugin honors the markers. No unregistered- resolver failure mode — the resolver init() registrations stay. Important 1 — §Phase-0 misidentified the DNS file with the mixed init(). VERIFIED: platform_dns.go:66 has the init() (+ interface + factory registry); platform_dns_backends.go has both impls + the route53 SDK import, NO init(). DNS is a TWO-file split, unlike single-file ecs/networking/autoscaling. §Phase-0 now states the per-family layout explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling one-file) and notes the audit script determines it. Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit deferred-resolution markers for env/CLI/managed-identity/workload-identity/ profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5 now state the plugin MUST implement marker handling for every deferred type, not just AWS profile/role_arn. Minor — safeIntToInt32 relocation rationale clarified (it's a clean copy-source for the plugin-bound files, not a hard core necessity); parseStringSlice IS a hard necessity (its file is deleted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions Addresses 2 Critical from cycle-7 review (architecture confirmed sound; these are the last two extraction-mechanic precision gaps): C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a clean contiguous tail, but awsRoleARNResolver's SDK block (base-config build + sts.AssumeRole, ~45 lines) is the larger half of the method, after the declared-input recording. FIX: §Architecture-2 re-characterizes the edit as a deliberate Resolve() body REWRITE (not a one-line snip) — explicitly per-resolver. Added a Phase B CI invariant: an import-block grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not prose-asserted. C2 — cloud_account_aws.go defines FOUR symbols, not one; the symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed: - AWSConfigProvider interface signature names aws.Config → CANNOT stay in core, deleted with the file. - awsProviderFrom → deleted with the interface. - ValidateCredentials → verified NO real caller (only a comment ref in cmd/wfctl/deploy.go:866) → deletes cleanly. - The 8 awsProviderFrom consumers are all verified plugin-bound — but each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the plugin there's no cloud.account to type-assert. §Cross-file-coupling invariant 3 now states Phase B must REWRITE all 8 consumers to obtain creds from the CloudCredentials proto + buildAWSConfig — explicit Phase B scope, not a footnote. Phase B table atomicity column updated. Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in Phase 0 so the dns family conforms to the uniform _core.go/_aws.go naming; no special-case three-file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main Cycle-8 adversarial review caught the design's file/symbol inventory as stale: it predated issue #653 (closed 2026-05-13), which already removed the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild + EKS backends. Re-baselined every file/symbol claim against origin/main HEAD (worktree confirmed 0 commits behind origin/main): - Added "Relationship to issue #653" section — this design is #653's named successor, extracting the AWS surface #653 scoped out ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces. - Problem table corrected: AWS 6 real-import files (not 13), Azure 3, GCP 3. storage_artifact_s3.go is comment-only — stays in core. - cloud_account_aws.go is dead code — zero non-test consumers verified; deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers removed by #653). - Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go); parseStringSlice + safeIntToInt32 no longer exist — helper-relocation task deleted. - PlatformBackend now serves only aks + gke (eks already a #653 SDK-free stub); interface-audit spike audits one interface, not five. - Phase B inventory rewritten; Phase A/C file lists corrected. - Self-challenge doubt #4 + Assumption 7 added: inventory staleness is the cycle-8 defect class; audit script makes it CI-enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script Cycle-9 adversarial review caught aksBackend mis-classified as an azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match is a stale doc comment (line 332) — aksBackend.azureToken is a plain net/http OAuth2 client. An import-block-disciplined re-survey found a second comment-only false positive: nosql_dynamodb.go. Structural fix for the recurring "grep matched a comment" defect class: added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks (never comments) and emits the comment-immune real-import map. Its output now populates every file table in the design — prose claims replaced by a build artifact. Formalized + CI-wired in Phase 0. Corrected inventory (audit-script output): AWS 5 real-import files (not 6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go are comment-only stubs — out of scope, stay in core. Design consequences of aksBackend being SDK-free: - Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all stay in core. - Architecture §2 no longer proposes a new PlatformBackend contract. The gke cross-process mechanism is gated on an interface-audit spike whose preferred outcome is folding into the existing ResourceDriver contract — a dedicated contract for one backend is YAGNI. - Phase A (Azure) is now pure IaCStateBackend — touches no platform file. - Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/ eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and fixes the stale line-332 comment. - The gke platform extraction + its contract decision move to Phase C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9 audit script scanned only module/, missing five aws-sdk-go-v2 importers under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal ("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by the four phases as written. Structural fix — third defect-class variant closed: - audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and splits results module/ vs. elsewhere. Comment-immune (cycle 9) + scope-complete (cycle 10) + CI-enforced (Phase 0). Whole-repo inventory result: - Azure + GCP SDK usage is entirely module/-resident → Phases A and C drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate). - aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go. Scope decision: the out-of-module/ AWS surface is exactly #653's deliberately-retained "RBAC/secrets/artifact stay" scope (plus the provider/aws deploy provider). This design does NOT unilaterally override #653's recent documented decision — it scopes that surface OUT (new Non-Goal, parallel to godo) and logs a recommended successor issue. Consequences threaded through the doc: - Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the out-of-scope surface). - Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP, module/-scoped zero for AWS. - Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's scope decision is respected, not contested. - Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check + sessionName extraction sit between declared-input recording and the SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale line-18 backend-list comment), M2 (internal/legacyaws noted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups Adversarial review cycle 11: PASS (zero Critical, zero Important). Two Minor nits applied: - audit-cloud-symbols.sh: real_import now also matches single-line `import "..."` form, not just parenthesized blocks — closes the one latent parser false-negative the reviewer flagged. - §Goals: clarified that the module/-scoped AWS-zero `--check` assertion is deferred-implementation added in Phase C (the committed script only enforces the cloud_account_aws_creds.go post-Phase-B invariant today), parallel to the Phase 0 init()-partition deferral. Design phase complete — proceeding to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe The cycle-11 single-line-import hardening added an inner `grep -E '^import "'` whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`, making real_import() return false for every file lacking a single-line import. Added `|| true` on the inner grep. Verified: full report restored, all REAL/comment-only classifications correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A) Bite-sized TDD plan for the first executable increment: Phase 0 (split platform_kubernetes_kind.go, fix the stale comment, wire the audit script into CI) + Phase A (IaCStateBackend proto + benchmark-gated proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging guard, workflow-plugin-azure implementation, core deletion dropping azure-sdk from go.mod). 14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on plan — their concrete tasks depend on Phase A's outputs (the benchmark-validated proto shape, the host-resolution pattern, the plugin-side serve path), so planning them now would be fiction. The design doc remains the authoritative B/C/D spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor). All addressed: - C1 (Critical): Task 4's proto used google.protobuf.Struct, which iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config as `bytes outputs_json`/`bytes config_json` (the established ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not structpb. Removed the bogus struct.proto import step. - I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives there), not `cd plugin/external/proto`. - I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up the new benchmarks automatically) — no redundant harness; clarified the task is a one-time decision gate. - I3: Task 8's embedded research spike resolved at plan time — engine.go was read; integration is the design-sanctioned package-level module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal. Tasks 8/13/14 now have concrete file sets. - I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo, workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly. - M1: Task 5's benchmark file is now genuinely self-contained (local benchStateToProto + benchStateBackendServer; no forward references). - M2: Task 3 names ci.yml directly, places the audit job beside the existing godo-banned/aws-sdk-banned grep-gate jobs. - M3: Task 6 pins benchstat (go install + bare invocation). - M4: Task 9 states the redaction gap is verified against step_output_redactor.go:7-19, not a live deduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed resolved; 2 new Important + 2 Minor surfaced by the new-defect scan. All addressed: - I-A (Important): Task 9's redaction test was inconsistent with the actual redactMap behavior — a key named `credentials` matches the existing `credential` pattern and is wholesale-replaced with the placeholder STRING before any recursion, so the test's `.(map[string]any)` assertion panicked. Reworked Task 9: the `credentials:` block is ALREADY redacted wholesale (regression-tested); the real gap is `credentials_ref` being over-redacted (it's a module name, not a secret) — fix is a narrow `*_ref`-suffix exemption in isSensitiveField, not camelCase leaf patterns (which would be dead code given wholesale redaction happens first). - I-B (Important): Task 14's engine.go integration seam was under-specified and would fight loadPluginInternal's no-concrete-types precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now defines an `IaCStateBackendProvider` optional interface and type-asserts it in loadPluginInternal exactly like the existing stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter implements it. Concrete file set + code sketch added. - M-i: Task 6's benchmark.yml description corrected (runs `go test -bench=.` inline, not `make bench-baseline`). - M-ii: Task 4 notes the proto README's plugin.proto-specific wording is stale; trust root buf.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups Plan-phase adversarial review cycle 3: PASS (zero Critical, zero Important). Two Minor doc-tightening fixes applied: - Task 9 Step 4 now names bearer_token_ref explicitly and explains why the *_ref exemption is safe for it (SecretRef is a reference struct, not a raw secret) rather than claiming no *_ref field exists. - engine.go line citations corrected to 311-326. Plan phase complete — proceeding to alignment-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: lock scope for cloud-sdk-extraction (alignment passed) * refactor(module): split platform_kubernetes_kind.go into _core + _gke Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/ aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core init(); gkeBackend (the only SDK-bearing k8s backend) moves to platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same five backend names registered. Isolates the lone SDK-bearing platform file for a later clean deletion. * docs(module): add file-purpose headers to platform_kubernetes _core/_gke Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition self-documenting for readers without the commit message. * docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk consumer. The stale comment is what fooled an earlier inventory pass into mis-counting platform_kubernetes_kind.go as an azure-sdk importer. * ci(audit): enforce k8s-backend init() partition + run audit on every PR Extends audit-cloud-symbols.sh --check with an init()-partition assertion (platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned / aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced artifact rather than a prose claim. * docs(plans): IaCStateBackend transport benchmark result — decision pending Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar. Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB map[string]any (inherent to the bytes outputs_json wire format the iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB message cap. The plan's contingency remedy (streaming redesign) addresses message-size-cap + memory-buffering, neither of which the benchmark hits; streaming would not move the number. Recommendation: retain unary (6.5ms is still negligible vs real cloud backend I/O — the design's own bar-rationale). Deviation from the literal 5ms estimate-bar is surfaced to the operator, not absorbed silently. Scope lock intact: Task 6 run + recorded, no task added/dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed) Operator reviewed the 6.51ms benchmark + root-cause analysis and confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's gate resolves Unary LOCKED — the Task 4 proto stands, no streaming redesign, PR 2/3 proceed unchanged. Operator additionally raised a long-term architectural item: IaC state is persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR) with JSON-export + content-detection-on-read would be better for processing/type-correctness/large-state scaling. Logged as a post-extraction follow-up in both the benchmark decision record and the design doc's Open items — distinct from the wire contract, cross-cutting across all IaCStateStore impls, needs its own brainstorming pass. Not actioned in this locked plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)" This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954. * docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4 Operator-approved scope amendment to the (reverted-to-Draft) plan: - ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 / Task 15. Task 7 had to hardcode context.Background() in grpcIaCStateStore; the operator directed widening the interface now while we're at that boundary, so Phase B/C/D plugin backends inherit it ctx-ful. Bounded blast radius (~9 files, all in module/); interfaces.IaCStateStore already had ctx and is untouched. - ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo. Operator: agents should operate in plugin repos directly; the real requirement is prompt clarity (absolute repo path stated up front), not a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes updated accordingly. - Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note updated to RESOLVED (unary locked). Next: re-run alignment-check on the amended plan, then re-lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 14, 2026
…roto-lock (#669) * docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and cloud.google.com/go + google.golang.org/api direct deps from workflow core's module/ package. Architecture: 3 extension surfaces, 3 strategies: - IaC state backends → new IaCStateBackend strict proto contract; iac.state stays core, config.backend dispatches to plugin gRPC client. - platform.* provisioners → new PlatformBackend strict proto contract; module types + provider: key stay core, kind backend stays in-core, cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract. - standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload, storage.s3, storage.gcs) → plugin-native module/step types via the existing ModuleFactories/StepFactories SDK — no new contract. Credentials (Option 1): each plugin-native module carries its own credentials: block + builds aws.Config in-process; optional in-plugin credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp cloud_account files have no SDK import and stay. 4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp, D digitalocean (spaces backend, minor bump + migration doc). Includes Assumptions + Rollback sections + self-challenge top-3 doubts (PlatformBackend over-generality, provider-separability fragility, benchmark-could-invalidate-unary-default — all with mitigations deferred to writing-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions Addresses 2 Critical + 5 Important findings from adversarial-design-review: Critical: - iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit home: deleted by Phase B's core PR; Phase D reframed from soft-compat to a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2 entirely" is now actually achieved by the phases as written + enforced by a go list -deps CI gate. - kinesis: added Non-Goals entry explaining it's a transitive dep of modular/modules/eventbus/v2, not a direct workflow import — out of scope, with the go mod why chain documented so the literal ask is fully answered. Important: - Full grep-verified 13-file AWS inventory table in Phase B with per-file destinations; reconciled aws_api_gateway.go (route-sync module) vs platform_apigateway.go (provisioner) as two distinct files. - aksBackend assigned to Phase A (Azure gets the PlatformBackend half too); platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with explicit always-compiles coordination. - Proto contracts fold into existing plugin/external/proto/iac.proto (8 services already) instead of new files — matches precedent. - New Security section: secret-redaction in config-version-store/tracing + gRPC interceptor logging are blocking writing-plans tasks; credentials_ref blast radius documented as strictly narrower than today's cloud.account. Minor: - IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no speculative surface. - Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR). - IaCProviderRequired/ResourceDriver reuse promoted to a first-class Alternatives Considered entry with accept/reject rationale + retained as the gated fallback for PlatformBackend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions Addresses 2 Critical + 3 Important from cycle-2 review: Critical: - platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with its own import block). The "always compiles across phases" property is now structural, not asserted. Added a verified per-file import-ownership table. - Corrected the false Phase A rationale: aksBackend uses raw net/http REST, NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend region). The Azure go.mod drop comes entirely from iac_state_azure.go deletion + iac_module.go edit; aksBackend extraction is code-organisation, not a dependency change. - Documented the eksBackend → cloud_account_aws.go call-graph edge as a hard same-commit atomicity constraint (verified: eksBackend calls awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138). Important: - Phase B core-PR bullet now explicitly lists "strip the spaces case from iac_module.go" (was only obliquely referenced). - New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds contract field; SaveState lost-response retry → documented idempotent (full-state replace, last-writer-wins); plugin-unreachable → abort before mutation; PlatformBackend mid-Apply crash → identical to today's in-process risk, no new mitigation. - §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no body-logging interceptor (grpc.NewServer(opts...) passthrough; only callback_server.go logs, never module config). Writing-plans adds a guard test instead of a conditional interceptor. Minor: file-count table footnoted (count = importers, not deletions); shared s3compat module added as Alternatives Considered #3 (deferred, not rejected); self-challenge doubt numbering tidied (2 mitigations cover 3 doubts, intentionally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): fix stale Phase A/B refs + Status line post-cycle-2 sed in the cycle-2 commit ran from the wrong cwd — Status line still said "cycle 1" and two interface-audit-spike references still said "Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions Addresses 2 Critical + 2 Important from cycle-3 review: Critical (same root — symbol-level coupling the import-block audit missed): - parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers the plugin-bound backend files call. An import-block audit is symbol-blind. Fix: Phase 0 now does TWO moves — the file split AND relocating both helpers into a new SDK-free core module/cloud_helpers.go. Per-file table gains a "cross-file symbol deps (the trap)" column listing every helper edge per backend. Phase 0 acceptance criteria now include a grep that no core file references the helpers from their old homes. - §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file (module shell + kubernetesBackend interface + safeIntToInt32) — NOT touched by the split; only platform_kubernetes_kind.go (holds all 4 backends) is split. Earlier draft conflated the two files. Important: - Per-file ownership table relabelled "intended post-split — verified by the Phase 0 build gate" (was asserted-as-verified against an unsplit file — same hand-waving class cycle-2 flagged for "always compiles"). - lease_ttl_seconds DROPPED from the Phase A proto. It was a contract field with no enforced semantics and no implementing backend in scope — YAGNI. §Failure-modes orphaned-lock reworked: documented limitation + operator-side lock-object delete for recovery; TTL is a planned ADDITIVE follow-up paired with a conformance test, shipped with the first backend that honors expiry. Added explicit Lock-contention behavior (immediate error, matches today's in-process IaCStateStore.Lock — no new waiting state). Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence fixed; §Assumptions 2 notes Phase 0 de-risks it structurally. Also: removed a stray stale cycle-1 copy of this doc that was sitting untracked in the main workflow checkout (the canonical doc is here in the feat/cloud-sdk-extraction worktree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions Addresses 2 Critical + 2 Important from cycle-4 review: Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't — that's eksBackend) and aksBackend has no cross-file deps (it does — CloudCredentials/CloudCredentialProvider from cloud_account.go, same as gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The symbol-ownership map is now a Phase 0 build artifact — scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a design-doc claim that rots on every edit. The design commits to the *method* + the *known shape* (cloud_account.go stays core; all 3 cloud backends bind to it via k.provider.GetCredentials; eksBackend additionally binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no cloud SDK). Critical 2 — Phase 0's "split into four, zero logic change" silently dropped the single func init() that registers kind/k3s/eks/gke/aks. Splitting REQUIRES partitioning init() per-file (a distribution, not zero-change). Phase 0 now has an explicit step 2 for the init() partition; relabelled "behavior-equivalent" not "zero logic change"; k3s documented as reusing kindBackend (both stay core). Important 1 — platform.* cloud credential flow across PlatformBackend was unspecified (aksBackend needs CloudCredentials — how does it reach the plugin?). Added: PlatformBackend requests carry a CloudCredentials proto message; engine resolves k.provider.GetCredentials() in-core (config-map parsing, no SDK) and serialises it. Unified with the Architecture-3 credentials story — ONE CloudCredentials proto shape for both surfaces, so secret-redaction has one shape to redact. Important 2 — core actually imports FOUR cloud SDK trees, not three: godo is still in cloud_account_do.go + 5 platform_do_*.go files. §Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT (user's ask was 3 trees), and the go list -deps gate is reworded to assert "zero packages from the three in-scope trees" not "zero cloud SDKs". All "zero cloud SDKs" phrasing reconciled throughout. Minor: ListStates filter + remaining-proto-messages notes folded in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions Addresses 1 Critical + 2 Important from cycle-5 review: Critical — the init()-partition fix (cycle-4) was kubernetes-only, but the SAME defect class exists in platform_dns.go / platform_ecs.go / platform_networking.go / platform_autoscaling.go: each has a single func init() registering BOTH a core-staying `mock` backend AND a plugin-bound `aws` backend. The old Phase B inventory moved those files wholesale → would exile the mock backends + dangle the route53 registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go" to a repo-wide uniform `_core.go` / `_<provider>.go` convention across the WHOLE platform.* family. Every mixed init() is partitioned; the audit script flags any init() registering a mix of core-staying + plugin-bound factories as a CI failure. Phase B inventory rewritten to delete only `_aws.go`/`_eks.go` files, never a mixed file. Important 1 — the cycle-4 "Known shape" prose reintroduced hand-maintained cross-file symbol claims (one already incomplete: parseStringSlice consumers). FIX: cut all per-file symbol enumerations; the section now states only invariants the script VERIFIES (not discovers) + the method. No transcribed symbol lists remain. Important 2 + own finding — cycle-4 said the engine resolves credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile) and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK. FIX: §Architecture-2 corrected — engine passes the DECLARED credential config (plain strings) in the CloudCredentials proto; the PLUGIN resolves (incl. the SDK-bearing profile/role_arn paths). Both cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B, no core replacement — all AWS cred resolution moves plugin-side. azure/gcp resolver files stay (their resolvers are genuinely SDK-free). Minor — backend-name collision: core-reserved names (memory/filesystem/ postgres/kind/k3s/mock) cause a load-time error if a plugin collides, not silent shadowing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions Addresses 1 Critical + 2 Important from cycle-6 review: Critical — cycle-5's credential-flow fix replaced one false claim with another: it said the CloudCredentials struct already holds "declared config (plain strings incl. profile)". VERIFIED FALSE — the struct (cloud_account.go:18) has no Profile field (profile lives in Extra map) and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner than the struct change the reviewer proposed: the struct needs NO change (Extra map already carries markers, RoleARN field exists). Instead, cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig, sts.AssumeRole) are removed; they keep their SDK-free heads (record declared inputs + an Extra["credential_source"] marker, exactly as awsStaticResolver already does). After the edit the file is SDK-free and stays in core alongside the azure/gcp resolver files. Only cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider + awsProviderFrom) is deleted; its profile-chain/STS logic moves into the plugin's buildAWSConfig. Every in-core resolver becomes uniformly "declare, don't resolve"; the plugin honors the markers. No unregistered- resolver failure mode — the resolver init() registrations stay. Important 1 — §Phase-0 misidentified the DNS file with the mixed init(). VERIFIED: platform_dns.go:66 has the init() (+ interface + factory registry); platform_dns_backends.go has both impls + the route53 SDK import, NO init(). DNS is a TWO-file split, unlike single-file ecs/networking/autoscaling. §Phase-0 now states the per-family layout explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling one-file) and notes the audit script determines it. Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit deferred-resolution markers for env/CLI/managed-identity/workload-identity/ profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5 now state the plugin MUST implement marker handling for every deferred type, not just AWS profile/role_arn. Minor — safeIntToInt32 relocation rationale clarified (it's a clean copy-source for the plugin-bound files, not a hard core necessity); parseStringSlice IS a hard necessity (its file is deleted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions Addresses 2 Critical from cycle-7 review (architecture confirmed sound; these are the last two extraction-mechanic precision gaps): C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a clean contiguous tail, but awsRoleARNResolver's SDK block (base-config build + sts.AssumeRole, ~45 lines) is the larger half of the method, after the declared-input recording. FIX: §Architecture-2 re-characterizes the edit as a deliberate Resolve() body REWRITE (not a one-line snip) — explicitly per-resolver. Added a Phase B CI invariant: an import-block grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not prose-asserted. C2 — cloud_account_aws.go defines FOUR symbols, not one; the symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed: - AWSConfigProvider interface signature names aws.Config → CANNOT stay in core, deleted with the file. - awsProviderFrom → deleted with the interface. - ValidateCredentials → verified NO real caller (only a comment ref in cmd/wfctl/deploy.go:866) → deletes cleanly. - The 8 awsProviderFrom consumers are all verified plugin-bound — but each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the plugin there's no cloud.account to type-assert. §Cross-file-coupling invariant 3 now states Phase B must REWRITE all 8 consumers to obtain creds from the CloudCredentials proto + buildAWSConfig — explicit Phase B scope, not a footnote. Phase B table atomicity column updated. Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in Phase 0 so the dns family conforms to the uniform _core.go/_aws.go naming; no special-case three-file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main Cycle-8 adversarial review caught the design's file/symbol inventory as stale: it predated issue #653 (closed 2026-05-13), which already removed the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild + EKS backends. Re-baselined every file/symbol claim against origin/main HEAD (worktree confirmed 0 commits behind origin/main): - Added "Relationship to issue #653" section — this design is #653's named successor, extracting the AWS surface #653 scoped out ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces. - Problem table corrected: AWS 6 real-import files (not 13), Azure 3, GCP 3. storage_artifact_s3.go is comment-only — stays in core. - cloud_account_aws.go is dead code — zero non-test consumers verified; deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers removed by #653). - Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go); parseStringSlice + safeIntToInt32 no longer exist — helper-relocation task deleted. - PlatformBackend now serves only aks + gke (eks already a #653 SDK-free stub); interface-audit spike audits one interface, not five. - Phase B inventory rewritten; Phase A/C file lists corrected. - Self-challenge doubt #4 + Assumption 7 added: inventory staleness is the cycle-8 defect class; audit script makes it CI-enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script Cycle-9 adversarial review caught aksBackend mis-classified as an azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match is a stale doc comment (line 332) — aksBackend.azureToken is a plain net/http OAuth2 client. An import-block-disciplined re-survey found a second comment-only false positive: nosql_dynamodb.go. Structural fix for the recurring "grep matched a comment" defect class: added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks (never comments) and emits the comment-immune real-import map. Its output now populates every file table in the design — prose claims replaced by a build artifact. Formalized + CI-wired in Phase 0. Corrected inventory (audit-script output): AWS 5 real-import files (not 6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go are comment-only stubs — out of scope, stay in core. Design consequences of aksBackend being SDK-free: - Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all stay in core. - Architecture §2 no longer proposes a new PlatformBackend contract. The gke cross-process mechanism is gated on an interface-audit spike whose preferred outcome is folding into the existing ResourceDriver contract — a dedicated contract for one backend is YAGNI. - Phase A (Azure) is now pure IaCStateBackend — touches no platform file. - Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/ eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and fixes the stale line-332 comment. - The gke platform extraction + its contract decision move to Phase C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9 audit script scanned only module/, missing five aws-sdk-go-v2 importers under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal ("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by the four phases as written. Structural fix — third defect-class variant closed: - audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and splits results module/ vs. elsewhere. Comment-immune (cycle 9) + scope-complete (cycle 10) + CI-enforced (Phase 0). Whole-repo inventory result: - Azure + GCP SDK usage is entirely module/-resident → Phases A and C drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate). - aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go. Scope decision: the out-of-module/ AWS surface is exactly #653's deliberately-retained "RBAC/secrets/artifact stay" scope (plus the provider/aws deploy provider). This design does NOT unilaterally override #653's recent documented decision — it scopes that surface OUT (new Non-Goal, parallel to godo) and logs a recommended successor issue. Consequences threaded through the doc: - Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the out-of-scope surface). - Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP, module/-scoped zero for AWS. - Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's scope decision is respected, not contested. - Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check + sessionName extraction sit between declared-input recording and the SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale line-18 backend-list comment), M2 (internal/legacyaws noted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups Adversarial review cycle 11: PASS (zero Critical, zero Important). Two Minor nits applied: - audit-cloud-symbols.sh: real_import now also matches single-line `import "..."` form, not just parenthesized blocks — closes the one latent parser false-negative the reviewer flagged. - §Goals: clarified that the module/-scoped AWS-zero `--check` assertion is deferred-implementation added in Phase C (the committed script only enforces the cloud_account_aws_creds.go post-Phase-B invariant today), parallel to the Phase 0 init()-partition deferral. Design phase complete — proceeding to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe The cycle-11 single-line-import hardening added an inner `grep -E '^import "'` whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`, making real_import() return false for every file lacking a single-line import. Added `|| true` on the inner grep. Verified: full report restored, all REAL/comment-only classifications correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A) Bite-sized TDD plan for the first executable increment: Phase 0 (split platform_kubernetes_kind.go, fix the stale comment, wire the audit script into CI) + Phase A (IaCStateBackend proto + benchmark-gated proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging guard, workflow-plugin-azure implementation, core deletion dropping azure-sdk from go.mod). 14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on plan — their concrete tasks depend on Phase A's outputs (the benchmark-validated proto shape, the host-resolution pattern, the plugin-side serve path), so planning them now would be fiction. The design doc remains the authoritative B/C/D spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor). All addressed: - C1 (Critical): Task 4's proto used google.protobuf.Struct, which iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config as `bytes outputs_json`/`bytes config_json` (the established ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not structpb. Removed the bogus struct.proto import step. - I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives there), not `cd plugin/external/proto`. - I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up the new benchmarks automatically) — no redundant harness; clarified the task is a one-time decision gate. - I3: Task 8's embedded research spike resolved at plan time — engine.go was read; integration is the design-sanctioned package-level module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal. Tasks 8/13/14 now have concrete file sets. - I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo, workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly. - M1: Task 5's benchmark file is now genuinely self-contained (local benchStateToProto + benchStateBackendServer; no forward references). - M2: Task 3 names ci.yml directly, places the audit job beside the existing godo-banned/aws-sdk-banned grep-gate jobs. - M3: Task 6 pins benchstat (go install + bare invocation). - M4: Task 9 states the redaction gap is verified against step_output_redactor.go:7-19, not a live deduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed resolved; 2 new Important + 2 Minor surfaced by the new-defect scan. All addressed: - I-A (Important): Task 9's redaction test was inconsistent with the actual redactMap behavior — a key named `credentials` matches the existing `credential` pattern and is wholesale-replaced with the placeholder STRING before any recursion, so the test's `.(map[string]any)` assertion panicked. Reworked Task 9: the `credentials:` block is ALREADY redacted wholesale (regression-tested); the real gap is `credentials_ref` being over-redacted (it's a module name, not a secret) — fix is a narrow `*_ref`-suffix exemption in isSensitiveField, not camelCase leaf patterns (which would be dead code given wholesale redaction happens first). - I-B (Important): Task 14's engine.go integration seam was under-specified and would fight loadPluginInternal's no-concrete-types precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now defines an `IaCStateBackendProvider` optional interface and type-asserts it in loadPluginInternal exactly like the existing stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter implements it. Concrete file set + code sketch added. - M-i: Task 6's benchmark.yml description corrected (runs `go test -bench=.` inline, not `make bench-baseline`). - M-ii: Task 4 notes the proto README's plugin.proto-specific wording is stale; trust root buf.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups Plan-phase adversarial review cycle 3: PASS (zero Critical, zero Important). Two Minor doc-tightening fixes applied: - Task 9 Step 4 now names bearer_token_ref explicitly and explains why the *_ref exemption is safe for it (SecretRef is a reference struct, not a raw secret) rather than claiming no *_ref field exists. - engine.go line citations corrected to 311-326. Plan phase complete — proceeding to alignment-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: lock scope for cloud-sdk-extraction (alignment passed) * refactor(module): split platform_kubernetes_kind.go into _core + _gke Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/ aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core init(); gkeBackend (the only SDK-bearing k8s backend) moves to platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same five backend names registered. Isolates the lone SDK-bearing platform file for a later clean deletion. * docs(module): add file-purpose headers to platform_kubernetes _core/_gke Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition self-documenting for readers without the commit message. * docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk consumer. The stale comment is what fooled an earlier inventory pass into mis-counting platform_kubernetes_kind.go as an azure-sdk importer. * ci(audit): enforce k8s-backend init() partition + run audit on every PR Extends audit-cloud-symbols.sh --check with an init()-partition assertion (platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned / aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced artifact rather than a prose claim. * docs(plans): IaCStateBackend transport benchmark result — decision pending Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar. Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB map[string]any (inherent to the bytes outputs_json wire format the iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB message cap. The plan's contingency remedy (streaming redesign) addresses message-size-cap + memory-buffering, neither of which the benchmark hits; streaming would not move the number. Recommendation: retain unary (6.5ms is still negligible vs real cloud backend I/O — the design's own bar-rationale). Deviation from the literal 5ms estimate-bar is surfaced to the operator, not absorbed silently. Scope lock intact: Task 6 run + recorded, no task added/dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed) Operator reviewed the 6.51ms benchmark + root-cause analysis and confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's gate resolves Unary LOCKED — the Task 4 proto stands, no streaming redesign, PR 2/3 proceed unchanged. Operator additionally raised a long-term architectural item: IaC state is persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR) with JSON-export + content-detection-on-read would be better for processing/type-correctness/large-state scaling. Logged as a post-extraction follow-up in both the benchmark decision record and the design doc's Open items — distinct from the wire contract, cross-cutting across all IaCStateStore impls, needs its own brainstorming pass. Not actioned in this locked plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)" This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954. * docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4 Operator-approved scope amendment to the (reverted-to-Draft) plan: - ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 / Task 15. Task 7 had to hardcode context.Background() in grpcIaCStateStore; the operator directed widening the interface now while we're at that boundary, so Phase B/C/D plugin backends inherit it ctx-ful. Bounded blast radius (~9 files, all in module/); interfaces.IaCStateStore already had ctx and is untouched. - ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo. Operator: agents should operate in plugin repos directly; the real requirement is prompt clarity (absolute repo path stated up front), not a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes updated accordingly. - Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note updated to RESOLVED (unary locked). Next: re-run alignment-check on the amended plan, then re-lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed) * feat(proto): add IaCStateBackend service to iac.proto Strict 6-method contract mirroring module.IaCStateStore 1:1, with an IaCState message mirroring module.IaCState. Free-form Outputs/Config maps cross the wire as bytes outputs_json/config_json per the iac.proto hard invariant (NO google.protobuf.Struct) — same pattern as ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated bindings via buf. * test(module): add IaCStateBackend gRPC-vs-in-process benchmark harness Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock both in-process (baseline) and over a real bufconn gRPC boundary (post-extraction path). Self-contained (local benchStateToProto + benchStateBackendServer; Task 7 promotes production versions). Feeds the unary-vs-streaming proto-transport decision in the next task. * test(wftest): add IaCStateBackend to iacServiceChecks coverage table Task 4 added the IaCStateBackend service to iac.proto but missed the corresponding iacServiceChecks row in wftest/bdd/strict_iac.go. TestIaCServiceChecks_CoversEveryProtoService enforces parity between iac.proto's services and that table — it was failing on the missing entry. Belongs with PR 2 (the proto PR). * fix(iac-bench): validate SaveState input, close bufconn, broaden import audit regex Addresses Copilot review on PR #669: - benchStateBackendServer.SaveState now rejects nil State and propagates JSON unmarshal failures as InvalidArgument instead of silently writing corrupted/empty data. - BenchmarkIaCStateBackend_GRPC closes the bufconn listener; comment no longer implies bufconn size sets the gRPC message cap. - audit-cloud-symbols.sh real_import() single-line regex now matches aliased/dot/blank imports (import foo "pkg" / . / _), not just plain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 14, 2026
…+ security guards (#670) * docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and cloud.google.com/go + google.golang.org/api direct deps from workflow core's module/ package. Architecture: 3 extension surfaces, 3 strategies: - IaC state backends → new IaCStateBackend strict proto contract; iac.state stays core, config.backend dispatches to plugin gRPC client. - platform.* provisioners → new PlatformBackend strict proto contract; module types + provider: key stay core, kind backend stays in-core, cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract. - standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload, storage.s3, storage.gcs) → plugin-native module/step types via the existing ModuleFactories/StepFactories SDK — no new contract. Credentials (Option 1): each plugin-native module carries its own credentials: block + builds aws.Config in-process; optional in-plugin credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp cloud_account files have no SDK import and stay. 4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp, D digitalocean (spaces backend, minor bump + migration doc). Includes Assumptions + Rollback sections + self-challenge top-3 doubts (PlatformBackend over-generality, provider-separability fragility, benchmark-could-invalidate-unary-default — all with mitigations deferred to writing-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions Addresses 2 Critical + 5 Important findings from adversarial-design-review: Critical: - iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit home: deleted by Phase B's core PR; Phase D reframed from soft-compat to a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2 entirely" is now actually achieved by the phases as written + enforced by a go list -deps CI gate. - kinesis: added Non-Goals entry explaining it's a transitive dep of modular/modules/eventbus/v2, not a direct workflow import — out of scope, with the go mod why chain documented so the literal ask is fully answered. Important: - Full grep-verified 13-file AWS inventory table in Phase B with per-file destinations; reconciled aws_api_gateway.go (route-sync module) vs platform_apigateway.go (provisioner) as two distinct files. - aksBackend assigned to Phase A (Azure gets the PlatformBackend half too); platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with explicit always-compiles coordination. - Proto contracts fold into existing plugin/external/proto/iac.proto (8 services already) instead of new files — matches precedent. - New Security section: secret-redaction in config-version-store/tracing + gRPC interceptor logging are blocking writing-plans tasks; credentials_ref blast radius documented as strictly narrower than today's cloud.account. Minor: - IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no speculative surface. - Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR). - IaCProviderRequired/ResourceDriver reuse promoted to a first-class Alternatives Considered entry with accept/reject rationale + retained as the gated fallback for PlatformBackend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions Addresses 2 Critical + 3 Important from cycle-2 review: Critical: - platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with its own import block). The "always compiles across phases" property is now structural, not asserted. Added a verified per-file import-ownership table. - Corrected the false Phase A rationale: aksBackend uses raw net/http REST, NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend region). The Azure go.mod drop comes entirely from iac_state_azure.go deletion + iac_module.go edit; aksBackend extraction is code-organisation, not a dependency change. - Documented the eksBackend → cloud_account_aws.go call-graph edge as a hard same-commit atomicity constraint (verified: eksBackend calls awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138). Important: - Phase B core-PR bullet now explicitly lists "strip the spaces case from iac_module.go" (was only obliquely referenced). - New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds contract field; SaveState lost-response retry → documented idempotent (full-state replace, last-writer-wins); plugin-unreachable → abort before mutation; PlatformBackend mid-Apply crash → identical to today's in-process risk, no new mitigation. - §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no body-logging interceptor (grpc.NewServer(opts...) passthrough; only callback_server.go logs, never module config). Writing-plans adds a guard test instead of a conditional interceptor. Minor: file-count table footnoted (count = importers, not deletions); shared s3compat module added as Alternatives Considered #3 (deferred, not rejected); self-challenge doubt numbering tidied (2 mitigations cover 3 doubts, intentionally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): fix stale Phase A/B refs + Status line post-cycle-2 sed in the cycle-2 commit ran from the wrong cwd — Status line still said "cycle 1" and two interface-audit-spike references still said "Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions Addresses 2 Critical + 2 Important from cycle-3 review: Critical (same root — symbol-level coupling the import-block audit missed): - parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers the plugin-bound backend files call. An import-block audit is symbol-blind. Fix: Phase 0 now does TWO moves — the file split AND relocating both helpers into a new SDK-free core module/cloud_helpers.go. Per-file table gains a "cross-file symbol deps (the trap)" column listing every helper edge per backend. Phase 0 acceptance criteria now include a grep that no core file references the helpers from their old homes. - §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file (module shell + kubernetesBackend interface + safeIntToInt32) — NOT touched by the split; only platform_kubernetes_kind.go (holds all 4 backends) is split. Earlier draft conflated the two files. Important: - Per-file ownership table relabelled "intended post-split — verified by the Phase 0 build gate" (was asserted-as-verified against an unsplit file — same hand-waving class cycle-2 flagged for "always compiles"). - lease_ttl_seconds DROPPED from the Phase A proto. It was a contract field with no enforced semantics and no implementing backend in scope — YAGNI. §Failure-modes orphaned-lock reworked: documented limitation + operator-side lock-object delete for recovery; TTL is a planned ADDITIVE follow-up paired with a conformance test, shipped with the first backend that honors expiry. Added explicit Lock-contention behavior (immediate error, matches today's in-process IaCStateStore.Lock — no new waiting state). Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence fixed; §Assumptions 2 notes Phase 0 de-risks it structurally. Also: removed a stray stale cycle-1 copy of this doc that was sitting untracked in the main workflow checkout (the canonical doc is here in the feat/cloud-sdk-extraction worktree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions Addresses 2 Critical + 2 Important from cycle-4 review: Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't — that's eksBackend) and aksBackend has no cross-file deps (it does — CloudCredentials/CloudCredentialProvider from cloud_account.go, same as gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The symbol-ownership map is now a Phase 0 build artifact — scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a design-doc claim that rots on every edit. The design commits to the *method* + the *known shape* (cloud_account.go stays core; all 3 cloud backends bind to it via k.provider.GetCredentials; eksBackend additionally binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no cloud SDK). Critical 2 — Phase 0's "split into four, zero logic change" silently dropped the single func init() that registers kind/k3s/eks/gke/aks. Splitting REQUIRES partitioning init() per-file (a distribution, not zero-change). Phase 0 now has an explicit step 2 for the init() partition; relabelled "behavior-equivalent" not "zero logic change"; k3s documented as reusing kindBackend (both stay core). Important 1 — platform.* cloud credential flow across PlatformBackend was unspecified (aksBackend needs CloudCredentials — how does it reach the plugin?). Added: PlatformBackend requests carry a CloudCredentials proto message; engine resolves k.provider.GetCredentials() in-core (config-map parsing, no SDK) and serialises it. Unified with the Architecture-3 credentials story — ONE CloudCredentials proto shape for both surfaces, so secret-redaction has one shape to redact. Important 2 — core actually imports FOUR cloud SDK trees, not three: godo is still in cloud_account_do.go + 5 platform_do_*.go files. §Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT (user's ask was 3 trees), and the go list -deps gate is reworded to assert "zero packages from the three in-scope trees" not "zero cloud SDKs". All "zero cloud SDKs" phrasing reconciled throughout. Minor: ListStates filter + remaining-proto-messages notes folded in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions Addresses 1 Critical + 2 Important from cycle-5 review: Critical — the init()-partition fix (cycle-4) was kubernetes-only, but the SAME defect class exists in platform_dns.go / platform_ecs.go / platform_networking.go / platform_autoscaling.go: each has a single func init() registering BOTH a core-staying `mock` backend AND a plugin-bound `aws` backend. The old Phase B inventory moved those files wholesale → would exile the mock backends + dangle the route53 registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go" to a repo-wide uniform `_core.go` / `_<provider>.go` convention across the WHOLE platform.* family. Every mixed init() is partitioned; the audit script flags any init() registering a mix of core-staying + plugin-bound factories as a CI failure. Phase B inventory rewritten to delete only `_aws.go`/`_eks.go` files, never a mixed file. Important 1 — the cycle-4 "Known shape" prose reintroduced hand-maintained cross-file symbol claims (one already incomplete: parseStringSlice consumers). FIX: cut all per-file symbol enumerations; the section now states only invariants the script VERIFIES (not discovers) + the method. No transcribed symbol lists remain. Important 2 + own finding — cycle-4 said the engine resolves credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile) and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK. FIX: §Architecture-2 corrected — engine passes the DECLARED credential config (plain strings) in the CloudCredentials proto; the PLUGIN resolves (incl. the SDK-bearing profile/role_arn paths). Both cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B, no core replacement — all AWS cred resolution moves plugin-side. azure/gcp resolver files stay (their resolvers are genuinely SDK-free). Minor — backend-name collision: core-reserved names (memory/filesystem/ postgres/kind/k3s/mock) cause a load-time error if a plugin collides, not silent shadowing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions Addresses 1 Critical + 2 Important from cycle-6 review: Critical — cycle-5's credential-flow fix replaced one false claim with another: it said the CloudCredentials struct already holds "declared config (plain strings incl. profile)". VERIFIED FALSE — the struct (cloud_account.go:18) has no Profile field (profile lives in Extra map) and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner than the struct change the reviewer proposed: the struct needs NO change (Extra map already carries markers, RoleARN field exists). Instead, cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig, sts.AssumeRole) are removed; they keep their SDK-free heads (record declared inputs + an Extra["credential_source"] marker, exactly as awsStaticResolver already does). After the edit the file is SDK-free and stays in core alongside the azure/gcp resolver files. Only cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider + awsProviderFrom) is deleted; its profile-chain/STS logic moves into the plugin's buildAWSConfig. Every in-core resolver becomes uniformly "declare, don't resolve"; the plugin honors the markers. No unregistered- resolver failure mode — the resolver init() registrations stay. Important 1 — §Phase-0 misidentified the DNS file with the mixed init(). VERIFIED: platform_dns.go:66 has the init() (+ interface + factory registry); platform_dns_backends.go has both impls + the route53 SDK import, NO init(). DNS is a TWO-file split, unlike single-file ecs/networking/autoscaling. §Phase-0 now states the per-family layout explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling one-file) and notes the audit script determines it. Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit deferred-resolution markers for env/CLI/managed-identity/workload-identity/ profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5 now state the plugin MUST implement marker handling for every deferred type, not just AWS profile/role_arn. Minor — safeIntToInt32 relocation rationale clarified (it's a clean copy-source for the plugin-bound files, not a hard core necessity); parseStringSlice IS a hard necessity (its file is deleted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions Addresses 2 Critical from cycle-7 review (architecture confirmed sound; these are the last two extraction-mechanic precision gaps): C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a clean contiguous tail, but awsRoleARNResolver's SDK block (base-config build + sts.AssumeRole, ~45 lines) is the larger half of the method, after the declared-input recording. FIX: §Architecture-2 re-characterizes the edit as a deliberate Resolve() body REWRITE (not a one-line snip) — explicitly per-resolver. Added a Phase B CI invariant: an import-block grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not prose-asserted. C2 — cloud_account_aws.go defines FOUR symbols, not one; the symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed: - AWSConfigProvider interface signature names aws.Config → CANNOT stay in core, deleted with the file. - awsProviderFrom → deleted with the interface. - ValidateCredentials → verified NO real caller (only a comment ref in cmd/wfctl/deploy.go:866) → deletes cleanly. - The 8 awsProviderFrom consumers are all verified plugin-bound — but each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the plugin there's no cloud.account to type-assert. §Cross-file-coupling invariant 3 now states Phase B must REWRITE all 8 consumers to obtain creds from the CloudCredentials proto + buildAWSConfig — explicit Phase B scope, not a footnote. Phase B table atomicity column updated. Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in Phase 0 so the dns family conforms to the uniform _core.go/_aws.go naming; no special-case three-file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main Cycle-8 adversarial review caught the design's file/symbol inventory as stale: it predated issue #653 (closed 2026-05-13), which already removed the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild + EKS backends. Re-baselined every file/symbol claim against origin/main HEAD (worktree confirmed 0 commits behind origin/main): - Added "Relationship to issue #653" section — this design is #653's named successor, extracting the AWS surface #653 scoped out ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces. - Problem table corrected: AWS 6 real-import files (not 13), Azure 3, GCP 3. storage_artifact_s3.go is comment-only — stays in core. - cloud_account_aws.go is dead code — zero non-test consumers verified; deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers removed by #653). - Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go); parseStringSlice + safeIntToInt32 no longer exist — helper-relocation task deleted. - PlatformBackend now serves only aks + gke (eks already a #653 SDK-free stub); interface-audit spike audits one interface, not five. - Phase B inventory rewritten; Phase A/C file lists corrected. - Self-challenge doubt #4 + Assumption 7 added: inventory staleness is the cycle-8 defect class; audit script makes it CI-enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script Cycle-9 adversarial review caught aksBackend mis-classified as an azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match is a stale doc comment (line 332) — aksBackend.azureToken is a plain net/http OAuth2 client. An import-block-disciplined re-survey found a second comment-only false positive: nosql_dynamodb.go. Structural fix for the recurring "grep matched a comment" defect class: added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks (never comments) and emits the comment-immune real-import map. Its output now populates every file table in the design — prose claims replaced by a build artifact. Formalized + CI-wired in Phase 0. Corrected inventory (audit-script output): AWS 5 real-import files (not 6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go are comment-only stubs — out of scope, stay in core. Design consequences of aksBackend being SDK-free: - Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all stay in core. - Architecture §2 no longer proposes a new PlatformBackend contract. The gke cross-process mechanism is gated on an interface-audit spike whose preferred outcome is folding into the existing ResourceDriver contract — a dedicated contract for one backend is YAGNI. - Phase A (Azure) is now pure IaCStateBackend — touches no platform file. - Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/ eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and fixes the stale line-332 comment. - The gke platform extraction + its contract decision move to Phase C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9 audit script scanned only module/, missing five aws-sdk-go-v2 importers under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal ("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by the four phases as written. Structural fix — third defect-class variant closed: - audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and splits results module/ vs. elsewhere. Comment-immune (cycle 9) + scope-complete (cycle 10) + CI-enforced (Phase 0). Whole-repo inventory result: - Azure + GCP SDK usage is entirely module/-resident → Phases A and C drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate). - aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go. Scope decision: the out-of-module/ AWS surface is exactly #653's deliberately-retained "RBAC/secrets/artifact stay" scope (plus the provider/aws deploy provider). This design does NOT unilaterally override #653's recent documented decision — it scopes that surface OUT (new Non-Goal, parallel to godo) and logs a recommended successor issue. Consequences threaded through the doc: - Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the out-of-scope surface). - Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP, module/-scoped zero for AWS. - Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's scope decision is respected, not contested. - Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check + sessionName extraction sit between declared-input recording and the SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale line-18 backend-list comment), M2 (internal/legacyaws noted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups Adversarial review cycle 11: PASS (zero Critical, zero Important). Two Minor nits applied: - audit-cloud-symbols.sh: real_import now also matches single-line `import "..."` form, not just parenthesized blocks — closes the one latent parser false-negative the reviewer flagged. - §Goals: clarified that the module/-scoped AWS-zero `--check` assertion is deferred-implementation added in Phase C (the committed script only enforces the cloud_account_aws_creds.go post-Phase-B invariant today), parallel to the Phase 0 init()-partition deferral. Design phase complete — proceeding to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe The cycle-11 single-line-import hardening added an inner `grep -E '^import "'` whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`, making real_import() return false for every file lacking a single-line import. Added `|| true` on the inner grep. Verified: full report restored, all REAL/comment-only classifications correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A) Bite-sized TDD plan for the first executable increment: Phase 0 (split platform_kubernetes_kind.go, fix the stale comment, wire the audit script into CI) + Phase A (IaCStateBackend proto + benchmark-gated proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging guard, workflow-plugin-azure implementation, core deletion dropping azure-sdk from go.mod). 14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on plan — their concrete tasks depend on Phase A's outputs (the benchmark-validated proto shape, the host-resolution pattern, the plugin-side serve path), so planning them now would be fiction. The design doc remains the authoritative B/C/D spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor). All addressed: - C1 (Critical): Task 4's proto used google.protobuf.Struct, which iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config as `bytes outputs_json`/`bytes config_json` (the established ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not structpb. Removed the bogus struct.proto import step. - I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives there), not `cd plugin/external/proto`. - I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up the new benchmarks automatically) — no redundant harness; clarified the task is a one-time decision gate. - I3: Task 8's embedded research spike resolved at plan time — engine.go was read; integration is the design-sanctioned package-level module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal. Tasks 8/13/14 now have concrete file sets. - I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo, workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly. - M1: Task 5's benchmark file is now genuinely self-contained (local benchStateToProto + benchStateBackendServer; no forward references). - M2: Task 3 names ci.yml directly, places the audit job beside the existing godo-banned/aws-sdk-banned grep-gate jobs. - M3: Task 6 pins benchstat (go install + bare invocation). - M4: Task 9 states the redaction gap is verified against step_output_redactor.go:7-19, not a live deduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed resolved; 2 new Important + 2 Minor surfaced by the new-defect scan. All addressed: - I-A (Important): Task 9's redaction test was inconsistent with the actual redactMap behavior — a key named `credentials` matches the existing `credential` pattern and is wholesale-replaced with the placeholder STRING before any recursion, so the test's `.(map[string]any)` assertion panicked. Reworked Task 9: the `credentials:` block is ALREADY redacted wholesale (regression-tested); the real gap is `credentials_ref` being over-redacted (it's a module name, not a secret) — fix is a narrow `*_ref`-suffix exemption in isSensitiveField, not camelCase leaf patterns (which would be dead code given wholesale redaction happens first). - I-B (Important): Task 14's engine.go integration seam was under-specified and would fight loadPluginInternal's no-concrete-types precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now defines an `IaCStateBackendProvider` optional interface and type-asserts it in loadPluginInternal exactly like the existing stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter implements it. Concrete file set + code sketch added. - M-i: Task 6's benchmark.yml description corrected (runs `go test -bench=.` inline, not `make bench-baseline`). - M-ii: Task 4 notes the proto README's plugin.proto-specific wording is stale; trust root buf.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups Plan-phase adversarial review cycle 3: PASS (zero Critical, zero Important). Two Minor doc-tightening fixes applied: - Task 9 Step 4 now names bearer_token_ref explicitly and explains why the *_ref exemption is safe for it (SecretRef is a reference struct, not a raw secret) rather than claiming no *_ref field exists. - engine.go line citations corrected to 311-326. Plan phase complete — proceeding to alignment-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: lock scope for cloud-sdk-extraction (alignment passed) * refactor(module): split platform_kubernetes_kind.go into _core + _gke Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/ aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core init(); gkeBackend (the only SDK-bearing k8s backend) moves to platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same five backend names registered. Isolates the lone SDK-bearing platform file for a later clean deletion. * docs(module): add file-purpose headers to platform_kubernetes _core/_gke Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition self-documenting for readers without the commit message. * docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk consumer. The stale comment is what fooled an earlier inventory pass into mis-counting platform_kubernetes_kind.go as an azure-sdk importer. * ci(audit): enforce k8s-backend init() partition + run audit on every PR Extends audit-cloud-symbols.sh --check with an init()-partition assertion (platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned / aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced artifact rather than a prose claim. * docs(plans): IaCStateBackend transport benchmark result — decision pending Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar. Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB map[string]any (inherent to the bytes outputs_json wire format the iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB message cap. The plan's contingency remedy (streaming redesign) addresses message-size-cap + memory-buffering, neither of which the benchmark hits; streaming would not move the number. Recommendation: retain unary (6.5ms is still negligible vs real cloud backend I/O — the design's own bar-rationale). Deviation from the literal 5ms estimate-bar is surfaced to the operator, not absorbed silently. Scope lock intact: Task 6 run + recorded, no task added/dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed) Operator reviewed the 6.51ms benchmark + root-cause analysis and confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's gate resolves Unary LOCKED — the Task 4 proto stands, no streaming redesign, PR 2/3 proceed unchanged. Operator additionally raised a long-term architectural item: IaC state is persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR) with JSON-export + content-detection-on-read would be better for processing/type-correctness/large-state scaling. Logged as a post-extraction follow-up in both the benchmark decision record and the design doc's Open items — distinct from the wire contract, cross-cutting across all IaCStateStore impls, needs its own brainstorming pass. Not actioned in this locked plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)" This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954. * docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4 Operator-approved scope amendment to the (reverted-to-Draft) plan: - ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 / Task 15. Task 7 had to hardcode context.Background() in grpcIaCStateStore; the operator directed widening the interface now while we're at that boundary, so Phase B/C/D plugin backends inherit it ctx-ful. Bounded blast radius (~9 files, all in module/); interfaces.IaCStateStore already had ctx and is untouched. - ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo. Operator: agents should operate in plugin repos directly; the real requirement is prompt clarity (absolute repo path stated up front), not a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes updated accordingly. - Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note updated to RESOLVED (unary locked). Next: re-run alignment-check on the amended plan, then re-lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed) * feat(proto): add IaCStateBackend service to iac.proto Strict 6-method contract mirroring module.IaCStateStore 1:1, with an IaCState message mirroring module.IaCState. Free-form Outputs/Config maps cross the wire as bytes outputs_json/config_json per the iac.proto hard invariant (NO google.protobuf.Struct) — same pattern as ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated bindings via buf. * test(module): add IaCStateBackend gRPC-vs-in-process benchmark harness Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock both in-process (baseline) and over a real bufconn gRPC boundary (post-extraction path). Self-contained (local benchStateToProto + benchStateBackendServer; Task 7 promotes production versions). Feeds the unary-vs-streaming proto-transport decision in the next task. * test(wftest): add IaCStateBackend to iacServiceChecks coverage table Task 4 added the IaCStateBackend service to iac.proto but missed the corresponding iacServiceChecks row in wftest/bdd/strict_iac.go. TestIaCServiceChecks_CoversEveryProtoService enforces parity between iac.proto's services and that table — it was failing on the missing entry. Belongs with PR 2 (the proto PR). * feat(module): IaCState proto converters + grpcIaCStateStore client adapter grpcIaCStateStore implements module.IaCStateStore over an IaCStateBackendClient — the host-side half of the new contract. iacStateToProto/iacStateFromProto convert the free-form Outputs/Config maps via encoding/json (no structpb — iac.proto hard invariant). iacStateBackendServer is the production server type. Promotes these out of the benchmark file so one canonical copy is shared. * docs(module): note context.Background() follow-up on grpcIaCStateStore Code-review Minor: the spec asked for the hardcoded context.Background() to be acknowledged as a known follow-up (IaCStateStore has no ctx param) rather than silently used. * feat(module): engine-side iac.state plugin-backend registry + dispatch A package-level iacStateBackendRegistry maps a backend name to a pb.IaCStateBackendClient; the engine populates it at plugin-load time (Task 14). IaCModule.Init()'s switch gains a default arm that resolves non-core backend names from the registry, constructing a grpcIaCStateStore. Reserved core names (memory/filesystem/postgres) are rejected at registration. The existing in-process backend cases (incl. azure_blob) are untouched here — the plumbing exists and is tested; PR 5 flips azure_blob onto it. * feat(module): exempt *_ref keys from redaction; lock in credentials: redaction Option-1 credentials move raw cloud secrets inline into plugin-native module config under a credentials: key — already redacted wholesale by the existing 'credential' pattern (regression test added). But that same pattern over-redacts credentials_ref:, which holds a module NAME, not a secret. Adds a narrow *_ref-suffix exemption to isSensitiveField so reference keys are preserved for trace debuggability. * refactor(module): name the _ref redaction-exemption suffix as a const Code-review Minor: refFieldSuffix const for consistency with the existing safeFieldSuffix (_display) exemption. * test(plugin/external): guard against gRPC body-logging interceptors CreateModule requests carry inline credentials: blocks (Option-1 credentials model). This guard fails CI if any plugin/external/ file gains a gRPC interceptor option, forcing a reviewer to confirm it cannot log request bodies. Implements the cloud-sdk-extraction design's Security guard-test requirement. * test(plugin/external): broaden interceptor guard to Stream interceptors Code-review catch: the guard regex covered Unary only. CreateModule is unary today, but a future streaming RPC carrying credentials must not slip a stream interceptor past the guard. Now matches (Unary|Stream). * fix(iac-host): narrow _ref redaction exemption, validate registry input, harden guard test Addresses Copilot review on PR #670: - step_output_redactor: the "_ref" suffix no longer blanket-bypasses redaction. It exempts only structural-reference words ("credential"), so credentials_ref is preserved but bearer_token_ref / api_key_ref / secret_ref still redact (token/api_key/secret are value-bearing). - iac_state_plugin_registry.register: rejects empty/whitespace names and nil clients; trims the name before use. - grpc_logging_guard_test: walks the whole plugin/external/ tree (catches subpackages like sdk/), skips generated *.pb.go / proto/ files to avoid false positives, and adds a real interceptorAllowlist mechanism the failure message now references. - iac_state_grpc_client_test + benchmark_iac_state_backend_test: close the bufconn listener via t.Cleanup/b.Cleanup; benchmark comment no longer implies bufconn size sets the gRPC message cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 14, 2026
…teStore (#671) * docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and cloud.google.com/go + google.golang.org/api direct deps from workflow core's module/ package. Architecture: 3 extension surfaces, 3 strategies: - IaC state backends → new IaCStateBackend strict proto contract; iac.state stays core, config.backend dispatches to plugin gRPC client. - platform.* provisioners → new PlatformBackend strict proto contract; module types + provider: key stay core, kind backend stays in-core, cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract. - standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload, storage.s3, storage.gcs) → plugin-native module/step types via the existing ModuleFactories/StepFactories SDK — no new contract. Credentials (Option 1): each plugin-native module carries its own credentials: block + builds aws.Config in-process; optional in-plugin credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp cloud_account files have no SDK import and stay. 4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp, D digitalocean (spaces backend, minor bump + migration doc). Includes Assumptions + Rollback sections + self-challenge top-3 doubts (PlatformBackend over-generality, provider-separability fragility, benchmark-could-invalidate-unary-default — all with mitigations deferred to writing-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions Addresses 2 Critical + 5 Important findings from adversarial-design-review: Critical: - iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit home: deleted by Phase B's core PR; Phase D reframed from soft-compat to a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2 entirely" is now actually achieved by the phases as written + enforced by a go list -deps CI gate. - kinesis: added Non-Goals entry explaining it's a transitive dep of modular/modules/eventbus/v2, not a direct workflow import — out of scope, with the go mod why chain documented so the literal ask is fully answered. Important: - Full grep-verified 13-file AWS inventory table in Phase B with per-file destinations; reconciled aws_api_gateway.go (route-sync module) vs platform_apigateway.go (provisioner) as two distinct files. - aksBackend assigned to Phase A (Azure gets the PlatformBackend half too); platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with explicit always-compiles coordination. - Proto contracts fold into existing plugin/external/proto/iac.proto (8 services already) instead of new files — matches precedent. - New Security section: secret-redaction in config-version-store/tracing + gRPC interceptor logging are blocking writing-plans tasks; credentials_ref blast radius documented as strictly narrower than today's cloud.account. Minor: - IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no speculative surface. - Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR). - IaCProviderRequired/ResourceDriver reuse promoted to a first-class Alternatives Considered entry with accept/reject rationale + retained as the gated fallback for PlatformBackend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions Addresses 2 Critical + 3 Important from cycle-2 review: Critical: - platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with its own import block). The "always compiles across phases" property is now structural, not asserted. Added a verified per-file import-ownership table. - Corrected the false Phase A rationale: aksBackend uses raw net/http REST, NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend region). The Azure go.mod drop comes entirely from iac_state_azure.go deletion + iac_module.go edit; aksBackend extraction is code-organisation, not a dependency change. - Documented the eksBackend → cloud_account_aws.go call-graph edge as a hard same-commit atomicity constraint (verified: eksBackend calls awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138). Important: - Phase B core-PR bullet now explicitly lists "strip the spaces case from iac_module.go" (was only obliquely referenced). - New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds contract field; SaveState lost-response retry → documented idempotent (full-state replace, last-writer-wins); plugin-unreachable → abort before mutation; PlatformBackend mid-Apply crash → identical to today's in-process risk, no new mitigation. - §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no body-logging interceptor (grpc.NewServer(opts...) passthrough; only callback_server.go logs, never module config). Writing-plans adds a guard test instead of a conditional interceptor. Minor: file-count table footnoted (count = importers, not deletions); shared s3compat module added as Alternatives Considered #3 (deferred, not rejected); self-challenge doubt numbering tidied (2 mitigations cover 3 doubts, intentionally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): fix stale Phase A/B refs + Status line post-cycle-2 sed in the cycle-2 commit ran from the wrong cwd — Status line still said "cycle 1" and two interface-audit-spike references still said "Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions Addresses 2 Critical + 2 Important from cycle-3 review: Critical (same root — symbol-level coupling the import-block audit missed): - parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers the plugin-bound backend files call. An import-block audit is symbol-blind. Fix: Phase 0 now does TWO moves — the file split AND relocating both helpers into a new SDK-free core module/cloud_helpers.go. Per-file table gains a "cross-file symbol deps (the trap)" column listing every helper edge per backend. Phase 0 acceptance criteria now include a grep that no core file references the helpers from their old homes. - §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file (module shell + kubernetesBackend interface + safeIntToInt32) — NOT touched by the split; only platform_kubernetes_kind.go (holds all 4 backends) is split. Earlier draft conflated the two files. Important: - Per-file ownership table relabelled "intended post-split — verified by the Phase 0 build gate" (was asserted-as-verified against an unsplit file — same hand-waving class cycle-2 flagged for "always compiles"). - lease_ttl_seconds DROPPED from the Phase A proto. It was a contract field with no enforced semantics and no implementing backend in scope — YAGNI. §Failure-modes orphaned-lock reworked: documented limitation + operator-side lock-object delete for recovery; TTL is a planned ADDITIVE follow-up paired with a conformance test, shipped with the first backend that honors expiry. Added explicit Lock-contention behavior (immediate error, matches today's in-process IaCStateStore.Lock — no new waiting state). Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence fixed; §Assumptions 2 notes Phase 0 de-risks it structurally. Also: removed a stray stale cycle-1 copy of this doc that was sitting untracked in the main workflow checkout (the canonical doc is here in the feat/cloud-sdk-extraction worktree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions Addresses 2 Critical + 2 Important from cycle-4 review: Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't — that's eksBackend) and aksBackend has no cross-file deps (it does — CloudCredentials/CloudCredentialProvider from cloud_account.go, same as gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The symbol-ownership map is now a Phase 0 build artifact — scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a design-doc claim that rots on every edit. The design commits to the *method* + the *known shape* (cloud_account.go stays core; all 3 cloud backends bind to it via k.provider.GetCredentials; eksBackend additionally binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no cloud SDK). Critical 2 — Phase 0's "split into four, zero logic change" silently dropped the single func init() that registers kind/k3s/eks/gke/aks. Splitting REQUIRES partitioning init() per-file (a distribution, not zero-change). Phase 0 now has an explicit step 2 for the init() partition; relabelled "behavior-equivalent" not "zero logic change"; k3s documented as reusing kindBackend (both stay core). Important 1 — platform.* cloud credential flow across PlatformBackend was unspecified (aksBackend needs CloudCredentials — how does it reach the plugin?). Added: PlatformBackend requests carry a CloudCredentials proto message; engine resolves k.provider.GetCredentials() in-core (config-map parsing, no SDK) and serialises it. Unified with the Architecture-3 credentials story — ONE CloudCredentials proto shape for both surfaces, so secret-redaction has one shape to redact. Important 2 — core actually imports FOUR cloud SDK trees, not three: godo is still in cloud_account_do.go + 5 platform_do_*.go files. §Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT (user's ask was 3 trees), and the go list -deps gate is reworded to assert "zero packages from the three in-scope trees" not "zero cloud SDKs". All "zero cloud SDKs" phrasing reconciled throughout. Minor: ListStates filter + remaining-proto-messages notes folded in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions Addresses 1 Critical + 2 Important from cycle-5 review: Critical — the init()-partition fix (cycle-4) was kubernetes-only, but the SAME defect class exists in platform_dns.go / platform_ecs.go / platform_networking.go / platform_autoscaling.go: each has a single func init() registering BOTH a core-staying `mock` backend AND a plugin-bound `aws` backend. The old Phase B inventory moved those files wholesale → would exile the mock backends + dangle the route53 registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go" to a repo-wide uniform `_core.go` / `_<provider>.go` convention across the WHOLE platform.* family. Every mixed init() is partitioned; the audit script flags any init() registering a mix of core-staying + plugin-bound factories as a CI failure. Phase B inventory rewritten to delete only `_aws.go`/`_eks.go` files, never a mixed file. Important 1 — the cycle-4 "Known shape" prose reintroduced hand-maintained cross-file symbol claims (one already incomplete: parseStringSlice consumers). FIX: cut all per-file symbol enumerations; the section now states only invariants the script VERIFIES (not discovers) + the method. No transcribed symbol lists remain. Important 2 + own finding — cycle-4 said the engine resolves credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile) and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK. FIX: §Architecture-2 corrected — engine passes the DECLARED credential config (plain strings) in the CloudCredentials proto; the PLUGIN resolves (incl. the SDK-bearing profile/role_arn paths). Both cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B, no core replacement — all AWS cred resolution moves plugin-side. azure/gcp resolver files stay (their resolvers are genuinely SDK-free). Minor — backend-name collision: core-reserved names (memory/filesystem/ postgres/kind/k3s/mock) cause a load-time error if a plugin collides, not silent shadowing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions Addresses 1 Critical + 2 Important from cycle-6 review: Critical — cycle-5's credential-flow fix replaced one false claim with another: it said the CloudCredentials struct already holds "declared config (plain strings incl. profile)". VERIFIED FALSE — the struct (cloud_account.go:18) has no Profile field (profile lives in Extra map) and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner than the struct change the reviewer proposed: the struct needs NO change (Extra map already carries markers, RoleARN field exists). Instead, cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig, sts.AssumeRole) are removed; they keep their SDK-free heads (record declared inputs + an Extra["credential_source"] marker, exactly as awsStaticResolver already does). After the edit the file is SDK-free and stays in core alongside the azure/gcp resolver files. Only cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider + awsProviderFrom) is deleted; its profile-chain/STS logic moves into the plugin's buildAWSConfig. Every in-core resolver becomes uniformly "declare, don't resolve"; the plugin honors the markers. No unregistered- resolver failure mode — the resolver init() registrations stay. Important 1 — §Phase-0 misidentified the DNS file with the mixed init(). VERIFIED: platform_dns.go:66 has the init() (+ interface + factory registry); platform_dns_backends.go has both impls + the route53 SDK import, NO init(). DNS is a TWO-file split, unlike single-file ecs/networking/autoscaling. §Phase-0 now states the per-family layout explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling one-file) and notes the audit script determines it. Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit deferred-resolution markers for env/CLI/managed-identity/workload-identity/ profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5 now state the plugin MUST implement marker handling for every deferred type, not just AWS profile/role_arn. Minor — safeIntToInt32 relocation rationale clarified (it's a clean copy-source for the plugin-bound files, not a hard core necessity); parseStringSlice IS a hard necessity (its file is deleted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions Addresses 2 Critical from cycle-7 review (architecture confirmed sound; these are the last two extraction-mechanic precision gaps): C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a clean contiguous tail, but awsRoleARNResolver's SDK block (base-config build + sts.AssumeRole, ~45 lines) is the larger half of the method, after the declared-input recording. FIX: §Architecture-2 re-characterizes the edit as a deliberate Resolve() body REWRITE (not a one-line snip) — explicitly per-resolver. Added a Phase B CI invariant: an import-block grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not prose-asserted. C2 — cloud_account_aws.go defines FOUR symbols, not one; the symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed: - AWSConfigProvider interface signature names aws.Config → CANNOT stay in core, deleted with the file. - awsProviderFrom → deleted with the interface. - ValidateCredentials → verified NO real caller (only a comment ref in cmd/wfctl/deploy.go:866) → deletes cleanly. - The 8 awsProviderFrom consumers are all verified plugin-bound — but each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the plugin there's no cloud.account to type-assert. §Cross-file-coupling invariant 3 now states Phase B must REWRITE all 8 consumers to obtain creds from the CloudCredentials proto + buildAWSConfig — explicit Phase B scope, not a footnote. Phase B table atomicity column updated. Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in Phase 0 so the dns family conforms to the uniform _core.go/_aws.go naming; no special-case three-file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main Cycle-8 adversarial review caught the design's file/symbol inventory as stale: it predated issue #653 (closed 2026-05-13), which already removed the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild + EKS backends. Re-baselined every file/symbol claim against origin/main HEAD (worktree confirmed 0 commits behind origin/main): - Added "Relationship to issue #653" section — this design is #653's named successor, extracting the AWS surface #653 scoped out ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces. - Problem table corrected: AWS 6 real-import files (not 13), Azure 3, GCP 3. storage_artifact_s3.go is comment-only — stays in core. - cloud_account_aws.go is dead code — zero non-test consumers verified; deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers removed by #653). - Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go); parseStringSlice + safeIntToInt32 no longer exist — helper-relocation task deleted. - PlatformBackend now serves only aks + gke (eks already a #653 SDK-free stub); interface-audit spike audits one interface, not five. - Phase B inventory rewritten; Phase A/C file lists corrected. - Self-challenge doubt #4 + Assumption 7 added: inventory staleness is the cycle-8 defect class; audit script makes it CI-enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script Cycle-9 adversarial review caught aksBackend mis-classified as an azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match is a stale doc comment (line 332) — aksBackend.azureToken is a plain net/http OAuth2 client. An import-block-disciplined re-survey found a second comment-only false positive: nosql_dynamodb.go. Structural fix for the recurring "grep matched a comment" defect class: added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks (never comments) and emits the comment-immune real-import map. Its output now populates every file table in the design — prose claims replaced by a build artifact. Formalized + CI-wired in Phase 0. Corrected inventory (audit-script output): AWS 5 real-import files (not 6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go are comment-only stubs — out of scope, stay in core. Design consequences of aksBackend being SDK-free: - Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all stay in core. - Architecture §2 no longer proposes a new PlatformBackend contract. The gke cross-process mechanism is gated on an interface-audit spike whose preferred outcome is folding into the existing ResourceDriver contract — a dedicated contract for one backend is YAGNI. - Phase A (Azure) is now pure IaCStateBackend — touches no platform file. - Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/ eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and fixes the stale line-332 comment. - The gke platform extraction + its contract decision move to Phase C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9 audit script scanned only module/, missing five aws-sdk-go-v2 importers under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal ("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by the four phases as written. Structural fix — third defect-class variant closed: - audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and splits results module/ vs. elsewhere. Comment-immune (cycle 9) + scope-complete (cycle 10) + CI-enforced (Phase 0). Whole-repo inventory result: - Azure + GCP SDK usage is entirely module/-resident → Phases A and C drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate). - aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go. Scope decision: the out-of-module/ AWS surface is exactly #653's deliberately-retained "RBAC/secrets/artifact stay" scope (plus the provider/aws deploy provider). This design does NOT unilaterally override #653's recent documented decision — it scopes that surface OUT (new Non-Goal, parallel to godo) and logs a recommended successor issue. Consequences threaded through the doc: - Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the out-of-scope surface). - Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP, module/-scoped zero for AWS. - Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's scope decision is respected, not contested. - Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check + sessionName extraction sit between declared-input recording and the SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale line-18 backend-list comment), M2 (internal/legacyaws noted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups Adversarial review cycle 11: PASS (zero Critical, zero Important). Two Minor nits applied: - audit-cloud-symbols.sh: real_import now also matches single-line `import "..."` form, not just parenthesized blocks — closes the one latent parser false-negative the reviewer flagged. - §Goals: clarified that the module/-scoped AWS-zero `--check` assertion is deferred-implementation added in Phase C (the committed script only enforces the cloud_account_aws_creds.go post-Phase-B invariant today), parallel to the Phase 0 init()-partition deferral. Design phase complete — proceeding to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe The cycle-11 single-line-import hardening added an inner `grep -E '^import "'` whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`, making real_import() return false for every file lacking a single-line import. Added `|| true` on the inner grep. Verified: full report restored, all REAL/comment-only classifications correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A) Bite-sized TDD plan for the first executable increment: Phase 0 (split platform_kubernetes_kind.go, fix the stale comment, wire the audit script into CI) + Phase A (IaCStateBackend proto + benchmark-gated proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging guard, workflow-plugin-azure implementation, core deletion dropping azure-sdk from go.mod). 14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on plan — their concrete tasks depend on Phase A's outputs (the benchmark-validated proto shape, the host-resolution pattern, the plugin-side serve path), so planning them now would be fiction. The design doc remains the authoritative B/C/D spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor). All addressed: - C1 (Critical): Task 4's proto used google.protobuf.Struct, which iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config as `bytes outputs_json`/`bytes config_json` (the established ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not structpb. Removed the bogus struct.proto import step. - I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives there), not `cd plugin/external/proto`. - I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up the new benchmarks automatically) — no redundant harness; clarified the task is a one-time decision gate. - I3: Task 8's embedded research spike resolved at plan time — engine.go was read; integration is the design-sanctioned package-level module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal. Tasks 8/13/14 now have concrete file sets. - I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo, workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly. - M1: Task 5's benchmark file is now genuinely self-contained (local benchStateToProto + benchStateBackendServer; no forward references). - M2: Task 3 names ci.yml directly, places the audit job beside the existing godo-banned/aws-sdk-banned grep-gate jobs. - M3: Task 6 pins benchstat (go install + bare invocation). - M4: Task 9 states the redaction gap is verified against step_output_redactor.go:7-19, not a live deduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed resolved; 2 new Important + 2 Minor surfaced by the new-defect scan. All addressed: - I-A (Important): Task 9's redaction test was inconsistent with the actual redactMap behavior — a key named `credentials` matches the existing `credential` pattern and is wholesale-replaced with the placeholder STRING before any recursion, so the test's `.(map[string]any)` assertion panicked. Reworked Task 9: the `credentials:` block is ALREADY redacted wholesale (regression-tested); the real gap is `credentials_ref` being over-redacted (it's a module name, not a secret) — fix is a narrow `*_ref`-suffix exemption in isSensitiveField, not camelCase leaf patterns (which would be dead code given wholesale redaction happens first). - I-B (Important): Task 14's engine.go integration seam was under-specified and would fight loadPluginInternal's no-concrete-types precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now defines an `IaCStateBackendProvider` optional interface and type-asserts it in loadPluginInternal exactly like the existing stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter implements it. Concrete file set + code sketch added. - M-i: Task 6's benchmark.yml description corrected (runs `go test -bench=.` inline, not `make bench-baseline`). - M-ii: Task 4 notes the proto README's plugin.proto-specific wording is stale; trust root buf.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups Plan-phase adversarial review cycle 3: PASS (zero Critical, zero Important). Two Minor doc-tightening fixes applied: - Task 9 Step 4 now names bearer_token_ref explicitly and explains why the *_ref exemption is safe for it (SecretRef is a reference struct, not a raw secret) rather than claiming no *_ref field exists. - engine.go line citations corrected to 311-326. Plan phase complete — proceeding to alignment-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: lock scope for cloud-sdk-extraction (alignment passed) * refactor(module): split platform_kubernetes_kind.go into _core + _gke Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/ aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core init(); gkeBackend (the only SDK-bearing k8s backend) moves to platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same five backend names registered. Isolates the lone SDK-bearing platform file for a later clean deletion. * docs(module): add file-purpose headers to platform_kubernetes _core/_gke Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition self-documenting for readers without the commit message. * docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk consumer. The stale comment is what fooled an earlier inventory pass into mis-counting platform_kubernetes_kind.go as an azure-sdk importer. * ci(audit): enforce k8s-backend init() partition + run audit on every PR Extends audit-cloud-symbols.sh --check with an init()-partition assertion (platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned / aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced artifact rather than a prose claim. * docs(plans): IaCStateBackend transport benchmark result — decision pending Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar. Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB map[string]any (inherent to the bytes outputs_json wire format the iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB message cap. The plan's contingency remedy (streaming redesign) addresses message-size-cap + memory-buffering, neither of which the benchmark hits; streaming would not move the number. Recommendation: retain unary (6.5ms is still negligible vs real cloud backend I/O — the design's own bar-rationale). Deviation from the literal 5ms estimate-bar is surfaced to the operator, not absorbed silently. Scope lock intact: Task 6 run + recorded, no task added/dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed) Operator reviewed the 6.51ms benchmark + root-cause analysis and confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's gate resolves Unary LOCKED — the Task 4 proto stands, no streaming redesign, PR 2/3 proceed unchanged. Operator additionally raised a long-term architectural item: IaC state is persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR) with JSON-export + content-detection-on-read would be better for processing/type-correctness/large-state scaling. Logged as a post-extraction follow-up in both the benchmark decision record and the design doc's Open items — distinct from the wire contract, cross-cutting across all IaCStateStore impls, needs its own brainstorming pass. Not actioned in this locked plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)" This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954. * docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4 Operator-approved scope amendment to the (reverted-to-Draft) plan: - ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 / Task 15. Task 7 had to hardcode context.Background() in grpcIaCStateStore; the operator directed widening the interface now while we're at that boundary, so Phase B/C/D plugin backends inherit it ctx-ful. Bounded blast radius (~9 files, all in module/); interfaces.IaCStateStore already had ctx and is untouched. - ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo. Operator: agents should operate in plugin repos directly; the real requirement is prompt clarity (absolute repo path stated up front), not a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes updated accordingly. - Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note updated to RESOLVED (unary locked). Next: re-run alignment-check on the amended plan, then re-lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed) * feat(proto): add IaCStateBackend service to iac.proto Strict 6-method contract mirroring module.IaCStateStore 1:1, with an IaCState message mirroring module.IaCState. Free-form Outputs/Config maps cross the wire as bytes outputs_json/config_json per the iac.proto hard invariant (NO google.protobuf.Struct) — same pattern as ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated bindings via buf. * test(module): add IaCStateBackend gRPC-vs-in-process benchmark harness Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock both in-process (baseline) and over a real bufconn gRPC boundary (post-extraction path). Self-contained (local benchStateToProto + benchStateBackendServer; Task 7 promotes production versions). Feeds the unary-vs-streaming proto-transport decision in the next task. * test(wftest): add IaCStateBackend to iacServiceChecks coverage table Task 4 added the IaCStateBackend service to iac.proto but missed the corresponding iacServiceChecks row in wftest/bdd/strict_iac.go. TestIaCServiceChecks_CoversEveryProtoService enforces parity between iac.proto's services and that table — it was failing on the missing entry. Belongs with PR 2 (the proto PR). * feat(module): IaCState proto converters + grpcIaCStateStore client adapter grpcIaCStateStore implements module.IaCStateStore over an IaCStateBackendClient — the host-side half of the new contract. iacStateToProto/iacStateFromProto convert the free-form Outputs/Config maps via encoding/json (no structpb — iac.proto hard invariant). iacStateBackendServer is the production server type. Promotes these out of the benchmark file so one canonical copy is shared. * docs(module): note context.Background() follow-up on grpcIaCStateStore Code-review Minor: the spec asked for the hardcoded context.Background() to be acknowledged as a known follow-up (IaCStateStore has no ctx param) rather than silently used. * feat(module): engine-side iac.state plugin-backend registry + dispatch A package-level iacStateBackendRegistry maps a backend name to a pb.IaCStateBackendClient; the engine populates it at plugin-load time (Task 14). IaCModule.Init()'s switch gains a default arm that resolves non-core backend names from the registry, constructing a grpcIaCStateStore. Reserved core names (memory/filesystem/postgres) are rejected at registration. The existing in-process backend cases (incl. azure_blob) are untouched here — the plumbing exists and is tested; PR 5 flips azure_blob onto it. * feat(module): exempt *_ref keys from redaction; lock in credentials: redaction Option-1 credentials move raw cloud secrets inline into plugin-native module config under a credentials: key — already redacted wholesale by the existing 'credential' pattern (regression test added). But that same pattern over-redacts credentials_ref:, which holds a module NAME, not a secret. Adds a narrow *_ref-suffix exemption to isSensitiveField so reference keys are preserved for trace debuggability. * refactor(module): name the _ref redaction-exemption suffix as a const Code-review Minor: refFieldSuffix const for consistency with the existing safeFieldSuffix (_display) exemption. * test(plugin/external): guard against gRPC body-logging interceptors CreateModule requests carry inline credentials: blocks (Option-1 credentials model). This guard fails CI if any plugin/external/ file gains a gRPC interceptor option, forcing a reviewer to confirm it cannot log request bodies. Implements the cloud-sdk-extraction design's Security guard-test requirement. * test(plugin/external): broaden interceptor guard to Stream interceptors Code-review catch: the guard regex covered Unary only. CreateModule is unary today, but a future streaming RPC carrying credentials must not slip a stream interceptor past the guard. Now matches (Unary|Stream). * feat(module)!: add ctx context.Context to IaCStateStore (operator amendment) Widens module.IaCStateStore's 6 methods with a leading ctx parameter so grpcIaCStateStore plumbs the caller's real context (was context.Background()) and iacStateBackendServer forwards its gRPC ctx into the store. The 6 in-process backends accept ctx; postgres/spaces/ gcs/azure use it for their SDK/DB calls. pipeline_step_iac.go callers pass the step context. Operator-approved scope amendment — see decisions/0033. The separate interfaces.IaCStateStore already had ctx and is untouched. Caller inventory note: cmd/wfctl/infra_state_store.go (the wfctl wrapper around the concrete Spaces/Postgres stores) was also updated — a mechanical consequence of the widening, beyond the plan's Files list; its wrapper methods already carried a ctx. Rollback: revert this commit — mechanical signature-only widening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): propagate context errors from state lookups instead of swallowing Addresses Copilot review on PR #671 — now that IaCStateStore carries the caller's ctx, silently dropping its errors lets a canceled step proceed against a stale state view: - pipeline_step_iac.go: iac_plan/iac_apply/iac_destroy now go through a lookupExistingState helper that returns any GetState error (including ctx cancellation/deadline) so the step aborts. - iac_state_gcs.go / iac_state_azure.go: ListStates aborts on a context-cancellation error mid-iteration rather than returning partial results; genuinely unreadable objects/blobs are still skipped. - iac_state.go: ListStates doc clarifies nil filter == "no filter" to match actual call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
intel352
added a commit
that referenced
this pull request
May 19, 2026
- Replace false CHANGELOG false-positive narrative in 'What Went Well #4' with accurate description of actual spec gaps: banner hyperlink format, missing .github/ templates, outdated engine version pins, example module mismatches, undocumented GH_TOKEN requirement - Replace false CHANGELOG narrative in 'What Didn't #2' with accurate list of surface-area issues found in Tasks 11-13 and how rework was prospectively applied Fixes spec-reviewer feedback issues 5-6 on PR #719.
intel352
added a commit
that referenced
this pull request
May 19, 2026
* docs(retro): multi-repo OSS-readiness QoL sweep (2026-05-19) Closes the loop on the cross-repo doc + license + experimental-marker sweep authored at workflow#714. Records what went well, what didn't, and follow-up tracking issues (workflow-registry#717) for registry-manifest creation for 11 P2 plugins and archived-repo notation. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * docs(retro): fix Tasks 11-13 issue descriptions - Replace false CHANGELOG false-positive narrative in 'What Went Well #4' with accurate description of actual spec gaps: banner hyperlink format, missing .github/ templates, outdated engine version pins, example module mismatches, undocumented GH_TOKEN requirement - Replace false CHANGELOG narrative in 'What Didn't #2' with accurate list of surface-area issues found in Tasks 11-13 and how rework was prospectively applied Fixes spec-reviewer feedback issues 5-6 on PR #719. --------- Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 31, 2026
…#805) * docs(cigen): design (adversarial PASS) + implementation plan for #3 per-phase secret scoping + #4 migration flags Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(cigen): revise plan per adversarial C1/C2/C3 + Important - C1/C2: new tests in internal package cigen files (analyze_phase_test.go, render_gha_phase_test.go) — unexported funcs reachable, own config/strings imports - C3: Task 5 regen uses REAL ci plan/generate flags (--out, --write; no --stdout/--format/--output-dir) per GAP.md recipe - Important: add on-disk golden test (multisite_evidence_test.go) locking the committed evidence - note pre-existing render tests survive (Contains-asserts) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(cigen): plan PASS adversarial cycle-2 (fix Task 2/3 git-add paths to *_phase_test.go) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: lock scope for cigen fidelity (alignment passed) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(cigen): add DeployPhase.Secrets + Scoped for per-phase scoping * feat(cigen): scope per-phase secrets in Analyze; derive single-env migrations --env * feat(cigen): per-phase env block + wfctl migrations up --format json * test(cigen): regen multisite evidence (scoped prereq env + migrations --format json) + on-disk golden test; honest GAP.md --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thanks for assigning this issue to me. I'm starting to work on it and will keep this PR's description up to date as I form a plan and make progress.
Original issue description:
Fixes #3.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.