Skip to content

[WIP] Upgrade Modular#4

Closed
Copilot wants to merge 2 commits into
mainfrom
copilot/fix-3
Closed

[WIP] Upgrade Modular#4
Copilot wants to merge 2 commits into
mainfrom
copilot/fix-3

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jul 11, 2025

Thanks for assigning this issue to me. I'm starting to work on it and will keep this PR's description up to date as I form a plan and make progress.

Original issue description:

Modular has upgraded to v1.3.9, and all of it's modules have had some updates as well. These updates have had some impactful changes, for which we need to ensure Workflow Engine is ready to work with the latest Modular changes.

  1. Ensure all tests are functioning, enhance test functionality where necessary.
  2. Find each go.mod file in the Workflow repository, upgrade Modular to v1.3.9, upgrade all Modular modules to their latest version as well.
  3. Update Workflow modules and logic to be compatible with the interfaces defined in Modular, update configurations as well.
  4. Ensure all tests pass, fix any errors that occur.
  5. Ensure linter checks all pass, fix any errors that are identified.
  6. Make sure all examples are functional and do what they claim to do. If they're erroring out, apply whatever fixes necessary.
  7. Generate Github workflows to help validate the functionality and stability of Workflow Engine. Github workflows in the Modular library could be a useful reference point for CI workflows as well as release workflows.

Fixes #3.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@intel352
Copy link
Copy Markdown
Contributor

@copilot Try again

@intel352 intel352 closed this Jul 11, 2025
@intel352 intel352 deleted the copilot/fix-3 branch July 11, 2025 20:00
intel352 added a commit that referenced this pull request May 2, 2026
Important #1 — pre-scan all ghosts for protected resources before any
state mutation (infra_apply_refresh.go). The original loop could prune
an unprotected ghost then fail on a protected one, leaving partial state.
Two-pass pattern: collect all blocked names first, return error listing
every blocked resource, then execute mutations only when pre-scan passes.

Important #2 — validate --allow-protected-prune requires --refresh
(infra.go). Without this check the flag was silently no-op'd, misleading
operators. Now returns a clear pre-flight error before any work begins.

Minor #3 — replace broken docs/plans/2026-05-02-infra-drift-recovery.md
link in drift-recovery.md (design worktree path, never merged) with a
pointer to the canonical source file.

Minor #4 — markdown table was already correct standard format; no change
needed (table separator rows are standard |---|---|).

Tests added:
- TestApplyRefresh_MultipleGhostsAllOrNothing (all-or-nothing invariant)
- TestApplyRefresh_AllGhostsUnprotectedPrunesAll (pre-scan allows clean batch)
- TestInfraApply_AllowProtectedPruneRequiresRefresh (flag validation)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 2, 2026
…ction state recovery (#519)

* feat(interfaces): add DriftClass enum + Class field to DriftResult

Add DriftClass string type with 4 constants:
- DriftClassUnknown (zero value, omitempty-safe for backwards compat)
- DriftClassInSync
- DriftClassGhost (state has resource; cloud returns ErrResourceNotFound)
- DriftClassConfig (both exist; configs differ)

Extend DriftResult with Class DriftClass json:"class,omitempty" field
(additive, backwards-compatible — consumers without the field see no
JSON change due to omitempty).

4 tests covering constant values, omitempty-on-zero, ghost JSON
rendering, and round-trip marshal/unmarshal for all 3 non-zero classes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(wfctl): implement runInfraApplyRefreshPhase for ghost-prune recovery

New function runInfraApplyRefreshPhase calls provider.DetectDrift and
prunes ghost-in-state entries (DriftClassGhost) from the state store:

- Dry-run by default (no autoApprove): prints "would prune" per ghost
- autoApprove=true: calls store.DeleteResource + emits audit log to stderr
- Protected resources blocked unless allowProtectedPrune=true
- Transient DetectDrift errors propagate immediately; no pruning happens
- DriftClassConfig / DriftClassInSync entries skipped (regular plan path)

6 tests covering: dry-run no-mutate, auto-approve prune, protected-block,
protected-with-flag, transient-error-propagation, in-sync-skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(wfctl): wire --refresh + --allow-protected-prune flags to infra apply

Add two flags to runInfraApply:
- --refresh: runs runInfraApplyRefreshPhase before plan+apply, iterating
  all state-tracked provider groups via groupStatesByProvider and pruning
  any DriftClassGhost entries.
- --allow-protected-prune: passed to runInfraApplyRefreshPhase to permit
  pruning resources with protected:true in state Outputs.

Refresh phase only fires when --refresh is set and the config has infra.*
modules; silently skipped for legacy platform.* configs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(wfctl): extend infra drift output with Class column

driftInfraModules now prints drift class (GHOST / CONFIG / IN-SYNC)
using the DriftClass constants from interfaces:

  GHOST    <name>   <type>   — cloud reports not found
  CONFIG   <name>   <type>
    <field>: expected=<v>  actual=<v>
  IN-SYNC  <name>   <type>

Providers still returning DriftClassUnknown fall through to the legacy
Drifted-bool behavior for backwards compatibility.

Column-aligned format matches wfctl infra status output style.
Drift-found message updated to suggest --refresh flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add drift-recovery operator guide + CHANGELOG Unreleased entry

docs/wfctl/drift-recovery.md (~100 lines) covering:
- Three drift classes (ghost / config / in-sync) with recovery actions
- wfctl infra drift usage + example output with Class column
- Dry-run-first workflow → auto-approve prune
- Protected resource two-key contract (--allow-protected-prune)
- Audit log format
- Production safety checklist
- CI integration patterns

CHANGELOG.md Unreleased section: DriftClass enum, --refresh flag,
--allow-protected-prune flag, drift output Class column, docs file.
Notes omitempty additions to DriftResult.Expected/Actual/Fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(wfctl): address 4 Copilot review findings on apply --refresh

Important #1 — pre-scan all ghosts for protected resources before any
state mutation (infra_apply_refresh.go). The original loop could prune
an unprotected ghost then fail on a protected one, leaving partial state.
Two-pass pattern: collect all blocked names first, return error listing
every blocked resource, then execute mutations only when pre-scan passes.

Important #2 — validate --allow-protected-prune requires --refresh
(infra.go). Without this check the flag was silently no-op'd, misleading
operators. Now returns a clear pre-flight error before any work begins.

Minor #3 — replace broken docs/plans/2026-05-02-infra-drift-recovery.md
link in drift-recovery.md (design worktree path, never merged) with a
pointer to the canonical source file.

Minor #4 — markdown table was already correct standard format; no change
needed (table separator rows are standard |---|---|).

Tests added:
- TestApplyRefresh_MultipleGhostsAllOrNothing (all-or-nothing invariant)
- TestApplyRefresh_AllGhostsUnprotectedPrunesAll (pre-scan allows clean batch)
- TestInfraApply_AllowProtectedPruneRequiresRefresh (flag validation)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
…rkflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.
intel352 added a commit that referenced this pull request May 4, 2026
…ft postcondition + diff cache (W-3a of 12) (#527)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* fix(iac): diffcache.Get refreshes mtime so LRU is actually LRU (Copilot review)

Without this, frequently-read entries were evicted as if unused
because maybeEvict orders by mtime. Now Get touches mtime via
os.Chtimes(now, now), turning eviction from FIFO-by-write into
true LRU. Mtime-touch chosen over a sidecar last-accessed file
to keep the on-disk shape trivial; cost is one extra syscall per
hit, errors are ignored (failure degrades eviction precision but
never produces wrong cache results).

Adds TestCache_LRURefreshesOnGet regression test: writes N entries,
Gets the oldest, then triggers over-cap, asserts the oldest survives
and the second-oldest (now the LRU) is evicted instead.

* fix(iac): diffcache.Put uses unique temp filename to avoid same-key write races (Copilot review)

Pre-fix, two goroutines calling Put with the same Key both wrote to
`<key>.json.tmp` and one would clobber the other's temp file
mid-write, producing either a Rename failure or a half-written
final file. Now Put uses os.CreateTemp so each call gets a unique
`<key>.json.<random>.tmp` filename; the final rename is racy on
which payload wins, but both payloads were derived from the same
Key so the outcome is deterministic from the caller's perspective.

Adds godoc "Concurrency: safe for concurrent use, including
concurrent Puts of the same Key." Adds TestCache_ConcurrentSameKeyPut
regression: 20 goroutines Put the same Key, asserts no leftover
*.tmp files, asserts final cache file decodes. Run under -race.

* fix(iac): diffcache.Put atomic rename on Windows (Copilot review)

Document the os.Rename Windows limitation explicitly: on Windows,
os.Rename fails when the destination exists, so an in-place cache
update via Put will fail. The caller treats this as a write failure
and proceeds without caching — correct because apply remains correct
on a 100% miss rate (per the package's cache-as-amortization framing).

We chose documentation over vendoring github.com/google/renameio:
adding renameio would introduce the first such dependency in the
repo, and there is no Windows-supported wfctl use case today. The
existing precedent in cmd/wfctl/update.go and cmd/wfctl/plugin_install.go
also uses bare os.Rename without Windows guards.

The fix tracks the limitation in two places: the Put godoc (where
the rename happens) and the package godoc Known Limitations section
(where consumers will look).

* fix(iac): diffcache returns deep-copy of DiffResult to avoid shared-slice mutation (Copilot review)

Pre-fix, the in-memory cache stored DiffResult by value but the
Changes slice ([]FieldChange) shared its backing array between the
cached entry and the value returned to the caller. A caller
mutating the returned Changes slice (element-level or via append-
into-cap) would silently mutate the cached entry. The symmetric
case is the same: mutating the Put argument after the Put call
would leak into the cached value.

Fix: clone the Changes slice via slices.Clone in both Get and Put.
Scalar struct fields are value-copied by struct assignment so a
single helper (cloneDiffResult) covers both directions. The
filesystem cache deserializes from JSON each time so each Get
already yields a fresh slice — no change needed there.

FieldChange.Old/New are typed any; if a caller stores a pointer or
mutable map there, the deep-copy stops at the slice level. By
convention DiffResult.Changes carries scalar Old/New (strings,
numbers, bools), so that is the right tradeoff between correctness
and copy cost. Documented in memoryCache godoc.

Adds TestCache_MemoryDeepCopiesChanges regression: Put a value,
mutate the original argument, Get + mutate (element + append), Get
again, assert original is preserved.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
…s on manifest computePlanVersion (W-3b of 12) (#528)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* fix(iac): T3.6e env-var hygiene — TestMain unsets WFCTL_PLAN_DIFF_CONCURRENCY (Copilot review)

A developer shell with WFCTL_PLAN_DIFF_CONCURRENCY=1 (or any other
non-default value) would serialize ComputePlan's parallel Diff dispatch
and break the parallelism assertions in differ tests. Explicitly unset
the var in TestMain alongside the existing WFCTL_DIFFCACHE=disabled
hygiene so test runs are deterministic regardless of shell environment.

Addresses Copilot inline comment on PR #528 (platform/main_test.go:24).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.6 polish — drop double error: prefix + reuse precomputed configHash (Copilot review round 2)

Two real fixes from Copilot's re-review of PR #528:

1. **Double "error:" prefix on plugin-load failure** — cmd/wfctl/main.go's
   top-level printer already emits "error: %v" on command failure. The
   T3.6b error string in cmd/wfctl/infra_plan_provider.go was prefixed
   with a literal "error: " of its own, producing operator output like
   `error: error: failed to load plugin "do": ...`. Drop the in-error
   prefix; update the assertion in infra_plan_provider_load_test.go to
   match the unprefixed root error; clarify in the CHANGELOG that the
   "error:" prefix in the rendered string is added by wfctl's top-level
   printer (not the underlying error).

2. **Duplicate configHash work in classifyModification** — ComputePlan
   already computes `hash := configHash(spec.Config)` while bucketing
   create vs modification candidates; classifyModification was
   re-computing the same hash on every Diff dispatch. Thread the
   precomputed hash through via a new `hash string` field on
   modCandidate + new parameter on classifyModification, so the per-
   candidate hashing happens exactly once.

Addresses Copilot inline comments on PR #528 (round 2):
- cmd/wfctl/infra_plan_provider.go:121
- platform/differ.go:104

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./interfaces/... ./iac/... ./plugin/sdk/... ./module/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.6/T3.9 polish — diff-cache bypass on empty ProviderID + omit empty current_sensitive arg (Copilot review round 3)

Two real fixes from Copilot's re-review of PR #528 round 3:

1. **Diff-cache hash-collision risk on empty ProviderID** — The cache
   key shape (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs)
   does not include the resource Name. When two existing-state
   resources of the same Type both have ProviderID=="" (state-bootstrap,
   broken-plugin paths, transient races) and matching SHAConfig +
   SHAOutputs (e.g., both freshly-discovered with default-config and
   empty-outputs), they would share a cache key and could serve each
   other's cached DiffResult — misclassifying actions or skipping a
   required Diff. Defensive fix: classifyModification now skips both
   cache.Get and cache.Put when rs.ProviderID is empty, always re-
   dispatching to the driver. Cost is one extra Diff call per
   pre-bootstrap resource; benefit is correctness regardless of state
   completeness. New pin: TestComputePlan_EmptyProviderID_BypassesCache.

2. **`current_sensitive` arg serialized as null instead of omitted** —
   sensitiveToAny's docstring promises "trim-friendly" wire shape by
   returning nil for empty input, but the call site at
   remoteResourceDriver.Diff was unconditionally setting
   `args["current_sensitive"] = sensitiveToAny(...)`, which structpb
   serializes as a NullValue field rather than omitting the key.
   Conditionally include the key only when sensitiveToAny returns a
   non-nil map, matching the docstring intent. New pins:
   TestRemoteDriver_Diff_OmitsCurrentSensitiveWhenEmpty +
   TestRemoteDriver_Diff_IncludesCurrentSensitiveWhenPopulated.

Addresses Copilot inline comments on PR #528 (round 3):
- platform/differ.go:240 (cache key empty-ProviderID collision)
- cmd/wfctl/deploy_providers.go:542 (current_sensitive null vs omit)

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.6/T3.9 polish — preserve loadErr chain + lock-free diff cache + bypass-side-effect-free (Copilot review round 4)

Three real fixes from Copilot's re-review of PR #528 round 4:

1. **loadErr chain lost across runInfraPlan re-wrap (errors.Is/As)** —
   computePlanForInfraSpecs returned `failed to load plugin %q: %v;
   ...` (using %v), losing the underlying error. After runInfraPlan
   re-wraps with `compute plan: %w`, callers could not errors.Is /
   errors.As against the original loader failure (e.g. to differentiate
   "plugin binary missing" from "plugin crashed during handshake").
   Switch the inner wrap to %w. Rendered text is identical to %v.
   New pin: TestRunInfraPlan_FailsLoudOnPluginLoadFailure now asserts
   `errors.Is(err, loadErr)` reaches the sentinel through both wrap
   layers.

2. **getDiffCache called even on the empty-ProviderID bypass path** —
   classifyModification was calling getDiffCache() unconditionally,
   which (under the old per-call mutex) acquired the lock, and (under
   any backend-init pattern) would eagerly construct the filesystem
   cache backend at ~/.cache/wfctl/diff/ on the operator's machine
   even for resources that bypass the cache. Move the getDiffCache
   call inside the `if cacheable` branch so the bypass path is fully
   side-effect free. Round-3 already pinned the bypass behavior via
   TestComputePlan_EmptyProviderID_BypassesCache.

3. **Per-call sync.Mutex contention on getDiffCache hot path** —
   Under ComputePlan's parallel Diff fan-out (planDiffConcurrency()
   workers), the per-call mutex on getDiffCache was contention on
   every cache.Get / cache.Put, especially on cache hits where the
   Get itself is cheap. Refactor to sync.Once for one-time init +
   atomic.Pointer[diffcache.Cache] for lock-free reads. Subsequent
   reads are just an atomic.Load (and a typed deref). The test-swap
   helper setDiffCacheForTest is updated to Store/Restore directly
   on the atomic; cleanup seeds a fresh default when there was no
   prior value (so subsequent tests in the binary still observe a
   working cache).

Addresses Copilot inline comments on PR #528 (round 4):
- cmd/wfctl/infra_plan_provider.go:124 (%v → %w)
- platform/differ.go:235 (getDiffCache eager call on bypass path)
- platform/differ.go:405 (per-call mutex on hot path)

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... ./module/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.7/T3.6 polish — DispatchVersionFor centralizes type assertion + cache nil-DiffResult as zero-value (Copilot review round 5)

Two real fixes from Copilot's re-review of PR #528 round 5 (a third
finding, plan/apply discovery duplication, is filed as a follow-up
issue rather than addressed in-PR to keep W-3b scope-locked).

1. **DispatchVersionFor docstring vs signature mismatch** — The
   helper claims to centralize the type assertion + non-implementer
   defaulting, but its parameter type was `ComputePlanVersionDeclarer`,
   forcing every call site to type-assert externally. Change the
   signature to accept `any` and perform the type assertion inside;
   non-implementers + nil now both return "v1" inside the helper as
   the docstring already promised. Param is `any` (not
   interfaces.IaCProvider) to keep the helper package
   import-free of the engine's interfaces package and to keep
   non-engine call sites (tests, stubs) frictionless. Updated the
   only production call site (cmd/wfctl/infra_apply.go) to drop the
   external type-assert.

2. **Cache no-op when driver.Diff returns (nil, nil)** — The
   cache.Put was guarded by `fresh != nil`, so providers using the
   nil-as-no-op convention (a documented option in the
   (DiffResult|nil, error|nil) return shape) re-Diffed on every
   ComputePlan call — undermining the cache contract for that whole
   class of providers. Cache a zero-value DiffResult on (nil, nil)
   returns; classifyModification's downstream switch already treats
   zero-value the same as nil (no plan action), so the semantic is
   preserved while the cache stays effective. New pin:
   TestComputePlan_NilDiffResult_CachesAsZeroValue verifies that the
   second ComputePlan against unchanged inputs is served from cache
   (driver.Diff invoked exactly once across two calls).

3. **Plan/apply provider-discovery duplication** (Copilot finding R5-C,
   not addressed in this PR) — computePlanForInfraSpecs duplicates
   the iac.provider discovery + grouping logic in applyInfraModules.
   Per workspace memory feedback_implementer_scope_bleed, refactoring
   to a shared helper is a separate task: the duplication exists
   pre-W-3b (apply was the original; plan was added in W-3b mirroring
   it intentionally), and the extraction touches code paths W-3b's
   test plan does not cover. Filed as follow-up rather than expanding
   W-3b's blast radius. Documented in PR description.

Addresses Copilot inline comments on PR #528 (round 5):
- iac/wfctlhelpers/dispatch.go:41 (signature vs docstring mismatch)
- platform/differ.go:265 (cache write skipped on (nil, nil))

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(iac): T3.7 — correct DispatchVersionFor + findIaCPluginDir doc claims (Copilot review round 6)

Two doc-comment accuracy fixes from Copilot's re-review of PR #528
round 6 — both surfaced by/exposed in the round-5 changes:

1. **findIaCPluginDir docstring referenced wrong helper** — Round 5
   changed wfctlhelpers.DispatchVersionFor to take `any` (a provider
   value), but findIaCPluginDir's docstring still told callers to pass
   the returned `computePlanVersion` string through DispatchVersionFor.
   That call wouldn't type-assert to ComputePlanVersionDeclarer (a
   string isn't a provider) and would silently default to "v1".
   Replaced with the correct pattern: string-equality against
   wfctlhelpers.DispatchVersionV2 at this loader-level seam where only
   the raw string is in hand. Includes example snippet.

2. **DispatchVersionFor docstring overstated the validation
   guarantee** — Claimed plugin/sdk.ParseManifest schema-validation
   means the dispatch only sees {"v1", "v2", ""}. True for callers
   that load via ParseManifest, but cmd/wfctl/deploy_providers.go's
   findIaCPluginDir / readIaCPluginComputePlanVersion path uses a
   minimal json.Unmarshal with NO schema validation — so unknown
   values CAN reach DispatchVersionFor at runtime. Updated the
   docstring to flag this honestly and call out that the default-to-v1
   behavior is the safety net for those paths (callers must not rely
   on the validation guarantee).

Doc-only; no code change. All packages still build + vet cleanly.

Addresses Copilot inline comments on PR #528 (round 6):
- cmd/wfctl/deploy_providers.go:107 (wrong helper referenced)
- iac/wfctlhelpers/dispatch.go:18 (overstated validation guarantee)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(iac): T3.5 — TestParseConcurrencyEnv subtest names (Copilot review round 7)

The first table case had `in: ""` and used `tc.in` directly as the
t.Run subtest name. Go's testing package silently rewrites empty
subtest names to "#00", which is unique enough to run but masks the
case identity in -v output and failure reports. Add a `name` field
to the table struct and use stable descriptive labels (empty,
non_numeric, negative, zero, one, eight, thirty_two,
thirty_three_clamped_to_max, one_hundred_clamped_to_max) while still
passing the raw `tc.in` to parseConcurrencyEnv. Identical test
coverage; clearer reporting.

Addresses Copilot inline comment on PR #528 (round 7):
- platform/differ_cache_test.go:253

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(iac): T3.5/T3.6 — clamp + in-flight counter doc accuracy (Copilot review round 8)

Two doc-only nits surfaced in Copilot's round-8 re-review of PR #528.
Both are accuracy fixes — no behaviour change.

1. **planDiffConcurrencyMin/Max comment overstated "disable"** —
   The comment said "Below 1 disables concurrency (worse than serial)",
   but parseConcurrencyEnv clamps values <=0 UP to planDiffConcurrencyMin
   (=1), which produces effectively-serial dispatch (one Diff in flight),
   not "disabled". Operators cannot turn the worker pool off, only narrow
   it to one. Updated the comment to spell that out and call out both
   clamp directions explicitly.

2. **channelGatedDriver.inFlight docstring claimed "peak"** — The
   docstring said inFlight tracks the *peak* number of simultaneous
   Diff goroutines, but…
intel352 added a commit that referenced this pull request May 4, 2026
* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): add ProviderValidator optional interface + PlanDiagnostic type

Adds an OPTIONAL `interfaces.ProviderValidator` interface that an IaCProvider
implementation MAY also satisfy to expose provider-side cross-resource
constraint validation at plan time:

    type ProviderValidator interface {
        ValidatePlan(plan *IaCPlan) []PlanDiagnostic
    }

Plus the supporting `PlanDiagnostic` type and `PlanDiagnosticSeverity` enum
(Info/Warning/Error). Consumers (e.g. the R-A10 align rule landing in the
next commit) discover ValidatePlan via type-assertion, so providers that do
not implement it keep working unchanged — purely additive.

Naming note: plan T4.1 originally proposed `Diagnostic` for this type, but
`interfaces.Diagnostic` is already taken by the unrelated Troubleshooter
runtime-event finding (`iac_resource_driver.go`). Renamed to PlanDiagnostic
to preserve W-4's pure-additive contract; the existing Troubleshooter type
is untouched.

TDD via interfaces/iac_provider_test.go covering severity-constant ordering,
PlanDiagnostic field shape, and type-assertion against both an implementor
and a non-implementor (confirms the interface remains optional).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): R-A10 align rule — provider.ValidatePlan dispatch

Adds R-A10, the align rule that surfaces provider-side cross-resource
constraint diagnostics at plan time. Wiring:

  cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan
      Iterates providers, type-asserts ProviderValidator, calls
      ValidatePlan(plan), maps each PlanDiagnostic to an AlignFinding.
      Severity mapping: Error→FAIL, Warning→WARN, Info→WARN (advisory;
      align has no INFO tier today). Resource label falls back to
      "<provider-name>:plan" for plan-level findings; field path is
      appended to the message when present.

  cmd/wfctl/infra_align.go::runInfraAlignChecks
      Dispatches R-A10 only when --plan is provided (R-A7 predicate parity).
      Loads providers via the new alignLoadProviders test seam — the
      default implementation enumerates iac.provider modules in the YAML
      and loads each through the existing resolveIaCProvider plugin path.
      Closers are released after the rule runs; a per-provider load failure
      logs a stderr warning and continues so other R-A* findings are not
      hidden.

TDD via cmd/wfctl/infra_align_ra10_test.go covers nil-plan, no-providers,
non-validating-provider-skipped, Error→FAIL, Warning→WARN, Info→WARN,
plan-level resource fallback, and multi-provider mixed-implementation
cases. Two integration tests exercise dispatch through the seam: one
asserts R-A10 fires under --strict and produces non-zero exit; the other
asserts the rule (and the loader) is silent without --plan.

Pure-additive: providers that do not implement ProviderValidator are
skipped, so this commit changes no existing align behaviour.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(iac): document ProviderValidator + R-A10 align rule

Adds the documentation pieces for W-4:

- DOCUMENTATION.md gains a new top-level "IaC Provider Plugin Interfaces"
  section that documents the optional interfaces.ProviderValidator
  interface, the PlanDiagnostic/PlanDiagnosticSeverity types, the
  ValidatePlan contract (read-only, no remote calls), the R-A10 consumer
  and its severity mapping, and the naming-distinction note vs. the
  pre-existing interfaces.Diagnostic (Troubleshooter) type.

- docs/WFCTL.md adds an `infra align` subsection under the existing
  `infra` command. It lists every R-A* rule (R-A1 through R-A10 with
  severities), the flag table, the R-A10 severity-mapping submatrix,
  and example invocations covering both plan-less and --plan/--strict
  modes.

- cmd/wfctl/dsl-reference-embedded.md (the source for `wfctl
  dsl-reference`) gains the R-A9 and R-A10 rows in the rule-families
  table and a short paragraph on R-A10's behaviour. The `--plan`
  description is updated to enable both R-A7 and R-A10.

Pure docs change; no code touched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(iac): T4.5 verification — `--plan` help text mentions R-A10

T4.5 verification surfaced one cosmetic gap: the `--plan` flag's help
description still read "enables R-A7 checks" after T4.2 added R-A10 as a
second `--plan`-gated rule. Updated to "enables R-A7 and R-A10 checks" so
`wfctl infra align --help` reflects current behaviour.

Verification steps (no further code change required):

- `GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \
   ./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/...` → all PASS.
- `go build ./cmd/wfctl` → builds clean.
- `wfctl infra align --help` → shows existing flags plus the corrected
  `--plan` description.
- Fixture-provider smoke (TestInfraAlign_RA10_FixtureProvider_Fires) wires
  a ProviderValidator returning a fatal diagnostic through the
  alignLoadProviders seam → R-A10 finding emitted, FAIL severity, non-zero
  exit under `--strict`. This satisfies T4.5 Step 3 manual rule-trigger
  smoke without needing a real plugin subprocess.
- `go vet ./interfaces/... ./cmd/wfctl/... ./iac/...` → clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(iac): T4.2/T4.4 review — Info diagnostics log, no finding

Spec-reviewer flagged that the rev10 plan T4.2 acceptance criteria specify
a three-tier severity mapping ("Errors → align failures; Warnings →
warnings; Info → logs"), and that the previous commit (76c4160) collapsed
Info into WARN. The collapse meant `wfctl infra align --strict` could exit
non-zero on a purely informational provider hint — the exact scenario the
Info tier exists to prevent (e.g. billing-tier change notices, deprecation
hints) — defeating the tier's contract.

Code (cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan):

  Severity switch reworked to three explicit cases plus a conservative
  default. PlanDiagnosticInfo now writes to a new package-level sink
  `ra10LogInfo` (stderr by default; overridable for tests) and emits NO
  AlignFinding, so it never affects exit code under any flag combination.
  PlanDiagnosticError → FAIL and PlanDiagnosticWarning → WARN are unchanged.
  Unknown future severities fall back to WARN so they cannot slip past
  --strict undetected.

  Doc-comment rewritten to spell out the three-tier mapping and the
  motivating "Info must not break --strict CI" rule.

Test (cmd/wfctl/infra_align_ra10_test.go):

  TestCheckRA10_InfoDiagnostic_BecomesWARN renamed/rewritten as
  TestCheckRA10_InfoDiagnostic_LogsAndEmitsNoFinding. Asserts:
  - len(findings) == 0
  - the captured log line carries the rule tag, [info] severity marker,
    "<provider>/<resource>" identifier, the diagnostic message, and the
    "field: <name>" suffix
  - alignExitCode(findings, strict=true) == 0 (the load-bearing guarantee)

Docs (DOCUMENTATION.md, docs/WFCTL.md):

  Both severity-mapping summaries replaced with a three-row table
  (Error → FAIL finding, Warning → WARN finding, Info → stderr log/no
  finding/no exit-code effect). Prose surrounding the table now
  explicitly calls out the strict-CI safety guarantee.

Verification:

- GOWORK=off go test -race -count=1 ./interfaces/... ./iac/...
  ./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/... → all PASS.
- markdown-link-check on the three modified docs → 0 dead links.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(iac): T4.4 review — embedded reference Info-tier mapping

Spec-reviewer caught one stale doc site missed in commit 9c41c1d:
`cmd/wfctl/dsl-reference-embedded.md:1358-1359` (the source for `wfctl
dsl-reference`) still claimed `PlanDiagnosticInfo` produced a WARN
AlignFinding. Replaced with the full three-tier prose so `wfctl
dsl-reference` callers see the corrected mapping:

  - PlanDiagnosticError   → FAIL AlignFinding (always non-zero exit)
  - PlanDiagnosticWarning → WARN AlignFinding (non-zero only under --strict)
  - PlanDiagnosticInfo    → stderr log "R-A10 [info] <provider>/<resource>:
                            <message>"; no AlignFinding so --strict CI
                            gates never fail on informational hints

The R-A10 row in the table at :1354 ("FAIL or WARN") is unchanged — Info
no longer produces a finding so the existing severity range still
exhaustively covers the possible AlignFinding severities.

Verification:
- `markdown-link-check cmd/wfctl/dsl-reference-embedded.md` → 0 dead links.
- `GOWORK=off go test -race -count=1 ./cmd/wfctl/...` → PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(iac): R1 review — load plan + cfg once; clean R-A10 Info log fmt; clarify PlanDiagnosticSeverity doc (Copilot review)

- runInfraAlignChecks loads --plan once and reuses the parsed *IaCPlan
  for R-A7 and R-A10 (was: 2x file open + JSON decode).
- alignLoadProviders now takes *alignContext (built once via
  buildAlignContext in runInfraAlignChecks) instead of re-loading the
  YAML from disk. Test seam updated.
- R-A10 Info log identifies plan-level diagnostics as `<provider>/plan`
  (matches the documented `R-A10 [info] <provider>/<resource>: ...`
  format) instead of the redundant `<provider>/<provider>:plan: ...`.
  Table label still uses `<provider>:plan`.
- PlanDiagnosticSeverity doc comment now spells out the exit-code
  mapping: Error always FAILs; Warning is advisory by default but FAILs
  under --strict; Info never affects exit code.

New test: TestCheckRA10_PlanLevelInfoDiagnostic_LogsAsProviderSlashPlan
covers the log-format fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
…#531)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): jitsubst.ResolveSpec for per-module deferred substitution

T5.1 — new package iac/jitsubst hosts ResolveSpec, the apply-time helper
that resolves ${VAR}, ${MODULE.field}, and ${MODULE.id} references in a
ResourceSpec.Config tree. Strict semantics: every reference MUST resolve
or the helper returns an error and the input spec unchanged. ${MODULE.id}
prefers the in-apply replaceIDMap (W-3b/T3.4) over syncedOutputs so
cascade-replace ProviderID propagation is authoritative over potentially
stale state outputs.

Used by W-5 T5.2 (wire into wfctlhelpers.ApplyPlan) and T5.3 (wire into
doReplace). No behavior change yet — helper has no in-tree caller.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): ApplyPlan resolves JIT substitutions per action

T5.2 — wfctlhelpers.ApplyPlan now invokes jitsubst.ResolveSpec on every
action.Resource before dispatch. The substitution sees:

  - result.ReplaceIDMap (this-apply Replace ProviderIDs from doReplace)
  - syncedOutputs (state-side outputs from action.Current entries +
    this-apply outputs from successful prior dispatches in the same loop)
  - os.LookupEnv (production env source)

syncedOutputs is pre-populated from every action.Current at start-of-apply
so a NEW action can reference an in-state sibling module's outputs from
action zero. After each successful dispatch (when result.Resources grows),
the new entry is folded into syncedOutputs via flattenOutputs — flat-copy
of Outputs with the canonical 'id' key shadowed by ProviderID so
${MODULE.id} resolves predictably across new and existing modules.

JIT failure surfaces as a per-action ActionError with the canonical
'jit substitution:' prefix; the offending action SKIPS dispatch
(unresolved spec must not reach the driver). The loop continues to the
next action — best-effort apply contract preserved.

Tests in apply_jit_test.go cover: 2-create plan with B referencing
${A.id}, pre-syncing from action.Current, unresolved-ref skipping
dispatch with canonical prefix, no-refs passthrough, and loop-continues-
after-per-action-JIT-error. T5.3 wires Replace cascade.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): ApplyPlan replace cascade propagates new ProviderID

T5.3 — locks the Replace-cascade contract via apply_replace_cascade_test.go
and updates doReplace godoc to document the cascade hookup explicitly.

Two scenarios:
- ReplaceCascade_DependentCreateGetsNewParentID: [Replace parent, Create
  dependent] where dependent's Config has ${parent.id}; dependent's
  Create receives the new ProviderID.
- ReplaceCascade_DependentReplaceGetsNewParentID: extends to Replace-on-
  Replace shape; dependent's post-Delete Create still sees the resolved
  parent.id, while its own Delete continues to target the OLD ProviderID
  via action.Current (JIT does not alter action.Current).

The behavior was already operational after T5.2's loop-level
jitsubst.ResolveSpec call: doReplace populates result.ReplaceIDMap
inside iteration N, and the loop's pre-dispatch substitution at
iteration N+1 sees the fresh entry. T5.3 adds the assertion + doc
that locks this ordering as a contract; future refactors that move
substitution out of the loop OR delay ReplaceIDMap population will
break these tests loudly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plan SchemaVersion=2 when JIT substitution required

T5.4 — runInfraPlan now stamps plan.SchemaVersion conditionally:

  - V1 (1) baseline when no plan action's resolved Resource.Config
    carries a JIT-style ${MODULE.field} or ${MODULE.id} reference.
  - V2 (2) when any action does — older wfctl binaries reading the
    persisted plan reject with the existing 'newer than supported'
    diagnostic at runInfraApply.

Detection is centralized in jitsubst.HasModuleRefs (recursive walk over
map[string]any / []any / string), gated by a simple regex that requires
non-empty segments on both sides of the dot — plain ${VAR} env-var
refs (no dot) do NOT trigger the bump, so the common operator
secret-via-env workflow stays at V1.

cmd/wfctl/infra.go gains:
  - infraPlanSchemaVersionV1 (=1) and infraPlanSchemaVersionJIT (=2)
    constants alongside the existing infraPlanSchemaVersion (=2, max
    readable). The 'max readable' constant ticks up with every schema
    bump; V1/JIT name the per-plan choice runInfraPlan makes.
  - planRequiresJITSubstitution(plan) helper that walks plan.Actions
    once via jitsubst.HasModuleRefs.

Tests:
  - iac/jitsubst/jitsubst_test.go — 8 new HasModuleRefs cases (env-var
    is false, .field/.id are true, nested map/slice, nil-safe,
    malformed refs are false, mixed-string is true).
  - cmd/wfctl/infra_plan_schema_test.go — V1 baseline (env-var only),
    V2 for both .field and .id, V1 negative for env-var-only, and
    persisted-plan SchemaVersion=2 end-to-end (where T5.5's rejection
    has not yet landed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): reject persisted JIT-style plans (canonical path is apply-without-plan)

T5.5 — runInfraPlan now refuses to write a plan.json via -o when the
plan is JIT-style (SchemaVersion = infraPlanSchemaVersionJIT). The exact
operator-facing error string is contract-stable:

  error: plan -o requires JIT-free config; this plan references
  ${MODULE.field} which only resolves at apply time. Use
  'wfctl infra apply' (without --plan) for JIT-aware applies.

Stdout-only emission (no -o) of a JIT-style plan is permitted — it's a
preview, not a contract. The guard fires AFTER plan computation so the
operator sees the plan table on stdout before the rejection at the
persistence step.

Tests in cmd/wfctl/infra_plan_jit_reject_test.go (4 cases):
  - exact-string match (the strict contract)
  - stdout-only JIT plan permitted (negative-control on the guard scope)
  - persisted non-JIT plan permitted (V1 happy path unchanged)
  - canonical-keyword substring match (operator-search-engine safety net)

Removed T5.4's now-redundant TestInfraPlan_SchemaVersionV2_PersistedToFile-
Matches — its happy path has been replaced by T5.5's strict rejection
contract; SchemaVersion stamping correctness is still locked by the
helper-direct tests in the same file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): T5.7 runtime-launch-validation — JIT subst + plan rejection

W-5 Task T5.7: per the plan's 'Files: none' instruction, this is a
documentation-only commit recording the runtime-launch-validation
transcript against the built wfctl binary.

# Step 1: Build

  $ GOWORK=off go build -o /tmp/wfctl-jit-validation ./cmd/wfctl
  (no output, exit 0)

# Step 3: T5.5 persisted-JIT-plan rejection (build-binary verification)

Fixture (infra.yaml):
  modules:
    - name: app
      type: infra.container_service
      config:
        env_vars:
          VPC_UUID: "${vpc.id}"
          DB_HOST: "${pg.private_ip}"

  $ wfctl infra plan -o /tmp/jit-validation/plan.json --config infra.yaml
  Infrastructure Plan — infra.yaml

  + create  app  (infra.container_service)

  Plan: 1 to create, 0 to update, 0 to destroy.
  error: error: plan -o requires JIT-free config; this plan references
  ${MODULE.field} which only resolves at apply time. Use 'wfctl infra
  apply' (without --plan) for JIT-aware applies.
  EXIT=1

The doubled 'error: error:' prefix is because cmd/wfctl/main.go's
top-level error reporter prepends 'error: ' to every command failure
(line 211: `fmt.Fprintf(os.Stderr, "error: %v\n", rootErr)`), AND
the team-lead-specified literal also begins with 'error: '. Per
implementer brief: 'Match exactly.' Flagging here for visibility — a
follow-up could either drop the prefix from the literal or special-case
main.go's wrapping. Not addressing in W-5.

# T5.5 inverse: stdout-only JIT plan permitted (no rejection)

  $ wfctl infra plan --config infra.yaml
  Infrastructure Plan — infra.yaml

  + create  app  (infra.container_service)

  Plan: 1 to create, 0 to update, 0 to destroy.
  EXIT=0

# T5.4 V1 baseline: non-JIT config persisted to disk still works

  Fixture (infra-novars.yaml):
    modules:
      - name: app
        type: infra.container_service
        config:
          cidr: "10.0.0.0/16"

  $ wfctl infra plan -o plan-novars.json --config infra-novars.yaml
  Plan: 1 to create, 0 to update, 0 to destroy.
  Plan saved to /tmp/jit-validation/plan-novars.json
  EXIT=0

  $ jq .schema_version plan-novars.json
  1                          ← V1 (T5.4 stamp logic working)

# Step 2: apply with ${A.id} reference — covered by in-tree tests

T5.7 plan §Step 2 specifies running 'apply against fixture with ${A.id}
reference' against the built binary. wfctl infra apply requires a fully-
configured iac.provider plugin (manifest, plugin.json, gRPC binary), so
running this end-to-end against an ad-hoc fixture is non-trivial without
W-7's conformance harness. The same code path is fully covered by:

  - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_TwoCreate_BSpec-
    ResolvesAID (T5.2 — basic create+create cascade)
  - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace-
    Cascade_DependentCreateGetsNewParentID (T5.3 — replace+create cascade)
  - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace-
    Cascade_DependentReplaceGetsNewParentID (T5.3 — replace+replace cascade)
  - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_UnresolvedRef_-
    RecordsActionErrorAndSkipsDispatch (T5.2 — failure path)

These exercise the SAME wfctlhelpers.ApplyPlan code path the binary
invokes; the unit-test fake driver is functionally equivalent to a v2
plugin from ApplyPlan's perspective. A binary-level apply smoke test is
deferred to W-7's conformance gate (which adds the DO smoke test against
real-cloud fixtures).

# Verification

Tests pass:
  GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... ./platform/... ./cmd/wfctl/... ./module/...
  → all packages OK.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(iac): T5.5 review — exact plan-literal error string

Spec-reviewer caught that the shipped error string in cmd/wfctl/infra.go
diverged from the plan literal at docs/plans/2026-05-03-iac-conformance-
and-replace.md §T5.5 line 2104. The kickoff brief I worked from
substituted a wordier alternate string; team-lead confirmed the plan
literal is the correct contract.

Three fixes:

1. cmd/wfctl/infra.go:297 — replace fmt.Errorf literal with
   errors.New(<plan literal>). No leading 'error:' prefix — that's
   prepended by cmd/wfctl/main.go's top-level error wrapper, so the
   doubled 'error: error:' artifact in T5.7's runtime transcript is
   resolved as a side benefit. Switched to errors.New per spec-reviewer
   suggestion: avoids govet's no-format-verbs noise on the no-substitution
   case and is the canonical Go pattern for fixed-string sentinels.

2. cmd/wfctl/infra_plan_jit_reject_test.go:16 — expectedJITRejectError
   constant updated to the plan literal. Comment block expanded to
   document the literal's source + the leading-error-prefix nuance for
   future readers.

3. cmd/wfctl/infra_plan_jit_reject_test.go:125 — substring keyword
   list in TestInfraPlan_RejectionErrorContainsCanonicalKeywords
   updated to keys actually present in the new literal:
   'JIT resolution', 'persisted plan.json', 'wfctl infra apply',
   '-o/--plan'. The exact-match test above is the strict contract;
   this one stays as the operator-search-engine safety net.

Verified end-to-end via rebuilt wfctl binary against the same fixture
from T5.7's transcript:

  $ wfctl infra plan -o plan.json --config infra.yaml
  Infrastructure Plan — infra.yaml
  + create  app  (infra.container_service)
  Plan: 1 to create, 0 to update, 0 to destroy.
  error: this plan requires JIT resolution; persisted plan.json is not
  supported. Run 'wfctl infra apply' directly without -o/--plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): ADR 008 — JIT substitution at dispatch loop, not per-helper

Records the architectural choice resolved during T5.3: jitsubst.ResolveSpec
runs once at the wfctlhelpers.ApplyPlan dispatch loop (immediately before
each dispatchAction call), NOT inside per-action helpers. doReplace
populates result.ReplaceIDMap; the next iteration's pre-dispatch
ResolveSpec consumes it. This honors the Replace-cascade contract via
loop-ordering invariant rather than via an explicit substitution call
inside doReplace.

Plan §T5.3 specified inner-resolve in doReplace; T5.2's loop-level call
already covered the cascade case. Threading syncedOutputs through
dispatchAction → doReplace would have made the helper boundary leaky for
one call site. Option 1 (test-only T5.3 + this ADR) chosen by team-lead
over option 2 (inner-resolve rework) on 2026-05-04 after spec-reviewer
escalation.

Cascade contract is locked by apply_replace_cascade_test.go's two
scenarios; this ADR ensures future refactors that move substitution out
of the loop OR delay ReplaceIDMap population see the trade-off rather
than rediscovering it via git bla…
intel352 added a commit that referenced this pull request May 4, 2026
…W-6 of 12) (#532)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): --allow-replace flag for per-resource protected-replace opt-in

W-6/T6.1: gate replace and delete actions targeting `protected: true`
resources behind a per-resource opt-in flag at apply time. Without
--allow-replace=<csv>, the apply errors before any provider Apply or
wfctlhelpers.ApplyPlan dispatch with the design-spec literal
("resource %q is protected: true and would be %sd; pass
--allow-replace=%s to override"). With the resource name listed in
--allow-replace, the protection is bypassed for that resource only.

Gate fires on both dispatch paths — live-diff (applyWithProviderAndStore)
and --plan (applyPrecomputedPlanWithStore) — so the safety guarantee
holds regardless of plan provenance. The protected flag is sourced from
Resource.Config for replace actions and Current.AppliedConfig for delete
actions (where platform.differ leaves Resource.Config empty).

The allow-set is published via package-level applyAllowReplaceSet
(matching the computeInfraPlan / applyV2ApplyPlanFn seam pattern) and
reset to nil at the top of every runInfraApply via deferred cleanup —
override authorization must not leak across runs.

T6.2 will swap this fail-fast for an aggregated multi-blocker report
with a copy-paste --allow-replace=name1,name2,... value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply batch-reports protected-replace blockers with copy-paste flag

W-6/T6.2: validateAllowReplaceProtected now walks the entire plan and
aggregates ALL replace/delete blockers (resources annotated
`protected: true` and not in --allow-replace) into a single error,
instead of failing fast on the first one. The operator sees the
complete blocker set in one apply attempt and gets a pre-formatted
copy-paste flag value to authorize them all at once:

  plan would require destructive action on N protected resource(s):
    <name1> (replace)
    <name2> (delete)
    ...
  to authorize, re-run with:
    --allow-replace=<name1>,<name2>,...

Names and the csv preserve plan-action declaration order so output is
deterministic. The single-blocker case still emits the batch format —
operator-facing UX is consistent regardless of blocker count, which
matters for automation pinning the copy-paste flag pattern.

Per plan T6.2 "(or apply-time check; pick one — apply is cleaner since
plan output already shows all actions)" — the gate stays in
cmd/wfctl/infra_apply.go rather than platform/differ.go::ComputePlan.
ComputePlan remains plugin-agnostic; the protected-resource policy is
a wfctl-side operator-experience concern.

T6.1's single-line error literal is superseded; T6.1 tests are
updated to assert on the operator-facing essentials (resource name +
copy-paste flag value) rather than the legacy literal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document --allow-replace flag

W-6/T6.4: add a dedicated `infra apply` subsection to docs/WFCTL.md
covering the protected-resource gate, the new --allow-replace=<csv>
override, and its relation to the older --allow-protected-prune flag.
Includes the canonical aggregated-blocker error format from T6.2 so
operators know what to expect (and what to copy-paste) when the gate
fires, plus three runnable examples (standard apply, --plan apply,
authorized Replace cascade).

Per W-4 team-lead Option-3, mdformat is waived; markdown-link-check
is the meaningful baseline. WFCTL.md links all resolve clean against
the local repo (3 internal/external refs). Pre-existing dead links
elsewhere in docs/ are unchanged by this commit and out of W-6 scope.

Verification:
  markdown-link-check docs/WFCTL.md → 0 errors
  GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \
    ./platform/... ./cmd/wfctl/... ./module/... → all pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(merge): restore T6.1 + T6.2 helpers lost during cascade-merge with -X theirs

* fix(iac): R1 review — drop redundant ComputePlanVersionDeclarer assertion at apply call site (Copilot review)

DispatchVersionFor is documented to centralise the type-assertion plus
the default-to-v1 fallback so call sites pass the raw provider value
rather than re-asserting the optional interface. The v2 dispatch
condition reverts to the canonical form:

    if wfctlhelpers.DispatchVersionFor(provider) == wfctlhelpers.DispatchVersionV2 { ... }

No behavior change: a provider that doesn't implement the interface,
or returns anything other than "v2", still routes to the legacy v1
provider.Apply path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
…9) (#534)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): add optional ProviderPlanner interface for v2 plugins (rev10 user override)

* ci(iac): cross-plugin build gate + ADR 009 (ProviderPlanner included per user override)

* docs(iac): document ProviderPlanner adapter author guide

* docs(adr): restore plan-literal Context para 1 in ADR 009 (T9.2 spec-review fix)

* docs(iac): point ProviderPlanner author guide at real ProviderIDValidator precedent (T9.3 quality fix)

* ci(iac): add fail-fast=false, concurrency, go.mod/go.sum paths to cross-plugin gate (T9.2 quality fix)

* fix(iac): R2 review — correct ProviderPlanner doc/ADR/test/CI findings (Copilot review)

Six Copilot inline findings + CodeQL workflow-permissions warning:

1. docs/iac/providerplanner.md: ComputePlan in v0.21.0 dispatches
   driver.Diff directly (in platform/differ.go); it does NOT call
   IaCProvider.Plan. The reverse is true (Plan delegates to ComputePlan
   in some implementations). Updated the call-chain description and
   the illustrative dispatch-site code block to reference the actual
   file (platform/differ.go) so adapter authors don't follow the wrong
   call chain.
2. docs/adr/009: replaced the personal email reference with "the
   workspace owner" so ADR provenance doesn't embed PII.
3. interfaces/iac_provider_planner_test.go: now actually verifies the
   additivity claim by reusing the package's existing mockProvider as
   the negative case — runtime assertion confirms mockProvider does
   NOT satisfy ProviderPlanner. Moved file to interfaces_test package
   to share fixtures.
4. .github/workflows/cross-plugin-build-test.yml: explicit `permissions:
   contents: read` (CodeQL workflow-permissions guidance); added
   `env: GOPRIVATE/GONOSUMCHECK` matching ci.yml + codeql.yml so
   downstream plugin builds resolve github.com/GoCodeAlone/* deps
   consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
Spec-reviewer round-2 finding (commit 26ac916): the dispatcher only
forced DryRun=false on -fix, but did NOT prevent a user-supplied
-dry-run=false from leaving the gate open. With the natural mode
predicate `if !opts.DryRun { mutate() }`, this would silently bypass
the explicit -fix gate that plan §W-8 line 2347 names as the sole
mutation entry point ("-dry-run flag default true; -fix opts into
mutation").

Fix: normalize the gate at the dispatcher boundary — when Fix is set,
DryRun=false; when Fix is unset, DryRun=true regardless of what the
user passed via -dry-run=. Fix is now the single source of truth for
"may I mutate?", so any natural mode predicate is safe by construction.
Options.DryRun's doc comment now states this contract explicitly so
T8.2-T8.5 implementers cannot reach for the wrong predicate.

Tests pin all three cases:
  - -dry-run=false alone        → DryRun stays true (the bypass)
  - -fix -dry-run=false         → mutation authorized (Fix wins)
  - -dry-run=true -fix          → mutation authorized (Fix wins)

Also adds TestPackageDoc_MentionsSkipMarker (process note #6) — cheap
file-content guard so a future SkipMarker rename trips a test rather
than silently desyncing the package doc comment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
Round-1 review on PR #538 surfaced 11 substantive findings; all addressed:

Critical (real bugs that broke compile or silently dropped logic):

 #1 [lint, refactor-plan] Rewrite target wrong — `wfctlhelpers.Plan` does
    not exist in the repo today. Pivoted to `platform.ComputePlan` (the
    real helper at platform/differ.go:72). Both targets now accepted by
    the lint analyzer for forward-compat with rev0 fixtures. Plan-doc
    §T8.3 named the wrong helper; flagged for retro.
 #2 [refactor-plan] rewritePlanBody only renamed `_` ctx params. A
    method declared `Plan(c context.Context, ...)` would be rewritten
    referencing undefined `ctx`. Now: any non-blank ctx-name preserved;
    only blank `_` renamed to `ctx`.
 #3 [refactor-plan] isCanonicalPlanBody too loose — extra side-effects
    inside the desired loop still classified as canonical. Tightened to
    require exactly the 3-statement template (lookup + !exists guard +
    configHash compare), no else branches, no trailing junk. Regression
    test: TestRefactorPlan_ExtraLoggingNotCanonical.
 #4 [refactor-plan, refactor-apply] SkipMarker only consulted on
    fn.Doc. PR description promised type-doc + GenDecl-doc honoring.
    Added receiverTypeDocs + carriesMarker; both modes now check all 3
    doc levels.
 #5 [refactor-apply] hasCanonicalCases only checked case labels. Bespoke
    bookkeeping inside a case body (logging, metrics, alternate driver
    calls) classified as canonical and would be silently dropped on
    -fix. Added caseBodyIsCanonical whitelist (driver call, ResourceRef
    construction, ProviderID guard). Regression test:
    TestRefactorApply_ExtraBookkeepingNotCanonical.
 #6 [refactor-apply] custom-error-wrapping suggestion named fictional
    APIs (ApplyResultErrorHook / WrapActionError). Replaced with honest
    hand-port advice: skip-marker + manual switch, OR move wrap into
    driver methods so wfctlhelpers records it verbatim.
 #7 [add-validate-plan] Stub always emitted unqualified `*IaCPlan` /
    `[]PlanDiagnostic`. Files importing the interfaces module under a
    qualifier (e.g. `*interfaces.IaCPlan`) failed to compile after
    -fix. Added interfacesQualifier detector + qualified stub emission.
    Regression: TestAddValidatePlan_Fix_QualifiedSignature.
 #8 [add-validate-plan, lint] hasValidatePlanMethod /
    AssertProviderImplementsValidatePlan checked method NAME only.
    Wrong-signature ValidatePlan (e.g. takes a string) was treated as
    compliant even though interfaces.ProviderValidator wouldn't be
    satisfied. Added validatePlanSignatureMatches: shape-checks the
    receiver param + return slice (qualified-or-unqualified). Both
    callers now use it. Regression:
    TestAddValidatePlan_DryRun_FlagsWrongSignature.
 #9 [refactor-plan, refactor-apply, add-validate-plan] Single-file
    pass — providers whose Plan + Apply lived in sibling files were
    silently omitted. Added planLikeReceiversInDir: directory-wide
    method-set scan. Per-file fallback retained for isolated single-
    file targets.

Important:

#10 [lint] Per-file parse/type-check errors accumulated in
    report.errors but exit code stayed 0 if there were no findings —
    green CI hid coverage gaps. Now exits 1 on either findings OR
    errors.
#11 [refactor-apply] -report-file mode flag never appeared in usage
    text. Documented in main.go's global usage block (the `-h` path
    intercepts before the per-mode FlagSet).

Plan-doc gap surfaced for retro: §T8.3 line 2373 reads "replaces with
`return wfctlhelpers.Plan(ctx, p, desired, current)`", but no such
function exists; reality is `platform.ComputePlan`. Recurring defect
class (plan-literal vs reality gap, W-4/W-5/W-7/W-9/W-8). Documented
in planHelperImportPath docstring + this commit body.
intel352 added a commit that referenced this pull request May 4, 2026
Round 2 surfaced 9 substantive findings; all addressed:

Critical (compile-break / contract-break):

 #1 [refactor-plan, lint] platform.ComputePlan returns IaCPlan BY VALUE,
    but provider Plan methods return *IaCPlan. Single-statement
    `return platform.ComputePlan(...)` rewrite produced uncompilable
    code. Switched to canonical 2-statement form:
        plan, err := platform.ComputePlan(ctx, p, desired, current)
        return &plan, err
    isAlreadyDelegatedPlanBody widened to recognise both the new shape
    and the legacy single-statement forms (idempotent across revs).
 #3 [refactor-plan] rewritePlanBody fell back to recvName="p" but
    didn't update the receiver decl when the source had an unnamed
    receiver (`func (*Provider) Plan(...)`). Rewritten call referenced
    undefined `p`. Added ensureReceiverName: injects identifier and
    mutates the AST. Regression: TestRefactorPlan_Fix_UnnamedReceiverGetsName.
    Also added: TestRefactorPlan_Fix_PreservesCustomCtxName for round-1
    finding #2 (custom ctx name preserved).
 #4 [refactor-apply] Same unnamed-receiver bug as #3. Same fix
    (ensureReceiverName + ensureCtxParamName + ensureNthParamName
    helpers shared with refactor-plan). Regression:
    TestRefactorApply_Fix_UnnamedReceiverGetsName.
 #5 [add-validate-plan] Stub always emitted `func (p *T) ValidatePlan(...)`
    even when the type used value receivers. Method-set mismatch made
    the type fail interfaces.ProviderValidator type assertion. Added
    providerReceiverConvention + receiverIsPointer; stub now matches
    the existing Plan/Apply convention. Regression:
    TestAddValidatePlan_Fix_ValueReceiverConvention.

Important (skip-marker not honored in lint, single-file pass):

 #6 [lint] AssertPlanDelegatesToHelper checked fn.Doc only, ignoring
    type-doc and GenDecl-doc skip markers. Added receiverTypeDocsForPass
    helper; analyzer now checks all 3 doc levels.
 #7 [lint] AssertApplyDelegatesToHelper — same fix as #6.
 #8 [lint] AssertDiffSetsNeedsReplaceForForceNew — same fix as #6.
 #9 [lint] lintFile passed only the target file to the analyzers, so
    cross-file method sets were invisible (same blind spot the
    refactor-* modes had in round 1). Now lintFile loads sibling
    non-test .go files from the same package directory and feeds the
    full slice to each analyzer; diagnostics for sibling files are
    dropped (the outer walker visits them in their own turn) so no
    duplicate findings.

All 4 modes now compile-clean rewrites + honor 3-level skip-marker +
package-aware method-set detection.
intel352 added a commit that referenced this pull request May 4, 2026
Round 3 surfaced 7 substantive findings; all addressed:

Critical (compile-break / silent data loss):

 #1 [add-validate-plan] Directory-wide detection only widened `provs`
    in round 2; methodsByRecv stayed file-local. A provider with
    ValidatePlan in a sibling file (or value-receiver Plan/Apply
    declared elsewhere) would receive a duplicate or wrong-receiver
    stub. Now planLikeProviderMethodsInDir returns both the recv set
    AND the merged method slice; methodsByRecv carries the
    package-wide view (deduped by method name). Stub injection still
    only fires when typeDecls[recv] is non-nil so we never append
    to a sibling file.
 #2 [refactor-plan] isCanonicalPlanBody accepted ANY 2-result return
    statement at the trailing slot. A planner with the canonical
    scaffold but a bespoke return (cloned plan, propagated error
    value) would classify as canonical and the bespoke logic would be
    silently dropped. Tightened to require EXACTLY `return plan, nil`.
 #3 [refactor-plan] rewritePlanBody hardcoded "desired"/"current" as
    args. A canonical Plan with renamed params (e.g. `Plan(ctx,
    specs, state)`) would rewrite to references to undefined
    identifiers. ensureNthParamName now extracts the actual signature
    names.
 #4 [refactor-plan] rewritePlanBody hardcoded "platform" as the call
    selector. A file using `pf "github.com/.../platform"` wouldn't
    compile because `platform` is undefined (ensureImport sees the
    aliased import as satisfying the path check). Added pkgAliasFor
    helper; rewrite now uses whatever local name the file imports
    under.
 #5 [refactor-apply] caseBodyIsCanonical accepted ANY AssignStmt as
    canonical. Bookkeeping AssignStmts (metrics counters, map
    updates, accumulators) passed and would be silently dropped.
    Tightened to a narrow whitelist: multi-target driver call,
    single-target driver call (LHS=err), composite-literal
    construction, selector-assignment to ResourceRef-style fields
    (ProviderID/Name/Type). Anything else rejected.
 #6 [refactor-apply] Same import-alias issue as #4 for `wfctlhelpers`.
    pkgAliasFor reused; rewriteApplyBody now uses whatever local name
    the file imports under.

Important:

 #7 [lint] AssertProviderImplementsValidatePlan checked ts.Doc only,
    missing markers placed on the wrapping GenDecl. Aligns now with
    the receiverDoc.carriesMarker pattern used by the other 3
    analyzers (round-2 #6/#7/#8). typeDocsByName captures both
    TypeSpec.Doc and GenDecl.Doc.

Round-2 regression tests retained (TestRefactorPlan_Fix_UnnamedReceiverGetsName,
TestRefactorPlan_Fix_PreservesCustomCtxName,
TestRefactorApply_Fix_UnnamedReceiverGetsName,
TestAddValidatePlan_Fix_ValueReceiverConvention).
Round-3 fix verified end-to-end against an aliased-import fixture
(pf "github.com/.../platform" + wfh "github.com/.../wfctlhelpers"):
the rewritten output compiles cleanly under gofmt.
intel352 added a commit that referenced this pull request May 4, 2026
… findings

Round 4 surfaced 6 findings, all real. The recurring theme: rev3's
pattern detectors were either too loose (accepted bookkeeping shapes
as canonical) or too rigid (literal package-name matching, breaking
on aliased imports).

Fixes:

 #1 [add-validate-plan] interfacesQualifier(file) returned "" when the
    type-only file (no Plan/Apply imports) received the stub via
    cross-file detection (round-3 #1). Stub then emitted unqualified
    types that wouldn't compile. Now: when the file lacks an interfaces
    import but ANY sibling does, fall back to "interfaces" qualifier
    AND inject the interfaces import into the type-file via AST printing
    (format.Node) before appending the stub. Added
    siblingUsesInterfacesImport helper.
 #2 [refactor-apply] isCanonicalCaseAssign accepted ANY composite
    literal (`x := <CompositeLit>`) as canonical. A bookkeeping struct
    construction (audit payload, metric envelope) silently passed.
    Tightened to require the literal type's name (qualified or
    unqualified) match "ResourceRef".
 #3 [refactor-apply] isDriverMethodCall only checked selector NAME
    (Create/Read/Update/Delete). Calls like `helper.Update(...)` or
    `metrics.Delete(...)` were misclassified as canonical driver
    dispatch. Added receiver-allowlist check: only `d`, `drv`, or
    `driver` accepted as driver-bound identifiers (matching the
    standard `d, err := p.ResourceDriver(...)` pattern in DO/AWS/GCP/Azure).
 #4 [refactor-apply, refactor-plan] isAlreadyDelegatedApplyBody and
    isAlreadyDelegatedPlanBody required literal `wfctlhelpers` /
    `platform` package idents. Files using aliased imports
    (`wf "..."`, `pf "..."`) were misreported as non-canonical even
    though they were valid delegations. Both functions now resolve
    the file's local alias via pkgAliasFor; literal names retained
    as fallbacks. Same fix for isPlatformComputePlanAssign (the
    helper inside isAlreadyDelegatedPlanBody).
 #5 [lint] AssertPlanDelegatesToHelper / AssertApplyDelegatesToHelper
    selector matchers required literal `platform` / `wfctlhelpers`
    package names. Same false-positive risk as #4 for aliased imports.
    Both analyzers now resolve the alias and accept either the
    aliased OR literal form.
 #6 [refactor-apply] caseBodyIsCanonical accepted ANY DeclStmt as
    canonical, so `var x SomeBookkeepingType` declarations passed
    even though they're exactly the bespoke logic the codemod is
    supposed to preserve. Tightened via isLocalOutPointerDecl: only
    `var <name> *<ResourceOutput-suffix>` accepted.

Smoke-tested against an aliased-import fixture (`wf "...wfctlhelpers"`
+ `pf "...platform"`):
- refactor-apply correctly classifies as already-delegated (was:
  misreported as missing-action-switch)
- lint reports 0 findings (was: false-positive
  AssertPlanDelegatesToHelper + AssertApplyDelegatesToHelper)
intel352 added a commit that referenced this pull request May 4, 2026
Round 5 surfaced 9 findings; all addressed. Recurring theme: the
detectors and reporters needed deeper structural verification (branch
contents, outer-shape, receiver-kind, package isolation, exit-code
semantics) — not just shape matching at one level.

Critical (silent data loss / repair regression):

 #1 [refactor-plan] rangeBodyMatchesCanonicalDesired only checked the
    guard expressions and statement count; never inspected what the
    `!exists` and `configHash != configHash` branch BODIES did. A
    planner with extra logic (telemetry, alternate action construction,
    different create/update payload) inside those branches was
    silently rewritten away. Added isCanonicalCreateBranchBody +
    isCanonicalUpdateBranchBody + isPlanActionsAppendAssign to verify
    the create branch is exactly `append+continue` and the update
    branch is exactly `append`.
 #2 [refactor-apply] classifyApplyBody verified only the switch
    shape; setup/teardown/result aggregation OUTSIDE the switch was
    silently dropped on -fix. Added isCanonicalApplyOuterShape: the
    Apply body must be exactly the 3-statement scaffold (result-init
    + range-loop + return result, nil).
 #3 [add-validate-plan] hasValidatePlanMethod ignored receiver kind.
    A value-receiver provider with a pointer-receiver ValidatePlan
    still failed the ProviderValidator type assertion (method-set on
    `T` does not include `*T` methods), but rev2 treated it as
    already-implemented. Now also requires receiver-kind match.
 #4 [lint] AssertProviderImplementsValidatePlan had the same
    receiver-kind blind spot. Now delegates to hasValidatePlanMethod
    (centralised + DRY).
 #5 [refactor-plan] isAlreadyDelegatedPlanBody accepted single-statement
    `return platform.ComputePlan(...)` (broken rev1 form) as
    already-delegated, so rerunning the fixed codemod never repaired
    output from the earlier broken rewrite. Now ONLY accepts the
    canonical 2-statement form; broken single-statement forms classify
    as non-canonical so a fresh -fix produces compilable output.
 #6 [refactor-plan] planLikeProviderMethodsInDir merged methods from
    every non-test .go file regardless of `package P` clause. Mixed-
    package or build-tagged directories could fold methods from
    unrelated packages into a synthetic provider. Added two-pass
    package-clause check: aggregate only files matching the dominant
    package.

Important (CI fidelity / detector recall):

 #7 [Makefile, lint] `|| true` in migrate-providers swallowed real
    execution failures alongside expected advisory findings, because
    lint returned 1 for both findings AND parse errors. Split the
    exit codes: 0 clean / 1 findings / 2 errors. Makefile now gates
    on `[ $? -ne 0 ] && [ $? -ne 1 ]` so parse errors fail the
    target.
 #8 [refactor-plan] Canonical matcher hardcoded the lookup flag
    name as `exists`. The semantically-identical `cur, ok :=`
    idiomatic Go form was reported non-canonical. Widened to accept
    both `exists` and `ok`.
 #9 [refactor-apply] isDriverMethodCall allowlist {d, drv, driver}
    missed common alternates. Widened to {d, dr, drv, rdrv, driver,
    resourceDriver}. Still rejects bookkeeping receivers like
    `metrics`, `audit`, `helper` (preserves round-4 #3 fix).

End-to-end verification: lint against DO plugin produces exit 1 (3
advisory findings, no errors); broken-Go-source produces exit 2;
clean source produces exit 0. Smoke-tested via /tmp/iac-codemod.
intel352 added a commit that referenced this pull request May 4, 2026
…ening findings

Round 7 surfaced 10 findings; 4 were stale (already fixed in R6). 6
real findings addressed:

Critical (compile-break / silent data loss):

 #1 [refactor-plan] isPlanActionsAppendAssign verified the LHS but
    not append's first argument. A bespoke `plan.Actions =
    append(otherSlice, ...)` was misclassified as canonical and the
    alternate-slice logic silently dropped during rewrite. Now both
    LHS and append's first arg must reference plan.Actions.
 #3+#9 [refactor-apply] isCanonicalApplyOuterShape only checked the
    outer 3-statement scaffold; per-action logic INSIDE the for-loop
    body (logging, metrics, custom error handling, accumulators) was
    silently dropped on -fix. Added isCanonicalApplyLoopBody +
    isCanonicalApplyLoopAssign + isCanonicalApplyLoopIf +
    isCanonicalApplyLoopIfBodyStmt: every loop-body statement must
    match a tight whitelist (driver lookup, var-out decl, action
    switch, err-/out-guard ifs).
 #7+#8 [add-validate-plan] provs[recv].Pos() panicked when the
    TypeSpec was nil (cross-file scenario from round-3 #1: type
    declaration in sibling file). Now defaults Pos to NoPos for nil
    specs; sort still works (stable on name when Pos ties).

Important (cross-file consistency):

 #4 [add-validate-plan] qualifier fallback to "interfaces" fired
    based on whether ANY sibling imported interfaces — unreliable
    if THIS provider uses local types but an unrelated sibling
    imports interfaces. Replaced with qualifierFromProviderMethods:
    inspects the provider's OWN Plan/Apply parameter types
    (directory-wide via round-3 #1) for the qualifier they use.
 #5 [add-validate-plan] skip-marker check only consulted typeDecls
    (current file). When Plan/Apply are here but the type with
    `// wfctl:skip-iac-codemod` lives in a SIBLING file, the marker
    was ignored. Added siblingTypeDocs lookup via
    receiverTypeDocsInDir (the round-6 helper).
#10 [add-validate-plan] sibling-method merge deduped by method NAME
    only. If local file has wrong-signature ValidatePlan and sibling
    has correct one, sibling dropped, hasValidatePlanMethod saw only
    bad declaration, injected duplicate stub. Replaced with
    isLocalDuplicate: dedupes by name + parameter arity + result
    arity, so distinct signatures both survive.

Stale findings (already fixed in R6, no action needed):
 #2 refactor-apply receiverTypeDocsInDir already in place
 #6 lint receiver-doc lookup already merged via receiverTypeDocsForPass

Smoke-tested against DO plugin: refactor-plan reports DOProvider.Plan
canonical, refactor-apply reports DOProvider.Apply upsert-recovery
with the upsertSupporter suggestion. Output matches T8.7 baseline.
intel352 added a commit that referenced this pull request May 5, 2026
…fier-name findings

Round 8 surfaced 9 findings; all addressed:

Critical (silent data loss / behavior change):

 #1 [add-validate-plan] isLocalDuplicate compared by name+arity only.
    Wrong-signature ValidatePlan(name string) []PlanDiagnostic and
    correct ValidatePlan(plan *IaCPlan) []PlanDiagnostic have same
    arity but different types — sibling-correct dropped, duplicate
    stub injected. Replaced with signature-fingerprint dedupe
    (signatureFingerprint + typeFingerprint walk all type shapes).
 #4 [refactor-apply] `default:` case clauses accepted without body
    inspection. Logging/metrics in default body silently dropped.
    Added isCanonicalDefaultBody: only `err = fmt.Errorf("unknown
    action %q", ...)` accepted.
 #5 [refactor-apply] isCanonicalApplyLoopAssign accepted any
    `<x>.ResourceDriver(...)`. `helper.ResourceDriver(...)` /
    `plan.ResourceDriver(...)` falsely classified. Now requires the
    receiver to match the provider's own receiver identifier
    (threaded through from classifyApplyBody).
 #8 [refactor-apply] Bare `if err != nil { continue }` accepted as
    canonical, but wfctlhelpers ALWAYS records ActionError before
    continuing — the rewrite would silently change behavior. Now
    requires the if-body to ALSO append to result.Errors before any
    continue/break.

Important (skip-marker scope + identifier flexibility):

 #2 [add-validate-plan] Skip-marker check fired on EVERY method's
    fn.Doc — a marker on Destroy/Status/etc. accidentally suppressed
    the whole provider's analysis. Restricted to Plan/Apply (the
    provider-defining methods).
 #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2.
 #6 [refactor-plan] Canonical detector hardcoded `current`/`desired`
    body identifiers. Providers using `state`/`specs` reported
    non-canonical despite rewriter preserving names. Added
    nthParamName extraction; isCanonicalPlanBody now takes the actual
    parameter names.
 #7 [refactor-apply] Driver-receiver allowlist comment claimed `rd`
    accepted, but the switch was missing it. Added.
 #9 [refactor-apply] Canonical detector hardcoded `result` /`plan`
    identifier names. Providers using `res` /`pl` rejected. Now
    recovers actual identifier from signature (planName) and from
    statement-1 LHS (resultName); both must be consistent within the
    body but can be any identifier.

Smoke-tested against DO plugin: refactor-plan / refactor-apply still
report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery
with stable upsertSupporter suggestion. Output matches T8.7 baseline.

Removed redundant TestRefactorApply_Fix_UnnamedReceiverGetsName: the
unnamed-receiver path can't have a canonical-shape Apply body
(`<recv>.ResourceDriver(...)` requires recv in scope). Receiver-name
injection is shared between refactor-plan and refactor-apply via
ensureReceiverName; coverage stays in
TestRefactorPlan_Fix_UnnamedReceiverGetsName.
intel352 added a commit that referenced this pull request May 5, 2026
Round 9 surfaced 4 findings; all addressed:

Critical (silent behavior change):

 #1 [refactor-apply] If-guard body accepted bare `break`, but
    wfctlhelpers.ApplyPlan records the error and KEEPS processing
    later actions. A `break` would silently change loop semantics
    on rewrite. Now only `continue` is accepted in if-guard bodies.
 #2 [refactor-apply] Driver-method allowlist accepted `Driver` /
    `DriverFor` alongside `ResourceDriver`. wfctlhelpers dispatches
    SPECIFICALLY through IaCProvider.ResourceDriver; a wrapper like
    `provider.Driver(...)` would have its caching/instrumentation
    bypassed. Restricted to `ResourceDriver` only.

Important (false positives / cross-file alias mismatch):

 #3 [add-validate-plan, lint] Receiver-kind enforcement was too
    strict. Per Go spec, `*T`'s method set includes BOTH
    pointer-receiver and value-receiver methods of T. So a
    value-receiver ValidatePlan on a pointer-receiver provider IS
    valid (satisfies ProviderValidator). hasValidatePlanMethod now
    only requires strict matching when the provider uses VALUE
    receivers (T's method set excludes *T methods).
 #4 [add-validate-plan] When the qualifier was derived from a
    sibling method's aliased import (e.g. `iface "github.com/.../interfaces"`),
    the post-loop import injection used unaliased `ensureImport`,
    leaving the stub's `iface.IaCPlan` referring to undefined
    `iface`. Added ensureImportAs helper; now the import alias
    matches the stub's qualifier.

Smoke-tested against DO plugin: refactor-plan / refactor-apply still
report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery
with stable upsertSupporter suggestion. Output matches T8.7 baseline.
intel352 added a commit that referenced this pull request May 5, 2026
Round 10 surfaced 8 findings; all addressed:

Critical (cross-file duplicate stub / silent override):

 #1 [add-validate-plan] Cross-file duplicate stub injection: when type
    is in file_a and Plan/Apply are in file_b, both files classified
    as missing-ValidatePlan and -fix injected duplicate stubs. Now
    only inject in the file containing the receiver TypeSpec
    (`if ts == nil { skip }`); the type-file's own pass handles it.
 #2 [add-validate-plan] Embedded-field promoted ValidatePlan not
    detected; -fix would shadow it with a no-op stub, silently
    dropping real plan diagnostics. Added typeHasEmbeddedFields:
    if the receiver type has any embedded fields, suppress the
    missing classification (we can't statically resolve method
    promotion without full type info, so err on the side of NOT
    injecting).
 #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2.
 #4 [refactor-apply] ProviderID/Name/Type assignment-target whitelist
    didn't check struct identity. `audit.Type = ...` or
    `result.ProviderID = ...` (wrong struct) classified as canonical
    and dropped on rewrite. Now requires the LHS receiver to be
    `ref` (the canonical ResourceRef construction site name).

Important (perf / determinism / lint precision):

 #5 [lint] O(n²) lintFile re-parsed every sibling per-call. Added
    lintDirCache: lintPath now groups files by directory and builds
    one parse cache per dir, reused across the directory's files.
    Per-call fallback retained for single-file invocation.
 #6 [refactor-plan] planLikeProviderMethodsInDir's dominant-package
    selection used range-over-map (random iteration), so on a
    package-count tie the dominant could differ across runs and
    rewrite against the wrong method set. Sort the package names
    so tie-break is lexicographic-first (deterministic).
 #7 [lint] AssertPlanDelegatesToHelper accepted ANY platform.ComputePlan
    call ANYWHERE in the body. Now requires the canonical SHAPE:
    either the 2-statement rev2 form (matches isAlreadyDelegatedPlanBody)
    OR a single-statement legacy `return <X>.Plan(...)` /
    `return <X>.ComputePlan(...)`. Bespoke wrappers that call the
    helper as an intermediate step now correctly flag.
 #8 [lint] AssertApplyDelegatesToHelper — same fix: now uses
    isAlreadyDelegatedApplyBody (the rewriter's idempotency check)
    so anything but the canonical single-statement form flags.

Smoke-tested against DO plugin: refactor-plan / refactor-apply still
report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery
with stable upsertSupporter suggestion. Output matches T8.7 baseline.
intel352 added a commit that referenced this pull request May 5, 2026
Round 11 surfaced 6 findings; all addressed:

Critical (broken-output false-clean / mode clobber):

 #1 [lint] planBodyDelegatesCanonically accepted single-statement
    `return platform.ComputePlan(...)` (the BROKEN rev1 form,
    uncompilable due to value/pointer mismatch). Lint reported
    partially-migrated providers as clean, so migrate-providers
    silently missed them. Now ONLY the canonical 2-statement rev2
    form OR legacy `return wfctlhelpers.Plan(...)` is accepted;
    the broken single-statement platform form falls through to
    non-canonical so lint surfaces the still-needs-fixup state.
 #2 [refactor-plan] writeFileAtomic left the temp file at
    os.CreateTemp's default 0600 mode; rename clobbered the
    source's original permissions (e.g., 0644 → 0600). Added
    writeFileAtomicBytesPreserveMode: captures original mode via
    os.Stat and chmods the temp file before rename.
 #5 [add-validate-plan] Same 0600 mode-clobber bug in
    writeFileAtomicBytes. Now delegates to
    writeFileAtomicBytesPreserveMode.

Important (revert + comment polish):

 #3 [add-validate-plan] Round-10 #2's "any embedded field suppresses
    missing-ValidatePlan" was too broad — sync.Mutex, loggers,
    config mixins don't promote ValidatePlan, so real targets were
    silently missed. Reverted: report missing unconditionally.
    Maintainers whose providers actually promote ValidatePlan
    suppress with the explicit `// wfctl:skip-iac-codemod` marker.
 #4 [lint] AssertProviderImplementsValidatePlan — same revert as #3.
 #6 [refactor-plan] Stale enum comment for planAlreadyDelegated still
    referenced `wfctlhelpers.Plan` as the recognised shape;
    actual implementation recognises the 2-statement
    platform.ComputePlan form. Comment updated.

Removed dead typeHasEmbeddedFields helper (both call sites reverted
in #3/#4). Source-file mode preservation verified end-to-end:
chmod 0644 → -fix → stat shows 0644 retained. Smoke-tested against
DO plugin: refactor-plan / refactor-apply still report
DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with
stable upsertSupporter suggestion. Output matches T8.7 baseline.
intel352 added a commit that referenced this pull request May 5, 2026
…tening findings

Round 12 surfaced 8 findings; all addressed:

Critical (CLI bug + silent rewrite of wrong file):

 #1 [main.go] Top-level dispatcher used a single FlagSet with only
    -dry-run and -fix registered, so any mode-specific flag (e.g.
    refactor-apply's -report-file) failed with "flag provided but
    not defined" BEFORE the mode could parse it. -report-file was
    documented but UNUSABLE from the CLI entrypoint. Replaced
    stdlib FlagSet with a manual-scan loop in run(): -dry-run/-fix
    are extracted; everything else (including unknown flags) flows
    through to the mode's own FlagSet. Bonus: flag-position
    flexibility (`/path -fix` now works), updated test +
    usage text accordingly.
 #2 [refactor-plan] Walked every .go file but built provs/typeDocs
    from only the dominant package. Mixed-package or build-tagged
    directories: a non-dominant file with overlapping receiver
    names was processed against another package's method set,
    rewriting the wrong file. Added dominantPackageForDir; each
    file processor now skips files in non-dominant packages.
 #3 [refactor-apply] Same fix as #2.
 #4 [add-validate-plan] Same fix as #2.

Important (canonical-detection precision):

 #5 [refactor-plan] isPlanActionsAppendAssign didn't validate the
    appended action's payload — `plan.Actions = append(plan.Actions,
    PlanAction{Action: "queue"})` was misclassified as canonical and
    silently rewritten. Added `expectedAction` parameter; create
    branch requires `Action: "create"` and update branch requires
    `Action: "update"`.
 #6 [refactor-apply] hasCanonicalCases verified case labels but not
    that the body's driver call MATCHED the label. A `case "create"`
    body that called `.Update()` or `.Delete()` was misclassified
    and silently rewritten away. Added caseBodyMatchesLabel: scans
    each case body for driver method calls and verifies the label-
    to-method mapping (create→Create, update→Update, delete→Delete,
    replace→Update).
 #7 [refactor-apply] Driver-lookup check accepted any
    `<recv>.ResourceDriver(<arg>)` regardless of <arg>. wfctlhelpers
    always dispatches with `action.Resource.Type`, so providers
    using a different lookup key (e.g. action.Tag, computed value)
    would see different driver behavior on rewrite. Now requires
    the lookup key to be exactly `action.Resource.Type`.
 #8 [lint] looksLikeProvider checked method NAMES + rough arity,
    so any unrelated type with `Plan(...)` and `Apply(...)` was
    treated as a provider (e.g., a deploy strategy). Tightened to
    verify signature shapes via type-name suffix matching:
    Plan must be `Plan(ctx, []ResourceSpec, []ResourceState)
    (*IaCPlan, error)` and Apply must be `Apply(ctx, *IaCPlan)
    (*ApplyResult, error)`. Qualified or unqualified accepted via
    typeNameTailMatches.

Smoke-tested:
- `iac-codemod refactor-apply -report-file <path> <dir>` now works
  (previously: "flag provided but not defined")
- DO plugin still reports DOProvider.Plan canonical / Apply
  upsert-recovery with stable upsertSupporter suggestion (T8.7
  baseline preserved)
intel352 added a commit that referenced this pull request May 5, 2026
…get (W-8 of 12) (#538)

* feat(codemod): scaffold cmd/iac-codemod with 4-mode subcommand dispatcher

T8.1: Adds cmd/iac-codemod skeleton with dispatcher for the four codemod
modes — refactor-plan, refactor-apply, add-validate-plan, lint — and the
shared -dry-run / -fix flag pair. Modes are registered via a map of
modeFunc entries so subsequent tasks (T8.2-T8.5) can wire in real
implementations file-by-file. Each mode currently delegates to a stub
that prints a "not yet implemented" message and exits zero.

Defaults: -dry-run is true; -fix opts into mutation and forces -dry-run
to false. Unknown modes return exit 2 with usage. The // wfctl:skip-iac-codemod
marker convention is documented in the package doc and usage text.

Tests cover dispatch, default flag values, -fix semantics, unknown-mode
handling, help routing, and positional-arg forwarding via a swappable
modes map (no subprocess required).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(codemod): pin SkipMarker const + document flag ordering (T8.1 review)

Addresses spec-reviewer findings on b76ab2f:

1. (BLOCKER) Extract `const SkipMarker = "// wfctl:skip-iac-codemod"` so
   T8.3-T8.5 parsers reference the canonical literal in one place. Plan
   rev2 (line 2400) unifies the four modes on this single marker
   specifically to prevent mismatched-marker silent-no-op surfaces; the
   const + TestSkipMarker_LiteralPinned + TestUsage_MentionsSkipMarker
   guards close the drift hole the reviewer flagged. usage() now formats
   the marker via the const rather than a duplicated string literal.

2. (MINOR) usage() documents the stdlib flag-parser ordering constraint
   (flags must precede paths). TestRun_FlagAfterPath_SilentlyTreatedAsPositional
   pins the failure mode so it is intentional, not a parser bug, and so
   future maintainers see the constraint exercised in tests.

3. (NIT) stubMode's unused args parameter renamed to _; cosmetic only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(codemod): close -dry-run=false mutation-gate bypass (T8.1 review #4)

Spec-reviewer round-2 finding (commit 26ac916): the dispatcher only
forced DryRun=false on -fix, but did NOT prevent a user-supplied
-dry-run=false from leaving the gate open. With the natural mode
predicate `if !opts.DryRun { mutate() }`, this would silently bypass
the explicit -fix gate that plan §W-8 line 2347 names as the sole
mutation entry point ("-dry-run flag default true; -fix opts into
mutation").

Fix: normalize the gate at the dispatcher boundary — when Fix is set,
DryRun=false; when Fix is unset, DryRun=true regardless of what the
user passed via -dry-run=. Fix is now the single source of truth for
"may I mutate?", so any natural mode predicate is safe by construction.
Options.DryRun's doc comment now states this contract explicitly so
T8.2-T8.5 implementers cannot reach for the wrong predicate.

Tests pin all three cases:
  - -dry-run=false alone        → DryRun stays true (the bypass)
  - -fix -dry-run=false         → mutation authorized (Fix wins)
  - -dry-run=true -fix          → mutation authorized (Fix wins)

Also adds TestPackageDoc_MentionsSkipMarker (process note #6) — cheap
file-content guard so a future SkipMarker rename trips a test rather
than silently desyncing the package doc comment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(codemod): warn future maintainers off t.Parallel in main_test.go (T8.1 review #5)

Code-reviewer round-3 authorized now-fix: tests in this file mutate the
package-global `modes` map under defer-restore. -race is currently clean
because no test calls t.Parallel(), but the swap-and-restore pattern is
a latent data race the next agent (T8.2-T8.5) could trigger by adding
parallelism. Top-of-file guard comment names the constraint and points
at the dependency-injection refactor as the unlock path if parallelism
is ever required.

Comment-only change; tests still pass with -race.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(codemod): lint mode with 4 static-check assertions

T8.2: Wires the lint subcommand using golang.org/x/tools/go/analysis
with the four assertions named in plan §T8.2:

  AssertPlanDelegatesToHelper                — provider Plan() delegates to wfctlhelpers.Plan
  AssertApplyDelegatesToHelper               — provider Apply() delegates to wfctlhelpers.ApplyPlan
  AssertDiffSetsNeedsReplaceForForceNew      — driver Diff() sets NeedsReplace on ForceNew
  AssertProviderImplementsValidatePlan       — provider satisfies ProviderValidator

Carry-forwards from T8.1 review baked in:

  1. Dispatcher fs.Usage override (main.go:run) so `iac-codemod <mode> -h`
     produces the global usage rather than the per-FlagSet banner.
     Pinned by TestRun_HelpAfterMode_PrintsGlobalUsage across all 4 modes.

  2. Mutation-gate negative test pinning lint-is-read-only-by-definition:
     TestRunLint_DoesNotMutateFilesEvenWithFixFlag invokes lint with
     hostile {Fix:true, DryRun:false} flags and asserts mtime + content
     unchanged. Plus TestRunLint_FixFlag_WarnsItHasNoEffect surfaces a
     warning so users know -fix did nothing.

  3. Skip-marker honored at func-doc and type-doc levels via
     hasSkipMarkerOn(fn.Doc) / ts.Doc; skipped sites flow through the
     pass.Report channel with a [skipped] prefix and are split into a
     separate report section by lintReport.unpackSkippedFromFindings.
     Plan rev2 (line 2400) requires each mode to surface a list of
     skipped sites in its report — pinned by TestRunLint_SkipMarker_SurfacedInReport.

Precision: all helper-call analyzers gate on providerLikeReceivers
(method set must contain BOTH Plan + Apply matching IaCProvider shape)
to avoid false-positive flags on deploy targets and other Apply-shaped
types. Manual verification against the workflow repo went from 9
findings (incl. 2 false positives in pkg/k8s) down to 7 (all genuine
provider implementations awaiting v2 migration).

Implementation notes:

  - File-by-file analysis via parser.ParseFile + tolerant types.Check
    (stub importer ignores unresolved imports). This works on plugin
    sources that haven't vendored their dependencies. Cross-file
    references won't resolve, but IaC providers and drivers are
    typically co-located by Go convention.
  - Skip-marker is encoded as a synthetic diagnostic with a `[skipped]`
    prefix; the driver post-processes it out of the findings list. This
    keeps the analyzer API surface to one channel.
  - go.mod: promotes golang.org/x/tools from indirect to direct. No
    new modules, no go.sum changes.

Verification: 33/33 tests pass with -race; binary smoke-tested against
workflow repo root (7 findings, exit 1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(codemod): T8.2 review — stdout for -h, marker context, precision filter

Spec-reviewer round-2 on commit 2908fa1; addresses 5 substantive
findings + 1 nit (findings 5 & 6 are PR-body notes, no code change):

1. (BLOCKER) `iac-codemod <mode> -h` now prints the global usage to
   STDOUT, matching `iac-codemod -h` and the kubectl/git/gh convention
   for help-on-success. Previously it landed on STDERR via the
   FlagSet's SetOutput handler. Pinned by
   TestRun_HelpAfterMode_PrintsGlobalUsageToStdout — asserts stream
   specifically rather than the union of stdout+stderr (the prior test
   would have passed even with stderr output). Parse-error noise still
   flows through stderr; only the help-text body moved to stdout.

2. (MEDIUM) hasSkipMarkerOn now accepts a trailing space + arbitrary
   justification text after SkipMarker:
     // wfctl:skip-iac-codemod legacy upsert recovery, see ADR-042
   Annotating WHY a site is skipped is a Go idiom; silently ignoring
   the marker because of trailing context would replicate the exact
   silent-no-op surface plan rev2 line 2400 unifies the marker to
   prevent. Two new tests pin both sides of the contract:
     - TestSkipMarker_AcceptsTrailingJustification
     - TestSkipMarker_RejectsCloseButWrongMarker (negative — the
       legacy `// wfctl:skip-codemod` prefix from design rev1 must
       still flag the diagnostic)

3. (MEDIUM) AssertDiffSetsNeedsReplaceForForceNew now gates on a new
   driverLikeReceivers helper (method set must contain Diff AND at
   least one canonical companion: Read/Create/Update/Delete). Brings
   the analyzer in line with the precision treatment Plan/Apply
   already had via providerLikeReceivers. New
   TestAssertDiffSetsNeedsReplaceForForceNew_NonDriverNotFlagged pins
   the negative case (a SettingsDiff struct with just Diff() is
   correctly invisible to the analyzer).

4. (LOW-MEDIUM) bodyAssignsFieldTrue → bodyAssignsField: the matcher
   now accepts ANY RHS, not just literal `= true`. The terser canonical
   pattern `r.NeedsReplace = c.ForceNew` is equally valid expression
   of the W-3 force-new contract; flagging it was a false positive
   previously hit by cmd/wfctl/deploy_providers.go remoteResourceDriver
   (which propagates NeedsReplace from a gRPC response via
   `result.NeedsReplace, _ = res["needs_replace"].(bool)`). Pinned by
   TestAssertDiffSetsNeedsReplaceForForceNew_AcceptsDirectAssign.

7. (NIT) Removed dead/misleading comment in lintFile that referenced a
   never-implemented passSkippedSink scratch field.

Findings 5 & 6 (no code change — PR-body notes for team-lead):

  5. Plan §T8.2 line 2363 says `golang.org/x/tools/go/analysis/passes`
     framework, but `/passes` is the directory of canonical reusable
     analyzers. The actual framework is `golang.org/x/tools/go/analysis`
     (which is what we import). Likely a plan typo; flag for
     post-merge retrospective.

  6. go.mod promotes golang.org/x/tools from indirect to direct.
     Already-transitive dep, no go.sum changes, no new modules. Should
     be fine but flagged for team-lead per W-7 trigger-list rigor.

Smoke-test re-verification on workflow repo: 6 genuine findings (down
from 7), zero false positives. -h now correctly streams to stdout for
both top-level and per-mode invocations.

37/37 tests pass with -race; build clean; vet clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(codemod): T8.2 review round-2 — tab-delimited marker, literal-false guard, adjacent-suffix rejection

* feat(codemod): refactor-plan mode (canonical pattern detection + rewrite); honors // wfctl:skip-iac-codemod marker

* feat(codemod): refactor-apply with informative non-canonical idiom reports; honors // wfctl:skip-iac-codemod marker

* feat(codemod): add-validate-plan mode (no-op stub injection); honors // wfctl:skip-iac-codemod marker

* chore(make): add migrate-providers target for workspace-wide codemod

* fix(codemod): T8.7 verification — exclude _worktrees and other underscore-prefixed dirs from walk

* fix(codemod): Copilot review round 1 — 9 critical + 2 important findings

Round-1 review on PR #538 surfaced 11 substantive findings; all addressed:

Critical (real bugs that broke compile or silently dropped logic):

 #1 [lint, refactor-plan] Rewrite target wrong — `wfctlhelpers.Plan` does
    not exist in the repo today. Pivoted to `platform.ComputePlan` (the
    real helper at platform/differ.go:72). Both targets now accepted by
    the lint analyzer for forward-compat with rev0 fixtures. Plan-doc
    §T8.3 named the wrong helper; flagged for retro.
 #2 [refactor-plan] rewritePlanBody only renamed `_` ctx params. A
    method declared `Plan(c context.Context, ...)` would be rewritten
    referencing undefined `ctx`. Now: any non-blank ctx-name preserved;
    only blank `_` renamed to `ctx`.
 #3 [refactor-plan] isCanonicalPlanBody too loose — extra side-effects
    inside the desired loop still classified as canonical. Tightened to
    require exactly the 3-statement template (lookup + !exists guard +
    configHash compare), no else branches, no trailing junk. Regression
    test: TestRefactorPlan_ExtraLoggingNotCanonical.
 #4 [refactor-plan, refactor-apply] SkipMarker only consulted on
    fn.Doc. PR description promised type-doc + GenDecl-doc honoring.
    Added receiverTypeDocs + carriesMarker; both modes now check all 3
    doc levels.
 #5 [refactor-apply] hasCanonicalCases only checked case labels. Bespoke
    bookkeeping inside a case body (logging, metrics, alternate driver
    calls) classified as canonical and would be silently dropped on
    -fix. Added caseBodyIsCanonical whitelist (driver call, ResourceRef
    construction, ProviderID guard). Regression test:
    TestRefactorApply_ExtraBookkeepingNotCanonical.
 #6 [refactor-apply] custom-error-wrapping suggestion named fictional
    APIs (ApplyResultErrorHook / WrapActionError). Replaced with honest
    hand-port advice: skip-marker + manual switch, OR move wrap into
    driver methods so wfctlhelpers records it verbatim.
 #7 [add-validate-plan] Stub always emitted unqualified `*IaCPlan` /
    `[]PlanDiagnostic`. Files importing the interfaces module under a
    qualifier (e.g. `*interfaces.IaCPlan`) failed to compile after
    -fix. Added interfacesQualifier detector + qualified stub emission.
    Regression: TestAddValidatePlan_Fix_QualifiedSignature.
 #8 [add-validate-plan, lint] hasValidatePlanMethod /
    AssertProviderImplementsValidatePlan checked method NAME only.
    Wrong-signature ValidatePlan (e.g. takes a string) was treated as
    compliant even though interfaces.ProviderValidator wouldn't be
    satisfied. Added validatePlanSignatureMatches: shape-checks the
    receiver param + return slice (qualified-or-unqualified). Both
    callers now use it. Regression:
    TestAddValidatePlan_DryRun_FlagsWrongSignature.
 #9 [refactor-plan, refactor-apply, add-validate-plan] Single-file
    pass — providers whose Plan + Apply lived in sibling files were
    silently omitted. Added planLikeReceiversInDir: directory-wide
    method-set scan. Per-file fallback retained for isolated single-
    file targets.

Important:

#10 [lint] Per-file parse/type-check errors accumulated in
    report.errors but exit code stayed 0 if there were no findings —
    green CI hid coverage gaps. Now exits 1 on either findings OR
    errors.
#11 [refactor-apply] -report-file mode flag never appeared in usage
    text. Documented in main.go's global usage block (the `-h` path
    intercepts before the per-mode FlagSet).

Plan-doc gap surfaced for retro: §T8.3 line 2373 reads "replaces with
`return wfctlhelpers.Plan(ctx, p, desired, current)`", but no such
function exists; reality is `platform.ComputePlan`. Recurring defect
class (plan-literal vs reality gap, W-4/W-5/W-7/W-9/W-8). Documented
in planHelperImportPath docstring + this commit body.

* fix(codemod): Copilot review round 2 — 5 critical + 4 important findings

Round 2 surfaced 9 substantive findings; all addressed:

Critical (compile-break / contract-break):

 #1 [refactor-plan, lint] platform.ComputePlan returns IaCPlan BY VALUE,
    but provider Plan methods return *IaCPlan. Single-statement
    `return platform.ComputePlan(...)` rewrite produced uncompilable
    code. Switched to canonical 2-statement form:
        plan, err := platform.ComputePlan(ctx, p, desired, current)
        return &plan, err
    isAlreadyDelegatedPlanBody widened to recognise both the new shape
    and the legacy single-statement forms (idempotent across revs).
 #3 [refactor-plan] rewritePlanBody fell back to recvName="p" but
    didn't update the receiver decl when the source had an unnamed
    receiver (`func (*Provider) Plan(...)`). Rewritten call referenced
    undefined `p`. Added ensureReceiverName: injects identifier and
    mutates the AST. Regression: TestRefactorPlan_Fix_UnnamedReceiverGetsName.
    Also added: TestRefactorPlan_Fix_PreservesCustomCtxName for round-1
    finding #2 (custom ctx name preserved).
 #4 [refactor-apply] Same unnamed-receiver bug as #3. Same fix
    (ensureReceiverName + ensureCtxParamName + ensureNthParamName
    helpers shared with refactor-plan). Regression:
    TestRefactorApply_Fix_UnnamedReceiverGetsName.
 #5 [add-validate-plan] Stub always emitted `func (p *T) ValidatePlan(...)`
    even when the type used value receivers. Method-set mismatch made
    the type fail interfaces.ProviderValidator type assertion. Added
    providerReceiverConvention + receiverIsPointer; stub now matches
    the existing Plan/Apply convention. Regression:
    TestAddValidatePlan_Fix_ValueReceiverConvention.

Important (skip-marker not honored in lint, single-file pass):

 #6 [lint] AssertPlanDelegatesToHelper checked fn.Doc only, ignoring
    type-doc and GenDecl-doc skip markers. Added receiverTypeDocsForPass
    helper; analyzer now checks all 3 doc levels.
 #7 [lint] AssertApplyDelegatesToHelper — same fix as #6.
 #8 [lint] AssertDiffSetsNeedsReplaceForForceNew — same fix as #6.
 #9 [lint] lintFile passed only the target file to the analyzers, so
    cross-file method sets were invisible (same blind spot the
    refactor-* modes had in round 1). Now lintFile loads sibling
    non-test .go files from the same package directory and feeds the
    full slice to each analyzer; diagnostics for sibling files are
    dropped (the outer walker visits them in their own turn) so no
    duplicate findings.

All 4 modes now compile-clean rewrites + honor 3-level skip-marker +
package-aware method-set detection.

* fix(codemod): Copilot review round 3 — 6 critical + 1 important findings

Round 3 surfaced 7 substantive findings; all addressed:

Critical (compile-break / silent data loss):

 #1 [add-validate-plan] Directory-wide detection only widened `provs`
    in round 2; methodsByRecv stayed file-local. A provider with
    ValidatePlan in a sibling file (or value-receiver Plan/Apply
    declared elsewhere) would receive a duplicate or wrong-receiver
    stub. Now planLikeProviderMethodsInDir returns both the recv set
    AND the merged method slice; methodsByRecv carries the
    package-wide view (deduped by method name). Stub injection still
    only fires when typeDecls[recv] is non-nil so we never append
    to a sibling file.
 #2 [refactor-plan] isCanonicalPlanBody accepted ANY 2-result return
    statement at the trailing slot. A planner with the canonical
    scaffold but a bespoke return (cloned plan, propagated error
    value) would classify as canonical and the bespoke logic would be
    silently dropped. Tightened to require EXACTLY `return plan, nil`.
 #3 [refactor-plan] rewritePlanBody hardcoded "desired"/"current" as
    args. A canonical Plan with renamed params (e.g. `Plan(ctx,
    specs, state)`) would rewrite to references to undefined
    identifiers. ensureNthParamName now extracts the actual signature
    names.
 #4 [refactor-plan] rewritePlanBody hardcoded "platform" as the call
    selector. A file using `pf "github.com/.../platform"` wouldn't
    compile because `platform` is undefined (ensureImport sees the
    aliased import as satisfying the path check). Added pkgAliasFor
    helper; rewrite now uses whatever local name the file imports
    under.
 #5 [refactor-apply] caseBodyIsCanonical accepted ANY AssignStmt as
    canonical. Bookkeeping AssignStmts (metrics counters, map
    updates, accumulators) passed and would be silently dropped.
    Tightened to a narrow whitelist: multi-target driver call,
    single-target driver call (LHS=err), composite-literal
    construction, selector-assignment to ResourceRef-style fields
    (ProviderID/Name/Type). Anything else rejected.
 #6 [refactor-apply] Same import-alias issue as #4 for `wfctlhelpers`.
    pkgAliasFor reused; rewriteApplyBody now uses whatever local name
    the file imports under.

Important:

 #7 [lint] AssertProviderImplementsValidatePlan checked ts.Doc only,
    missing markers placed on the wrapping GenDecl. Aligns now with
    the receiverDoc.carriesMarker pattern used by the other 3
    analyzers (round-2 #6/#7/#8). typeDocsByName captures both
    TypeSpec.Doc and GenDecl.Doc.

Round-2 regression tests retained (TestRefactorPlan_Fix_UnnamedReceiverGetsName,
TestRefactorPlan_Fix_PreservesCustomCtxName,
TestRefactorApply_Fix_UnnamedReceiverGetsName,
TestAddValidatePlan_Fix_ValueReceiverConvention).
Round-3 fix verified end-to-end against an aliased-import fixture
(pf "github.com/.../platform" + wfh "github.com/.../wfctlhelpers"):
the rewritten output compiles cleanly under gofmt.

* fix(codemod): Copilot review round 4 — 6 critical-detection-loosening findings

Round 4 surfaced 6 findings, all real. The recurring theme: rev3's
pattern detectors were either too loose (accepted bookkeeping shapes
as canonical) or too rigid (literal package-name matching, breaking
on aliased imports).

Fixes:

 #1 [add-validate-plan] interfacesQualifier(file) returned "" when the
    type-only file (no Plan/Apply imports) received the stub via
    cross-file detection (round-3 #1). Stub then emitted unqualified
    types that wouldn't compile. Now: when the file lacks an interfaces
    import but ANY sibling does, fall back to "interfaces" qualifier
    AND inject the interfaces import into the type-file via AST printing
    (format.Node) before appending the stub. Added
    siblingUsesInterfacesImport helper.
 #2 [refactor-apply] isCanonicalCaseAssign accepted ANY composite
    literal (`x := <CompositeLit>`) as canonical. A bookkeeping struct
    construction (audit payload, metric envelope) silently passed.
    Tightened to require the literal type's name (qualified or
    unqualified) match "ResourceRef".
 #3 [refactor-apply] isDriverMethodCall only checked selector NAME
    (Create/Read/Update/Delete). Calls like `helper.Update(...)` or
    `metrics.Delete(...)` were misclassified as canonical driver
    dispatch. Added receiver-allowlist check: only `d`, `drv`, or
    `driver` accepted as driver-bound identifiers (matching the
    standard `d, err := p.ResourceDriver(...)` pattern in DO/AWS/GCP/Azure).
 #4 [refactor-apply, refactor-plan] isAlreadyDelegatedApplyBody and
    isAlreadyDelegatedPlanBody required literal `wfctlhelpers` /
    `platform` package idents. Files using aliased imports
    (`wf "..."`, `pf "..."`) were misreported as non-canonical even
    though they were valid delegations. Both functions now resolve
    the file's local alias via pkgAliasFor; literal names retained
    as fallbacks. Same fix for isPlatformComputePlanAssign (the
    helper inside isAlreadyDelegatedPlanBody).
 #5 [lint] AssertPlanDelegatesToHelper / AssertApplyDelegatesToHelper
    selector matchers required literal `platform` / `wfctlhelpers`
    package names. Same false-positive risk as #4 for aliased imports.
    Both analyzers now resolve the alias and accept either the
    aliased OR literal form.
 #6 [refactor-apply] caseBodyIsCanonical accepted ANY DeclStmt as
    canonical, so `var x SomeBookkeepingType` declarations passed
    even though they're exactly the bespoke logic the codemod is
    supposed to preserve. Tightened via isLocalOutPointerDecl: only
    `var <name> *<ResourceOutput-suffix>` accepted.

Smoke-tested against an aliased-import fixture (`wf "...wfctlhelpers"`
+ `pf "...platform"`):
- refactor-apply correctly classifies as already-delegated (was:
  misreported as missing-action-switch)
- lint reports 0 findings (was: false-positive
  AssertPlanDelegatesToHelper + AssertApplyDelegatesToHelper)

* fix(codemod): Copilot review round 5 — 9 deeper-detection findings

Round 5 surfaced 9 findings; all addressed. Recurring theme: the
detectors and reporters needed deeper structural verification (branch
contents, outer-shape, receiver-kind, package isolation, exit-code
semantics) — not just shape matching at one level.

Critical (silent data loss / repair regression):

 #1 [refactor-plan] rangeBodyMatchesCanonicalDesired only checked the
    guard expressions and statement count; never inspected what the
    `!exists` and `configHash != configHash` branch BODIES did. A
    planner with extra logic (telemetry, alternate action construction,
    different create/update payload) inside those branches was
    silently rewritten away. Added isCanonicalCreateBranchBody +
    isCanonicalUpdateBranchBody + isPlanActionsAppendAssign to verify
    the create branch is exactly `append+continue` and the update
    branch is exactly `append`.
 #2 [refactor-apply] classifyApplyBody verified only the switch
    shape; setup/teardown/result aggregation OUTSIDE the switch was
    silently dropped on -fix. Added isCanonicalApplyOuterShape: the
    Apply body must be exactly the 3-statement scaffold (result-init
    + range-loop + return result, nil).
 #3 [add-validate-plan] hasValidatePlanMethod ignored receiver kind.
    A value-receiver provider with a pointer-receiver ValidatePlan
    still failed the ProviderValidator type assertion (method-set on
    `T` does not include `*T` methods), but rev2 treated it as
    already-implemented. Now also requires receiver-kind match.
 #4 [lint] AssertProviderImplementsValidatePlan had the same
    receiver-kind blind spot. Now delegates to hasValidatePlanMethod
    (centralised + DRY).
 #5 [refactor-plan] isAlreadyDelegatedPlanBody accepted single-statement
    `return platform.ComputePlan(...)` (broken rev1 form) as
    already-delegated, so rerunning the fixed codemod never repaired
    output from the earlier broken rewrite. Now ONLY accepts the
    canonical 2-statement form; broken single-statement forms classify
    as non-canonical so a fresh -fix produces compilable output.
 #6 [refactor-plan] planLikeProviderMethodsInDir merged methods from
    every non-test .go file regardless of `package P` clause. Mixed-
    package or build-tagged directories could fold methods from
    unrelated packages into a synthetic provider. Added two-pass
    package-clause check: aggregate only files matching the dominant
    package.

Important (CI fidelity / detector recall):

 #7 [Makefile, lint] `|| true` in migrate-providers swallowed real
    execution failures alongside expected advisory findings, because
    lint returned 1 for both findings AND parse errors. Split the
    exit codes: 0 clean / 1 findings / 2 errors. Makefile now gates
    on `[ $? -ne 0 ] && [ $? -ne 1 ]` so parse errors fail the
    target.
 #8 [refactor-plan] Canonical matcher hardcoded the lookup flag
    name as `exists`. The semantically-identical `cur, ok :=`
    idiomatic Go form was reported non-canonical. Widened to accept
    both `exists` and `ok`.
 #9 [refactor-apply] isDriverMethodCall allowlist {d, drv, driver}
    missed common alternates. Widened to {d, dr, drv, rdrv, driver,
    resourceDriver}. Still rejects bookkeeping receivers like
    `metrics`, `audit`, `helper` (preserves round-4 #3 fix).

End-to-end verification: lint against DO plugin produces exit 1 (3
advisory findings, no errors); broken-Go-source produces exit 2;
clean source produces exit 0. Smoke-tested via /tmp/iac-codemod.

* fix(codemod): Copilot review round 6 — type-doc skip-marker honored across sibling files

Round 6 surfaced 1 finding:

#1 [refactor-plan, refactor-apply, lint] receiverTypeDocs ran per-file
   only, so a `// wfctl:skip-iac-codemod` marker placed on a SIBLING
   file's type declaration was ignored when processing methods in the
   primary file. Round-3's directory-wide method-set scan made this
   layout possible (provider type in types.go, Plan/Apply in
   provider.go, skip-marker on the type), but the type-doc lookup
   wasn't widened in tandem. Effectively: providers explicitly opted
   out at the type-doc level were still rewritten if their methods
   were in a different file from the type.

Fix:

- Added receiverTypeDocsInDir(dir, primary) — merges receiverTypeDocs
  across every non-test .go file in dir whose `package P` matches the
  dominant package. Honors the same dominant-package filter introduced
  in round-5 #6 to keep build-tagged / mixed-package directories safe.
- refactor-plan + refactor-apply switched from receiverTypeDocs(file)
  to receiverTypeDocsInDir(filepath.Dir(path), file).
- lint's receiverTypeDocsForPass refactored to build a SINGLE merged
  map across pass.Files (which is already directory-wide after
  round-2 #9) and return it per-file. First-occurrence wins.

add_validate_plan unaffected: stub injection only fires when
typeDecls[recv] != nil (type IS in the current file), so its
skip-marker check on ts.Doc was never the cross-file scenario.

* fix(codemod): Copilot review round 7 — 6 cross-file + detection-tightening findings

Round 7 surfaced 10 findings; 4 were stale (already fixed in R6). 6
real findings addressed:

Critical (compile-break / silent data loss):

 #1 [refactor-plan] isPlanActionsAppendAssign verified the LHS but
    not append's first argument. A bespoke `plan.Actions =
    append(otherSlice, ...)` was misclassified as canonical and the
    alternate-slice logic silently dropped during rewrite. Now both
    LHS and append's first arg must reference plan.Actions.
 #3+#9 [refactor-apply] isCanonicalApplyOuterShape only checked the
    outer 3-statement scaffold; per-action logic INSIDE the for-loop
    body (logging, metrics, custom error handling, accumulators) was
    silently dropped on -fix. Added isCanonicalApplyLoopBody +
    isCanonicalApplyLoopAssign + isCanonicalApplyLoopIf +
    isCanonicalApplyLoopIfBodyStmt: every loop-body statement must
    match a tight whitelist (driver lookup, var-out decl, action
    switch, err-/out-guard ifs).
 #7+#8 [add-validate-plan] provs[recv].Pos() panicked when the
    TypeSpec was nil (cross-file scenario from round-3 #1: type
    declaration in sibling file). Now defaults Pos to NoPos for nil
    specs; sort still works (stable on name when Pos ties).

Important (cross-file consistency):

 #4 [add-validate-plan] qualifier fallback to "interfaces" fired
    based on whether ANY sibling imported interfaces — unreliable
    if THIS provider uses local types but an unrelated sibling
    imports interfaces. Replaced with qualifierFromProviderMethods:
    inspects the provider's OWN Plan/Apply parameter types
    (directory-wide via round-3 #1) for the qualifier they use.
 #5 [add-validate-plan] skip-marker check only consulted typeDecls
    (current file). When Plan/Apply are here but the type with
    `// wfctl:skip-iac-codemod` lives in a SIBLING file, the marker
    was ignored. Added siblingTypeDocs lookup via
    receiverTypeDocsInDir (the round-6 helper).
#10 [add-validate-plan] sibling-method merge deduped by method NAME
    only. If local file has wrong-signature ValidatePlan and sibling
    has correct one, sibling dropped, hasValidatePlanMethod saw only
    bad declaration, injected duplicate stub. Replaced with
    isLocalDuplicate: dedupes by name + parameter arity + result
    arity, so distinct signatures both survive.

Stale findings (already fixed in R6, no action needed):
 #2 refactor-apply receiverTypeDocsInDir already in place
 #6 lint receiver-doc lookup already merged via receiverTypeDocsForPass

Smoke-tested against DO plugin: refactor-plan reports DOProvider.Plan
canonical, refactor-apply reports DOProvider.Apply upsert-recovery
with the upsertSupporter suggestion. Output matches T8.7 baseline.

* fix(codemod): Copilot review round 8 — 9 dedup + skip-marker + identifier-name findings

Round 8 surfaced 9 findings; all addressed:

Critical (silent data loss / behavior change):

 #1 [add-validate-plan] isLocalDuplicate compared by name+arity only.
    Wrong-signature ValidatePlan(name string) []PlanDiagnostic and
    correct ValidatePlan(plan *IaCPlan) []PlanDiagnostic have same
    arity but different types — sibling-correct dropped, duplicate
    stub injected. Replaced with signature-fingerprint dedupe
    (signatureFingerprint + typeFingerprint walk all type shapes).
 #4 [refactor-apply] `default:` case clauses accepted without body
    inspection. Logging/metrics in default body silently dropped.
    Added isCanonicalDefaultBody: only `err = fmt.Errorf("unknown
    action %q", ...)` accepted.
 #5 [refactor-apply] isCanonicalApplyLoopAssign accepted any
    `<x>.ResourceDriver(...)`. `helper.ResourceDriver(...)` /
    `plan.ResourceDriver(...)` falsely classified. Now requires the
    receiver to match the provider's own receiver identifier
    (threaded through from classifyApplyBody).
 #8 [refactor-apply] Bare `if err != nil { continue }` accepted as
    canonical, but wfctlhelpers ALWAYS records ActionError before
    continuing — the rewrite would silently change behavior. Now
    requires the if-body to ALSO append to result.Errors before any
    continue/break.

Important (skip-marker scope + identifier flexibility):

 #2 [add-validate-plan] Skip-marker check fired on EVERY method's
    fn.Doc — a marker on Destroy/Status/etc. accidentally suppressed
    the whole provider's analysis. Restricted to Plan/Apply (the
    provider-defining methods).
 #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2.
 #6 [refactor-plan] Canonical detector hardcoded `current`/`desired`
    body identifiers. Providers using `state`/`specs` reported
    non-canonical despite rewriter preserving names. Added
    nthParamName extraction; isCanonicalPlanBody now takes the actual
    parameter names.
 #7 [refactor-apply] Driver-receiver allowlist comment claimed `rd`
    accepted, but the switch was missing it. Added.
 #9 [refactor-apply] Canonical detector hardcoded `result` /`plan`
    identifier names. Providers using `res` /`pl` rejected. Now
    recovers actual identifier from signature (planName) and from
    statement-1 LHS (resultName); both must be consistent within the
    body but can be any identifier.

Smoke-tested against DO plugin: refactor-plan / refactor-apply still
report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery
with stable upsertSupporter suggestion. Output matches T8.7 baseline.

Removed redundant TestRefactorApply_Fix_UnnamedReceiverGetsName: the
unnamed-receiver path can't have a canonical-shape Apply body
(`<recv>.ResourceDriver(...)` requires recv in scope). Receiver-name
injection is shared between refactor-plan and refactor-apply via
ensureReceiverName; coverage stays in
TestRefactorPlan_Fix_UnnamedReceiverGetsName.

* fix(codemod): Copilot review round 9 — 4 behavior-preservation findings

Round 9 surfaced 4 findings; all addressed:

Critical (silent behavior change):

 #1 [refactor-apply] If-guard body accepted bare `break`, but
    wfctlhelpers.ApplyPlan records the error and KEEPS processing
    later actions. A `break` would silently change loop semantics
    on rewrite. Now only `continue` is accepted in if-guard bodies.
 #2 [refactor-apply] Driver-method allowlist accepted `Driver` /
    `DriverFor` alongside `ResourceDriver`. wfctlhelpers dispatches
    SPECIFICALLY through IaCProvider.ResourceDriver; a wrapper like
    `provider.Driver(...)` would have its caching/instrumentation
    bypassed. Restricted to `ResourceDriver` only.

Important (false positives / cross-file alias mismatch):

 #3 [add-validate-plan, lint] Receiver-kind enforcement was too
    strict. Per Go spec, `*T`'s method set includes BOTH
    pointer-receiver and value-receiver methods of T. So a
    value-receiver ValidatePlan on a pointer-receiver provider IS
    valid (satisfies ProviderValidator). hasValidatePlanMethod now
    only requires strict matching when the provider uses VALUE
    receivers (T's method set excludes *T methods).
 #4 [add-validate-plan] When the qualifier was derived from a
    sibling method's aliased import (e.g. `iface "github.com/.../interfaces"`),
    the post-loop import injection used unaliased `ensureImport`,
    leaving the stub's `iface.IaCPlan` referring to undefined
    `iface`. Added ensureImportAs helper; now the import alias
    matches the stub's qualifier.

Smoke-tested against DO plugin: refactor-plan / refactor-apply still
report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery
with stable upsertSupporter suggestion. Output matches T8.7 baseline.

* fix(codemod): Copilot review round 10 — 8 cross-file + perf + tightening

Round 10 surfaced 8 findings; all addressed:

Critical (cross-file duplicate stub / silent override):

 #1 [add-validate-plan] Cross-file duplicate stub injection: when type
    is in file_a and Plan/Apply are in file_b, both files classified
    as missing-ValidatePlan and -fix injected duplicate stubs. Now
    only inject in the file containing the receiver TypeSpec
    (`if ts == nil { skip }`); the type-file's own pass handles it.
 #2 [add-validate-plan] Embedded-field promoted ValidatePlan not
    detected; -fix would shadow it with a no-op stub, silently
    dropping real plan diagnostics. Added typeHasEmbeddedFields:
    if the receiver type has any embedded fields, suppress the
    missing classification (we can't statically resolve method
    promotion without full type info, so err on the side of NOT
    injecting).
 #3 [lint] AssertProviderImplementsValidatePlan — same fix as #2.
 #4 [refactor-apply] ProviderID/Name/Type assignment-target whitelist
    didn't check struct identity. `audit.Type = ...` or
    `result.ProviderID = ...` (wrong struct) classified as canonical
    and dropped on rewrite. Now requires the LHS receiver to be
    `ref` (the canonical ResourceRef construction site name).

Important (perf / determinism / lint precision):

 #5 [lint] O(n²) lintFile re-parsed every sibling per-call. Added
    lintDirCache: lintPath now groups files by directory and builds
    one parse cache per dir, reused across the directory's files.
    Per-call fallback retained for single-file invocation.
 #6 [refactor-plan] planLikeProviderMethodsInDir's dominant-package
    selection used range-over-map (random iteration), so on a
    package-count tie the dominant could differ across runs and
    rewrite against the wrong method set. Sort the package names
    so tie-break is lexicographic-first (deterministic).
 #7 [lint] AssertPlanDelegatesToHelper accepted ANY platform.ComputePlan
    call ANYWHERE in the body. Now requires the canonical SHAPE:
    either the 2-statement rev2 form (matches isAlreadyDelegatedPlanBody)
    OR a single-statement legacy `return <X>.Plan(...)` /
    `return <X>.ComputePlan(...)`. Bespoke wrappers that call the
    helper as an intermediate step now correctly flag.
 #8 [lint] AssertApplyDelegatesToHelper — same fix: now uses
    isAlreadyDelegatedApplyBody (the rewriter's idempotency check)
    so anything but the canonical single-statement form flags.

Smoke-tested against DO plugin: refactor-plan / refactor-apply still
report DOProvider.Plan canonical / DOProvider.Apply upsert-recovery
with stable upsertSupporter suggestion. Output matches T8.7 baseline.

* fix(codemod): Copilot review round 11 — 6 polish + revert findings

Round 11 surfaced 6 findings; all addressed:

Critical (broken-output false-clean / mode clobber):

 #1 [lint] planBodyDelegatesCanonically accepted single-statement
    `return platform.ComputePlan(...)` (the BROKEN rev1 form,
    uncompilable due to value/pointer mismatch). Lint reported
    partially-migrated providers as clean, so migrate-providers
    silently missed them. Now ONLY the canonical 2-statement rev2
    form OR legacy `return wfctlhelpers.Plan(...)` is accepted;
    the broken single-statement platform form falls through to
    non-canonical so lint surfaces the still-needs-fixup state.
 #2 [refactor-plan] writeFileAtomic left the temp file at
    os.CreateTemp's default 0600 mode; rename clobbered the
    source's original permissions (e.g., 0644 → 0600). Added
    writeFileAtomicBytesPreserveMode: captures original mode via
    os.Stat and chmods the temp file before rename.
 #5 [add-validate-plan] Same 0600 mode-clobber bug in
    writeFileAtomicBytes. Now delegates to
    writeFileAtomicBytesPreserveMode.

Important (revert + comment polish):

 #3 [add-validate-plan] Round-10 #2's "any embedded field suppresses
    missing-ValidatePlan" was too broad — sync.Mutex, loggers,
    config mixins don't promote ValidatePlan, so real targets were
    silently missed. Reverted: report missing unconditionally.
    Maintainers whose providers actually promote ValidatePlan
    suppress with the explicit `// wfctl:skip-iac-codemod` marker.
 #4 [lint] AssertProviderImplementsValidatePlan — same revert as #3.
 #6 [refactor-plan] Stale enum comment for planAlreadyDelegated still
    referenced `wfctlhelpers.Plan` as the recognised shape;
    actual implementation recognises the 2-statement
    platform.ComputePlan form. Comment updated.

Removed dead typeHasEmbeddedFields helper (both call sites reverted
in #3/#4). Source-file mode preservation verified end-to-end:
chmod 0644 → -fix → stat shows 0644 retained. Smoke-tested against
DO plugin: refactor-plan / refactor-apply still report
DOProvider.Plan canonical / DOProvider.Apply upsert-recovery with
stable upsertSupporter suggestion. Output matches T8.7 baseline.

* fix(codemod): Copilot review round 12 — 8 dispatcher + detection-tightening findings

Round 12 surfaced 8 findings; all addressed:

Critical (CLI bug + silent rewrite of wrong file):

 #1 [main.go] Top-level dispatcher used a single FlagSet with only
    -dry-run and -fix registered, so any mode-specific flag (e.g.
    refactor-apply's -report-file) failed with "flag provided but
    not defined" BEFORE the mode could parse it. -report-file was
    documented but UNUSABLE from the CLI entrypoint. Replaced
    stdlib FlagSet with a manual-scan loop in run(): -dry-run/-fix
    are extracted; everything else (including unknown flags) flows
    through to the mode's own FlagSet. Bonus: flag-position
    flexibility (`/path -fix` now works), updated test +
    usage text accordingly.
 #2 [refactor-plan] Walked every .go file but built provs/typeDocs
    from only the dominant package. Mixed-package or build-tagged
    directories: a non-dominant file with overlapping receiver
    names was processed against another package's method set,
    rewriting the wrong file. Added dominantPackageForDir; each
    file processor now skips files in non-dominant packages.
 #3 [refactor-apply] Same fix as #2.
 #4 [add-validate-plan] Same fix as #2.

Important (canonical-detection precision):

 #5 [refactor-plan] isPlanActionsAppendAssign didn't validate the
    appended action's payload — `plan.Actions = append(plan.Actions,
    PlanAction{Action: "queue"})` was misclassified as canonical and
    silently rewritten. Added `expectedAction` parameter; create
    branch requires `Action: "create"` and update branch requires
    `Action: "update"`.
 #6 [refactor-apply] hasCanonicalCases verified case labels but not
    that the body's driver call MATCHED the label. A `case "create"`
    body that called `.Update()` or `.Delete()` was misclassified
    and silently rewritten away. Added caseBodyMatchesLabel: scans
    each case body for driver method calls and verifies the label-
    to-method mapping (create→Create, update→Update, delete→Delete,
    replace→Update).
 #7 [refactor-apply] Driver-lookup check accepted any
    `<recv>.ResourceDriver(<arg>)` regardless of <arg>. wfctlhelpers
    always dispatches with `action.Resource.Type`, so providers
    using a different lookup key (e.g. action.Tag, computed value)
    would see different driver behavior on rewrite. Now requires
    the lookup key to be exactly `action.Resource.Type`.
 #8 [lint] looksLikeProvider checked method NAMES + rough arity,
    so any unrelated type with `Plan(...)` and `Apply(...)` was
    treated as a provider (e.g., a deploy strategy). Tightened to
    verify signature shapes via type-name suffix matching:
    Plan must be `Plan(ctx, []ResourceSpec, []ResourceState)
    (*IaCPlan, error)` and Apply must be `Apply(ctx, *IaCPlan)
    (*ApplyResult, error)`. Qualified or unqualified accepted via
    typeNameTailMatches.

Smoke-tested:
- `iac-codemod refactor-apply -report-file <path> <dir>` now works
  (previously: "flag provided but not defined")
- DO plugin still reports DOProvider.Plan canonical / Apply
  upsert-recovery with stable upsertSupporter suggestion (T8.7
  baseline preserved)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 13, 2026
* fix(plugin/external): handle empty ConfigMessage for input-only STRICT_PROTO step contracts

Steps that declare STRICT_PROTO mode + InputMessage + OutputMessage but no
ConfigMessage (e.g., step.eventbus.ack, step.eventbus.publish) failed engine
initialization with:

  STRICT_PROTO contract for config message "" cannot use legacy Struct
  fallback: missing protobuf message name

The step has no per-instance config schema — data flows through the input
message. Engine now treats empty ConfigMessage as "no typed config", encodes
cfg as legacy *structpb.Struct, returns nil typed payload. Plugin's typed
factory reads from InputMessage as designed.

Caught by BMW PR #278 image-launch smoke against v0.51.3 + eventbus v0.3.0
(steps.eventbus.{ack,publish,consume} have empty ConfigMessage).

Test: TestCreateTypedConfigRequestEmptyConfigMessageStrictProto.

* fix: address Copilot review — comment scope + test asserts both nil + non-nil cfg paths

* docs(#617): design for godo removal from workflow core

Force-cutover single-PR plan: delete 11 legacy DO modules+steps (~3042 LOC),
strip 8 registration sites, remove godo from go.mod, add load-time migration
error pointing to workflow-plugin-digitalocean + infra.* IaC types.

AWS SDK audit deferred to follow-up issue (will auto-progress after merge).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise design per adversarial review cycle 1

C-1 fix: add step-type migration guard (5 step.do_* types) alongside the
module-type guard; error message branches on plugin-loaded detection.
I-1 fix: parity matrix split into per-step rows; step.do_logs and step.do_scale
flagged as GAPs with pre-merge follow-up issues in workflow-plugin-digitalocean.
I-2 fix: migration error has two branches — 'install plugin' vs 'config-only
issue, plugin already loaded'.
Minors: exact grep invocation in T4; dns.go typo; infra_apply_test.go:1990
added to T2 review list.
Companion: wfctl modernize rules in scope of T5 (auto-rewrite YAML).
Considered approaches: added Option B' (build tag fence — rejected).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise design per adversarial review cycle 2

I-1: platform_doks_test.go (164 LOC) added to deletion inventory.
     Total now 12 files / ~3206 LOC; T1 scope updated.
m-1: wfctl modernize flag corrected (--apply, not --write).
m-2: example/ sub-module go.mod also pins godo as indirect; T4 now runs
     go mod tidy in both root and example/, plus a second grep over go.mod
     files to catch residual indirect dependencies.

Cycle-1 fixes verified to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): incorporate adversarial cycle 3 minor amendments (PASS)

m-1: grep gates now !-prefixed to fail CI on match (|| true was silent no-op).
m-2: plugin-loaded detection simplified to single factory-map lookup.
m-3: workflow-scenarios migration sequencing constraint added.
t-1: T2 file count 9→10.

Cycle 3 verdict PASS (0 Critical / 0 Important / 3 Minor incorporated).
Pipeline advances to writing-plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): implementation plan (5 tasks, 1 PR)

Single-PR force-cutover, 5 tasks:
T1: delete 12 legacy DO files (~3206 LOC)
T2: strip 10 registration sites + remap wfctl detection hooks
T3: add legacy-type migration error guards (module + step paths)
T4: go mod tidy + CI grep gate
T5: docs + CHANGELOG + migration guide + wfctl modernize rules + file
    follow-up issues in workflow-plugin-digitalocean (logs/scale GAPs)
    and workflow (AWS audit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise plan per adversarial review cycle 1 (plan phase)

C-1 fix: T3 engine test uses NewStdEngine(app, logger) + AddModuleType()
         per engine.go:146,210; package workflow.
C-2 fix: T3 step test uses module.NewStepRegistry().Create() per
         pipeline_step_registry.go:18,32.
I-1 fix: T2 test calls KnownModuleTypes() / KnownStepTypes() directly
         (invented buildTypeRegistry() was never a thing).
I-2 fix: iacProviderLoaded is now sync/atomic.Bool with IsIaCProviderLoaded()
         accessor — eliminates race with parallel tests under go test -race.
I-3 fix: gap-type modernize test covers all 3 gap types (do_logs, do_scale,
         do_networking) — previously only first two.
m-1: acknowledged walkTypeNodes vs walkNodes duplication; documented intent.
m-2 fix: module.RemovedInVersion constant; no more v0.52.0 sprinkled in 7+ places.
m-3 fix: modernize/testdata/legacy-do-config.{,.expected}.yaml committed;
         end-of-PR checklist points at it.

End-of-PR checklist: added mandatory `go test -race ./...`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise plan per adversarial review cycle 2 (plan phase)

C-1 fix (scope-limit Option 2): modernize Fix only renames type:, does NOT
   inject config.provider:digitalocean. Migration guide now has explicit
   manual provider-add step + example YAML + error string user will see.
C-2 fix: cmd/wfctl/deploy.go added to T2 (platform.* prefix collector +
   "no platform.* modules" error message — both updated to include infra.*).
I-1: newTestEngine intentional plugin omission documented.
I-2: T5 includes comment-hygiene cleanup for hasPlatformModules / isInfraType.
m-1 fix: newTestEngine uses mockLogger{} matching engine_test.go pattern.
m-2 fix: legacyDORemovedInVersion duplicated in modernize package (import
   cycle prevents shared constant) with keep-in-sync comment.
m-3 fix: AWS issue body now derives in-scope list from a runtime grep
   rather than copying speculative names.

Cycle 1 plan-phase fixes verified to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise plan per adversarial review cycle 3 (plan phase)

C-1 fix: drop redeclared mockLogger from engine_legacy_do_migration_test.go;
   reuse the existing in-package type from engine_test.go:482.
C-2 fix: drop legacyDORemovedInVersion duplicate; no import cycle exists
   (verified via go list). modernize now imports module and uses
   module.RemovedInVersion directly. Single source of truth.
I-1 fix: add TestLegacyDOStepError_PluginLoaded (was missing — only
   not-loaded branch was tested for steps).
m-1 fix: actions/checkout@v5 → @v4 (repo standard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise plan per adversarial review cycle 4 (plan phase)

C-1 fix: extract shared constants/formatters to internal/legacydo leaf
   package. Earlier cycle's "no import cycle" claim was wrong:
   module→plugin→modernize is a real transitive chain (verified via
   go list -deps). modernize cannot import module. Both packages now
   import only the leaf legacydo package.
I-1 fix: replace package-level atomic.Bool iacProviderLoaded global with
   StepRegistry instance field. Per-registry state; parallel tests can
   own fresh NewStepRegistry() instances; no global mutation between
   tests. Engine sets the field via r.SetIaCProviderLoaded(loaded) just
   before pipeline construction.
I-2 fix: design doc drops the credential-registry-zero-DO-entries test
   (unimplementable — credentialResolvers is unexported). Rationale:
   registry is additive via init(); deleting file removes init() — self-
   evidencing. No API-surface-for-test added.
m-1 fix: T2 spec includes rename of platformModules local variable to
   deployTargetModules in cmd/wfctl/deploy.go.

Cycle 1/2/3 plan-phase fixes verified to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise plan per adversarial review cycle 5 (plan phase)

C-1 fix: schema.ValidateConfig fires at engine.go:400 BEFORE the factory
   loop at :506. Removing legacy DO types from schema/schema.go alone
   would cause the generic schema error to mask the actionable migration
   message. T3 now appends legacydo.ModuleTypes + StepTypes to
   schema.WithExtra{Module,Step}Types so schema passes them through to
   the factory guard — the real rejection point.
I-1 fix: e.stepRegistry is interfaces.StepRegistrar; SetIaCProviderLoaded
   is not on the interface. Plan now uses the type-assertion pattern
   from engine.go:163,216 (matches precedent; interface NOT widened).
I-2 fix: stale "T3 introduces a package-level atomic" comment in the
   end-of-PR checklist updated to reflect the per-registry instance field.
m-1 fix: legacyDORule() unexported (matches peers); test in internal
   package modernize (matches sibling test files); external modernize
   import dropped.

Cycle 1-4 plan-phase fixes verified to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise plan per adversarial review cycle 6 (plan phase)

C-1 fix (two parts):
  - Phantom schema.WithExtraStepTypes: schema.ValidateConfig only checks
    module types, not step types. Step migration guard at StepRegistry.Create
    is correctly the sole gate. Step-types schema-injection sentence/loop
    deleted from T3.
  - wfctl validate path: cmd/wfctl/validate.go and ci_validate.go call
    schema.ValidateConfig directly (not via engine.BuildFromConfig). Without
    a hook, AC3 fails on these commands. T2 now includes both files: inject
    legacydo.ModuleTypes into opts + add post-ValidateConfig legacy sweep
    emitting legacydo.Format{Module,Step}Error.

I-1 fix: `if len(...) > 0 || true` replaced with unconditional code
   (staticcheck SA4010 was a CI lint blocker).

m-1: cycle-5 history line referenced the now-removed step-types injection;
     implicit fix via T3 edit.

Cycle 1-5 fixes verified to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): revise plan per adversarial review cycle 7 (plan phase)

C-1 fix: validate/ci_validate post-pass step sweep was incorrect —
   cfg.Pipelines is map[string]any (verified config/config.go:149), not
   a typed slice. T2 now uses yaml.Marshal/Unmarshal pattern matching
   engine.go configurePipelines. Also separates ciValidateFile's
   accumulating errs=append from validateFile's early-return.
I-1 fix: added TestValidateFile_LegacyDOModule_ReturnsActionableError
   and TestCIValidateFile_LegacyDOStep_ReturnsActionableError to T2
   to give AC3 automated coverage on the validate path.

Cycle 1-6 fixes verified to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#617): convert task headings to H3 for scope-manifest check

plan-scope-check.sh requires "### Task N:" headings (H3); plan was using H2.
PR Grouping rows reference Task 1-5 and the body must match. Now passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: lock scope for issue #617 godo removal (alignment passed)

* feat(#617): delete legacy DO modules (godo importers)

Removes 12 files / ~3206 LOC. Registration sites cleaned in T2.

* platform_do_app.go + test
* platform_do_database.go + test
* platform_do_dns.go + test
* platform_do_networking.go + test
* platform_doks.go + test
* cloud_account_do.go (DO credential resolvers + doClient())
* pipeline_step_do.go (5 DO App Platform step types)

Adds godo_absent_test.go as a regression gate inside module/.

* feat(#617): strip DO registration sites + remap wfctl detection hooks

* plugins/platform: drop 5 module + 5 step factories and manifest entries.
* schema/*: drop 10 entries from module/step type lists + schema descriptions.
  Update editor-schemas.golden.json to match.
* cmd/wfctl/type_registry.go: drop 10 legacy DO type entries.
* cmd/wfctl/{infra.go,deploy_providers.go,ci_run_dryrun.go}: remap
  isContainerType and deployTargetTypes to remove platform.do_app.
* cmd/wfctl/deploy.go: extend prefix check to include infra.* + rename
  platformModules → deployTargetModules + update error message.
* module/multi_region.go: rewrite DOKS multi-region hint to point at
  infra.k8s_cluster + workflow-plugin-digitalocean.
* cmd/wfctl/infra_apply_test.go: replace platform.do_app negative-test
  fixture with example.legacy_unknown synthetic type.
* cmd/wfctl/{validate.go,ci_validate.go}: inject legacydo.ModuleTypes into
  schema opts + post-ValidateConfig sweep emits actionable migration errors.
* cmd/wfctl/deploy_test.go: update error message assertion.

Creates internal/legacydo/types.go (leaf package — stdlib only) with the
legacy-DO type maps and message formatters needed by T3's engine/step-registry
guards and this task's wfctl validate edits.

Adds legacy_do_types_removed_test.go (registry-absence regression gate) +
TestValidateFile_LegacyDOModule_ReturnsActionableError and
TestCIValidateFile_LegacyDOStep_ReturnsActionableError (validate-path AC3).

* feat(#617): actionable migration errors for legacy DO types

Adds legacydo.FormatModuleError + legacydo.FormatStepError (already in
internal/legacydo from T2) and wires them into two rejection points:

  engine.go:508 (module path) — factory-loop guard now emits the
  actionable migration error for the 5 removed legacy DO module types,
  branching on whether iac.provider is already registered in the engine.

  pipeline_step_registry.go:Create (step path) — unknown-step guard
  now emits the actionable migration error for the 5 removed legacy DO
  step types, using the per-registry iacProviderLoaded field set via
  SetIaCProviderLoaded before pipeline construction.

  engine.go:393-398 — guarded WithExtraModuleTypes block replaced with
  unconditional injection that also includes legacydo.ModuleTypes so that
  schema.ValidateConfig passes legacy DO module types through to the
  factory-loop guard (schema rejection would mask the migration message).

SetIaCProviderLoaded bridges the boolean from engine to module package
via type assertion (interface deliberately NOT widened — no method burden
on alternate StepRegistrar implementors).

Each step type gets a per-step message; step.do_logs and step.do_scale
carry GAP messages with workarounds because no 1:1 pipeline-step
successor exists yet (follow-up issues in T5).

Tests: 5 module × 2 branches + 5 step × 2 branches = 12 sub-cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(#617): drop godo from go.mod + add CI grep gate

* go mod tidy on root and example/ drops github.com/digitalocean/godo
  (direct from root, indirect from example/).
* New CI job 'godo-banned' fails the build on any *.go import of godo OR
  any mention of godo in go.mod files. Excludes _worktrees, .worktrees,
  .claude (local agent state) and godo_absent_test.go (T1 regression gate
  that references the import path as a string literal, not an actual import).

This satisfies acceptance criterion #4 (dependabot bumps target the
provider repo, not workflow core).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(#617): wfctl modernize rule + migration guide + CHANGELOG

* New modernize rule "legacy-do-types": auto-rewrites 5 module types and 3
  of 5 step types to infra.*; flags but does not modify the two GAP step
  types (step.do_logs, step.do_scale) and the 1→2 platform.do_networking
  split. Registered in AllRules().
* testdata/legacy-do-config.yaml: smoke-test fixture exercising all 10
  legacy types; testdata/legacy-do-config.expected.yaml: golden post-Fix
  output (types renamed, GAP types preserved, provider NOT auto-injected).
* CHANGELOG: v0.52.0 BREAKING entry.
* docs/migrations/v0.52.0-godo-removal.md: full migration guide with
  mapping tables, before/after YAML, error reference, rollback note.
  workflow-plugin-digitalocean follow-up issue URLs wired in:
    step.do_logs GAP → GoCodeAlone/workflow-plugin-digitalocean#107
    step.do_scale GAP → GoCodeAlone/workflow-plugin-digitalocean#108
* DOCUMENTATION.md: replace 10 legacy DO rows with pointers to the plugin
  and the migration guide.
* Comment hygiene: drop "legacy" framing from hasPlatformModules and
  parseInfraResourceSpecs doc comments (both functions correctly handle
  the surviving platform.kubernetes / platform.ecs module types).

Follow-up issues filed:
  GoCodeAlone/workflow-plugin-digitalocean#107 — step.iac_logs GAP
  GoCodeAlone/workflow-plugin-digitalocean#108 — step.iac_scale GAP
  #653 — AWS SDK audit (continuation of #617)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(#617): bump modernize rule-count expectation to include legacy-do-types

T5 appended legacyDORule (id: legacy-do-types) to AllRules() but missed
this counter test in cmd/wfctl/modernize_test.go. Single-line fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#617): make legacy DO step modernize findings non-fixable; fix migration guide step example

step.do_deploy/status/destroy require different config keys in their
successors (platform + state_store vs legacy app:) — auto-rewriting the
type alone produces an invalid config. Mark step findings Fixable: false,
remove step rewrites from Fix(), update testdata fixture and tests to
reflect unchanged step types post-modernize.

Also update FormatStepError to include required config keys in the
migration error message, and fix the migration guide pipeline-step example
to show the correct step.iac_apply config shape.

Addresses Copilot review comments:
- r3232996570: make step findings non-fixable (option a)
- r3232996648: fix migration guide step example config shape
- r3232996683: fix testdata fixture to leave step types unchanged
- r3232996732: add required config keys to step migration error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(#617): update migration guide step table to reflect non-fixable step rewrites

step.do_deploy/status/destroy are now flagged-not-rewritten (Fixable: false)
because their successors use different config keys. Update the step mapping
table Auto-fix column and the recipe description to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lint): add return after t.Fatal to resolve SA5011 nil-dereference false positives

staticcheck SA5011 flags t.Fatal()/t.Fatalf() as non-terminating because
testing.T.Fatal calls runtime.Goexit (not a panic/return), which staticcheck
does not model as a definite exit. Adding an unreachable `return` statement
after each t.Fatal in iac/conformance scenarios makes the nil-guard pattern
unambiguous to static analysis: execution cannot reach the pointer dereference
if result/res is nil.

Affected files:
- iac/conformance/scenario_delete_action.go
- iac/conformance/scenario_grpc_roundtrip.go
- iac/conformance/scenario_replace_cascade_preserves_dependents.go
- iac/conformance/scenario_upsert_on_already_exists.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address Copilot review round 2 — doc YAML shape + test coverage gap

Two corrections from Copilot's second-pass review:

1. docs/migrations/v0.52.0-godo-removal.md: plugin install snippet used a
   top-level `plugins:` sequence with `source:` which does not match the app
   config schema (ExternalPluginDecl has no source field; PluginsConfig wraps
   external under plugins.external:). Replace with the correct `wfctl plugin
   install` CLI command + wfctl.yaml manifest form (WfctlPluginEntry has source).

2. module/godo_absent_test.go: `filepath.Glob("*.go")` is non-recursive and
   only checks the current directory, not subdirectories. The comment claimed it
   covered "no file under module/", which was misleading. Switch to
   `filepath.WalkDir(".", ...)` to make the assertion match the comment's intent
   and guard against future subdirectory additions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 14, 2026
…ipt CI gate (+ design & plan) (#668)

* docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins

Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and
cloud.google.com/go + google.golang.org/api direct deps from
workflow core's module/ package.

Architecture: 3 extension surfaces, 3 strategies:
- IaC state backends → new IaCStateBackend strict proto contract;
  iac.state stays core, config.backend dispatches to plugin gRPC client.
- platform.* provisioners → new PlatformBackend strict proto contract;
  module types + provider: key stay core, kind backend stays in-core,
  cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract.
- standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload,
  storage.s3, storage.gcs) → plugin-native module/step types via the
  existing ModuleFactories/StepFactories SDK — no new contract.

Credentials (Option 1): each plugin-native module carries its own
credentials: block + builds aws.Config in-process; optional in-plugin
credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp
cloud_account files have no SDK import and stay.

4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp,
D digitalocean (spaces backend, minor bump + migration doc).

Includes Assumptions + Rollback sections + self-challenge top-3 doubts
(PlatformBackend over-generality, provider-separability fragility,
benchmark-could-invalidate-unary-default — all with mitigations
deferred to writing-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions

Addresses 2 Critical + 5 Important findings from adversarial-design-review:

Critical:
- iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit
  home: deleted by Phase B's core PR; Phase D reframed from soft-compat to
  a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2
  entirely" is now actually achieved by the phases as written + enforced by
  a go list -deps CI gate.
- kinesis: added Non-Goals entry explaining it's a transitive dep of
  modular/modules/eventbus/v2, not a direct workflow import — out of scope,
  with the go mod why chain documented so the literal ask is fully answered.

Important:
- Full grep-verified 13-file AWS inventory table in Phase B with per-file
  destinations; reconciled aws_api_gateway.go (route-sync module) vs
  platform_apigateway.go (provisioner) as two distinct files.
- aksBackend assigned to Phase A (Azure gets the PlatformBackend half too);
  platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with
  explicit always-compiles coordination.
- Proto contracts fold into existing plugin/external/proto/iac.proto
  (8 services already) instead of new files — matches precedent.
- New Security section: secret-redaction in config-version-store/tracing +
  gRPC interceptor logging are blocking writing-plans tasks; credentials_ref
  blast radius documented as strictly narrower than today's cloud.account.

Minor:
- IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore
  interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no
  speculative surface.
- Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR).
- IaCProviderRequired/ResourceDriver reuse promoted to a first-class
  Alternatives Considered entry with accept/reject rationale + retained as
  the gated fallback for PlatformBackend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions

Addresses 2 Critical + 3 Important from cycle-2 review:

Critical:
- platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure
  mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with
  its own import block). The "always compiles across phases" property is
  now structural, not asserted. Added a verified per-file import-ownership
  table.
- Corrected the false Phase A rationale: aksBackend uses raw net/http REST,
  NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend
  region). The Azure go.mod drop comes entirely from iac_state_azure.go
  deletion + iac_module.go edit; aksBackend extraction is code-organisation,
  not a dependency change.
- Documented the eksBackend → cloud_account_aws.go call-graph edge as a
  hard same-commit atomicity constraint (verified: eksBackend calls
  awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138).

Important:
- Phase B core-PR bullet now explicitly lists "strip the spaces case from
  iac_module.go" (was only obliquely referenced).
- New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds
  contract field; SaveState lost-response retry → documented idempotent
  (full-state replace, last-writer-wins); plugin-unreachable → abort before
  mutation; PlatformBackend mid-Apply crash → identical to today's
  in-process risk, no new mitigation.
- §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no
  body-logging interceptor (grpc.NewServer(opts...) passthrough; only
  callback_server.go logs, never module config). Writing-plans adds a
  guard test instead of a conditional interceptor.

Minor: file-count table footnoted (count = importers, not deletions);
shared s3compat module added as Alternatives Considered #3 (deferred,
not rejected); self-challenge doubt numbering tidied (2 mitigations
cover 3 doubts, intentionally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): fix stale Phase A/B refs + Status line post-cycle-2

sed in the cycle-2 commit ran from the wrong cwd — Status line still
said "cycle 1" and two interface-audit-spike references still said
"Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions

Addresses 2 Critical + 2 Important from cycle-3 review:

Critical (same root — symbol-level coupling the import-block audit missed):
- parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and
  safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers
  the plugin-bound backend files call. An import-block audit is symbol-blind.
  Fix: Phase 0 now does TWO moves — the file split AND relocating both
  helpers into a new SDK-free core module/cloud_helpers.go. Per-file table
  gains a "cross-file symbol deps (the trap)" column listing every helper
  edge per backend. Phase 0 acceptance criteria now include a grep that no
  core file references the helpers from their old homes.
- §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file
  (module shell + kubernetesBackend interface + safeIntToInt32) — NOT
  touched by the split; only platform_kubernetes_kind.go (holds all 4
  backends) is split. Earlier draft conflated the two files.

Important:
- Per-file ownership table relabelled "intended post-split — verified by
  the Phase 0 build gate" (was asserted-as-verified against an unsplit
  file — same hand-waving class cycle-2 flagged for "always compiles").
- lease_ttl_seconds DROPPED from the Phase A proto. It was a contract
  field with no enforced semantics and no implementing backend in scope —
  YAGNI. §Failure-modes orphaned-lock reworked: documented limitation +
  operator-side lock-object delete for recovery; TTL is a planned ADDITIVE
  follow-up paired with a conformance test, shipped with the first backend
  that honors expiry. Added explicit Lock-contention behavior (immediate
  error, matches today's in-process IaCStateStore.Lock — no new waiting
  state).

Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence
fixed; §Assumptions 2 notes Phase 0 de-risks it structurally.

Also: removed a stray stale cycle-1 copy of this doc that was sitting
untracked in the main workflow checkout (the canonical doc is here in
the feat/cloud-sdk-extraction worktree).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions

Addresses 2 Critical + 2 Important from cycle-4 review:

Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd
cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't
— that's eksBackend) and aksBackend has no cross-file deps (it does —
CloudCredentials/CloudCredentialProvider from cloud_account.go, same as
gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The
symbol-ownership map is now a Phase 0 build artifact —
scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a
design-doc claim that rots on every edit. The design commits to the
*method* + the *known shape* (cloud_account.go stays core; all 3 cloud
backends bind to it via k.provider.GetCredentials; eksBackend additionally
binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no
cloud SDK).

Critical 2 — Phase 0's "split into four, zero logic change" silently
dropped the single func init() that registers kind/k3s/eks/gke/aks.
Splitting REQUIRES partitioning init() per-file (a distribution, not
zero-change). Phase 0 now has an explicit step 2 for the init() partition;
relabelled "behavior-equivalent" not "zero logic change"; k3s documented
as reusing kindBackend (both stay core).

Important 1 — platform.* cloud credential flow across PlatformBackend was
unspecified (aksBackend needs CloudCredentials — how does it reach the
plugin?). Added: PlatformBackend requests carry a CloudCredentials proto
message; engine resolves k.provider.GetCredentials() in-core (config-map
parsing, no SDK) and serialises it. Unified with the Architecture-3
credentials story — ONE CloudCredentials proto shape for both surfaces,
so secret-redaction has one shape to redact.

Important 2 — core actually imports FOUR cloud SDK trees, not three:
godo is still in cloud_account_do.go + 5 platform_do_*.go files.
§Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT
(user's ask was 3 trees), and the go list -deps gate is reworded to
assert "zero packages from the three in-scope trees" not "zero cloud
SDKs". All "zero cloud SDKs" phrasing reconciled throughout.

Minor: ListStates filter + remaining-proto-messages notes folded in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions

Addresses 1 Critical + 2 Important from cycle-5 review:

Critical — the init()-partition fix (cycle-4) was kubernetes-only, but
the SAME defect class exists in platform_dns.go / platform_ecs.go /
platform_networking.go / platform_autoscaling.go: each has a single
func init() registering BOTH a core-staying `mock` backend AND a
plugin-bound `aws` backend. The old Phase B inventory moved those files
wholesale → would exile the mock backends + dangle the route53
registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go"
to a repo-wide uniform `_core.go` / `_<provider>.go` convention across
the WHOLE platform.* family. Every mixed init() is partitioned; the
audit script flags any init() registering a mix of core-staying +
plugin-bound factories as a CI failure. Phase B inventory rewritten to
delete only `_aws.go`/`_eks.go` files, never a mixed file.

Important 1 — the cycle-4 "Known shape" prose reintroduced
hand-maintained cross-file symbol claims (one already incomplete:
parseStringSlice consumers). FIX: cut all per-file symbol enumerations;
the section now states only invariants the script VERIFIES (not
discovers) + the method. No transcribed symbol lists remain.

Important 2 + own finding — cycle-4 said the engine resolves
credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's
awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile)
and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK.
FIX: §Architecture-2 corrected — engine passes the DECLARED credential
config (plain strings) in the CloudCredentials proto; the PLUGIN
resolves (incl. the SDK-bearing profile/role_arn paths). Both
cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B,
no core replacement — all AWS cred resolution moves plugin-side. azure/gcp
resolver files stay (their resolvers are genuinely SDK-free).

Minor — backend-name collision: core-reserved names (memory/filesystem/
postgres/kind/k3s/mock) cause a load-time error if a plugin collides,
not silent shadowing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions

Addresses 1 Critical + 2 Important from cycle-6 review:

Critical — cycle-5's credential-flow fix replaced one false claim with
another: it said the CloudCredentials struct already holds "declared
config (plain strings incl. profile)". VERIFIED FALSE — the struct
(cloud_account.go:18) has no Profile field (profile lives in Extra map)
and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner
than the struct change the reviewer proposed: the struct needs NO change
(Extra map already carries markers, RoleARN field exists). Instead,
cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails
of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig,
sts.AssumeRole) are removed; they keep their SDK-free heads (record
declared inputs + an Extra["credential_source"] marker, exactly as
awsStaticResolver already does). After the edit the file is SDK-free
and stays in core alongside the azure/gcp resolver files. Only
cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider
+ awsProviderFrom) is deleted; its profile-chain/STS logic moves into
the plugin's buildAWSConfig. Every in-core resolver becomes uniformly
"declare, don't resolve"; the plugin honors the markers. No unregistered-
resolver failure mode — the resolver init() registrations stay.

Important 1 — §Phase-0 misidentified the DNS file with the mixed init().
VERIFIED: platform_dns.go:66 has the init() (+ interface + factory
registry); platform_dns_backends.go has both impls + the route53 SDK
import, NO init(). DNS is a TWO-file split, unlike single-file
ecs/networking/autoscaling. §Phase-0 now states the per-family layout
explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling
one-file) and notes the audit script determines it.

Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit
deferred-resolution markers for env/CLI/managed-identity/workload-identity/
profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5
now state the plugin MUST implement marker handling for every deferred
type, not just AWS profile/role_arn.

Minor — safeIntToInt32 relocation rationale clarified (it's a clean
copy-source for the plugin-bound files, not a hard core necessity);
parseStringSlice IS a hard necessity (its file is deleted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions

Addresses 2 Critical from cycle-7 review (architecture confirmed sound;
these are the last two extraction-mechanic precision gaps):

C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the
awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a
clean contiguous tail, but awsRoleARNResolver's SDK block (base-config
build + sts.AssumeRole, ~45 lines) is the larger half of the method,
after the declared-input recording. FIX: §Architecture-2 re-characterizes
the edit as a deliberate Resolve() body REWRITE (not a one-line snip) —
explicitly per-resolver. Added a Phase B CI invariant: an import-block
grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go
has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not
prose-asserted.

C2 — cloud_account_aws.go defines FOUR symbols, not one; the
symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed:
- AWSConfigProvider interface signature names aws.Config → CANNOT stay
  in core, deleted with the file.
- awsProviderFrom → deleted with the interface.
- ValidateCredentials → verified NO real caller (only a comment ref in
  cmd/wfctl/deploy.go:866) → deletes cleanly.
- The 8 awsProviderFrom consumers are all verified plugin-bound — but
  each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the
  plugin there's no cloud.account to type-assert. §Cross-file-coupling
  invariant 3 now states Phase B must REWRITE all 8 consumers to obtain
  creds from the CloudCredentials proto + buildAWSConfig — explicit
  Phase B scope, not a footnote. Phase B table atomicity column updated.

Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in
Phase 0 so the dns family conforms to the uniform _core.go/_aws.go
naming; no special-case three-file layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main

Cycle-8 adversarial review caught the design's file/symbol inventory as
stale: it predated issue #653 (closed 2026-05-13), which already removed
the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild +
EKS backends.

Re-baselined every file/symbol claim against origin/main HEAD (worktree
confirmed 0 commits behind origin/main):

- Added "Relationship to issue #653" section — this design is #653's
  named successor, extracting the AWS surface #653 scoped out
  ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces.
- Problem table corrected: AWS 6 real-import files (not 13), Azure 3,
  GCP 3. storage_artifact_s3.go is comment-only — stays in core.
- cloud_account_aws.go is dead code — zero non-test consumers verified;
  deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers
  removed by #653).
- Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go);
  parseStringSlice + safeIntToInt32 no longer exist — helper-relocation
  task deleted.
- PlatformBackend now serves only aks + gke (eks already a #653 SDK-free
  stub); interface-audit spike audits one interface, not five.
- Phase B inventory rewritten; Phase A/C file lists corrected.
- Self-challenge doubt #4 + Assumption 7 added: inventory staleness is
  the cycle-8 defect class; audit script makes it CI-enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script

Cycle-9 adversarial review caught aksBackend mis-classified as an
azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match
is a stale doc comment (line 332) — aksBackend.azureToken is a plain
net/http OAuth2 client. An import-block-disciplined re-survey found a
second comment-only false positive: nosql_dynamodb.go.

Structural fix for the recurring "grep matched a comment" defect class:
added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks
(never comments) and emits the comment-immune real-import map. Its output
now populates every file table in the design — prose claims replaced by
a build artifact. Formalized + CI-wired in Phase 0.

Corrected inventory (audit-script output): AWS 5 real-import files (not
6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go
are comment-only stubs — out of scope, stay in core.

Design consequences of aksBackend being SDK-free:
- Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all
  stay in core.
- Architecture §2 no longer proposes a new PlatformBackend contract. The
  gke cross-process mechanism is gated on an interface-audit spike whose
  preferred outcome is folding into the existing ResourceDriver contract
  — a dedicated contract for one backend is YAGNI.
- Phase A (Azure) is now pure IaCStateBackend — touches no platform file.
- Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/
  eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and
  fixes the stale line-332 comment.
- The gke platform extraction + its contract decision move to Phase C.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit

Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9
audit script scanned only module/, missing five aws-sdk-go-v2 importers
under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal
("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by
the four phases as written.

Structural fix — third defect-class variant closed:
- audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and
  splits results module/ vs. elsewhere. Comment-immune (cycle 9) +
  scope-complete (cycle 10) + CI-enforced (Phase 0).

Whole-repo inventory result:
- Azure + GCP SDK usage is entirely module/-resident → Phases A and C
  drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate).
- aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files
  in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go.

Scope decision: the out-of-module/ AWS surface is exactly #653's
deliberately-retained "RBAC/secrets/artifact stay" scope (plus the
provider/aws deploy provider). This design does NOT unilaterally
override #653's recent documented decision — it scopes that surface OUT
(new Non-Goal, parallel to godo) and logs a recommended successor issue.

Consequences threaded through the doc:
- Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is
  module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the
  out-of-scope surface).
- Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP,
  module/-scoped zero for AWS.
- Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's
  scope decision is respected, not contested.
- Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check +
  sessionName extraction sit between declared-input recording and the
  SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale
  line-18 backend-list comment), M2 (internal/legacyaws noted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups

Adversarial review cycle 11: PASS (zero Critical, zero Important). Two
Minor nits applied:
- audit-cloud-symbols.sh: real_import now also matches single-line
  `import "..."` form, not just parenthesized blocks — closes the one
  latent parser false-negative the reviewer flagged.
- §Goals: clarified that the module/-scoped AWS-zero `--check` assertion
  is deferred-implementation added in Phase C (the committed script only
  enforces the cloud_account_aws_creds.go post-Phase-B invariant today),
  parallel to the Phase 0 init()-partition deferral.

Design phase complete — proceeding to writing-plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe

The cycle-11 single-line-import hardening added an inner `grep -E '^import "'`
whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`,
making real_import() return false for every file lacking a single-line
import. Added `|| true` on the inner grep. Verified: full report restored,
all REAL/comment-only classifications correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A)

Bite-sized TDD plan for the first executable increment: Phase 0 (split
platform_kubernetes_kind.go, fix the stale comment, wire the audit
script into CI) + Phase A (IaCStateBackend proto + benchmark-gated
proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging
guard, workflow-plugin-azure implementation, core deletion dropping
azure-sdk from go.mod).

14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on
plan — their concrete tasks depend on Phase A's outputs (the
benchmark-validated proto shape, the host-resolution pattern, the
plugin-side serve path), so planning them now would be fiction. The
design doc remains the authoritative B/C/D spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review

Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor).
All addressed:

- C1 (Critical): Task 4's proto used google.protobuf.Struct, which
  iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config
  as `bytes outputs_json`/`bytes config_json` (the established
  ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not
  structpb. Removed the bogus struct.proto import step.
- I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives
  there), not `cd plugin/external/proto`.
- I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up
  the new benchmarks automatically) — no redundant harness; clarified the
  task is a one-time decision gate.
- I3: Task 8's embedded research spike resolved at plan time — engine.go
  was read; integration is the design-sanctioned package-level
  module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal.
  Tasks 8/13/14 now have concrete file sets.
- I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo,
  workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly.
- M1: Task 5's benchmark file is now genuinely self-contained (local
  benchStateToProto + benchStateBackendServer; no forward references).
- M2: Task 3 names ci.yml directly, places the audit job beside the
  existing godo-banned/aws-sdk-banned grep-gate jobs.
- M3: Task 6 pins benchstat (go install + bare invocation).
- M4: Task 9 states the redaction gap is verified against
  step_output_redactor.go:7-19, not a live deduction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes

Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed
resolved; 2 new Important + 2 Minor surfaced by the new-defect scan.
All addressed:

- I-A (Important): Task 9's redaction test was inconsistent with the
  actual redactMap behavior — a key named `credentials` matches the
  existing `credential` pattern and is wholesale-replaced with the
  placeholder STRING before any recursion, so the test's
  `.(map[string]any)` assertion panicked. Reworked Task 9: the
  `credentials:` block is ALREADY redacted wholesale (regression-tested);
  the real gap is `credentials_ref` being over-redacted (it's a module
  name, not a secret) — fix is a narrow `*_ref`-suffix exemption in
  isSensitiveField, not camelCase leaf patterns (which would be dead code
  given wholesale redaction happens first).
- I-B (Important): Task 14's engine.go integration seam was
  under-specified and would fight loadPluginInternal's no-concrete-types
  precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now
  defines an `IaCStateBackendProvider` optional interface and
  type-asserts it in loadPluginInternal exactly like the existing
  stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter
  implements it. Concrete file set + code sketch added.
- M-i: Task 6's benchmark.yml description corrected (runs `go test
  -bench=.` inline, not `make bench-baseline`).
- M-ii: Task 4 notes the proto README's plugin.proto-specific wording is
  stale; trust root buf.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups

Plan-phase adversarial review cycle 3: PASS (zero Critical, zero
Important). Two Minor doc-tightening fixes applied:
- Task 9 Step 4 now names bearer_token_ref explicitly and explains why
  the *_ref exemption is safe for it (SecretRef is a reference struct,
  not a raw secret) rather than claiming no *_ref field exists.
- engine.go line citations corrected to 311-326.

Plan phase complete — proceeding to alignment-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: lock scope for cloud-sdk-extraction (alignment passed)

* refactor(module): split platform_kubernetes_kind.go into _core + _gke

Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/
aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core
init(); gkeBackend (the only SDK-bearing k8s backend) moves to
platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same
five backend names registered. Isolates the lone SDK-bearing platform
file for a later clean deletion.

* docs(module): add file-purpose headers to platform_kubernetes _core/_gke

Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition
self-documenting for readers without the commit message.

* docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend

aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk
consumer. The stale comment is what fooled an earlier inventory pass into
mis-counting platform_kubernetes_kind.go as an azure-sdk importer.

* ci(audit): enforce k8s-backend init() partition + run audit on every PR

Extends audit-cloud-symbols.sh --check with an init()-partition assertion
(platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only
gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned /
aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced
artifact rather than a prose claim.

* docs(plans): IaCStateBackend transport benchmark result — decision pending

Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a
worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar.

Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB
map[string]any (inherent to the bytes outputs_json wire format the
iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB
message cap. The plan's contingency remedy (streaming redesign) addresses
message-size-cap + memory-buffering, neither of which the benchmark hits;
streaming would not move the number.

Recommendation: retain unary (6.5ms is still negligible vs real cloud
backend I/O — the design's own bar-rationale). Deviation from the literal
5ms estimate-bar is surfaced to the operator, not absorbed silently.
Scope lock intact: Task 6 run + recorded, no task added/dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed)

Operator reviewed the 6.51ms benchmark + root-cause analysis and
confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's
gate resolves Unary LOCKED — the Task 4 proto stands, no streaming
redesign, PR 2/3 proceed unchanged.

Operator additionally raised a long-term architectural item: IaC state is
persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR)
with JSON-export + content-detection-on-read would be better for
processing/type-correctness/large-state scaling. Logged as a
post-extraction follow-up in both the benchmark decision record and the
design doc's Open items — distinct from the wire contract, cross-cutting
across all IaCStateStore impls, needs its own brainstorming pass. Not
actioned in this locked plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)"

This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954.

* docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4

Operator-approved scope amendment to the (reverted-to-Draft) plan:
- ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 /
  Task 15. Task 7 had to hardcode context.Background() in
  grpcIaCStateStore; the operator directed widening the interface now
  while we're at that boundary, so Phase B/C/D plugin backends inherit it
  ctx-ful. Bounded blast radius (~9 files, all in module/);
  interfaces.IaCStateStore already had ctx and is untouched.
- ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo.
  Operator: agents should operate in plugin repos directly; the real
  requirement is prompt clarity (absolute repo path stated up front), not
  a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes
  updated accordingly.
- Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented
  (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note
  updated to RESOLVED (unary locked).

Next: re-run alignment-check on the amended plan, then re-lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 14, 2026
…roto-lock (#669)

* docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins

Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and
cloud.google.com/go + google.golang.org/api direct deps from
workflow core's module/ package.

Architecture: 3 extension surfaces, 3 strategies:
- IaC state backends → new IaCStateBackend strict proto contract;
  iac.state stays core, config.backend dispatches to plugin gRPC client.
- platform.* provisioners → new PlatformBackend strict proto contract;
  module types + provider: key stay core, kind backend stays in-core,
  cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract.
- standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload,
  storage.s3, storage.gcs) → plugin-native module/step types via the
  existing ModuleFactories/StepFactories SDK — no new contract.

Credentials (Option 1): each plugin-native module carries its own
credentials: block + builds aws.Config in-process; optional in-plugin
credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp
cloud_account files have no SDK import and stay.

4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp,
D digitalocean (spaces backend, minor bump + migration doc).

Includes Assumptions + Rollback sections + self-challenge top-3 doubts
(PlatformBackend over-generality, provider-separability fragility,
benchmark-could-invalidate-unary-default — all with mitigations
deferred to writing-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions

Addresses 2 Critical + 5 Important findings from adversarial-design-review:

Critical:
- iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit
  home: deleted by Phase B's core PR; Phase D reframed from soft-compat to
  a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2
  entirely" is now actually achieved by the phases as written + enforced by
  a go list -deps CI gate.
- kinesis: added Non-Goals entry explaining it's a transitive dep of
  modular/modules/eventbus/v2, not a direct workflow import — out of scope,
  with the go mod why chain documented so the literal ask is fully answered.

Important:
- Full grep-verified 13-file AWS inventory table in Phase B with per-file
  destinations; reconciled aws_api_gateway.go (route-sync module) vs
  platform_apigateway.go (provisioner) as two distinct files.
- aksBackend assigned to Phase A (Azure gets the PlatformBackend half too);
  platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with
  explicit always-compiles coordination.
- Proto contracts fold into existing plugin/external/proto/iac.proto
  (8 services already) instead of new files — matches precedent.
- New Security section: secret-redaction in config-version-store/tracing +
  gRPC interceptor logging are blocking writing-plans tasks; credentials_ref
  blast radius documented as strictly narrower than today's cloud.account.

Minor:
- IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore
  interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no
  speculative surface.
- Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR).
- IaCProviderRequired/ResourceDriver reuse promoted to a first-class
  Alternatives Considered entry with accept/reject rationale + retained as
  the gated fallback for PlatformBackend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions

Addresses 2 Critical + 3 Important from cycle-2 review:

Critical:
- platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure
  mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with
  its own import block). The "always compiles across phases" property is
  now structural, not asserted. Added a verified per-file import-ownership
  table.
- Corrected the false Phase A rationale: aksBackend uses raw net/http REST,
  NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend
  region). The Azure go.mod drop comes entirely from iac_state_azure.go
  deletion + iac_module.go edit; aksBackend extraction is code-organisation,
  not a dependency change.
- Documented the eksBackend → cloud_account_aws.go call-graph edge as a
  hard same-commit atomicity constraint (verified: eksBackend calls
  awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138).

Important:
- Phase B core-PR bullet now explicitly lists "strip the spaces case from
  iac_module.go" (was only obliquely referenced).
- New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds
  contract field; SaveState lost-response retry → documented idempotent
  (full-state replace, last-writer-wins); plugin-unreachable → abort before
  mutation; PlatformBackend mid-Apply crash → identical to today's
  in-process risk, no new mitigation.
- §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no
  body-logging interceptor (grpc.NewServer(opts...) passthrough; only
  callback_server.go logs, never module config). Writing-plans adds a
  guard test instead of a conditional interceptor.

Minor: file-count table footnoted (count = importers, not deletions);
shared s3compat module added as Alternatives Considered #3 (deferred,
not rejected); self-challenge doubt numbering tidied (2 mitigations
cover 3 doubts, intentionally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): fix stale Phase A/B refs + Status line post-cycle-2

sed in the cycle-2 commit ran from the wrong cwd — Status line still
said "cycle 1" and two interface-audit-spike references still said
"Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions

Addresses 2 Critical + 2 Important from cycle-3 review:

Critical (same root — symbol-level coupling the import-block audit missed):
- parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and
  safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers
  the plugin-bound backend files call. An import-block audit is symbol-blind.
  Fix: Phase 0 now does TWO moves — the file split AND relocating both
  helpers into a new SDK-free core module/cloud_helpers.go. Per-file table
  gains a "cross-file symbol deps (the trap)" column listing every helper
  edge per backend. Phase 0 acceptance criteria now include a grep that no
  core file references the helpers from their old homes.
- §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file
  (module shell + kubernetesBackend interface + safeIntToInt32) — NOT
  touched by the split; only platform_kubernetes_kind.go (holds all 4
  backends) is split. Earlier draft conflated the two files.

Important:
- Per-file ownership table relabelled "intended post-split — verified by
  the Phase 0 build gate" (was asserted-as-verified against an unsplit
  file — same hand-waving class cycle-2 flagged for "always compiles").
- lease_ttl_seconds DROPPED from the Phase A proto. It was a contract
  field with no enforced semantics and no implementing backend in scope —
  YAGNI. §Failure-modes orphaned-lock reworked: documented limitation +
  operator-side lock-object delete for recovery; TTL is a planned ADDITIVE
  follow-up paired with a conformance test, shipped with the first backend
  that honors expiry. Added explicit Lock-contention behavior (immediate
  error, matches today's in-process IaCStateStore.Lock — no new waiting
  state).

Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence
fixed; §Assumptions 2 notes Phase 0 de-risks it structurally.

Also: removed a stray stale cycle-1 copy of this doc that was sitting
untracked in the main workflow checkout (the canonical doc is here in
the feat/cloud-sdk-extraction worktree).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions

Addresses 2 Critical + 2 Important from cycle-4 review:

Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd
cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't
— that's eksBackend) and aksBackend has no cross-file deps (it does —
CloudCredentials/CloudCredentialProvider from cloud_account.go, same as
gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The
symbol-ownership map is now a Phase 0 build artifact —
scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a
design-doc claim that rots on every edit. The design commits to the
*method* + the *known shape* (cloud_account.go stays core; all 3 cloud
backends bind to it via k.provider.GetCredentials; eksBackend additionally
binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no
cloud SDK).

Critical 2 — Phase 0's "split into four, zero logic change" silently
dropped the single func init() that registers kind/k3s/eks/gke/aks.
Splitting REQUIRES partitioning init() per-file (a distribution, not
zero-change). Phase 0 now has an explicit step 2 for the init() partition;
relabelled "behavior-equivalent" not "zero logic change"; k3s documented
as reusing kindBackend (both stay core).

Important 1 — platform.* cloud credential flow across PlatformBackend was
unspecified (aksBackend needs CloudCredentials — how does it reach the
plugin?). Added: PlatformBackend requests carry a CloudCredentials proto
message; engine resolves k.provider.GetCredentials() in-core (config-map
parsing, no SDK) and serialises it. Unified with the Architecture-3
credentials story — ONE CloudCredentials proto shape for both surfaces,
so secret-redaction has one shape to redact.

Important 2 — core actually imports FOUR cloud SDK trees, not three:
godo is still in cloud_account_do.go + 5 platform_do_*.go files.
§Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT
(user's ask was 3 trees), and the go list -deps gate is reworded to
assert "zero packages from the three in-scope trees" not "zero cloud
SDKs". All "zero cloud SDKs" phrasing reconciled throughout.

Minor: ListStates filter + remaining-proto-messages notes folded in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions

Addresses 1 Critical + 2 Important from cycle-5 review:

Critical — the init()-partition fix (cycle-4) was kubernetes-only, but
the SAME defect class exists in platform_dns.go / platform_ecs.go /
platform_networking.go / platform_autoscaling.go: each has a single
func init() registering BOTH a core-staying `mock` backend AND a
plugin-bound `aws` backend. The old Phase B inventory moved those files
wholesale → would exile the mock backends + dangle the route53
registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go"
to a repo-wide uniform `_core.go` / `_<provider>.go` convention across
the WHOLE platform.* family. Every mixed init() is partitioned; the
audit script flags any init() registering a mix of core-staying +
plugin-bound factories as a CI failure. Phase B inventory rewritten to
delete only `_aws.go`/`_eks.go` files, never a mixed file.

Important 1 — the cycle-4 "Known shape" prose reintroduced
hand-maintained cross-file symbol claims (one already incomplete:
parseStringSlice consumers). FIX: cut all per-file symbol enumerations;
the section now states only invariants the script VERIFIES (not
discovers) + the method. No transcribed symbol lists remain.

Important 2 + own finding — cycle-4 said the engine resolves
credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's
awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile)
and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK.
FIX: §Architecture-2 corrected — engine passes the DECLARED credential
config (plain strings) in the CloudCredentials proto; the PLUGIN
resolves (incl. the SDK-bearing profile/role_arn paths). Both
cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B,
no core replacement — all AWS cred resolution moves plugin-side. azure/gcp
resolver files stay (their resolvers are genuinely SDK-free).

Minor — backend-name collision: core-reserved names (memory/filesystem/
postgres/kind/k3s/mock) cause a load-time error if a plugin collides,
not silent shadowing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions

Addresses 1 Critical + 2 Important from cycle-6 review:

Critical — cycle-5's credential-flow fix replaced one false claim with
another: it said the CloudCredentials struct already holds "declared
config (plain strings incl. profile)". VERIFIED FALSE — the struct
(cloud_account.go:18) has no Profile field (profile lives in Extra map)
and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner
than the struct change the reviewer proposed: the struct needs NO change
(Extra map already carries markers, RoleARN field exists). Instead,
cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails
of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig,
sts.AssumeRole) are removed; they keep their SDK-free heads (record
declared inputs + an Extra["credential_source"] marker, exactly as
awsStaticResolver already does). After the edit the file is SDK-free
and stays in core alongside the azure/gcp resolver files. Only
cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider
+ awsProviderFrom) is deleted; its profile-chain/STS logic moves into
the plugin's buildAWSConfig. Every in-core resolver becomes uniformly
"declare, don't resolve"; the plugin honors the markers. No unregistered-
resolver failure mode — the resolver init() registrations stay.

Important 1 — §Phase-0 misidentified the DNS file with the mixed init().
VERIFIED: platform_dns.go:66 has the init() (+ interface + factory
registry); platform_dns_backends.go has both impls + the route53 SDK
import, NO init(). DNS is a TWO-file split, unlike single-file
ecs/networking/autoscaling. §Phase-0 now states the per-family layout
explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling
one-file) and notes the audit script determines it.

Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit
deferred-resolution markers for env/CLI/managed-identity/workload-identity/
profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5
now state the plugin MUST implement marker handling for every deferred
type, not just AWS profile/role_arn.

Minor — safeIntToInt32 relocation rationale clarified (it's a clean
copy-source for the plugin-bound files, not a hard core necessity);
parseStringSlice IS a hard necessity (its file is deleted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions

Addresses 2 Critical from cycle-7 review (architecture confirmed sound;
these are the last two extraction-mechanic precision gaps):

C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the
awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a
clean contiguous tail, but awsRoleARNResolver's SDK block (base-config
build + sts.AssumeRole, ~45 lines) is the larger half of the method,
after the declared-input recording. FIX: §Architecture-2 re-characterizes
the edit as a deliberate Resolve() body REWRITE (not a one-line snip) —
explicitly per-resolver. Added a Phase B CI invariant: an import-block
grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go
has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not
prose-asserted.

C2 — cloud_account_aws.go defines FOUR symbols, not one; the
symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed:
- AWSConfigProvider interface signature names aws.Config → CANNOT stay
  in core, deleted with the file.
- awsProviderFrom → deleted with the interface.
- ValidateCredentials → verified NO real caller (only a comment ref in
  cmd/wfctl/deploy.go:866) → deletes cleanly.
- The 8 awsProviderFrom consumers are all verified plugin-bound — but
  each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the
  plugin there's no cloud.account to type-assert. §Cross-file-coupling
  invariant 3 now states Phase B must REWRITE all 8 consumers to obtain
  creds from the CloudCredentials proto + buildAWSConfig — explicit
  Phase B scope, not a footnote. Phase B table atomicity column updated.

Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in
Phase 0 so the dns family conforms to the uniform _core.go/_aws.go
naming; no special-case three-file layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main

Cycle-8 adversarial review caught the design's file/symbol inventory as
stale: it predated issue #653 (closed 2026-05-13), which already removed
the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild +
EKS backends.

Re-baselined every file/symbol claim against origin/main HEAD (worktree
confirmed 0 commits behind origin/main):

- Added "Relationship to issue #653" section — this design is #653's
  named successor, extracting the AWS surface #653 scoped out
  ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces.
- Problem table corrected: AWS 6 real-import files (not 13), Azure 3,
  GCP 3. storage_artifact_s3.go is comment-only — stays in core.
- cloud_account_aws.go is dead code — zero non-test consumers verified;
  deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers
  removed by #653).
- Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go);
  parseStringSlice + safeIntToInt32 no longer exist — helper-relocation
  task deleted.
- PlatformBackend now serves only aks + gke (eks already a #653 SDK-free
  stub); interface-audit spike audits one interface, not five.
- Phase B inventory rewritten; Phase A/C file lists corrected.
- Self-challenge doubt #4 + Assumption 7 added: inventory staleness is
  the cycle-8 defect class; audit script makes it CI-enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script

Cycle-9 adversarial review caught aksBackend mis-classified as an
azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match
is a stale doc comment (line 332) — aksBackend.azureToken is a plain
net/http OAuth2 client. An import-block-disciplined re-survey found a
second comment-only false positive: nosql_dynamodb.go.

Structural fix for the recurring "grep matched a comment" defect class:
added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks
(never comments) and emits the comment-immune real-import map. Its output
now populates every file table in the design — prose claims replaced by
a build artifact. Formalized + CI-wired in Phase 0.

Corrected inventory (audit-script output): AWS 5 real-import files (not
6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go
are comment-only stubs — out of scope, stay in core.

Design consequences of aksBackend being SDK-free:
- Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all
  stay in core.
- Architecture §2 no longer proposes a new PlatformBackend contract. The
  gke cross-process mechanism is gated on an interface-audit spike whose
  preferred outcome is folding into the existing ResourceDriver contract
  — a dedicated contract for one backend is YAGNI.
- Phase A (Azure) is now pure IaCStateBackend — touches no platform file.
- Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/
  eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and
  fixes the stale line-332 comment.
- The gke platform extraction + its contract decision move to Phase C.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit

Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9
audit script scanned only module/, missing five aws-sdk-go-v2 importers
under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal
("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by
the four phases as written.

Structural fix — third defect-class variant closed:
- audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and
  splits results module/ vs. elsewhere. Comment-immune (cycle 9) +
  scope-complete (cycle 10) + CI-enforced (Phase 0).

Whole-repo inventory result:
- Azure + GCP SDK usage is entirely module/-resident → Phases A and C
  drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate).
- aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files
  in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go.

Scope decision: the out-of-module/ AWS surface is exactly #653's
deliberately-retained "RBAC/secrets/artifact stay" scope (plus the
provider/aws deploy provider). This design does NOT unilaterally
override #653's recent documented decision — it scopes that surface OUT
(new Non-Goal, parallel to godo) and logs a recommended successor issue.

Consequences threaded through the doc:
- Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is
  module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the
  out-of-scope surface).
- Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP,
  module/-scoped zero for AWS.
- Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's
  scope decision is respected, not contested.
- Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check +
  sessionName extraction sit between declared-input recording and the
  SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale
  line-18 backend-list comment), M2 (internal/legacyaws noted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups

Adversarial review cycle 11: PASS (zero Critical, zero Important). Two
Minor nits applied:
- audit-cloud-symbols.sh: real_import now also matches single-line
  `import "..."` form, not just parenthesized blocks — closes the one
  latent parser false-negative the reviewer flagged.
- §Goals: clarified that the module/-scoped AWS-zero `--check` assertion
  is deferred-implementation added in Phase C (the committed script only
  enforces the cloud_account_aws_creds.go post-Phase-B invariant today),
  parallel to the Phase 0 init()-partition deferral.

Design phase complete — proceeding to writing-plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe

The cycle-11 single-line-import hardening added an inner `grep -E '^import "'`
whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`,
making real_import() return false for every file lacking a single-line
import. Added `|| true` on the inner grep. Verified: full report restored,
all REAL/comment-only classifications correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A)

Bite-sized TDD plan for the first executable increment: Phase 0 (split
platform_kubernetes_kind.go, fix the stale comment, wire the audit
script into CI) + Phase A (IaCStateBackend proto + benchmark-gated
proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging
guard, workflow-plugin-azure implementation, core deletion dropping
azure-sdk from go.mod).

14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on
plan — their concrete tasks depend on Phase A's outputs (the
benchmark-validated proto shape, the host-resolution pattern, the
plugin-side serve path), so planning them now would be fiction. The
design doc remains the authoritative B/C/D spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review

Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor).
All addressed:

- C1 (Critical): Task 4's proto used google.protobuf.Struct, which
  iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config
  as `bytes outputs_json`/`bytes config_json` (the established
  ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not
  structpb. Removed the bogus struct.proto import step.
- I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives
  there), not `cd plugin/external/proto`.
- I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up
  the new benchmarks automatically) — no redundant harness; clarified the
  task is a one-time decision gate.
- I3: Task 8's embedded research spike resolved at plan time — engine.go
  was read; integration is the design-sanctioned package-level
  module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal.
  Tasks 8/13/14 now have concrete file sets.
- I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo,
  workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly.
- M1: Task 5's benchmark file is now genuinely self-contained (local
  benchStateToProto + benchStateBackendServer; no forward references).
- M2: Task 3 names ci.yml directly, places the audit job beside the
  existing godo-banned/aws-sdk-banned grep-gate jobs.
- M3: Task 6 pins benchstat (go install + bare invocation).
- M4: Task 9 states the redaction gap is verified against
  step_output_redactor.go:7-19, not a live deduction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes

Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed
resolved; 2 new Important + 2 Minor surfaced by the new-defect scan.
All addressed:

- I-A (Important): Task 9's redaction test was inconsistent with the
  actual redactMap behavior — a key named `credentials` matches the
  existing `credential` pattern and is wholesale-replaced with the
  placeholder STRING before any recursion, so the test's
  `.(map[string]any)` assertion panicked. Reworked Task 9: the
  `credentials:` block is ALREADY redacted wholesale (regression-tested);
  the real gap is `credentials_ref` being over-redacted (it's a module
  name, not a secret) — fix is a narrow `*_ref`-suffix exemption in
  isSensitiveField, not camelCase leaf patterns (which would be dead code
  given wholesale redaction happens first).
- I-B (Important): Task 14's engine.go integration seam was
  under-specified and would fight loadPluginInternal's no-concrete-types
  precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now
  defines an `IaCStateBackendProvider` optional interface and
  type-asserts it in loadPluginInternal exactly like the existing
  stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter
  implements it. Concrete file set + code sketch added.
- M-i: Task 6's benchmark.yml description corrected (runs `go test
  -bench=.` inline, not `make bench-baseline`).
- M-ii: Task 4 notes the proto README's plugin.proto-specific wording is
  stale; trust root buf.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups

Plan-phase adversarial review cycle 3: PASS (zero Critical, zero
Important). Two Minor doc-tightening fixes applied:
- Task 9 Step 4 now names bearer_token_ref explicitly and explains why
  the *_ref exemption is safe for it (SecretRef is a reference struct,
  not a raw secret) rather than claiming no *_ref field exists.
- engine.go line citations corrected to 311-326.

Plan phase complete — proceeding to alignment-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: lock scope for cloud-sdk-extraction (alignment passed)

* refactor(module): split platform_kubernetes_kind.go into _core + _gke

Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/
aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core
init(); gkeBackend (the only SDK-bearing k8s backend) moves to
platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same
five backend names registered. Isolates the lone SDK-bearing platform
file for a later clean deletion.

* docs(module): add file-purpose headers to platform_kubernetes _core/_gke

Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition
self-documenting for readers without the commit message.

* docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend

aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk
consumer. The stale comment is what fooled an earlier inventory pass into
mis-counting platform_kubernetes_kind.go as an azure-sdk importer.

* ci(audit): enforce k8s-backend init() partition + run audit on every PR

Extends audit-cloud-symbols.sh --check with an init()-partition assertion
(platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only
gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned /
aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced
artifact rather than a prose claim.

* docs(plans): IaCStateBackend transport benchmark result — decision pending

Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a
worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar.

Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB
map[string]any (inherent to the bytes outputs_json wire format the
iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB
message cap. The plan's contingency remedy (streaming redesign) addresses
message-size-cap + memory-buffering, neither of which the benchmark hits;
streaming would not move the number.

Recommendation: retain unary (6.5ms is still negligible vs real cloud
backend I/O — the design's own bar-rationale). Deviation from the literal
5ms estimate-bar is surfaced to the operator, not absorbed silently.
Scope lock intact: Task 6 run + recorded, no task added/dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed)

Operator reviewed the 6.51ms benchmark + root-cause analysis and
confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's
gate resolves Unary LOCKED — the Task 4 proto stands, no streaming
redesign, PR 2/3 proceed unchanged.

Operator additionally raised a long-term architectural item: IaC state is
persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR)
with JSON-export + content-detection-on-read would be better for
processing/type-correctness/large-state scaling. Logged as a
post-extraction follow-up in both the benchmark decision record and the
design doc's Open items — distinct from the wire contract, cross-cutting
across all IaCStateStore impls, needs its own brainstorming pass. Not
actioned in this locked plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)"

This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954.

* docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4

Operator-approved scope amendment to the (reverted-to-Draft) plan:
- ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 /
  Task 15. Task 7 had to hardcode context.Background() in
  grpcIaCStateStore; the operator directed widening the interface now
  while we're at that boundary, so Phase B/C/D plugin backends inherit it
  ctx-ful. Bounded blast radius (~9 files, all in module/);
  interfaces.IaCStateStore already had ctx and is untouched.
- ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo.
  Operator: agents should operate in plugin repos directly; the real
  requirement is prompt clarity (absolute repo path stated up front), not
  a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes
  updated accordingly.
- Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented
  (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note
  updated to RESOLVED (unary locked).

Next: re-run alignment-check on the amended plan, then re-lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed)

* feat(proto): add IaCStateBackend service to iac.proto

Strict 6-method contract mirroring module.IaCStateStore 1:1, with an
IaCState message mirroring module.IaCState. Free-form Outputs/Config
maps cross the wire as bytes outputs_json/config_json per the iac.proto
hard invariant (NO google.protobuf.Struct) — same pattern as
ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated
bindings via buf.

* test(module): add IaCStateBackend gRPC-vs-in-process benchmark harness

Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock
both in-process (baseline) and over a real bufconn gRPC boundary
(post-extraction path). Self-contained (local benchStateToProto +
benchStateBackendServer; Task 7 promotes production versions). Feeds the
unary-vs-streaming proto-transport decision in the next task.

* test(wftest): add IaCStateBackend to iacServiceChecks coverage table

Task 4 added the IaCStateBackend service to iac.proto but missed the
corresponding iacServiceChecks row in wftest/bdd/strict_iac.go.
TestIaCServiceChecks_CoversEveryProtoService enforces parity between
iac.proto's services and that table — it was failing on the missing
entry. Belongs with PR 2 (the proto PR).

* fix(iac-bench): validate SaveState input, close bufconn, broaden import audit regex

Addresses Copilot review on PR #669:
- benchStateBackendServer.SaveState now rejects nil State and propagates
  JSON unmarshal failures as InvalidArgument instead of silently writing
  corrupted/empty data.
- BenchmarkIaCStateBackend_GRPC closes the bufconn listener; comment no
  longer implies bufconn size sets the gRPC message cap.
- audit-cloud-symbols.sh real_import() single-line regex now matches
  aliased/dot/blank imports (import foo "pkg" / . / _), not just plain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 14, 2026
…+ security guards (#670)

* docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins

Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and
cloud.google.com/go + google.golang.org/api direct deps from
workflow core's module/ package.

Architecture: 3 extension surfaces, 3 strategies:
- IaC state backends → new IaCStateBackend strict proto contract;
  iac.state stays core, config.backend dispatches to plugin gRPC client.
- platform.* provisioners → new PlatformBackend strict proto contract;
  module types + provider: key stay core, kind backend stays in-core,
  cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract.
- standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload,
  storage.s3, storage.gcs) → plugin-native module/step types via the
  existing ModuleFactories/StepFactories SDK — no new contract.

Credentials (Option 1): each plugin-native module carries its own
credentials: block + builds aws.Config in-process; optional in-plugin
credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp
cloud_account files have no SDK import and stay.

4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp,
D digitalocean (spaces backend, minor bump + migration doc).

Includes Assumptions + Rollback sections + self-challenge top-3 doubts
(PlatformBackend over-generality, provider-separability fragility,
benchmark-could-invalidate-unary-default — all with mitigations
deferred to writing-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions

Addresses 2 Critical + 5 Important findings from adversarial-design-review:

Critical:
- iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit
  home: deleted by Phase B's core PR; Phase D reframed from soft-compat to
  a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2
  entirely" is now actually achieved by the phases as written + enforced by
  a go list -deps CI gate.
- kinesis: added Non-Goals entry explaining it's a transitive dep of
  modular/modules/eventbus/v2, not a direct workflow import — out of scope,
  with the go mod why chain documented so the literal ask is fully answered.

Important:
- Full grep-verified 13-file AWS inventory table in Phase B with per-file
  destinations; reconciled aws_api_gateway.go (route-sync module) vs
  platform_apigateway.go (provisioner) as two distinct files.
- aksBackend assigned to Phase A (Azure gets the PlatformBackend half too);
  platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with
  explicit always-compiles coordination.
- Proto contracts fold into existing plugin/external/proto/iac.proto
  (8 services already) instead of new files — matches precedent.
- New Security section: secret-redaction in config-version-store/tracing +
  gRPC interceptor logging are blocking writing-plans tasks; credentials_ref
  blast radius documented as strictly narrower than today's cloud.account.

Minor:
- IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore
  interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no
  speculative surface.
- Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR).
- IaCProviderRequired/ResourceDriver reuse promoted to a first-class
  Alternatives Considered entry with accept/reject rationale + retained as
  the gated fallback for PlatformBackend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions

Addresses 2 Critical + 3 Important from cycle-2 review:

Critical:
- platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure
  mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with
  its own import block). The "always compiles across phases" property is
  now structural, not asserted. Added a verified per-file import-ownership
  table.
- Corrected the false Phase A rationale: aksBackend uses raw net/http REST,
  NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend
  region). The Azure go.mod drop comes entirely from iac_state_azure.go
  deletion + iac_module.go edit; aksBackend extraction is code-organisation,
  not a dependency change.
- Documented the eksBackend → cloud_account_aws.go call-graph edge as a
  hard same-commit atomicity constraint (verified: eksBackend calls
  awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138).

Important:
- Phase B core-PR bullet now explicitly lists "strip the spaces case from
  iac_module.go" (was only obliquely referenced).
- New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds
  contract field; SaveState lost-response retry → documented idempotent
  (full-state replace, last-writer-wins); plugin-unreachable → abort before
  mutation; PlatformBackend mid-Apply crash → identical to today's
  in-process risk, no new mitigation.
- §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no
  body-logging interceptor (grpc.NewServer(opts...) passthrough; only
  callback_server.go logs, never module config). Writing-plans adds a
  guard test instead of a conditional interceptor.

Minor: file-count table footnoted (count = importers, not deletions);
shared s3compat module added as Alternatives Considered #3 (deferred,
not rejected); self-challenge doubt numbering tidied (2 mitigations
cover 3 doubts, intentionally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): fix stale Phase A/B refs + Status line post-cycle-2

sed in the cycle-2 commit ran from the wrong cwd — Status line still
said "cycle 1" and two interface-audit-spike references still said
"Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions

Addresses 2 Critical + 2 Important from cycle-3 review:

Critical (same root — symbol-level coupling the import-block audit missed):
- parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and
  safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers
  the plugin-bound backend files call. An import-block audit is symbol-blind.
  Fix: Phase 0 now does TWO moves — the file split AND relocating both
  helpers into a new SDK-free core module/cloud_helpers.go. Per-file table
  gains a "cross-file symbol deps (the trap)" column listing every helper
  edge per backend. Phase 0 acceptance criteria now include a grep that no
  core file references the helpers from their old homes.
- §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file
  (module shell + kubernetesBackend interface + safeIntToInt32) — NOT
  touched by the split; only platform_kubernetes_kind.go (holds all 4
  backends) is split. Earlier draft conflated the two files.

Important:
- Per-file ownership table relabelled "intended post-split — verified by
  the Phase 0 build gate" (was asserted-as-verified against an unsplit
  file — same hand-waving class cycle-2 flagged for "always compiles").
- lease_ttl_seconds DROPPED from the Phase A proto. It was a contract
  field with no enforced semantics and no implementing backend in scope —
  YAGNI. §Failure-modes orphaned-lock reworked: documented limitation +
  operator-side lock-object delete for recovery; TTL is a planned ADDITIVE
  follow-up paired with a conformance test, shipped with the first backend
  that honors expiry. Added explicit Lock-contention behavior (immediate
  error, matches today's in-process IaCStateStore.Lock — no new waiting
  state).

Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence
fixed; §Assumptions 2 notes Phase 0 de-risks it structurally.

Also: removed a stray stale cycle-1 copy of this doc that was sitting
untracked in the main workflow checkout (the canonical doc is here in
the feat/cloud-sdk-extraction worktree).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions

Addresses 2 Critical + 2 Important from cycle-4 review:

Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd
cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't
— that's eksBackend) and aksBackend has no cross-file deps (it does —
CloudCredentials/CloudCredentialProvider from cloud_account.go, same as
gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The
symbol-ownership map is now a Phase 0 build artifact —
scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a
design-doc claim that rots on every edit. The design commits to the
*method* + the *known shape* (cloud_account.go stays core; all 3 cloud
backends bind to it via k.provider.GetCredentials; eksBackend additionally
binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no
cloud SDK).

Critical 2 — Phase 0's "split into four, zero logic change" silently
dropped the single func init() that registers kind/k3s/eks/gke/aks.
Splitting REQUIRES partitioning init() per-file (a distribution, not
zero-change). Phase 0 now has an explicit step 2 for the init() partition;
relabelled "behavior-equivalent" not "zero logic change"; k3s documented
as reusing kindBackend (both stay core).

Important 1 — platform.* cloud credential flow across PlatformBackend was
unspecified (aksBackend needs CloudCredentials — how does it reach the
plugin?). Added: PlatformBackend requests carry a CloudCredentials proto
message; engine resolves k.provider.GetCredentials() in-core (config-map
parsing, no SDK) and serialises it. Unified with the Architecture-3
credentials story — ONE CloudCredentials proto shape for both surfaces,
so secret-redaction has one shape to redact.

Important 2 — core actually imports FOUR cloud SDK trees, not three:
godo is still in cloud_account_do.go + 5 platform_do_*.go files.
§Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT
(user's ask was 3 trees), and the go list -deps gate is reworded to
assert "zero packages from the three in-scope trees" not "zero cloud
SDKs". All "zero cloud SDKs" phrasing reconciled throughout.

Minor: ListStates filter + remaining-proto-messages notes folded in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions

Addresses 1 Critical + 2 Important from cycle-5 review:

Critical — the init()-partition fix (cycle-4) was kubernetes-only, but
the SAME defect class exists in platform_dns.go / platform_ecs.go /
platform_networking.go / platform_autoscaling.go: each has a single
func init() registering BOTH a core-staying `mock` backend AND a
plugin-bound `aws` backend. The old Phase B inventory moved those files
wholesale → would exile the mock backends + dangle the route53
registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go"
to a repo-wide uniform `_core.go` / `_<provider>.go` convention across
the WHOLE platform.* family. Every mixed init() is partitioned; the
audit script flags any init() registering a mix of core-staying +
plugin-bound factories as a CI failure. Phase B inventory rewritten to
delete only `_aws.go`/`_eks.go` files, never a mixed file.

Important 1 — the cycle-4 "Known shape" prose reintroduced
hand-maintained cross-file symbol claims (one already incomplete:
parseStringSlice consumers). FIX: cut all per-file symbol enumerations;
the section now states only invariants the script VERIFIES (not
discovers) + the method. No transcribed symbol lists remain.

Important 2 + own finding — cycle-4 said the engine resolves
credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's
awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile)
and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK.
FIX: §Architecture-2 corrected — engine passes the DECLARED credential
config (plain strings) in the CloudCredentials proto; the PLUGIN
resolves (incl. the SDK-bearing profile/role_arn paths). Both
cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B,
no core replacement — all AWS cred resolution moves plugin-side. azure/gcp
resolver files stay (their resolvers are genuinely SDK-free).

Minor — backend-name collision: core-reserved names (memory/filesystem/
postgres/kind/k3s/mock) cause a load-time error if a plugin collides,
not silent shadowing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions

Addresses 1 Critical + 2 Important from cycle-6 review:

Critical — cycle-5's credential-flow fix replaced one false claim with
another: it said the CloudCredentials struct already holds "declared
config (plain strings incl. profile)". VERIFIED FALSE — the struct
(cloud_account.go:18) has no Profile field (profile lives in Extra map)
and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner
than the struct change the reviewer proposed: the struct needs NO change
(Extra map already carries markers, RoleARN field exists). Instead,
cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails
of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig,
sts.AssumeRole) are removed; they keep their SDK-free heads (record
declared inputs + an Extra["credential_source"] marker, exactly as
awsStaticResolver already does). After the edit the file is SDK-free
and stays in core alongside the azure/gcp resolver files. Only
cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider
+ awsProviderFrom) is deleted; its profile-chain/STS logic moves into
the plugin's buildAWSConfig. Every in-core resolver becomes uniformly
"declare, don't resolve"; the plugin honors the markers. No unregistered-
resolver failure mode — the resolver init() registrations stay.

Important 1 — §Phase-0 misidentified the DNS file with the mixed init().
VERIFIED: platform_dns.go:66 has the init() (+ interface + factory
registry); platform_dns_backends.go has both impls + the route53 SDK
import, NO init(). DNS is a TWO-file split, unlike single-file
ecs/networking/autoscaling. §Phase-0 now states the per-family layout
explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling
one-file) and notes the audit script determines it.

Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit
deferred-resolution markers for env/CLI/managed-identity/workload-identity/
profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5
now state the plugin MUST implement marker handling for every deferred
type, not just AWS profile/role_arn.

Minor — safeIntToInt32 relocation rationale clarified (it's a clean
copy-source for the plugin-bound files, not a hard core necessity);
parseStringSlice IS a hard necessity (its file is deleted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions

Addresses 2 Critical from cycle-7 review (architecture confirmed sound;
these are the last two extraction-mechanic precision gaps):

C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the
awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a
clean contiguous tail, but awsRoleARNResolver's SDK block (base-config
build + sts.AssumeRole, ~45 lines) is the larger half of the method,
after the declared-input recording. FIX: §Architecture-2 re-characterizes
the edit as a deliberate Resolve() body REWRITE (not a one-line snip) —
explicitly per-resolver. Added a Phase B CI invariant: an import-block
grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go
has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not
prose-asserted.

C2 — cloud_account_aws.go defines FOUR symbols, not one; the
symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed:
- AWSConfigProvider interface signature names aws.Config → CANNOT stay
  in core, deleted with the file.
- awsProviderFrom → deleted with the interface.
- ValidateCredentials → verified NO real caller (only a comment ref in
  cmd/wfctl/deploy.go:866) → deletes cleanly.
- The 8 awsProviderFrom consumers are all verified plugin-bound — but
  each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the
  plugin there's no cloud.account to type-assert. §Cross-file-coupling
  invariant 3 now states Phase B must REWRITE all 8 consumers to obtain
  creds from the CloudCredentials proto + buildAWSConfig — explicit
  Phase B scope, not a footnote. Phase B table atomicity column updated.

Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in
Phase 0 so the dns family conforms to the uniform _core.go/_aws.go
naming; no special-case three-file layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main

Cycle-8 adversarial review caught the design's file/symbol inventory as
stale: it predated issue #653 (closed 2026-05-13), which already removed
the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild +
EKS backends.

Re-baselined every file/symbol claim against origin/main HEAD (worktree
confirmed 0 commits behind origin/main):

- Added "Relationship to issue #653" section — this design is #653's
  named successor, extracting the AWS surface #653 scoped out
  ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces.
- Problem table corrected: AWS 6 real-import files (not 13), Azure 3,
  GCP 3. storage_artifact_s3.go is comment-only — stays in core.
- cloud_account_aws.go is dead code — zero non-test consumers verified;
  deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers
  removed by #653).
- Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go);
  parseStringSlice + safeIntToInt32 no longer exist — helper-relocation
  task deleted.
- PlatformBackend now serves only aks + gke (eks already a #653 SDK-free
  stub); interface-audit spike audits one interface, not five.
- Phase B inventory rewritten; Phase A/C file lists corrected.
- Self-challenge doubt #4 + Assumption 7 added: inventory staleness is
  the cycle-8 defect class; audit script makes it CI-enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script

Cycle-9 adversarial review caught aksBackend mis-classified as an
azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match
is a stale doc comment (line 332) — aksBackend.azureToken is a plain
net/http OAuth2 client. An import-block-disciplined re-survey found a
second comment-only false positive: nosql_dynamodb.go.

Structural fix for the recurring "grep matched a comment" defect class:
added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks
(never comments) and emits the comment-immune real-import map. Its output
now populates every file table in the design — prose claims replaced by
a build artifact. Formalized + CI-wired in Phase 0.

Corrected inventory (audit-script output): AWS 5 real-import files (not
6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go
are comment-only stubs — out of scope, stay in core.

Design consequences of aksBackend being SDK-free:
- Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all
  stay in core.
- Architecture §2 no longer proposes a new PlatformBackend contract. The
  gke cross-process mechanism is gated on an interface-audit spike whose
  preferred outcome is folding into the existing ResourceDriver contract
  — a dedicated contract for one backend is YAGNI.
- Phase A (Azure) is now pure IaCStateBackend — touches no platform file.
- Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/
  eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and
  fixes the stale line-332 comment.
- The gke platform extraction + its contract decision move to Phase C.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit

Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9
audit script scanned only module/, missing five aws-sdk-go-v2 importers
under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal
("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by
the four phases as written.

Structural fix — third defect-class variant closed:
- audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and
  splits results module/ vs. elsewhere. Comment-immune (cycle 9) +
  scope-complete (cycle 10) + CI-enforced (Phase 0).

Whole-repo inventory result:
- Azure + GCP SDK usage is entirely module/-resident → Phases A and C
  drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate).
- aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files
  in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go.

Scope decision: the out-of-module/ AWS surface is exactly #653's
deliberately-retained "RBAC/secrets/artifact stay" scope (plus the
provider/aws deploy provider). This design does NOT unilaterally
override #653's recent documented decision — it scopes that surface OUT
(new Non-Goal, parallel to godo) and logs a recommended successor issue.

Consequences threaded through the doc:
- Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is
  module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the
  out-of-scope surface).
- Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP,
  module/-scoped zero for AWS.
- Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's
  scope decision is respected, not contested.
- Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check +
  sessionName extraction sit between declared-input recording and the
  SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale
  line-18 backend-list comment), M2 (internal/legacyaws noted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups

Adversarial review cycle 11: PASS (zero Critical, zero Important). Two
Minor nits applied:
- audit-cloud-symbols.sh: real_import now also matches single-line
  `import "..."` form, not just parenthesized blocks — closes the one
  latent parser false-negative the reviewer flagged.
- §Goals: clarified that the module/-scoped AWS-zero `--check` assertion
  is deferred-implementation added in Phase C (the committed script only
  enforces the cloud_account_aws_creds.go post-Phase-B invariant today),
  parallel to the Phase 0 init()-partition deferral.

Design phase complete — proceeding to writing-plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe

The cycle-11 single-line-import hardening added an inner `grep -E '^import "'`
whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`,
making real_import() return false for every file lacking a single-line
import. Added `|| true` on the inner grep. Verified: full report restored,
all REAL/comment-only classifications correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A)

Bite-sized TDD plan for the first executable increment: Phase 0 (split
platform_kubernetes_kind.go, fix the stale comment, wire the audit
script into CI) + Phase A (IaCStateBackend proto + benchmark-gated
proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging
guard, workflow-plugin-azure implementation, core deletion dropping
azure-sdk from go.mod).

14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on
plan — their concrete tasks depend on Phase A's outputs (the
benchmark-validated proto shape, the host-resolution pattern, the
plugin-side serve path), so planning them now would be fiction. The
design doc remains the authoritative B/C/D spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review

Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor).
All addressed:

- C1 (Critical): Task 4's proto used google.protobuf.Struct, which
  iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config
  as `bytes outputs_json`/`bytes config_json` (the established
  ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not
  structpb. Removed the bogus struct.proto import step.
- I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives
  there), not `cd plugin/external/proto`.
- I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up
  the new benchmarks automatically) — no redundant harness; clarified the
  task is a one-time decision gate.
- I3: Task 8's embedded research spike resolved at plan time — engine.go
  was read; integration is the design-sanctioned package-level
  module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal.
  Tasks 8/13/14 now have concrete file sets.
- I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo,
  workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly.
- M1: Task 5's benchmark file is now genuinely self-contained (local
  benchStateToProto + benchStateBackendServer; no forward references).
- M2: Task 3 names ci.yml directly, places the audit job beside the
  existing godo-banned/aws-sdk-banned grep-gate jobs.
- M3: Task 6 pins benchstat (go install + bare invocation).
- M4: Task 9 states the redaction gap is verified against
  step_output_redactor.go:7-19, not a live deduction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes

Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed
resolved; 2 new Important + 2 Minor surfaced by the new-defect scan.
All addressed:

- I-A (Important): Task 9's redaction test was inconsistent with the
  actual redactMap behavior — a key named `credentials` matches the
  existing `credential` pattern and is wholesale-replaced with the
  placeholder STRING before any recursion, so the test's
  `.(map[string]any)` assertion panicked. Reworked Task 9: the
  `credentials:` block is ALREADY redacted wholesale (regression-tested);
  the real gap is `credentials_ref` being over-redacted (it's a module
  name, not a secret) — fix is a narrow `*_ref`-suffix exemption in
  isSensitiveField, not camelCase leaf patterns (which would be dead code
  given wholesale redaction happens first).
- I-B (Important): Task 14's engine.go integration seam was
  under-specified and would fight loadPluginInternal's no-concrete-types
  precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now
  defines an `IaCStateBackendProvider` optional interface and
  type-asserts it in loadPluginInternal exactly like the existing
  stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter
  implements it. Concrete file set + code sketch added.
- M-i: Task 6's benchmark.yml description corrected (runs `go test
  -bench=.` inline, not `make bench-baseline`).
- M-ii: Task 4 notes the proto README's plugin.proto-specific wording is
  stale; trust root buf.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups

Plan-phase adversarial review cycle 3: PASS (zero Critical, zero
Important). Two Minor doc-tightening fixes applied:
- Task 9 Step 4 now names bearer_token_ref explicitly and explains why
  the *_ref exemption is safe for it (SecretRef is a reference struct,
  not a raw secret) rather than claiming no *_ref field exists.
- engine.go line citations corrected to 311-326.

Plan phase complete — proceeding to alignment-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: lock scope for cloud-sdk-extraction (alignment passed)

* refactor(module): split platform_kubernetes_kind.go into _core + _gke

Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/
aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core
init(); gkeBackend (the only SDK-bearing k8s backend) moves to
platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same
five backend names registered. Isolates the lone SDK-bearing platform
file for a later clean deletion.

* docs(module): add file-purpose headers to platform_kubernetes _core/_gke

Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition
self-documenting for readers without the commit message.

* docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend

aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk
consumer. The stale comment is what fooled an earlier inventory pass into
mis-counting platform_kubernetes_kind.go as an azure-sdk importer.

* ci(audit): enforce k8s-backend init() partition + run audit on every PR

Extends audit-cloud-symbols.sh --check with an init()-partition assertion
(platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only
gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned /
aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced
artifact rather than a prose claim.

* docs(plans): IaCStateBackend transport benchmark result — decision pending

Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a
worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar.

Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB
map[string]any (inherent to the bytes outputs_json wire format the
iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB
message cap. The plan's contingency remedy (streaming redesign) addresses
message-size-cap + memory-buffering, neither of which the benchmark hits;
streaming would not move the number.

Recommendation: retain unary (6.5ms is still negligible vs real cloud
backend I/O — the design's own bar-rationale). Deviation from the literal
5ms estimate-bar is surfaced to the operator, not absorbed silently.
Scope lock intact: Task 6 run + recorded, no task added/dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed)

Operator reviewed the 6.51ms benchmark + root-cause analysis and
confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's
gate resolves Unary LOCKED — the Task 4 proto stands, no streaming
redesign, PR 2/3 proceed unchanged.

Operator additionally raised a long-term architectural item: IaC state is
persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR)
with JSON-export + content-detection-on-read would be better for
processing/type-correctness/large-state scaling. Logged as a
post-extraction follow-up in both the benchmark decision record and the
design doc's Open items — distinct from the wire contract, cross-cutting
across all IaCStateStore impls, needs its own brainstorming pass. Not
actioned in this locked plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)"

This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954.

* docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4

Operator-approved scope amendment to the (reverted-to-Draft) plan:
- ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 /
  Task 15. Task 7 had to hardcode context.Background() in
  grpcIaCStateStore; the operator directed widening the interface now
  while we're at that boundary, so Phase B/C/D plugin backends inherit it
  ctx-ful. Bounded blast radius (~9 files, all in module/);
  interfaces.IaCStateStore already had ctx and is untouched.
- ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo.
  Operator: agents should operate in plugin repos directly; the real
  requirement is prompt clarity (absolute repo path stated up front), not
  a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes
  updated accordingly.
- Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented
  (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note
  updated to RESOLVED (unary locked).

Next: re-run alignment-check on the amended plan, then re-lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed)

* feat(proto): add IaCStateBackend service to iac.proto

Strict 6-method contract mirroring module.IaCStateStore 1:1, with an
IaCState message mirroring module.IaCState. Free-form Outputs/Config
maps cross the wire as bytes outputs_json/config_json per the iac.proto
hard invariant (NO google.protobuf.Struct) — same pattern as
ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated
bindings via buf.

* test(module): add IaCStateBackend gRPC-vs-in-process benchmark harness

Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock
both in-process (baseline) and over a real bufconn gRPC boundary
(post-extraction path). Self-contained (local benchStateToProto +
benchStateBackendServer; Task 7 promotes production versions). Feeds the
unary-vs-streaming proto-transport decision in the next task.

* test(wftest): add IaCStateBackend to iacServiceChecks coverage table

Task 4 added the IaCStateBackend service to iac.proto but missed the
corresponding iacServiceChecks row in wftest/bdd/strict_iac.go.
TestIaCServiceChecks_CoversEveryProtoService enforces parity between
iac.proto's services and that table — it was failing on the missing
entry. Belongs with PR 2 (the proto PR).

* feat(module): IaCState proto converters + grpcIaCStateStore client adapter

grpcIaCStateStore implements module.IaCStateStore over an
IaCStateBackendClient — the host-side half of the new contract.
iacStateToProto/iacStateFromProto convert the free-form Outputs/Config
maps via encoding/json (no structpb — iac.proto hard invariant).
iacStateBackendServer is the production server type. Promotes these out
of the benchmark file so one canonical copy is shared.

* docs(module): note context.Background() follow-up on grpcIaCStateStore

Code-review Minor: the spec asked for the hardcoded context.Background()
to be acknowledged as a known follow-up (IaCStateStore has no ctx param)
rather than silently used.

* feat(module): engine-side iac.state plugin-backend registry + dispatch

A package-level iacStateBackendRegistry maps a backend name to a
pb.IaCStateBackendClient; the engine populates it at plugin-load time
(Task 14). IaCModule.Init()'s switch gains a default arm that resolves
non-core backend names from the registry, constructing a
grpcIaCStateStore. Reserved core names (memory/filesystem/postgres) are
rejected at registration. The existing in-process backend cases
(incl. azure_blob) are untouched here — the plumbing exists and is
tested; PR 5 flips azure_blob onto it.

* feat(module): exempt *_ref keys from redaction; lock in credentials: redaction

Option-1 credentials move raw cloud secrets inline into plugin-native
module config under a credentials: key — already redacted wholesale by
the existing 'credential' pattern (regression test added). But that same
pattern over-redacts credentials_ref:, which holds a module NAME, not a
secret. Adds a narrow *_ref-suffix exemption to isSensitiveField so
reference keys are preserved for trace debuggability.

* refactor(module): name the _ref redaction-exemption suffix as a const

Code-review Minor: refFieldSuffix const for consistency with the
existing safeFieldSuffix (_display) exemption.

* test(plugin/external): guard against gRPC body-logging interceptors

CreateModule requests carry inline credentials: blocks (Option-1
credentials model). This guard fails CI if any plugin/external/ file
gains a gRPC interceptor option, forcing a reviewer to confirm it cannot
log request bodies. Implements the cloud-sdk-extraction design's Security
guard-test requirement.

* test(plugin/external): broaden interceptor guard to Stream interceptors

Code-review catch: the guard regex covered Unary only. CreateModule is
unary today, but a future streaming RPC carrying credentials must not
slip a stream interceptor past the guard. Now matches (Unary|Stream).

* fix(iac-host): narrow _ref redaction exemption, validate registry input, harden guard test

Addresses Copilot review on PR #670:
- step_output_redactor: the "_ref" suffix no longer blanket-bypasses
  redaction. It exempts only structural-reference words ("credential"),
  so credentials_ref is preserved but bearer_token_ref / api_key_ref /
  secret_ref still redact (token/api_key/secret are value-bearing).
- iac_state_plugin_registry.register: rejects empty/whitespace names and
  nil clients; trims the name before use.
- grpc_logging_guard_test: walks the whole plugin/external/ tree (catches
  subpackages like sdk/), skips generated *.pb.go / proto/ files to avoid
  false positives, and adds a real interceptorAllowlist mechanism the
  failure message now references.
- iac_state_grpc_client_test + benchmark_iac_state_backend_test: close the
  bufconn listener via t.Cleanup/b.Cleanup; benchmark comment no longer
  implies bufconn size sets the gRPC message cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 14, 2026
…teStore (#671)

* docs(plans): cloud-SDK extraction design — workflow core → strict-contract plugins

Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and
cloud.google.com/go + google.golang.org/api direct deps from
workflow core's module/ package.

Architecture: 3 extension surfaces, 3 strategies:
- IaC state backends → new IaCStateBackend strict proto contract;
  iac.state stays core, config.backend dispatches to plugin gRPC client.
- platform.* provisioners → new PlatformBackend strict proto contract;
  module types + provider: key stay core, kind backend stays in-core,
  cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract.
- standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload,
  storage.s3, storage.gcs) → plugin-native module/step types via the
  existing ModuleFactories/StepFactories SDK — no new contract.

Credentials (Option 1): each plugin-native module carries its own
credentials: block + builds aws.Config in-process; optional in-plugin
credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp
cloud_account files have no SDK import and stay.

4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp,
D digitalocean (spaces backend, minor bump + migration doc).

Includes Assumptions + Rollback sections + self-challenge top-3 doubts
(PlatformBackend over-generality, provider-separability fragility,
benchmark-could-invalidate-unary-default — all with mitigations
deferred to writing-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 1 revisions

Addresses 2 Critical + 5 Important findings from adversarial-design-review:

Critical:
- iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit
  home: deleted by Phase B's core PR; Phase D reframed from soft-compat to
  a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2
  entirely" is now actually achieved by the phases as written + enforced by
  a go list -deps CI gate.
- kinesis: added Non-Goals entry explaining it's a transitive dep of
  modular/modules/eventbus/v2, not a direct workflow import — out of scope,
  with the go mod why chain documented so the literal ask is fully answered.

Important:
- Full grep-verified 13-file AWS inventory table in Phase B with per-file
  destinations; reconciled aws_api_gateway.go (route-sync module) vs
  platform_apigateway.go (provisioner) as two distinct files.
- aksBackend assigned to Phase A (Azure gets the PlatformBackend half too);
  platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with
  explicit always-compiles coordination.
- Proto contracts fold into existing plugin/external/proto/iac.proto
  (8 services already) instead of new files — matches precedent.
- New Security section: secret-redaction in config-version-store/tracing +
  gRPC interceptor logging are blocking writing-plans tasks; credentials_ref
  blast radius documented as strictly narrower than today's cloud.account.

Minor:
- IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore
  interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no
  speculative surface.
- Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR).
- IaCProviderRequired/ResourceDriver reuse promoted to a first-class
  Alternatives Considered entry with accept/reject rationale + retained as
  the gated fallback for PlatformBackend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 2 revisions

Addresses 2 Critical + 3 Important from cycle-2 review:

Critical:
- platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure
  mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with
  its own import block). The "always compiles across phases" property is
  now structural, not asserted. Added a verified per-file import-ownership
  table.
- Corrected the false Phase A rationale: aksBackend uses raw net/http REST,
  NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend
  region). The Azure go.mod drop comes entirely from iac_state_azure.go
  deletion + iac_module.go edit; aksBackend extraction is code-organisation,
  not a dependency change.
- Documented the eksBackend → cloud_account_aws.go call-graph edge as a
  hard same-commit atomicity constraint (verified: eksBackend calls
  awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138).

Important:
- Phase B core-PR bullet now explicitly lists "strip the spaces case from
  iac_module.go" (was only obliquely referenced).
- New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds
  contract field; SaveState lost-response retry → documented idempotent
  (full-state replace, last-writer-wins); plugin-unreachable → abort before
  mutation; PlatformBackend mid-Apply crash → identical to today's
  in-process risk, no new mitigation.
- §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no
  body-logging interceptor (grpc.NewServer(opts...) passthrough; only
  callback_server.go logs, never module config). Writing-plans adds a
  guard test instead of a conditional interceptor.

Minor: file-count table footnoted (count = importers, not deletions);
shared s3compat module added as Alternatives Considered #3 (deferred,
not rejected); self-challenge doubt numbering tidied (2 mitigations
cover 3 doubts, intentionally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): fix stale Phase A/B refs + Status line post-cycle-2

sed in the cycle-2 commit ran from the wrong cwd — Status line still
said "cycle 1" and two interface-audit-spike references still said
"Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 3 revisions

Addresses 2 Critical + 2 Important from cycle-3 review:

Critical (same root — symbol-level coupling the import-block audit missed):
- parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and
  safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers
  the plugin-bound backend files call. An import-block audit is symbol-blind.
  Fix: Phase 0 now does TWO moves — the file split AND relocating both
  helpers into a new SDK-free core module/cloud_helpers.go. Per-file table
  gains a "cross-file symbol deps (the trap)" column listing every helper
  edge per backend. Phase 0 acceptance criteria now include a grep that no
  core file references the helpers from their old homes.
- §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file
  (module shell + kubernetesBackend interface + safeIntToInt32) — NOT
  touched by the split; only platform_kubernetes_kind.go (holds all 4
  backends) is split. Earlier draft conflated the two files.

Important:
- Per-file ownership table relabelled "intended post-split — verified by
  the Phase 0 build gate" (was asserted-as-verified against an unsplit
  file — same hand-waving class cycle-2 flagged for "always compiles").
- lease_ttl_seconds DROPPED from the Phase A proto. It was a contract
  field with no enforced semantics and no implementing backend in scope —
  YAGNI. §Failure-modes orphaned-lock reworked: documented limitation +
  operator-side lock-object delete for recovery; TTL is a planned ADDITIVE
  follow-up paired with a conformance test, shipped with the first backend
  that honors expiry. Added explicit Lock-contention behavior (immediate
  error, matches today's in-process IaCStateStore.Lock — no new waiting
  state).

Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence
fixed; §Assumptions 2 notes Phase 0 de-risks it structurally.

Also: removed a stray stale cycle-1 copy of this doc that was sitting
untracked in the main workflow checkout (the canonical doc is here in
the feat/cloud-sdk-extraction worktree).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 4 revisions

Addresses 2 Critical + 2 Important from cycle-4 review:

Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd
cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't
— that's eksBackend) and aksBackend has no cross-file deps (it does —
CloudCredentials/CloudCredentialProvider from cloud_account.go, same as
gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The
symbol-ownership map is now a Phase 0 build artifact —
scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a
design-doc claim that rots on every edit. The design commits to the
*method* + the *known shape* (cloud_account.go stays core; all 3 cloud
backends bind to it via k.provider.GetCredentials; eksBackend additionally
binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no
cloud SDK).

Critical 2 — Phase 0's "split into four, zero logic change" silently
dropped the single func init() that registers kind/k3s/eks/gke/aks.
Splitting REQUIRES partitioning init() per-file (a distribution, not
zero-change). Phase 0 now has an explicit step 2 for the init() partition;
relabelled "behavior-equivalent" not "zero logic change"; k3s documented
as reusing kindBackend (both stay core).

Important 1 — platform.* cloud credential flow across PlatformBackend was
unspecified (aksBackend needs CloudCredentials — how does it reach the
plugin?). Added: PlatformBackend requests carry a CloudCredentials proto
message; engine resolves k.provider.GetCredentials() in-core (config-map
parsing, no SDK) and serialises it. Unified with the Architecture-3
credentials story — ONE CloudCredentials proto shape for both surfaces,
so secret-redaction has one shape to redact.

Important 2 — core actually imports FOUR cloud SDK trees, not three:
godo is still in cloud_account_do.go + 5 platform_do_*.go files.
§Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT
(user's ask was 3 trees), and the go list -deps gate is reworded to
assert "zero packages from the three in-scope trees" not "zero cloud
SDKs". All "zero cloud SDKs" phrasing reconciled throughout.

Minor: ListStates filter + remaining-proto-messages notes folded in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 5 revisions

Addresses 1 Critical + 2 Important from cycle-5 review:

Critical — the init()-partition fix (cycle-4) was kubernetes-only, but
the SAME defect class exists in platform_dns.go / platform_ecs.go /
platform_networking.go / platform_autoscaling.go: each has a single
func init() registering BOTH a core-staying `mock` backend AND a
plugin-bound `aws` backend. The old Phase B inventory moved those files
wholesale → would exile the mock backends + dangle the route53
registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go"
to a repo-wide uniform `_core.go` / `_<provider>.go` convention across
the WHOLE platform.* family. Every mixed init() is partitioned; the
audit script flags any init() registering a mix of core-staying +
plugin-bound factories as a CI failure. Phase B inventory rewritten to
delete only `_aws.go`/`_eks.go` files, never a mixed file.

Important 1 — the cycle-4 "Known shape" prose reintroduced
hand-maintained cross-file symbol claims (one already incomplete:
parseStringSlice consumers). FIX: cut all per-file symbol enumerations;
the section now states only invariants the script VERIFIES (not
discovers) + the method. No transcribed symbol lists remain.

Important 2 + own finding — cycle-4 said the engine resolves
credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's
awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile)
and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK.
FIX: §Architecture-2 corrected — engine passes the DECLARED credential
config (plain strings) in the CloudCredentials proto; the PLUGIN
resolves (incl. the SDK-bearing profile/role_arn paths). Both
cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B,
no core replacement — all AWS cred resolution moves plugin-side. azure/gcp
resolver files stay (their resolvers are genuinely SDK-free).

Minor — backend-name collision: core-reserved names (memory/filesystem/
postgres/kind/k3s/mock) cause a load-time error if a plugin collides,
not silent shadowing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 6 revisions

Addresses 1 Critical + 2 Important from cycle-6 review:

Critical — cycle-5's credential-flow fix replaced one false claim with
another: it said the CloudCredentials struct already holds "declared
config (plain strings incl. profile)". VERIFIED FALSE — the struct
(cloud_account.go:18) has no Profile field (profile lives in Extra map)
and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner
than the struct change the reviewer proposed: the struct needs NO change
(Extra map already carries markers, RoleARN field exists). Instead,
cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails
of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig,
sts.AssumeRole) are removed; they keep their SDK-free heads (record
declared inputs + an Extra["credential_source"] marker, exactly as
awsStaticResolver already does). After the edit the file is SDK-free
and stays in core alongside the azure/gcp resolver files. Only
cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider
+ awsProviderFrom) is deleted; its profile-chain/STS logic moves into
the plugin's buildAWSConfig. Every in-core resolver becomes uniformly
"declare, don't resolve"; the plugin honors the markers. No unregistered-
resolver failure mode — the resolver init() registrations stay.

Important 1 — §Phase-0 misidentified the DNS file with the mixed init().
VERIFIED: platform_dns.go:66 has the init() (+ interface + factory
registry); platform_dns_backends.go has both impls + the route53 SDK
import, NO init(). DNS is a TWO-file split, unlike single-file
ecs/networking/autoscaling. §Phase-0 now states the per-family layout
explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling
one-file) and notes the audit script determines it.

Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit
deferred-resolution markers for env/CLI/managed-identity/workload-identity/
profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5
now state the plugin MUST implement marker handling for every deferred
type, not just AWS profile/role_arn.

Minor — safeIntToInt32 relocation rationale clarified (it's a clean
copy-source for the plugin-bound files, not a hard core necessity);
parseStringSlice IS a hard necessity (its file is deleted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — adversarial review cycle 7 revisions

Addresses 2 Critical from cycle-7 review (architecture confirmed sound;
these are the last two extraction-mechanic precision gaps):

C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the
awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a
clean contiguous tail, but awsRoleARNResolver's SDK block (base-config
build + sts.AssumeRole, ~45 lines) is the larger half of the method,
after the declared-input recording. FIX: §Architecture-2 re-characterizes
the edit as a deliberate Resolve() body REWRITE (not a one-line snip) —
explicitly per-resolver. Added a Phase B CI invariant: an import-block
grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go
has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not
prose-asserted.

C2 — cloud_account_aws.go defines FOUR symbols, not one; the
symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed:
- AWSConfigProvider interface signature names aws.Config → CANNOT stay
  in core, deleted with the file.
- awsProviderFrom → deleted with the interface.
- ValidateCredentials → verified NO real caller (only a comment ref in
  cmd/wfctl/deploy.go:866) → deletes cleanly.
- The 8 awsProviderFrom consumers are all verified plugin-bound — but
  each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the
  plugin there's no cloud.account to type-assert. §Cross-file-coupling
  invariant 3 now states Phase B must REWRITE all 8 consumers to obtain
  creds from the CloudCredentials proto + buildAWSConfig — explicit
  Phase B scope, not a footnote. Phase B table atomicity column updated.

Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in
Phase 0 so the dns family conforms to the uniform _core.go/_aws.go
naming; no special-case three-file layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-8 re-baseline against post-#653 main

Cycle-8 adversarial review caught the design's file/symbol inventory as
stale: it predated issue #653 (closed 2026-05-13), which already removed
the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild +
EKS backends.

Re-baselined every file/symbol claim against origin/main HEAD (worktree
confirmed 0 commits behind origin/main):

- Added "Relationship to issue #653" section — this design is #653's
  named successor, extracting the AWS surface #653 scoped out
  ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces.
- Problem table corrected: AWS 6 real-import files (not 13), Azure 3,
  GCP 3. storage_artifact_s3.go is comment-only — stays in core.
- cloud_account_aws.go is dead code — zero non-test consumers verified;
  deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers
  removed by #653).
- Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go);
  parseStringSlice + safeIntToInt32 no longer exist — helper-relocation
  task deleted.
- PlatformBackend now serves only aks + gke (eks already a #653 SDK-free
  stub); interface-audit spike audits one interface, not five.
- Phase B inventory rewritten; Phase A/C file lists corrected.
- Self-challenge doubt #4 + Assumption 7 added: inventory staleness is
  the cycle-8 defect class; audit script makes it CI-enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-9 re-baseline + audit script

Cycle-9 adversarial review caught aksBackend mis-classified as an
azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match
is a stale doc comment (line 332) — aksBackend.azureToken is a plain
net/http OAuth2 client. An import-block-disciplined re-survey found a
second comment-only false positive: nosql_dynamodb.go.

Structural fix for the recurring "grep matched a comment" defect class:
added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks
(never comments) and emits the comment-immune real-import map. Its output
now populates every file table in the design — prose claims replaced by
a build artifact. Formalized + CI-wired in Phase 0.

Corrected inventory (audit-script output): AWS 5 real-import files (not
6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go
are comment-only stubs — out of scope, stay in core.

Design consequences of aksBackend being SDK-free:
- Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all
  stay in core.
- Architecture §2 no longer proposes a new PlatformBackend contract. The
  gke cross-process mechanism is gated on an interface-audit spike whose
  preferred outcome is folding into the existing ResourceDriver contract
  — a dedicated contract for one backend is YAGNI.
- Phase A (Azure) is now pure IaCStateBackend — touches no platform file.
- Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/
  eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and
  fixes the stale line-332 comment.
- The gke platform extraction + its contract decision move to Phase C.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-10 re-baseline, AWS scope boundary explicit

Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9
audit script scanned only module/, missing five aws-sdk-go-v2 importers
under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal
("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by
the four phases as written.

Structural fix — third defect-class variant closed:
- audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and
  splits results module/ vs. elsewhere. Comment-immune (cycle 9) +
  scope-complete (cycle 10) + CI-enforced (Phase 0).

Whole-repo inventory result:
- Azure + GCP SDK usage is entirely module/-resident → Phases A and C
  drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate).
- aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files
  in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go.

Scope decision: the out-of-module/ AWS surface is exactly #653's
deliberately-retained "RBAC/secrets/artifact stay" scope (plus the
provider/aws deploy provider). This design does NOT unilaterally
override #653's recent documented decision — it scopes that surface OUT
(new Non-Goal, parallel to godo) and logs a recommended successor issue.

Consequences threaded through the doc:
- Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is
  module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the
  out-of-scope surface).
- Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP,
  module/-scoped zero for AWS.
- Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's
  scope decision is respected, not contested.
- Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check +
  sessionName extraction sit between declared-input recording and the
  SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale
  line-18 backend-list comment), M2 (internal/legacyaws noted).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction design — cycle-11 PASS, minor cleanups

Adversarial review cycle 11: PASS (zero Critical, zero Important). Two
Minor nits applied:
- audit-cloud-symbols.sh: real_import now also matches single-line
  `import "..."` form, not just parenthesized blocks — closes the one
  latent parser false-negative the reviewer flagged.
- §Goals: clarified that the module/-scoped AWS-zero `--check` assertion
  is deferred-implementation added in Phase C (the committed script only
  enforces the cloud_account_aws_creds.go post-Phase-B invariant today),
  parallel to the Phase 0 init()-partition deferral.

Design phase complete — proceeding to writing-plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(scripts): audit-cloud-symbols single-line-import grep poisoned the pipe

The cycle-11 single-line-import hardening added an inner `grep -E '^import "'`
whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`,
making real_import() return false for every file lacking a single-line
import. Added `|| true` on the inner grep. Verified: full report restored,
all REAL/comment-only classifications correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction implementation plan (Phase 0 + Phase A)

Bite-sized TDD plan for the first executable increment: Phase 0 (split
platform_kubernetes_kind.go, fix the stale comment, wire the audit
script into CI) + Phase A (IaCStateBackend proto + benchmark-gated
proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging
guard, workflow-plugin-azure implementation, core deletion dropping
azure-sdk from go.mod).

14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on
plan — their concrete tasks depend on Phase A's outputs (the
benchmark-validated proto shape, the host-resolution pattern, the
plugin-side serve path), so planning them now would be fiction. The
design doc remains the authoritative B/C/D spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — address plan-phase adversarial review

Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor).
All addressed:

- C1 (Critical): Task 4's proto used google.protobuf.Struct, which
  iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config
  as `bytes outputs_json`/`bytes config_json` (the established
  ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not
  structpb. Removed the bogus struct.proto import step.
- I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives
  there), not `cd plugin/external/proto`.
- I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up
  the new benchmarks automatically) — no redundant harness; clarified the
  task is a one-time decision gate.
- I3: Task 8's embedded research spike resolved at plan time — engine.go
  was read; integration is the design-sanctioned package-level
  module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal.
  Tasks 8/13/14 now have concrete file sets.
- I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo,
  workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly.
- M1: Task 5's benchmark file is now genuinely self-contained (local
  benchStateToProto + benchStateBackendServer; no forward references).
- M2: Task 3 names ci.yml directly, places the audit job beside the
  existing godo-banned/aws-sdk-banned grep-gate jobs.
- M3: Task 6 pins benchstat (go install + bare invocation).
- M4: Task 9 states the redaction gap is verified against
  step_output_redactor.go:7-19, not a live deduction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 2 fixes

Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed
resolved; 2 new Important + 2 Minor surfaced by the new-defect scan.
All addressed:

- I-A (Important): Task 9's redaction test was inconsistent with the
  actual redactMap behavior — a key named `credentials` matches the
  existing `credential` pattern and is wholesale-replaced with the
  placeholder STRING before any recursion, so the test's
  `.(map[string]any)` assertion panicked. Reworked Task 9: the
  `credentials:` block is ALREADY redacted wholesale (regression-tested);
  the real gap is `credentials_ref` being over-redacted (it's a module
  name, not a secret) — fix is a narrow `*_ref`-suffix exemption in
  isSensitiveField, not camelCase leaf patterns (which would be dead code
  given wholesale redaction happens first).
- I-B (Important): Task 14's engine.go integration seam was
  under-specified and would fight loadPluginInternal's no-concrete-types
  precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now
  defines an `IaCStateBackendProvider` optional interface and
  type-asserts it in loadPluginInternal exactly like the existing
  stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter
  implements it. Concrete file set + code sketch added.
- M-i: Task 6's benchmark.yml description corrected (runs `go test
  -bench=.` inline, not `make bench-baseline`).
- M-ii: Task 4 notes the proto README's plugin.proto-specific wording is
  stale; trust root buf.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): cloud-SDK extraction plan — plan-review cycle 3 PASS + minor cleanups

Plan-phase adversarial review cycle 3: PASS (zero Critical, zero
Important). Two Minor doc-tightening fixes applied:
- Task 9 Step 4 now names bearer_token_ref explicitly and explains why
  the *_ref exemption is safe for it (SecretRef is a reference struct,
  not a raw secret) rather than claiming no *_ref field exists.
- engine.go line citations corrected to 311-326.

Plan phase complete — proceeding to alignment-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: lock scope for cloud-sdk-extraction (alignment passed)

* refactor(module): split platform_kubernetes_kind.go into _core + _gke

Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/
aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core
init(); gkeBackend (the only SDK-bearing k8s backend) moves to
platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same
five backend names registered. Isolates the lone SDK-bearing platform
file for a later clean deletion.

* docs(module): add file-purpose headers to platform_kubernetes _core/_gke

Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition
self-documenting for readers without the commit message.

* docs(module): fix stale 'Requires the Azure SDK' comment on aksBackend

aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk
consumer. The stale comment is what fooled an earlier inventory pass into
mis-counting platform_kubernetes_kind.go as an azure-sdk importer.

* ci(audit): enforce k8s-backend init() partition + run audit on every PR

Extends audit-cloud-symbols.sh --check with an init()-partition assertion
(platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only
gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned /
aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced
artifact rather than a prose claim.

* docs(plans): IaCStateBackend transport benchmark result — decision pending

Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a
worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar.

Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB
map[string]any (inherent to the bytes outputs_json wire format the
iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB
message cap. The plan's contingency remedy (streaming redesign) addresses
message-size-cap + memory-buffering, neither of which the benchmark hits;
streaming would not move the number.

Recommendation: retain unary (6.5ms is still negligible vs real cloud
backend I/O — the design's own bar-rationale). Deviation from the literal
5ms estimate-bar is surfaced to the operator, not absorbed silently.
Scope lock intact: Task 6 run + recorded, no task added/dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plans): Task 6 resolved — unary IaCStateBackend LOCKED (operator-confirmed)

Operator reviewed the 6.51ms benchmark + root-cause analysis and
confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's
gate resolves Unary LOCKED — the Task 4 proto stands, no streaming
redesign, PR 2/3 proceed unchanged.

Operator additionally raised a long-term architectural item: IaC state is
persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR)
with JSON-export + content-detection-on-read would be better for
processing/type-correctness/large-state scaling. Logged as a
post-extraction follow-up in both the benchmark decision record and the
design doc's Open items — distinct from the wire contract, cross-cutting
across all IaCStateStore impls, needs its own brainstorming pass. Not
actioned in this locked plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "chore: lock scope for cloud-sdk-extraction (alignment passed)"

This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954.

* docs(plans): amend cloud-sdk-extraction plan — PR 6 (ctx) + de-gate PR 4

Operator-approved scope amendment to the (reverted-to-Draft) plan:
- ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 /
  Task 15. Task 7 had to hardcode context.Background() in
  grpcIaCStateStore; the operator directed widening the interface now
  while we're at that boundary, so Phase B/C/D plugin backends inherit it
  ctx-ful. Bounded blast radius (~9 files, all in module/);
  interfaces.IaCStateStore already had ctx and is untouched.
- ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo.
  Operator: agents should operate in plugin repos directly; the real
  requirement is prompt clarity (absolute repo path stated up front), not
  a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes
  updated accordingly.
- Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented
  (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note
  updated to RESOLVED (unary locked).

Next: re-run alignment-check on the amended plan, then re-lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-lock scope for cloud-sdk-extraction (amended — alignment re-passed)

* feat(proto): add IaCStateBackend service to iac.proto

Strict 6-method contract mirroring module.IaCStateStore 1:1, with an
IaCState message mirroring module.IaCState. Free-form Outputs/Config
maps cross the wire as bytes outputs_json/config_json per the iac.proto
hard invariant (NO google.protobuf.Struct) — same pattern as
ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated
bindings via buf.

* test(module): add IaCStateBackend gRPC-vs-in-process benchmark harness

Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock
both in-process (baseline) and over a real bufconn gRPC boundary
(post-extraction path). Self-contained (local benchStateToProto +
benchStateBackendServer; Task 7 promotes production versions). Feeds the
unary-vs-streaming proto-transport decision in the next task.

* test(wftest): add IaCStateBackend to iacServiceChecks coverage table

Task 4 added the IaCStateBackend service to iac.proto but missed the
corresponding iacServiceChecks row in wftest/bdd/strict_iac.go.
TestIaCServiceChecks_CoversEveryProtoService enforces parity between
iac.proto's services and that table — it was failing on the missing
entry. Belongs with PR 2 (the proto PR).

* feat(module): IaCState proto converters + grpcIaCStateStore client adapter

grpcIaCStateStore implements module.IaCStateStore over an
IaCStateBackendClient — the host-side half of the new contract.
iacStateToProto/iacStateFromProto convert the free-form Outputs/Config
maps via encoding/json (no structpb — iac.proto hard invariant).
iacStateBackendServer is the production server type. Promotes these out
of the benchmark file so one canonical copy is shared.

* docs(module): note context.Background() follow-up on grpcIaCStateStore

Code-review Minor: the spec asked for the hardcoded context.Background()
to be acknowledged as a known follow-up (IaCStateStore has no ctx param)
rather than silently used.

* feat(module): engine-side iac.state plugin-backend registry + dispatch

A package-level iacStateBackendRegistry maps a backend name to a
pb.IaCStateBackendClient; the engine populates it at plugin-load time
(Task 14). IaCModule.Init()'s switch gains a default arm that resolves
non-core backend names from the registry, constructing a
grpcIaCStateStore. Reserved core names (memory/filesystem/postgres) are
rejected at registration. The existing in-process backend cases
(incl. azure_blob) are untouched here — the plumbing exists and is
tested; PR 5 flips azure_blob onto it.

* feat(module): exempt *_ref keys from redaction; lock in credentials: redaction

Option-1 credentials move raw cloud secrets inline into plugin-native
module config under a credentials: key — already redacted wholesale by
the existing 'credential' pattern (regression test added). But that same
pattern over-redacts credentials_ref:, which holds a module NAME, not a
secret. Adds a narrow *_ref-suffix exemption to isSensitiveField so
reference keys are preserved for trace debuggability.

* refactor(module): name the _ref redaction-exemption suffix as a const

Code-review Minor: refFieldSuffix const for consistency with the
existing safeFieldSuffix (_display) exemption.

* test(plugin/external): guard against gRPC body-logging interceptors

CreateModule requests carry inline credentials: blocks (Option-1
credentials model). This guard fails CI if any plugin/external/ file
gains a gRPC interceptor option, forcing a reviewer to confirm it cannot
log request bodies. Implements the cloud-sdk-extraction design's Security
guard-test requirement.

* test(plugin/external): broaden interceptor guard to Stream interceptors

Code-review catch: the guard regex covered Unary only. CreateModule is
unary today, but a future streaming RPC carrying credentials must not
slip a stream interceptor past the guard. Now matches (Unary|Stream).

* feat(module)!: add ctx context.Context to IaCStateStore (operator amendment)

Widens module.IaCStateStore's 6 methods with a leading ctx parameter so
grpcIaCStateStore plumbs the caller's real context (was
context.Background()) and iacStateBackendServer forwards its gRPC ctx
into the store. The 6 in-process backends accept ctx; postgres/spaces/
gcs/azure use it for their SDK/DB calls. pipeline_step_iac.go callers
pass the step context.

Operator-approved scope amendment — see decisions/0033. The separate
interfaces.IaCStateStore already had ctx and is untouched.

Caller inventory note: cmd/wfctl/infra_state_store.go (the wfctl wrapper around the concrete Spaces/Postgres stores) was also updated — a mechanical consequence of the widening, beyond the plan's Files list; its wrapper methods already carried a ctx.

Rollback: revert this commit — mechanical signature-only widening.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): propagate context errors from state lookups instead of swallowing

Addresses Copilot review on PR #671 — now that IaCStateStore carries the
caller's ctx, silently dropping its errors lets a canceled step proceed
against a stale state view:
- pipeline_step_iac.go: iac_plan/iac_apply/iac_destroy now go through a
  lookupExistingState helper that returns any GetState error (including
  ctx cancellation/deadline) so the step aborts.
- iac_state_gcs.go / iac_state_azure.go: ListStates aborts on a
  context-cancellation error mid-iteration rather than returning partial
  results; genuinely unreadable objects/blobs are still skipped.
- iac_state.go: ListStates doc clarifies nil filter == "no filter" to
  match actual call sites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 19, 2026
- Replace false CHANGELOG false-positive narrative in 'What Went Well #4' with accurate description of actual spec gaps: banner hyperlink format, missing .github/ templates, outdated engine version pins, example module mismatches, undocumented GH_TOKEN requirement
- Replace false CHANGELOG narrative in 'What Didn't #2' with accurate list of surface-area issues found in Tasks 11-13 and how rework was prospectively applied

Fixes spec-reviewer feedback issues 5-6 on PR #719.
intel352 added a commit that referenced this pull request May 19, 2026
* docs(retro): multi-repo OSS-readiness QoL sweep (2026-05-19)

Closes the loop on the cross-repo doc + license + experimental-marker
sweep authored at workflow#714. Records what went well, what didn't, and
follow-up tracking issues (workflow-registry#717) for registry-manifest
creation for 11 P2 plugins and archived-repo notation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs(retro): fix Tasks 11-13 issue descriptions

- Replace false CHANGELOG false-positive narrative in 'What Went Well #4' with accurate description of actual spec gaps: banner hyperlink format, missing .github/ templates, outdated engine version pins, example module mismatches, undocumented GH_TOKEN requirement
- Replace false CHANGELOG narrative in 'What Didn't #2' with accurate list of surface-area issues found in Tasks 11-13 and how rework was prospectively applied

Fixes spec-reviewer feedback issues 5-6 on PR #719.

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 31, 2026
…#805)

* docs(cigen): design (adversarial PASS) + implementation plan for #3 per-phase secret scoping + #4 migration flags

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(cigen): revise plan per adversarial C1/C2/C3 + Important

- C1/C2: new tests in internal package cigen files (analyze_phase_test.go, render_gha_phase_test.go) — unexported funcs reachable, own config/strings imports
- C3: Task 5 regen uses REAL ci plan/generate flags (--out, --write; no --stdout/--format/--output-dir) per GAP.md recipe
- Important: add on-disk golden test (multisite_evidence_test.go) locking the committed evidence
- note pre-existing render tests survive (Contains-asserts)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(cigen): plan PASS adversarial cycle-2 (fix Task 2/3 git-add paths to *_phase_test.go)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: lock scope for cigen fidelity (alignment passed)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(cigen): add DeployPhase.Secrets + Scoped for per-phase scoping

* feat(cigen): scope per-phase secrets in Analyze; derive single-env migrations --env

* feat(cigen): per-phase env block + wfctl migrations up --format json

* test(cigen): regen multisite evidence (scoped prereq env + migrations --format json) + on-disk golden test; honest GAP.md

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade Modular

2 participants