Skip to content

feat(iac): IaCPlan schema + plan-stale diagnostic (W-1 of 12)#523

Merged
intel352 merged 29 commits into
mainfrom
feat/iac-plan-schema-diagnostic
May 4, 2026
Merged

feat(iac): IaCPlan schema + plan-stale diagnostic (W-1 of 12)#523
intel352 merged 29 commits into
mainfrom
feat/iac-plan-schema-diagnostic

Conversation

@intel352
Copy link
Copy Markdown
Contributor

@intel352 intel352 commented May 3, 2026

Summary

W-1 of the 12-PR IaC root-cause + provider conformance plan series. Adds the IaCPlan.SchemaVersion + IaCPlan.InputSnapshot + PlanAction.ResolvedConfigHash + standalone DriftEntry type. Wires per-key plan-stale diagnostic into the persisted---plan apply path. Adds iac/inputsnapshot/ package (Compute + Snapshot + NewTolerantEnvProvider + preservedFingerprint sentinel + ComputeDrift + FormatStaleError + ErrEnvVarChanged sentinel). Warns when plan.json is not in .gitignore.

Plan reference: docs/plans/2026-05-03-iac-conformance-and-replace.md rev10 (commit on design/iac-conformance-and-replace branch). 9 adversarial-review cycles; user ratified Option C (W-9 includes ProviderPlanner; ADR 007).

What ships

6 commits in dependency order:

  • 324d856 — schema fields + DriftEntry type (T1.1)
  • 7c8c3ac — inputsnapshot.Compute / Snapshot / NewTolerantEnvProvider with preservedFingerprint sentinel (T1.2)
  • 4774c3b — wfctl infra plan writes InputSnapshot to plan.json (T1.3)
  • 295d354 — ComputePlan sets ResolvedConfigHash on create + update actions (T1.4)
  • b442dae — wfctl infra plan warns on missing .gitignore entry for plan.json (T1.6)
  • 43c8ced — typed ErrEnvVarChanged + ComputeDrift sentinel-honoring + persisted-plan diagnostic wiring (T1.5)

Out of scope (deferred to later PRs)

  • ApplyResult fields (InitialInputSnapshot, InputDriftReport, ReplaceIDMap) ship in W-3a/T3.0.4
  • In-process apply path postcondition wiring ships in W-3a/T3.1.5
  • wfctlhelpers.ApplyPlan body ships in W-3a/T3.1
  • Refresh-outputs (W-2), Replace action (W-3a/W-3b), JIT secret resolution (W-5), conformance suite (W-7)

Critical design constraints (rev1-rev10 history)

  • preservedFingerprint is UNEXPORTED; NewTolerantEnvProvider is the only sanctioned sentinel injector
  • ComputeDrift honors the sentinel via in-package access; tests reference unsetFingerprintPlaceholder constant (not the literal "(unset)" string) — closes cycle-7 brittle-test fix
  • Persisted-plan path uses OSEnvProvider (drift detection desired); in-process path will use NewTolerantEnvProvider in W-3a (preservation desired for sub-action env-cleanup case)
  • No ghost-stub of wfctlhelpers.ApplyPlan ships in this PR — the helper package lands in W-3a/T3.1

Test plan

  • GOWORK=off go test ./interfaces/... ./iac/inputsnapshot/... ./platform/... ./cmd/wfctl/... PASS
  • Manual: wfctl infra plan -o plan.json against a config with ${VAR} references; inspect plan.json contains input_snapshot map; .gitignore warning surfaces if entry absent
  • CI: standard test + lint pipeline green

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 3, 2026 23:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds plan-format schema fields and input-fingerprint drift diagnostics to strengthen IaC plan/apply conformance, especially for the persisted wfctl infra apply --plan path.

Changes:

  • Extend interfaces.IaCPlan / interfaces.PlanAction with SchemaVersion, InputSnapshot, ResolvedConfigHash, and a shared DriftEntry type.
  • Introduce iac/inputsnapshot to compute env-var fingerprints, drift reports, and canonical “plan stale” diagnostics (including a typed sentinel error).
  • Wire snapshot capture into wfctl infra plan, drift checking into wfctl infra apply --plan, and add a heuristic warning when plan.json isn’t gitignored.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
platform/differ.go Populate per-action ResolvedConfigHash on create/update plan actions.
platform/differ_test.go Add coverage asserting ResolvedConfigHash is set correctly per action.
interfaces/iac_state.go Add SchemaVersion, InputSnapshot, ResolvedConfigHash, and DriftEntry to public plan/state interfaces.
interfaces/iac_state_test.go JSON round-trip tests for the newly added schema fields.
iac/inputsnapshot/snapshot.go Implement fingerprint computation, OS/env providers, and tolerant sentinel handling.
iac/inputsnapshot/snapshot_test.go Tests for fingerprint format, determinism, unset handling, and sentinel pass-through.
iac/inputsnapshot/compute_drift.go Compute drift entries from plan/apply snapshots, honoring preservation sentinel.
iac/inputsnapshot/compute_drift_test.go Tests for drift detection and sentinel suppression behavior.
iac/inputsnapshot/diagnostic.go Canonical formatting for plan-stale drift diagnostics.
iac/inputsnapshot/errors.go Typed sentinel error for env-var drift (plan-stale) detection.
cmd/wfctl/infra_inputsnapshot.go Scan config for ${VAR}/$VAR refs and compute InputSnapshot for infra plans.
cmd/wfctl/infra.go Write snapshot/schema to plan output; warn on missing gitignore; enforce drift check during apply --plan.
cmd/wfctl/infra_plan_inputsnapshot_test.go Ensure wfctl infra plan -o persists InputSnapshot and SchemaVersion.
cmd/wfctl/infra_plan_gitignore.go Heuristic .gitignore coverage check for plan output warnings.
cmd/wfctl/infra_plan_gitignore_test.go Tests for warning emission/suppression based on .gitignore contents.
cmd/wfctl/infra_apply_plan_test.go Persisted-plan path test ensuring drift triggers typed error + per-key diagnostic and blocks provider.Apply.
Comments suppressed due to low confidence (1)

cmd/wfctl/infra.go:1104

  • SchemaVersion is written to plan.json, but the --plan apply path never validates it. To avoid older wfctl binaries accidentally applying a future/incompatible plan format, add a check after loading the plan (e.g., reject SchemaVersion > supported) before proceeding with drift/hash validation.
	if planFile != "" {
		plan, err := loadPlanFromFile(planFile)
		if err != nil {
			return err
		}
		// Validate that the plan is still current relative to the config.
		desired, err := parseInfraResourceSpecsForEnv(cfgFile, envName)
		if err != nil {
			return fmt.Errorf("parse infra resource specs: %w", err)
		}
		if plan.DesiredHash == "" {
			return fmt.Errorf("plan file has no hash — regenerate with: wfctl infra plan -o plan.json")
		}
		// Check the input-fingerprint drift first so the operator gets a
		// per-key diagnostic instead of the generic config-hash mismatch.
		// (Env-var changes are a strict subset of config-hash differences;
		// flagging them here yields the actionable message.) Names list is
		// derived from plan.InputSnapshot keys — no separate InputNames field.
		if len(plan.InputSnapshot) > 0 {
			names := make([]string, 0, len(plan.InputSnapshot))
			for k := range plan.InputSnapshot {
				names = append(names, k)
			}
			applySnap := inputsnapshot.Compute(names, inputsnapshot.OSEnvProvider)
			if drift := inputsnapshot.ComputeDrift(plan.InputSnapshot, applySnap); len(drift) > 0 {
				return fmt.Errorf("%w\n%s", inputsnapshot.ErrEnvVarChanged, inputsnapshot.FormatStaleError(drift))
			}
		}
		currentHash := desiredStateHash(desired)
		if plan.DesiredHash != currentHash {
			return fmt.Errorf("plan stale: config hash mismatch (run wfctl infra plan again)")
		}

Comment on lines +20 to +34
// Drift entries are sorted by Name for deterministic output. An empty drift
// report yields the singular line "plan stale: 0 input(s) changed since plan"
// — callers should avoid invoking the formatter when no drift exists.
func FormatStaleError(drift []interfaces.DriftEntry) string {
sorted := make([]interfaces.DriftEntry, len(drift))
copy(sorted, drift)
sort.Slice(sorted, func(i, j int) bool { return sorted[i].Name < sorted[j].Name })

var b strings.Builder
fmt.Fprintf(&b, "plan stale: %d input(s) changed since plan\n", len(sorted))
for _, d := range sorted {
fmt.Fprintf(&b, " %s: fingerprint %s (plan) → %s (apply)\n", d.Name, d.PlanFingerprint, d.ApplyFingerprint)
}
b.WriteString(" hint: ensure all env vars referenced by infra.yaml are exported to both Plan and Apply steps")
return b.String()
Comment thread iac/inputsnapshot/errors.go Outdated
// has a different fingerprint at apply time. Callers can match with
// errors.Is(err, ErrEnvVarChanged) to detect the plan-stale case
// independently of the human-readable per-key drift message.
var ErrEnvVarChanged = errors.New("plan stale: env-var changed since plan")
Comment on lines +44 to +70
// preservedFingerprint is a sentinel value indicating an env-var was set at
// plan time but is unset at apply time (sub-action cleanup is the canonical
// case). ComputeDrift (T1.5) skips drift detection for keys whose applySnap
// value is this sentinel. UNEXPORTED (rev6 — addresses cycle-5 Important on
// external-bypass channel): NewTolerantEnvProvider is the only sanctioned
// way to inject the sentinel; external callers cannot defeat drift detection.
//
// Cross-function contract:
// - Compute (this file, in-package) passes the sentinel through unhashed.
// - NewTolerantEnvProvider (this file) returns the sentinel for plan-time-set
// but apply-time-unset vars (in-package access to the constant).
// - ComputeDrift (compute_drift.go, T1.5, same package) honors the sentinel
// by skipping drift detection for that key.
const preservedFingerprint = "__plan_time_preserved__"

// NewTolerantEnvProvider returns an EnvProvider closure used by the
// in-process apply postcondition (T3.1.5). When a var was set at plan time
// (present in planSnapshot) but is now unset (sub-action cleanup), the
// closure returns the in-package preservedFingerprint sentinel so
// ComputeDrift suppresses the (false-positive) drift entry. For vars
// genuinely unset at both times, returns ("", false) → Compute drops the
// key from the resulting map.
//
// This is the ONLY sanctioned way to inject the preservation sentinel.
// Direct callers of Compute with a custom env-provider cannot construct
// the sentinel value because it is unexported.
func NewTolerantEnvProvider(planSnapshot map[string]string) func(name string) (string, bool) {
Comment thread cmd/wfctl/infra_plan_gitignore.go Outdated
Comment on lines +63 to +66
func gitignoreCovers(data []byte, base, planAbs, gitignoreDir string) bool {
ext := filepath.Ext(base)
scanner := bufio.NewScanner(strings.NewReader(string(data)))
for scanner.Scan() {
Comment thread interfaces/iac_state.go Outdated
Comment on lines +58 to +60
// InputSnapshot records every env var name read during ${VAR} substitution
// at plan time, mapped to a 16-hex-char (64-bit) sha256 prefix of the value.
// Apply re-computes inputs and prints diagnostic on mismatch.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

⏱ Benchmark Results

No significant performance regressions detected.

benchstat comparison (baseline → PR)
## benchstat: baseline → PR
baseline-bench.txt:245: parsing iteration count: invalid syntax
baseline-bench.txt:290613: parsing iteration count: invalid syntax
baseline-bench.txt:644485: parsing iteration count: invalid syntax
baseline-bench.txt:968099: parsing iteration count: invalid syntax
baseline-bench.txt:1277031: parsing iteration count: invalid syntax
baseline-bench.txt:1592872: parsing iteration count: invalid syntax
benchmark-results.txt:247: parsing iteration count: invalid syntax
benchmark-results.txt:324578: parsing iteration count: invalid syntax
benchmark-results.txt:614589: parsing iteration count: invalid syntax
benchmark-results.txt:936970: parsing iteration count: invalid syntax
benchmark-results.txt:1212041: parsing iteration count: invalid syntax
benchmark-results.txt:1501269: parsing iteration count: invalid syntax
goos: linux
goarch: amd64
pkg: github.com/GoCodeAlone/workflow/dynamic
cpu: AMD EPYC 7763 64-Core Processor                
                            │ baseline-bench.txt │        benchmark-results.txt        │
                            │       sec/op       │    sec/op      vs base              │
InterpreterCreation-4              3.779m ± 172%   3.353m ± 213%       ~ (p=0.485 n=6)
ComponentLoad-4                    3.707m ±   1%   3.609m ±   1%  -2.66% (p=0.002 n=6)
ComponentExecute-4                 1.952µ ±   1%   1.935µ ±   3%       ~ (p=0.288 n=6)
PoolContention/workers-1-4         1.103µ ±   4%   1.085µ ±   2%       ~ (p=0.087 n=6)
PoolContention/workers-2-4         1.099µ ±   5%   1.098µ ±   3%       ~ (p=0.853 n=6)
PoolContention/workers-4-4         1.101µ ±   1%   1.085µ ±   1%  -1.45% (p=0.006 n=6)
PoolContention/workers-8-4         1.101µ ±   1%   1.088µ ±   1%  -1.18% (p=0.002 n=6)
PoolContention/workers-16-4        1.120µ ±   3%   1.093µ ±   7%       ~ (p=0.065 n=6)
ComponentLifecycle-4               3.825m ±   1%   3.630m ±   1%  -5.12% (p=0.002 n=6)
SourceValidation-4                 2.431µ ±   0%   2.257µ ±   1%  -7.18% (p=0.002 n=6)
RegistryConcurrent-4               900.1n ±   3%   831.1n ±   3%  -7.67% (p=0.002 n=6)
LoaderLoadFromString-4             3.855m ±   1%   3.640m ±   1%  -5.57% (p=0.002 n=6)
geomean                            18.34µ          17.61µ         -3.98%

                            │ baseline-bench.txt │        benchmark-results.txt         │
                            │        B/op        │     B/op      vs base                │
InterpreterCreation-4               2.027Mi ± 0%   2.027Mi ± 0%       ~ (p=0.853 n=6)
ComponentLoad-4                     2.180Mi ± 0%   2.180Mi ± 0%       ~ (p=0.896 n=6)
ComponentExecute-4                  1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-1-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-2-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-4-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-8-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-16-4         1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
ComponentLifecycle-4                2.183Mi ± 0%   2.183Mi ± 0%       ~ (p=0.331 n=6)
SourceValidation-4                  1.984Ki ± 0%   1.984Ki ± 0%       ~ (p=1.000 n=6) ¹
RegistryConcurrent-4                1.133Ki ± 0%   1.133Ki ± 0%       ~ (p=1.000 n=6) ¹
LoaderLoadFromString-4              2.182Mi ± 0%   2.182Mi ± 0%       ~ (p=0.907 n=6)
geomean                             15.25Ki        15.25Ki       -0.00%
¹ all samples are equal

                            │ baseline-bench.txt │        benchmark-results.txt        │
                            │     allocs/op      │  allocs/op   vs base                │
InterpreterCreation-4                15.68k ± 0%   15.68k ± 0%       ~ (p=1.000 n=6)
ComponentLoad-4                      18.02k ± 0%   18.02k ± 0%       ~ (p=1.000 n=6)
ComponentExecute-4                    25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-1-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-2-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-4-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-8-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-16-4           25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
ComponentLifecycle-4                 18.07k ± 0%   18.07k ± 0%       ~ (p=1.000 n=6) ¹
SourceValidation-4                    32.00 ± 0%    32.00 ± 0%       ~ (p=1.000 n=6) ¹
RegistryConcurrent-4                  2.000 ± 0%    2.000 ± 0%       ~ (p=1.000 n=6) ¹
LoaderLoadFromString-4               18.06k ± 0%   18.06k ± 0%       ~ (p=1.000 n=6) ¹
geomean                               183.3         183.3       +0.00%
¹ all samples are equal

pkg: github.com/GoCodeAlone/workflow/middleware
                                  │ baseline-bench.txt │       benchmark-results.txt       │
                                  │       sec/op       │   sec/op     vs base              │
CircuitBreakerDetection-4                 287.2n ± 17%   285.4n ± 4%  -0.64% (p=0.022 n=6)
CircuitBreakerExecution_Success-4         21.52n ±  0%   21.52n ± 0%       ~ (p=0.974 n=6)
CircuitBreakerExecution_Failure-4         65.88n ±  1%   65.30n ± 0%  -0.87% (p=0.002 n=6)
geomean                                   74.12n         73.74n       -0.50%

                                  │ baseline-bench.txt │       benchmark-results.txt        │
                                  │        B/op        │    B/op     vs base                │
CircuitBreakerDetection-4                 144.0 ± 0%     144.0 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Success-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Failure-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                              ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                  │ baseline-bench.txt │       benchmark-results.txt        │
                                  │     allocs/op      │ allocs/op   vs base                │
CircuitBreakerDetection-4                 1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Success-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Failure-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                              ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

pkg: github.com/GoCodeAlone/workflow/module
                                 │ baseline-bench.txt │       benchmark-results.txt        │
                                 │       sec/op       │    sec/op     vs base              │
JQTransform_Simple-4                     881.2n ± 26%   875.9n ± 33%       ~ (p=0.699 n=6)
JQTransform_ObjectConstruction-4         1.465µ ±  0%   1.444µ ±  1%  -1.40% (p=0.002 n=6)
JQTransform_ArraySelect-4                3.451µ ±  3%   3.329µ ±  1%  -3.52% (p=0.002 n=6)
JQTransform_Complex-4                    39.73µ ±  0%   38.76µ ±  0%  -2.43% (p=0.002 n=6)
JQTransform_Throughput-4                 1.819µ ±  2%   1.774µ ±  1%  -2.50% (p=0.002 n=6)
SSEPublishDelivery-4                     69.89n ±  5%   73.41n ±  1%  +5.04% (p=0.002 n=6)
geomean                                  1.680µ         1.664µ        -0.94%

                                 │ baseline-bench.txt │        benchmark-results.txt         │
                                 │        B/op        │     B/op      vs base                │
JQTransform_Simple-4                   1.273Ki ± 0%     1.273Ki ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_ObjectConstruction-4       1.773Ki ± 0%     1.773Ki ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_ArraySelect-4              2.625Ki ± 0%     2.625Ki ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_Complex-4                  16.22Ki ± 0%     16.22Ki ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_Throughput-4               1.984Ki ± 0%     1.984Ki ± 0%       ~ (p=1.000 n=6) ¹
SSEPublishDelivery-4                     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                             ²                 +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                 │ baseline-bench.txt │       benchmark-results.txt        │
                                 │     allocs/op      │ allocs/op   vs base                │
JQTransform_Simple-4                     10.00 ± 0%     10.00 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_ObjectConstruction-4         15.00 ± 0%     15.00 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_ArraySelect-4                30.00 ± 0%     30.00 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_Complex-4                    324.0 ± 0%     324.0 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_Throughput-4                 17.00 ± 0%     17.00 ± 0%       ~ (p=1.000 n=6) ¹
SSEPublishDelivery-4                     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                             ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

pkg: github.com/GoCodeAlone/workflow/schema
                                    │ baseline-bench.txt │       benchmark-results.txt       │
                                    │       sec/op       │   sec/op     vs base              │
SchemaValidation_Simple-4                   1.103µ ± 21%   1.113µ ± 1%       ~ (p=0.394 n=6)
SchemaValidation_AllFields-4                1.659µ ±  2%   1.661µ ± 2%       ~ (p=0.485 n=6)
SchemaValidation_FormatValidation-4         1.587µ ±  2%   1.576µ ± 2%       ~ (p=0.851 n=6)
SchemaValidation_ManySchemas-4              1.796µ ±  1%   1.825µ ± 3%       ~ (p=0.240 n=6)
geomean                                     1.511µ         1.518µ       +0.50%

                                    │ baseline-bench.txt │       benchmark-results.txt        │
                                    │        B/op        │    B/op     vs base                │
SchemaValidation_Simple-4                   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_AllFields-4                0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_FormatValidation-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_ManySchemas-4              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                                ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                    │ baseline-bench.txt │       benchmark-results.txt        │
                                    │     allocs/op      │ allocs/op   vs base                │
SchemaValidation_Simple-4                   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_AllFields-4                0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_FormatValidation-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_ManySchemas-4              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                                ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

pkg: github.com/GoCodeAlone/workflow/store
                                   │ baseline-bench.txt │        benchmark-results.txt        │
                                   │       sec/op       │    sec/op     vs base               │
EventStoreAppend_InMemory-4                1.197µ ± 13%   1.216µ ± 14%        ~ (p=0.818 n=6)
EventStoreAppend_SQLite-4                  1.399m ±  4%   1.371m ±  8%        ~ (p=0.310 n=6)
GetTimeline_InMemory/events-10-4           13.66µ ±  4%   14.63µ ±  4%   +7.09% (p=0.004 n=6)
GetTimeline_InMemory/events-50-4           75.64µ ±  1%   63.40µ ± 10%  -16.18% (p=0.002 n=6)
GetTimeline_InMemory/events-100-4          133.1µ ± 17%   126.9µ ±  2%        ~ (p=1.000 n=6)
GetTimeline_InMemory/events-500-4          626.0µ ±  1%   655.7µ ±  1%   +4.74% (p=0.002 n=6)
GetTimeline_InMemory/events-1000-4         1.274m ±  0%   1.368m ±  1%   +7.37% (p=0.002 n=6)
GetTimeline_SQLite/events-10-4             108.5µ ±  1%   120.3µ ±  1%  +10.89% (p=0.002 n=6)
GetTimeline_SQLite/events-50-4             249.4µ ±  0%   270.4µ ±  1%   +8.41% (p=0.002 n=6)
GetTimeline_SQLite/events-100-4            419.4µ ±  1%   440.1µ ±  1%   +4.93% (p=0.002 n=6)
GetTimeline_SQLite/events-500-4            1.789m ±  1%   1.881m ±  1%   +5.16% (p=0.002 n=6)
GetTimeline_SQLite/events-1000-4           3.479m ±  1%   3.633m ±  1%   +4.42% (p=0.002 n=6)
geomean                                    220.2µ         225.4µ         +2.39%

                                   │ baseline-bench.txt │        benchmark-results.txt         │
                                   │        B/op        │     B/op      vs base                │
EventStoreAppend_InMemory-4                  800.5 ± 7%     778.0 ± 7%       ~ (p=0.677 n=6)
EventStoreAppend_SQLite-4                  1.982Ki ± 3%   1.986Ki ± 2%       ~ (p=0.197 n=6)
GetTimeline_InMemory/events-10-4           7.953Ki ± 0%   7.953Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-50-4           46.62Ki ± 0%   46.62Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-100-4          94.48Ki ± 0%   94.48Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-500-4          472.8Ki ± 0%   472.8Ki ± 0%       ~ (p=0.242 n=6)
GetTimeline_InMemory/events-1000-4         944.3Ki ± 0%   944.3Ki ± 0%       ~ (p=0.273 n=6)
GetTimeline_SQLite/events-10-4             16.74Ki ± 0%   16.74Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-50-4             87.14Ki ± 0%   87.14Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-100-4            175.4Ki ± 0%   175.4Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-500-4            846.1Ki ± 0%   846.1Ki ± 0%  +0.00% (p=0.002 n=6)
GetTimeline_SQLite/events-1000-4           1.639Mi ± 0%   1.639Mi ± 0%       ~ (p=0.485 n=6)
geomean                                    67.41Ki        67.26Ki       -0.22%
¹ all samples are equal

                                   │ baseline-bench.txt │        benchmark-results.txt        │
                                   │     allocs/op      │  allocs/op   vs base                │
EventStoreAppend_InMemory-4                  7.000 ± 0%    7.000 ± 0%       ~ (p=1.000 n=6) ¹
EventStoreAppend_SQLite-4                    53.00 ± 0%    53.00 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-10-4             125.0 ± 0%    125.0 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-50-4             653.0 ± 0%    653.0 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-100-4           1.306k ± 0%   1.306k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-500-4           6.514k ± 0%   6.514k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-1000-4          13.02k ± 0%   13.02k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-10-4               382.0 ± 0%    382.0 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-50-4              1.852k ± 0%   1.852k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-100-4             3.681k ± 0%   3.681k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-500-4             18.54k ± 0%   18.54k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-1000-4            37.29k ± 0%   37.29k ± 0%       ~ (p=1.000 n=6) ¹
geomean                                     1.162k        1.162k       +0.00%
¹ all samples are equal

Benchmarks run with go test -bench=. -benchmem -count=6.
Regressions ≥ 20% are flagged. Results compared via benchstat.

intel352 and others added 5 commits May 3, 2026 19:58
…view)

Empty drift report previously rendered as a 2-line message (header + hint),
contradicting the doc comment that promised a singular header line. Gate
the hint on len(drift) > 0 so the empty case stays minimal as documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r-facing prefix (Copilot review)

The sentinel was wrapped via fmt.Errorf("%w\n%s", ErrEnvVarChanged,
FormatStaleError(...)), so its "plan stale:" prefix duplicated the
formatter's own "plan stale: %d input(s)..." header. Reduce the sentinel
to a short machine-only marker; FormatStaleError remains the sole owner
of the human-facing prefix. Existing test assertions match
strings.Contains(err.Error(), "plan stale") via the formatter portion of
the wrapped error and continue to pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…boundary not security (Copilot review)

Previous comment claimed external callers "cannot defeat drift detection"
because the sentinel is unexported, but any caller can return the literal
string "__plan_time_preserved__" from a custom env-provider closure.
Update both the constant doc and NewTolerantEnvProvider doc to be honest:
the unexported boundary is API hygiene, not a security guarantee. Sentinel
value unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Copilot review)

bufio.NewScanner over strings.NewReader(string(data)) made an extra copy
of the .gitignore contents; switch to bytes.NewReader(data) to scan the
slice directly. Also check scanner.Err() after the loop — oversized lines
(over bufio.MaxScanTokenSize) previously fell through silently as
"not covered". Conservative behavior: scan errors return false so an
operator-visible warning is emitted rather than silently letting plan.json
land in source control.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ot review)

Comment said "every env var name read" but inputsnapshot.Compute and
OSEnvProvider intentionally omit unset vars from the resulting map. Make
the contract explicit: only set vars are fingerprinted; unset-at-plan +
unset-at-apply yields no drift entry by design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Comment on lines +72 to +74
if strings.HasPrefix(line, "!") {
continue // negation rules — skip; conservative (warn even if a later rule re-includes)
}
Comment on lines +21 to +26
if val == preservedFingerprint {
// Sentinel from NewTolerantEnvProvider — pass through unhashed
// so ComputeDrift recognizes the preservation signal. (rev6 —
// unexported per cycle-5; in-package access only.)
out[name] = preservedFingerprint
continue
Comment thread iac/inputsnapshot/snapshot_test.go Outdated
Comment on lines +45 to +46
func TestNewTolerantEnvProvider_UnsetButPlanned_ReturnsSentinel(t *testing.T) {
os.Unsetenv("STAGING_PG_PASSWORD")
Comment thread interfaces/iac_state.go Outdated
Comment on lines +73 to +74
// ResolvedConfigHash is the SHA-256 of POST-substitution Resource.Config.
// Apply re-computes per-action and surfaces per-resource diagnostic on mismatch.
intel352 and others added 4 commits May 3, 2026 20:08
…(Copilot review)

Heuristic skips !-prefixed negation rules and returns on first positive
match, so a "*.json" then "!plan.json" pattern silently passes. Acceptable
for a nudge-not-enforce warning; document the limitation in the negation
branch comment so future maintainers know the boundary. Full
last-matching-rule-wins semantics or git check-ignore shell-out are out
of scope for W-1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n impossible (Copilot review)

Previously a real env var whose value happened to equal
"__plan_time_preserved__" would be treated as a preservation sentinel and
silently suppress drift detection. Embed a NUL byte: POSIX exec(3) and
Windows CreateProcess both reject NUL inside env values, so no var the OS
delivers to a Go process can collide with the constant. In-package call
sites (Compute, NewTolerantEnvProvider, ComputeDrift) compare by string
equality against the constant — value change is transparent to them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…v (Copilot review)

Previous test called os.Unsetenv("STAGING_PG_PASSWORD") without restoring
the prior value, leaking state across the test process and creating
order-dependence with any other test that reads STAGING_PG_PASSWORD.
Switch to a test-unique env var name (WFCTL_TEST_INPUTSNAPSHOT_UNSET_KEY)
the test never sets, so no cleanup is needed and there is no cross-test
state leak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment said "SHA-256 of POST-substitution Resource.Config" but didn't
specify lower-case-hex encoding (no "sha256:" prefix) or the empty-string
short-circuit when the config map is empty (platform.ConfigHash behavior).
Make the contract explicit so downstream consumers don't guess.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Comment thread interfaces/iac_state_test.go Outdated
Comment on lines +38 to +49
func TestPlanAction_ResolvedConfigHashField(t *testing.T) {
a := PlanAction{Action: "create", ResolvedConfigHash: "sha256:abc"}
data, err := json.Marshal(a)
if err != nil {
t.Fatal(err)
}
var got PlanAction
if err := json.Unmarshal(data, &got); err != nil {
t.Fatal(err)
}
if got.ResolvedConfigHash != "sha256:abc" {
t.Errorf("ResolvedConfigHash: got %q", got.ResolvedConfigHash)
Comment thread cmd/wfctl/infra.go
Comment on lines +1086 to +1090
// Check the input-fingerprint drift first so the operator gets a
// per-key diagnostic instead of the generic config-hash mismatch.
// (Env-var changes are a strict subset of config-hash differences;
// flagging them here yields the actionable message.) Names list is
// derived from plan.InputSnapshot keys — no separate InputNames field.
Comment thread cmd/wfctl/infra_apply_plan_test.go Outdated
Comment on lines +20 to +26
// fingerprintForTest matches inputsnapshot.Compute's fingerprint format
// (16-hex-char sha256 prefix) so tests can construct expected plan-time
// snapshots without depending on the concrete env-provider closure.
func fingerprintForTest(value string) string {
sum := sha256.Sum256([]byte(value))
return hex.EncodeToString(sum[:])[:16]
}
Comment thread cmd/wfctl/infra_plan_gitignore.go Outdated
Comment on lines +105 to +111
// Scanner errors (e.g. line longer than bufio.MaxScanTokenSize) cause
// silent fall-through if not checked. Conservative: treat scan failure
// as not-covered, which surfaces a warning the operator can investigate
// rather than silently letting plan.json land in source control.
if err := scanner.Err(); err != nil {
return false
}
Comment on lines +44 to +54
func TestNewTolerantEnvProvider_UnsetButPlanned_ReturnsSentinel(t *testing.T) {
// Use a test-unique env-var name to avoid colliding with anything the
// process or other tests might rely on; we never set or unset it, so
// no cleanup is required and there is no cross-test state leak.
const key = "WFCTL_TEST_INPUTSNAPSHOT_UNSET_KEY"
plan := map[string]string{key: "deadbeef00000000"}
provider := NewTolerantEnvProvider(plan)
val, ok := provider(key)
if !ok || val != preservedFingerprint {
t.Errorf("expected (preservedFingerprint, true) for plan-time-set unset-now var; got (%q, %v)", val, ok)
}
intel352 and others added 4 commits May 3, 2026 20:20
…ip (Copilot review)

Test fixture used "sha256:abc" but the actual format produced by
platform.ConfigHash is a lower-case 64-hex sha256 digest with no prefix.
Replace with a realistic 64-hex value so the test mirrors on-disk shape
and won't mislead a future validator/refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (Copilot review)

runInfraPlan stamps SchemaVersion=1 on every emitted plan, but
runInfraApply was not validating it — a future bump (e.g. W-5 JIT plans)
would be silently mis-read by an older binary. Add an infraPlanSchemaVersion
constant + guard that rejects plans with schema_version > supported,
returning a clear "newer than this wfctl supports" message. Plans with
SchemaVersion=0 (predating the field) remain accepted for back-compat.
Test: TestInfraApplyConsumesPlan_FutureSchemaRejected.

Also reuse inputsnapshot.Compute in fingerprintForTest so the test always
exercises the production fingerprint algorithm — re-implementing
sha256+16-hex inline would silently drift if the scheme changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…parse failure (Copilot review)

Previous round added a scanner.Err() check that returned false either way,
making the branch a no-op. Change the helper signature to (bool, error)
and have warnIfPlanNotGitignored emit a "could not scan ... for plan.json
coverage" stderr warning when the underlying bufio.Scanner fails (e.g. a
line over bufio.MaxScanTokenSize). The "not covered" warning is suppressed
on scan failure so the operator sees the parse error rather than a
potentially-misleading coverage warning. Test: TestGitignoreCovers_ScanError_Propagates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…with cleanup (Copilot review)

Previous round used a unique env-var name and assumed it was unset, but a
hostile CI environment could pre-set it and silently flip the test result.
Explicitly Unsetenv at start, restore prior value (if any) via t.Cleanup
so the test cannot leak state across the process.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intel352 intel352 requested a review from Copilot May 4, 2026 00:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Comment on lines +40 to +62
for {
gitignore := filepath.Join(dir, ".gitignore")
if data, err := os.ReadFile(gitignore); err == nil {
foundAny = true
ok, scanErr := gitignoreCovers(data, base, abs, dir)
if scanErr != nil {
// Surface parse failure to the operator (line over
// bufio.MaxScanTokenSize, etc.) rather than silently
// pretending the file is/isn't covered.
fmt.Fprintf(w, "warning: could not scan %s for %s coverage: %v\n", gitignore, base, scanErr)
scanFailed = true
}
if ok {
covered = true
break
}
}
parent := filepath.Dir(dir)
if parent == dir {
break // reached filesystem root
}
dir = parent
}
Comment thread interfaces/iac_state.go Outdated
Comment on lines +58 to +62
// InputSnapshot records every env var name read during ${VAR} substitution
// at plan time, fingerprinting only those that were SET (16-hex-char sha256
// prefix of the value). Unset vars are omitted from the map; their absence
// at apply time is therefore not flagged as drift. Apply re-computes inputs
// and prints diagnostic on mismatch.
intel352 and others added 2 commits May 3, 2026 20:52
…iew)

Previous heuristic walked .gitignore files from the plan dir up to the
filesystem root, so an unrelated /tmp/.gitignore or $HOME/.gitignore
could shadow the real coverage check (or flake the not-covered tests).
Add findGitWorktreeRoot — pure stat-based discovery that walks up
looking for a .git entry (handles both git directories and git-worktree
pointer files). The walk now terminates at the worktree root so unrelated
ancestor .gitignore files are ignored, and outside any git worktree the
warning stays silent entirely. Tests updated to mark t.TempDir() as a
worktree (mkdir .git) where they expect the heuristic to activate;
TestPlan_NoGitWorktree_NoWarning added to cover the silent-when-untracked
path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… gap (Copilot review)

The InputSnapshot field comment claimed completeness ("env var names read
during ${VAR} substitution"), but the cmd/wfctl scanner that populates
the map intentionally does not apply top-level environments[env].envVars
defaults — that limitation was already documented at
collectInfraEnvVarRefs but not surfaced in the public interface
contract. Add a "Completeness caveat" note pointing at the scanner's
limitation so consumers don't assume the snapshot is exhaustive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Comment on lines +22 to +30
func ComputeDrift(planSnap, applySnap map[string]string) []interfaces.DriftEntry {
var drift []interfaces.DriftEntry
for name, planFP := range planSnap {
applyFP, present := applySnap[name]
if !present {
drift = append(drift, interfaces.DriftEntry{
Name: name,
PlanFingerprint: planFP,
ApplyFingerprint: unsetFingerprintPlaceholder,
…rder (Copilot review)

Map iteration order in Go is randomized, so consumers that marshal /
log / compare the returned drift slice (now exposed via *StaleError.Drift)
would see non-deterministic output across runs. FormatStaleError already
sorts independently for its printed output; sort the structured slice
once at the source so all downstream consumers benefit. Test:
TestComputeDrift_ResultIsSortedByName.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Comment thread interfaces/iac_state.go
Comment on lines +80 to +90
// ResolvedConfigHash is the SHA-256 of POST-substitution Resource.Config,
// computed via platform.ConfigHash. Encoded as lower-case hex (no
// "sha256:" prefix); empty string when the config map is empty
// (platform.ConfigHash short-circuit).
//
// Currently populated by ComputePlan and persisted in plan.json so apply
// has the per-action hash available; the apply-time consumer that surfaces
// a per-resource diagnostic on mismatch is wired in a follow-up PR (W-3a/
// T3.1.5). Until then the field is observable via plan.json inspection but
// not yet enforced at apply.
ResolvedConfigHash string `json:"resolved_config_hash,omitempty"`
…pilot review)

Field is tagged json:",omitempty" so the empty-string case is dropped
from plan.json entirely rather than persisted as ""; consumers should
treat "key missing" and "value == empty string" as the same condition.
Comment now states this explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Comment thread iac/inputsnapshot/snapshot.go Outdated
// Compute returns a map of env-var name → 16-hex-char sha256 prefix of the value.
// Variables that aren't set (lookup returns ok=false) are omitted from the snapshot.
func Compute(varNames []string, lookup func(string) (string, bool)) map[string]string {
out := make(map[string]string)
Comment thread cmd/wfctl/infra_plan_gitignore.go Outdated
Comment on lines +144 to +148
// Relative path from .gitignore dir, e.g. "cmd/wfctl/plan.json".
if rel, err := filepath.Rel(gitignoreDir, planAbs); err == nil {
if anchored == rel || anchored == filepath.ToSlash(rel) {
return true, nil
}
…stants (Copilot review)

Two micro-optimizations:
- inputsnapshot.Compute now allocates with len(varNames) capacity hint
  to avoid grow-resize cycles when many env vars are referenced.
- gitignoreCovers hoists filepath.Rel/ToSlash and the base-derived
  pattern strings (starExt, doubleStarExt, doubleStarBase) out of the
  per-line scan loop — they're constant for the whole .gitignore file.
  No behavior change; less per-line allocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

@intel352 intel352 merged commit 48f7a0c into main May 4, 2026
21 of 22 checks passed
@intel352 intel352 deleted the feat/iac-plan-schema-diagnostic branch May 4, 2026 01:20
intel352 added a commit that referenced this pull request May 4, 2026
…_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.
intel352 added a commit that referenced this pull request May 4, 2026
…ft postcondition + diff cache (W-3a of 12) (#527)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* fix(iac): diffcache.Get refreshes mtime so LRU is actually LRU (Copilot review)

Without this, frequently-read entries were evicted as if unused
because maybeEvict orders by mtime. Now Get touches mtime via
os.Chtimes(now, now), turning eviction from FIFO-by-write into
true LRU. Mtime-touch chosen over a sidecar last-accessed file
to keep the on-disk shape trivial; cost is one extra syscall per
hit, errors are ignored (failure degrades eviction precision but
never produces wrong cache results).

Adds TestCache_LRURefreshesOnGet regression test: writes N entries,
Gets the oldest, then triggers over-cap, asserts the oldest survives
and the second-oldest (now the LRU) is evicted instead.

* fix(iac): diffcache.Put uses unique temp filename to avoid same-key write races (Copilot review)

Pre-fix, two goroutines calling Put with the same Key both wrote to
`<key>.json.tmp` and one would clobber the other's temp file
mid-write, producing either a Rename failure or a half-written
final file. Now Put uses os.CreateTemp so each call gets a unique
`<key>.json.<random>.tmp` filename; the final rename is racy on
which payload wins, but both payloads were derived from the same
Key so the outcome is deterministic from the caller's perspective.

Adds godoc "Concurrency: safe for concurrent use, including
concurrent Puts of the same Key." Adds TestCache_ConcurrentSameKeyPut
regression: 20 goroutines Put the same Key, asserts no leftover
*.tmp files, asserts final cache file decodes. Run under -race.

* fix(iac): diffcache.Put atomic rename on Windows (Copilot review)

Document the os.Rename Windows limitation explicitly: on Windows,
os.Rename fails when the destination exists, so an in-place cache
update via Put will fail. The caller treats this as a write failure
and proceeds without caching — correct because apply remains correct
on a 100% miss rate (per the package's cache-as-amortization framing).

We chose documentation over vendoring github.com/google/renameio:
adding renameio would introduce the first such dependency in the
repo, and there is no Windows-supported wfctl use case today. The
existing precedent in cmd/wfctl/update.go and cmd/wfctl/plugin_install.go
also uses bare os.Rename without Windows guards.

The fix tracks the limitation in two places: the Put godoc (where
the rename happens) and the package godoc Known Limitations section
(where consumers will look).

* fix(iac): diffcache returns deep-copy of DiffResult to avoid shared-slice mutation (Copilot review)

Pre-fix, the in-memory cache stored DiffResult by value but the
Changes slice ([]FieldChange) shared its backing array between the
cached entry and the value returned to the caller. A caller
mutating the returned Changes slice (element-level or via append-
into-cap) would silently mutate the cached entry. The symmetric
case is the same: mutating the Put argument after the Put call
would leak into the cached value.

Fix: clone the Changes slice via slices.Clone in both Get and Put.
Scalar struct fields are value-copied by struct assignment so a
single helper (cloneDiffResult) covers both directions. The
filesystem cache deserializes from JSON each time so each Get
already yields a fresh slice — no change needed there.

FieldChange.Old/New are typed any; if a caller stores a pointer or
mutable map there, the deep-copy stops at the slice level. By
convention DiffResult.Changes carries scalar Old/New (strings,
numbers, bools), so that is the right tradeoff between correctness
and copy cost. Documented in memoryCache godoc.

Adds TestCache_MemoryDeepCopiesChanges regression: Put a value,
mutate the original argument, Get + mutate (element + append), Get
again, assert original is preserved.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
…s on manifest computePlanVersion (W-3b of 12) (#528)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* fix(iac): T3.6e env-var hygiene — TestMain unsets WFCTL_PLAN_DIFF_CONCURRENCY (Copilot review)

A developer shell with WFCTL_PLAN_DIFF_CONCURRENCY=1 (or any other
non-default value) would serialize ComputePlan's parallel Diff dispatch
and break the parallelism assertions in differ tests. Explicitly unset
the var in TestMain alongside the existing WFCTL_DIFFCACHE=disabled
hygiene so test runs are deterministic regardless of shell environment.

Addresses Copilot inline comment on PR #528 (platform/main_test.go:24).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.6 polish — drop double error: prefix + reuse precomputed configHash (Copilot review round 2)

Two real fixes from Copilot's re-review of PR #528:

1. **Double "error:" prefix on plugin-load failure** — cmd/wfctl/main.go's
   top-level printer already emits "error: %v" on command failure. The
   T3.6b error string in cmd/wfctl/infra_plan_provider.go was prefixed
   with a literal "error: " of its own, producing operator output like
   `error: error: failed to load plugin "do": ...`. Drop the in-error
   prefix; update the assertion in infra_plan_provider_load_test.go to
   match the unprefixed root error; clarify in the CHANGELOG that the
   "error:" prefix in the rendered string is added by wfctl's top-level
   printer (not the underlying error).

2. **Duplicate configHash work in classifyModification** — ComputePlan
   already computes `hash := configHash(spec.Config)` while bucketing
   create vs modification candidates; classifyModification was
   re-computing the same hash on every Diff dispatch. Thread the
   precomputed hash through via a new `hash string` field on
   modCandidate + new parameter on classifyModification, so the per-
   candidate hashing happens exactly once.

Addresses Copilot inline comments on PR #528 (round 2):
- cmd/wfctl/infra_plan_provider.go:121
- platform/differ.go:104

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./interfaces/... ./iac/... ./plugin/sdk/... ./module/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.6/T3.9 polish — diff-cache bypass on empty ProviderID + omit empty current_sensitive arg (Copilot review round 3)

Two real fixes from Copilot's re-review of PR #528 round 3:

1. **Diff-cache hash-collision risk on empty ProviderID** — The cache
   key shape (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs)
   does not include the resource Name. When two existing-state
   resources of the same Type both have ProviderID=="" (state-bootstrap,
   broken-plugin paths, transient races) and matching SHAConfig +
   SHAOutputs (e.g., both freshly-discovered with default-config and
   empty-outputs), they would share a cache key and could serve each
   other's cached DiffResult — misclassifying actions or skipping a
   required Diff. Defensive fix: classifyModification now skips both
   cache.Get and cache.Put when rs.ProviderID is empty, always re-
   dispatching to the driver. Cost is one extra Diff call per
   pre-bootstrap resource; benefit is correctness regardless of state
   completeness. New pin: TestComputePlan_EmptyProviderID_BypassesCache.

2. **`current_sensitive` arg serialized as null instead of omitted** —
   sensitiveToAny's docstring promises "trim-friendly" wire shape by
   returning nil for empty input, but the call site at
   remoteResourceDriver.Diff was unconditionally setting
   `args["current_sensitive"] = sensitiveToAny(...)`, which structpb
   serializes as a NullValue field rather than omitting the key.
   Conditionally include the key only when sensitiveToAny returns a
   non-nil map, matching the docstring intent. New pins:
   TestRemoteDriver_Diff_OmitsCurrentSensitiveWhenEmpty +
   TestRemoteDriver_Diff_IncludesCurrentSensitiveWhenPopulated.

Addresses Copilot inline comments on PR #528 (round 3):
- platform/differ.go:240 (cache key empty-ProviderID collision)
- cmd/wfctl/deploy_providers.go:542 (current_sensitive null vs omit)

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.6/T3.9 polish — preserve loadErr chain + lock-free diff cache + bypass-side-effect-free (Copilot review round 4)

Three real fixes from Copilot's re-review of PR #528 round 4:

1. **loadErr chain lost across runInfraPlan re-wrap (errors.Is/As)** —
   computePlanForInfraSpecs returned `failed to load plugin %q: %v;
   ...` (using %v), losing the underlying error. After runInfraPlan
   re-wraps with `compute plan: %w`, callers could not errors.Is /
   errors.As against the original loader failure (e.g. to differentiate
   "plugin binary missing" from "plugin crashed during handshake").
   Switch the inner wrap to %w. Rendered text is identical to %v.
   New pin: TestRunInfraPlan_FailsLoudOnPluginLoadFailure now asserts
   `errors.Is(err, loadErr)` reaches the sentinel through both wrap
   layers.

2. **getDiffCache called even on the empty-ProviderID bypass path** —
   classifyModification was calling getDiffCache() unconditionally,
   which (under the old per-call mutex) acquired the lock, and (under
   any backend-init pattern) would eagerly construct the filesystem
   cache backend at ~/.cache/wfctl/diff/ on the operator's machine
   even for resources that bypass the cache. Move the getDiffCache
   call inside the `if cacheable` branch so the bypass path is fully
   side-effect free. Round-3 already pinned the bypass behavior via
   TestComputePlan_EmptyProviderID_BypassesCache.

3. **Per-call sync.Mutex contention on getDiffCache hot path** —
   Under ComputePlan's parallel Diff fan-out (planDiffConcurrency()
   workers), the per-call mutex on getDiffCache was contention on
   every cache.Get / cache.Put, especially on cache hits where the
   Get itself is cheap. Refactor to sync.Once for one-time init +
   atomic.Pointer[diffcache.Cache] for lock-free reads. Subsequent
   reads are just an atomic.Load (and a typed deref). The test-swap
   helper setDiffCacheForTest is updated to Store/Restore directly
   on the atomic; cleanup seeds a fresh default when there was no
   prior value (so subsequent tests in the binary still observe a
   working cache).

Addresses Copilot inline comments on PR #528 (round 4):
- cmd/wfctl/infra_plan_provider.go:124 (%v → %w)
- platform/differ.go:235 (getDiffCache eager call on bypass path)
- platform/differ.go:405 (per-call mutex on hot path)

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... ./module/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): T3.7/T3.6 polish — DispatchVersionFor centralizes type assertion + cache nil-DiffResult as zero-value (Copilot review round 5)

Two real fixes from Copilot's re-review of PR #528 round 5 (a third
finding, plan/apply discovery duplication, is filed as a follow-up
issue rather than addressed in-PR to keep W-3b scope-locked).

1. **DispatchVersionFor docstring vs signature mismatch** — The
   helper claims to centralize the type assertion + non-implementer
   defaulting, but its parameter type was `ComputePlanVersionDeclarer`,
   forcing every call site to type-assert externally. Change the
   signature to accept `any` and perform the type assertion inside;
   non-implementers + nil now both return "v1" inside the helper as
   the docstring already promised. Param is `any` (not
   interfaces.IaCProvider) to keep the helper package
   import-free of the engine's interfaces package and to keep
   non-engine call sites (tests, stubs) frictionless. Updated the
   only production call site (cmd/wfctl/infra_apply.go) to drop the
   external type-assert.

2. **Cache no-op when driver.Diff returns (nil, nil)** — The
   cache.Put was guarded by `fresh != nil`, so providers using the
   nil-as-no-op convention (a documented option in the
   (DiffResult|nil, error|nil) return shape) re-Diffed on every
   ComputePlan call — undermining the cache contract for that whole
   class of providers. Cache a zero-value DiffResult on (nil, nil)
   returns; classifyModification's downstream switch already treats
   zero-value the same as nil (no plan action), so the semantic is
   preserved while the cache stays effective. New pin:
   TestComputePlan_NilDiffResult_CachesAsZeroValue verifies that the
   second ComputePlan against unchanged inputs is served from cache
   (driver.Diff invoked exactly once across two calls).

3. **Plan/apply provider-discovery duplication** (Copilot finding R5-C,
   not addressed in this PR) — computePlanForInfraSpecs duplicates
   the iac.provider discovery + grouping logic in applyInfraModules.
   Per workspace memory feedback_implementer_scope_bleed, refactoring
   to a shared helper is a separate task: the duplication exists
   pre-W-3b (apply was the original; plan was added in W-3b mirroring
   it intentionally), and the extraction touches code paths W-3b's
   test plan does not cover. Filed as follow-up rather than expanding
   W-3b's blast radius. Documented in PR description.

Addresses Copilot inline comments on PR #528 (round 5):
- iac/wfctlhelpers/dispatch.go:41 (signature vs docstring mismatch)
- platform/differ.go:265 (cache write skipped on (nil, nil))

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(iac): T3.7 — correct DispatchVersionFor + findIaCPluginDir doc claims (Copilot review round 6)

Two doc-comment accuracy fixes from Copilot's re-review of PR #528
round 6 — both surfaced by/exposed in the round-5 changes:

1. **findIaCPluginDir docstring referenced wrong helper** — Round 5
   changed wfctlhelpers.DispatchVersionFor to take `any` (a provider
   value), but findIaCPluginDir's docstring still told callers to pass
   the returned `computePlanVersion` string through DispatchVersionFor.
   That call wouldn't type-assert to ComputePlanVersionDeclarer (a
   string isn't a provider) and would silently default to "v1".
   Replaced with the correct pattern: string-equality against
   wfctlhelpers.DispatchVersionV2 at this loader-level seam where only
   the raw string is in hand. Includes example snippet.

2. **DispatchVersionFor docstring overstated the validation
   guarantee** — Claimed plugin/sdk.ParseManifest schema-validation
   means the dispatch only sees {"v1", "v2", ""}. True for callers
   that load via ParseManifest, but cmd/wfctl/deploy_providers.go's
   findIaCPluginDir / readIaCPluginComputePlanVersion path uses a
   minimal json.Unmarshal with NO schema validation — so unknown
   values CAN reach DispatchVersionFor at runtime. Updated the
   docstring to flag this honestly and call out that the default-to-v1
   behavior is the safety net for those paths (callers must not rely
   on the validation guarantee).

Doc-only; no code change. All packages still build + vet cleanly.

Addresses Copilot inline comments on PR #528 (round 6):
- cmd/wfctl/deploy_providers.go:107 (wrong helper referenced)
- iac/wfctlhelpers/dispatch.go:18 (overstated validation guarantee)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(iac): T3.5 — TestParseConcurrencyEnv subtest names (Copilot review round 7)

The first table case had `in: ""` and used `tc.in` directly as the
t.Run subtest name. Go's testing package silently rewrites empty
subtest names to "#00", which is unique enough to run but masks the
case identity in -v output and failure reports. Add a `name` field
to the table struct and use stable descriptive labels (empty,
non_numeric, negative, zero, one, eight, thirty_two,
thirty_three_clamped_to_max, one_hundred_clamped_to_max) while still
passing the raw `tc.in` to parseConcurrencyEnv. Identical test
coverage; clearer reporting.

Addresses Copilot inline comment on PR #528 (round 7):
- platform/differ_cache_test.go:253

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(iac): T3.5/T3.6 — clamp + in-flight counter doc accuracy (Copilot review round 8)

Two doc-only nits surfaced in Copilot's round-8 re-review of PR #528.
Both are accuracy fixes — no behaviour change.

1. **planDiffConcurrencyMin/Max comment overstated "disable"** —
   The comment said "Below 1 disables concurrency (worse than serial)",
   but parseConcurrencyEnv clamps values <=0 UP to planDiffConcurrencyMin
   (=1), which produces effectively-serial dispatch (one Diff in flight),
   not "disabled". Operators cannot turn the worker pool off, only narrow
   it to one. Updated the comment to spell that out and call out both
   clamp directions explicitly.

2. **channelGatedDriver.inFlight docstring claimed "peak"** — The
   docstring said inFlight tracks the *peak* number of simultaneous
   Diff goroutines, but…
intel352 added a commit that referenced this pull request May 4, 2026
* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): add ProviderValidator optional interface + PlanDiagnostic type

Adds an OPTIONAL `interfaces.ProviderValidator` interface that an IaCProvider
implementation MAY also satisfy to expose provider-side cross-resource
constraint validation at plan time:

    type ProviderValidator interface {
        ValidatePlan(plan *IaCPlan) []PlanDiagnostic
    }

Plus the supporting `PlanDiagnostic` type and `PlanDiagnosticSeverity` enum
(Info/Warning/Error). Consumers (e.g. the R-A10 align rule landing in the
next commit) discover ValidatePlan via type-assertion, so providers that do
not implement it keep working unchanged — purely additive.

Naming note: plan T4.1 originally proposed `Diagnostic` for this type, but
`interfaces.Diagnostic` is already taken by the unrelated Troubleshooter
runtime-event finding (`iac_resource_driver.go`). Renamed to PlanDiagnostic
to preserve W-4's pure-additive contract; the existing Troubleshooter type
is untouched.

TDD via interfaces/iac_provider_test.go covering severity-constant ordering,
PlanDiagnostic field shape, and type-assertion against both an implementor
and a non-implementor (confirms the interface remains optional).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): R-A10 align rule — provider.ValidatePlan dispatch

Adds R-A10, the align rule that surfaces provider-side cross-resource
constraint diagnostics at plan time. Wiring:

  cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan
      Iterates providers, type-asserts ProviderValidator, calls
      ValidatePlan(plan), maps each PlanDiagnostic to an AlignFinding.
      Severity mapping: Error→FAIL, Warning→WARN, Info→WARN (advisory;
      align has no INFO tier today). Resource label falls back to
      "<provider-name>:plan" for plan-level findings; field path is
      appended to the message when present.

  cmd/wfctl/infra_align.go::runInfraAlignChecks
      Dispatches R-A10 only when --plan is provided (R-A7 predicate parity).
      Loads providers via the new alignLoadProviders test seam — the
      default implementation enumerates iac.provider modules in the YAML
      and loads each through the existing resolveIaCProvider plugin path.
      Closers are released after the rule runs; a per-provider load failure
      logs a stderr warning and continues so other R-A* findings are not
      hidden.

TDD via cmd/wfctl/infra_align_ra10_test.go covers nil-plan, no-providers,
non-validating-provider-skipped, Error→FAIL, Warning→WARN, Info→WARN,
plan-level resource fallback, and multi-provider mixed-implementation
cases. Two integration tests exercise dispatch through the seam: one
asserts R-A10 fires under --strict and produces non-zero exit; the other
asserts the rule (and the loader) is silent without --plan.

Pure-additive: providers that do not implement ProviderValidator are
skipped, so this commit changes no existing align behaviour.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(iac): document ProviderValidator + R-A10 align rule

Adds the documentation pieces for W-4:

- DOCUMENTATION.md gains a new top-level "IaC Provider Plugin Interfaces"
  section that documents the optional interfaces.ProviderValidator
  interface, the PlanDiagnostic/PlanDiagnosticSeverity types, the
  ValidatePlan contract (read-only, no remote calls), the R-A10 consumer
  and its severity mapping, and the naming-distinction note vs. the
  pre-existing interfaces.Diagnostic (Troubleshooter) type.

- docs/WFCTL.md adds an `infra align` subsection under the existing
  `infra` command. It lists every R-A* rule (R-A1 through R-A10 with
  severities), the flag table, the R-A10 severity-mapping submatrix,
  and example invocations covering both plan-less and --plan/--strict
  modes.

- cmd/wfctl/dsl-reference-embedded.md (the source for `wfctl
  dsl-reference`) gains the R-A9 and R-A10 rows in the rule-families
  table and a short paragraph on R-A10's behaviour. The `--plan`
  description is updated to enable both R-A7 and R-A10.

Pure docs change; no code touched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(iac): T4.5 verification — `--plan` help text mentions R-A10

T4.5 verification surfaced one cosmetic gap: the `--plan` flag's help
description still read "enables R-A7 checks" after T4.2 added R-A10 as a
second `--plan`-gated rule. Updated to "enables R-A7 and R-A10 checks" so
`wfctl infra align --help` reflects current behaviour.

Verification steps (no further code change required):

- `GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \
   ./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/...` → all PASS.
- `go build ./cmd/wfctl` → builds clean.
- `wfctl infra align --help` → shows existing flags plus the corrected
  `--plan` description.
- Fixture-provider smoke (TestInfraAlign_RA10_FixtureProvider_Fires) wires
  a ProviderValidator returning a fatal diagnostic through the
  alignLoadProviders seam → R-A10 finding emitted, FAIL severity, non-zero
  exit under `--strict`. This satisfies T4.5 Step 3 manual rule-trigger
  smoke without needing a real plugin subprocess.
- `go vet ./interfaces/... ./cmd/wfctl/... ./iac/...` → clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(iac): T4.2/T4.4 review — Info diagnostics log, no finding

Spec-reviewer flagged that the rev10 plan T4.2 acceptance criteria specify
a three-tier severity mapping ("Errors → align failures; Warnings →
warnings; Info → logs"), and that the previous commit (76c4160) collapsed
Info into WARN. The collapse meant `wfctl infra align --strict` could exit
non-zero on a purely informational provider hint — the exact scenario the
Info tier exists to prevent (e.g. billing-tier change notices, deprecation
hints) — defeating the tier's contract.

Code (cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan):

  Severity switch reworked to three explicit cases plus a conservative
  default. PlanDiagnosticInfo now writes to a new package-level sink
  `ra10LogInfo` (stderr by default; overridable for tests) and emits NO
  AlignFinding, so it never affects exit code under any flag combination.
  PlanDiagnosticError → FAIL and PlanDiagnosticWarning → WARN are unchanged.
  Unknown future severities fall back to WARN so they cannot slip past
  --strict undetected.

  Doc-comment rewritten to spell out the three-tier mapping and the
  motivating "Info must not break --strict CI" rule.

Test (cmd/wfctl/infra_align_ra10_test.go):

  TestCheckRA10_InfoDiagnostic_BecomesWARN renamed/rewritten as
  TestCheckRA10_InfoDiagnostic_LogsAndEmitsNoFinding. Asserts:
  - len(findings) == 0
  - the captured log line carries the rule tag, [info] severity marker,
    "<provider>/<resource>" identifier, the diagnostic message, and the
    "field: <name>" suffix
  - alignExitCode(findings, strict=true) == 0 (the load-bearing guarantee)

Docs (DOCUMENTATION.md, docs/WFCTL.md):

  Both severity-mapping summaries replaced with a three-row table
  (Error → FAIL finding, Warning → WARN finding, Info → stderr log/no
  finding/no exit-code effect). Prose surrounding the table now
  explicitly calls out the strict-CI safety guarantee.

Verification:

- GOWORK=off go test -race -count=1 ./interfaces/... ./iac/...
  ./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/... → all PASS.
- markdown-link-check on the three modified docs → 0 dead links.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(iac): T4.4 review — embedded reference Info-tier mapping

Spec-reviewer caught one stale doc site missed in commit 9c41c1d:
`cmd/wfctl/dsl-reference-embedded.md:1358-1359` (the source for `wfctl
dsl-reference`) still claimed `PlanDiagnosticInfo` produced a WARN
AlignFinding. Replaced with the full three-tier prose so `wfctl
dsl-reference` callers see the corrected mapping:

  - PlanDiagnosticError   → FAIL AlignFinding (always non-zero exit)
  - PlanDiagnosticWarning → WARN AlignFinding (non-zero only under --strict)
  - PlanDiagnosticInfo    → stderr log "R-A10 [info] <provider>/<resource>:
                            <message>"; no AlignFinding so --strict CI
                            gates never fail on informational hints

The R-A10 row in the table at :1354 ("FAIL or WARN") is unchanged — Info
no longer produces a finding so the existing severity range still
exhaustively covers the possible AlignFinding severities.

Verification:
- `markdown-link-check cmd/wfctl/dsl-reference-embedded.md` → 0 dead links.
- `GOWORK=off go test -race -count=1 ./cmd/wfctl/...` → PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(iac): R1 review — load plan + cfg once; clean R-A10 Info log fmt; clarify PlanDiagnosticSeverity doc (Copilot review)

- runInfraAlignChecks loads --plan once and reuses the parsed *IaCPlan
  for R-A7 and R-A10 (was: 2x file open + JSON decode).
- alignLoadProviders now takes *alignContext (built once via
  buildAlignContext in runInfraAlignChecks) instead of re-loading the
  YAML from disk. Test seam updated.
- R-A10 Info log identifies plan-level diagnostics as `<provider>/plan`
  (matches the documented `R-A10 [info] <provider>/<resource>: ...`
  format) instead of the redundant `<provider>/<provider>:plan: ...`.
  Table label still uses `<provider>:plan`.
- PlanDiagnosticSeverity doc comment now spells out the exit-code
  mapping: Error always FAILs; Warning is advisory by default but FAILs
  under --strict; Info never affects exit code.

New test: TestCheckRA10_PlanLevelInfoDiagnostic_LogsAsProviderSlashPlan
covers the log-format fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
…#531)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): jitsubst.ResolveSpec for per-module deferred substitution

T5.1 — new package iac/jitsubst hosts ResolveSpec, the apply-time helper
that resolves ${VAR}, ${MODULE.field}, and ${MODULE.id} references in a
ResourceSpec.Config tree. Strict semantics: every reference MUST resolve
or the helper returns an error and the input spec unchanged. ${MODULE.id}
prefers the in-apply replaceIDMap (W-3b/T3.4) over syncedOutputs so
cascade-replace ProviderID propagation is authoritative over potentially
stale state outputs.

Used by W-5 T5.2 (wire into wfctlhelpers.ApplyPlan) and T5.3 (wire into
doReplace). No behavior change yet — helper has no in-tree caller.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): ApplyPlan resolves JIT substitutions per action

T5.2 — wfctlhelpers.ApplyPlan now invokes jitsubst.ResolveSpec on every
action.Resource before dispatch. The substitution sees:

  - result.ReplaceIDMap (this-apply Replace ProviderIDs from doReplace)
  - syncedOutputs (state-side outputs from action.Current entries +
    this-apply outputs from successful prior dispatches in the same loop)
  - os.LookupEnv (production env source)

syncedOutputs is pre-populated from every action.Current at start-of-apply
so a NEW action can reference an in-state sibling module's outputs from
action zero. After each successful dispatch (when result.Resources grows),
the new entry is folded into syncedOutputs via flattenOutputs — flat-copy
of Outputs with the canonical 'id' key shadowed by ProviderID so
${MODULE.id} resolves predictably across new and existing modules.

JIT failure surfaces as a per-action ActionError with the canonical
'jit substitution:' prefix; the offending action SKIPS dispatch
(unresolved spec must not reach the driver). The loop continues to the
next action — best-effort apply contract preserved.

Tests in apply_jit_test.go cover: 2-create plan with B referencing
${A.id}, pre-syncing from action.Current, unresolved-ref skipping
dispatch with canonical prefix, no-refs passthrough, and loop-continues-
after-per-action-JIT-error. T5.3 wires Replace cascade.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): ApplyPlan replace cascade propagates new ProviderID

T5.3 — locks the Replace-cascade contract via apply_replace_cascade_test.go
and updates doReplace godoc to document the cascade hookup explicitly.

Two scenarios:
- ReplaceCascade_DependentCreateGetsNewParentID: [Replace parent, Create
  dependent] where dependent's Config has ${parent.id}; dependent's
  Create receives the new ProviderID.
- ReplaceCascade_DependentReplaceGetsNewParentID: extends to Replace-on-
  Replace shape; dependent's post-Delete Create still sees the resolved
  parent.id, while its own Delete continues to target the OLD ProviderID
  via action.Current (JIT does not alter action.Current).

The behavior was already operational after T5.2's loop-level
jitsubst.ResolveSpec call: doReplace populates result.ReplaceIDMap
inside iteration N, and the loop's pre-dispatch substitution at
iteration N+1 sees the fresh entry. T5.3 adds the assertion + doc
that locks this ordering as a contract; future refactors that move
substitution out of the loop OR delay ReplaceIDMap population will
break these tests loudly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plan SchemaVersion=2 when JIT substitution required

T5.4 — runInfraPlan now stamps plan.SchemaVersion conditionally:

  - V1 (1) baseline when no plan action's resolved Resource.Config
    carries a JIT-style ${MODULE.field} or ${MODULE.id} reference.
  - V2 (2) when any action does — older wfctl binaries reading the
    persisted plan reject with the existing 'newer than supported'
    diagnostic at runInfraApply.

Detection is centralized in jitsubst.HasModuleRefs (recursive walk over
map[string]any / []any / string), gated by a simple regex that requires
non-empty segments on both sides of the dot — plain ${VAR} env-var
refs (no dot) do NOT trigger the bump, so the common operator
secret-via-env workflow stays at V1.

cmd/wfctl/infra.go gains:
  - infraPlanSchemaVersionV1 (=1) and infraPlanSchemaVersionJIT (=2)
    constants alongside the existing infraPlanSchemaVersion (=2, max
    readable). The 'max readable' constant ticks up with every schema
    bump; V1/JIT name the per-plan choice runInfraPlan makes.
  - planRequiresJITSubstitution(plan) helper that walks plan.Actions
    once via jitsubst.HasModuleRefs.

Tests:
  - iac/jitsubst/jitsubst_test.go — 8 new HasModuleRefs cases (env-var
    is false, .field/.id are true, nested map/slice, nil-safe,
    malformed refs are false, mixed-string is true).
  - cmd/wfctl/infra_plan_schema_test.go — V1 baseline (env-var only),
    V2 for both .field and .id, V1 negative for env-var-only, and
    persisted-plan SchemaVersion=2 end-to-end (where T5.5's rejection
    has not yet landed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): reject persisted JIT-style plans (canonical path is apply-without-plan)

T5.5 — runInfraPlan now refuses to write a plan.json via -o when the
plan is JIT-style (SchemaVersion = infraPlanSchemaVersionJIT). The exact
operator-facing error string is contract-stable:

  error: plan -o requires JIT-free config; this plan references
  ${MODULE.field} which only resolves at apply time. Use
  'wfctl infra apply' (without --plan) for JIT-aware applies.

Stdout-only emission (no -o) of a JIT-style plan is permitted — it's a
preview, not a contract. The guard fires AFTER plan computation so the
operator sees the plan table on stdout before the rejection at the
persistence step.

Tests in cmd/wfctl/infra_plan_jit_reject_test.go (4 cases):
  - exact-string match (the strict contract)
  - stdout-only JIT plan permitted (negative-control on the guard scope)
  - persisted non-JIT plan permitted (V1 happy path unchanged)
  - canonical-keyword substring match (operator-search-engine safety net)

Removed T5.4's now-redundant TestInfraPlan_SchemaVersionV2_PersistedToFile-
Matches — its happy path has been replaced by T5.5's strict rejection
contract; SchemaVersion stamping correctness is still locked by the
helper-direct tests in the same file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): T5.7 runtime-launch-validation — JIT subst + plan rejection

W-5 Task T5.7: per the plan's 'Files: none' instruction, this is a
documentation-only commit recording the runtime-launch-validation
transcript against the built wfctl binary.

# Step 1: Build

  $ GOWORK=off go build -o /tmp/wfctl-jit-validation ./cmd/wfctl
  (no output, exit 0)

# Step 3: T5.5 persisted-JIT-plan rejection (build-binary verification)

Fixture (infra.yaml):
  modules:
    - name: app
      type: infra.container_service
      config:
        env_vars:
          VPC_UUID: "${vpc.id}"
          DB_HOST: "${pg.private_ip}"

  $ wfctl infra plan -o /tmp/jit-validation/plan.json --config infra.yaml
  Infrastructure Plan — infra.yaml

  + create  app  (infra.container_service)

  Plan: 1 to create, 0 to update, 0 to destroy.
  error: error: plan -o requires JIT-free config; this plan references
  ${MODULE.field} which only resolves at apply time. Use 'wfctl infra
  apply' (without --plan) for JIT-aware applies.
  EXIT=1

The doubled 'error: error:' prefix is because cmd/wfctl/main.go's
top-level error reporter prepends 'error: ' to every command failure
(line 211: `fmt.Fprintf(os.Stderr, "error: %v\n", rootErr)`), AND
the team-lead-specified literal also begins with 'error: '. Per
implementer brief: 'Match exactly.' Flagging here for visibility — a
follow-up could either drop the prefix from the literal or special-case
main.go's wrapping. Not addressing in W-5.

# T5.5 inverse: stdout-only JIT plan permitted (no rejection)

  $ wfctl infra plan --config infra.yaml
  Infrastructure Plan — infra.yaml

  + create  app  (infra.container_service)

  Plan: 1 to create, 0 to update, 0 to destroy.
  EXIT=0

# T5.4 V1 baseline: non-JIT config persisted to disk still works

  Fixture (infra-novars.yaml):
    modules:
      - name: app
        type: infra.container_service
        config:
          cidr: "10.0.0.0/16"

  $ wfctl infra plan -o plan-novars.json --config infra-novars.yaml
  Plan: 1 to create, 0 to update, 0 to destroy.
  Plan saved to /tmp/jit-validation/plan-novars.json
  EXIT=0

  $ jq .schema_version plan-novars.json
  1                          ← V1 (T5.4 stamp logic working)

# Step 2: apply with ${A.id} reference — covered by in-tree tests

T5.7 plan §Step 2 specifies running 'apply against fixture with ${A.id}
reference' against the built binary. wfctl infra apply requires a fully-
configured iac.provider plugin (manifest, plugin.json, gRPC binary), so
running this end-to-end against an ad-hoc fixture is non-trivial without
W-7's conformance harness. The same code path is fully covered by:

  - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_TwoCreate_BSpec-
    ResolvesAID (T5.2 — basic create+create cascade)
  - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace-
    Cascade_DependentCreateGetsNewParentID (T5.3 — replace+create cascade)
  - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace-
    Cascade_DependentReplaceGetsNewParentID (T5.3 — replace+replace cascade)
  - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_UnresolvedRef_-
    RecordsActionErrorAndSkipsDispatch (T5.2 — failure path)

These exercise the SAME wfctlhelpers.ApplyPlan code path the binary
invokes; the unit-test fake driver is functionally equivalent to a v2
plugin from ApplyPlan's perspective. A binary-level apply smoke test is
deferred to W-7's conformance gate (which adds the DO smoke test against
real-cloud fixtures).

# Verification

Tests pass:
  GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... ./platform/... ./cmd/wfctl/... ./module/...
  → all packages OK.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(iac): T5.5 review — exact plan-literal error string

Spec-reviewer caught that the shipped error string in cmd/wfctl/infra.go
diverged from the plan literal at docs/plans/2026-05-03-iac-conformance-
and-replace.md §T5.5 line 2104. The kickoff brief I worked from
substituted a wordier alternate string; team-lead confirmed the plan
literal is the correct contract.

Three fixes:

1. cmd/wfctl/infra.go:297 — replace fmt.Errorf literal with
   errors.New(<plan literal>). No leading 'error:' prefix — that's
   prepended by cmd/wfctl/main.go's top-level error wrapper, so the
   doubled 'error: error:' artifact in T5.7's runtime transcript is
   resolved as a side benefit. Switched to errors.New per spec-reviewer
   suggestion: avoids govet's no-format-verbs noise on the no-substitution
   case and is the canonical Go pattern for fixed-string sentinels.

2. cmd/wfctl/infra_plan_jit_reject_test.go:16 — expectedJITRejectError
   constant updated to the plan literal. Comment block expanded to
   document the literal's source + the leading-error-prefix nuance for
   future readers.

3. cmd/wfctl/infra_plan_jit_reject_test.go:125 — substring keyword
   list in TestInfraPlan_RejectionErrorContainsCanonicalKeywords
   updated to keys actually present in the new literal:
   'JIT resolution', 'persisted plan.json', 'wfctl infra apply',
   '-o/--plan'. The exact-match test above is the strict contract;
   this one stays as the operator-search-engine safety net.

Verified end-to-end via rebuilt wfctl binary against the same fixture
from T5.7's transcript:

  $ wfctl infra plan -o plan.json --config infra.yaml
  Infrastructure Plan — infra.yaml
  + create  app  (infra.container_service)
  Plan: 1 to create, 0 to update, 0 to destroy.
  error: this plan requires JIT resolution; persisted plan.json is not
  supported. Run 'wfctl infra apply' directly without -o/--plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): ADR 008 — JIT substitution at dispatch loop, not per-helper

Records the architectural choice resolved during T5.3: jitsubst.ResolveSpec
runs once at the wfctlhelpers.ApplyPlan dispatch loop (immediately before
each dispatchAction call), NOT inside per-action helpers. doReplace
populates result.ReplaceIDMap; the next iteration's pre-dispatch
ResolveSpec consumes it. This honors the Replace-cascade contract via
loop-ordering invariant rather than via an explicit substitution call
inside doReplace.

Plan §T5.3 specified inner-resolve in doReplace; T5.2's loop-level call
already covered the cascade case. Threading syncedOutputs through
dispatchAction → doReplace would have made the helper boundary leaky for
one call site. Option 1 (test-only T5.3 + this ADR) chosen by team-lead
over option 2 (inner-resolve rework) on 2026-05-04 after spec-reviewer
escalation.

Cascade contract is locked by apply_replace_cascade_test.go's two
scenarios; this ADR ensures future refactors that move substitution out
of the loop OR delay ReplaceIDMap population see the trade-off rather
than rediscovering it via git bla…
intel352 added a commit that referenced this pull request May 4, 2026
…W-6 of 12) (#532)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): --allow-replace flag for per-resource protected-replace opt-in

W-6/T6.1: gate replace and delete actions targeting `protected: true`
resources behind a per-resource opt-in flag at apply time. Without
--allow-replace=<csv>, the apply errors before any provider Apply or
wfctlhelpers.ApplyPlan dispatch with the design-spec literal
("resource %q is protected: true and would be %sd; pass
--allow-replace=%s to override"). With the resource name listed in
--allow-replace, the protection is bypassed for that resource only.

Gate fires on both dispatch paths — live-diff (applyWithProviderAndStore)
and --plan (applyPrecomputedPlanWithStore) — so the safety guarantee
holds regardless of plan provenance. The protected flag is sourced from
Resource.Config for replace actions and Current.AppliedConfig for delete
actions (where platform.differ leaves Resource.Config empty).

The allow-set is published via package-level applyAllowReplaceSet
(matching the computeInfraPlan / applyV2ApplyPlanFn seam pattern) and
reset to nil at the top of every runInfraApply via deferred cleanup —
override authorization must not leak across runs.

T6.2 will swap this fail-fast for an aggregated multi-blocker report
with a copy-paste --allow-replace=name1,name2,... value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply batch-reports protected-replace blockers with copy-paste flag

W-6/T6.2: validateAllowReplaceProtected now walks the entire plan and
aggregates ALL replace/delete blockers (resources annotated
`protected: true` and not in --allow-replace) into a single error,
instead of failing fast on the first one. The operator sees the
complete blocker set in one apply attempt and gets a pre-formatted
copy-paste flag value to authorize them all at once:

  plan would require destructive action on N protected resource(s):
    <name1> (replace)
    <name2> (delete)
    ...
  to authorize, re-run with:
    --allow-replace=<name1>,<name2>,...

Names and the csv preserve plan-action declaration order so output is
deterministic. The single-blocker case still emits the batch format —
operator-facing UX is consistent regardless of blocker count, which
matters for automation pinning the copy-paste flag pattern.

Per plan T6.2 "(or apply-time check; pick one — apply is cleaner since
plan output already shows all actions)" — the gate stays in
cmd/wfctl/infra_apply.go rather than platform/differ.go::ComputePlan.
ComputePlan remains plugin-agnostic; the protected-resource policy is
a wfctl-side operator-experience concern.

T6.1's single-line error literal is superseded; T6.1 tests are
updated to assert on the operator-facing essentials (resource name +
copy-paste flag value) rather than the legacy literal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document --allow-replace flag

W-6/T6.4: add a dedicated `infra apply` subsection to docs/WFCTL.md
covering the protected-resource gate, the new --allow-replace=<csv>
override, and its relation to the older --allow-protected-prune flag.
Includes the canonical aggregated-blocker error format from T6.2 so
operators know what to expect (and what to copy-paste) when the gate
fires, plus three runnable examples (standard apply, --plan apply,
authorized Replace cascade).

Per W-4 team-lead Option-3, mdformat is waived; markdown-link-check
is the meaningful baseline. WFCTL.md links all resolve clean against
the local repo (3 internal/external refs). Pre-existing dead links
elsewhere in docs/ are unchanged by this commit and out of W-6 scope.

Verification:
  markdown-link-check docs/WFCTL.md → 0 errors
  GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \
    ./platform/... ./cmd/wfctl/... ./module/... → all pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(merge): restore T6.1 + T6.2 helpers lost during cascade-merge with -X theirs

* fix(iac): R1 review — drop redundant ComputePlanVersionDeclarer assertion at apply call site (Copilot review)

DispatchVersionFor is documented to centralise the type-assertion plus
the default-to-v1 fallback so call sites pass the raw provider value
rather than re-asserting the optional interface. The v2 dispatch
condition reverts to the canonical form:

    if wfctlhelpers.DispatchVersionFor(provider) == wfctlhelpers.DispatchVersionV2 { ... }

No behavior change: a provider that doesn't implement the interface,
or returns anything other than "v2", still routes to the legacy v1
provider.Apply path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 4, 2026
…9) (#534)

* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type

* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel

* feat(iac): wfctl infra plan writes InputSnapshot to plan.json

* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash

* feat(iac): wfctl infra plan warns when plan.json not in .gitignore

* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)

* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.

* feat(iac): add refreshoutputs.Refresh — read-only state output refresh

T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add wfctl infra refresh-outputs subcommand

T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(iac): concurrency stress test for refreshoutputs.Refresh

T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(wfctl): document infra refresh-outputs subcommand

T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields

* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)

* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.

* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.

* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.

* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)

* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery

* feat(iac): doUpdate + doDelete actions

* feat(iac): doReplace populates ApplyResult.ReplaceIDMap

* feat(iac): add diff cache with LRU eviction + corruption recovery

* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy

Three independent review-fix bundles:

T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
  inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
  fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
  imports. Future Compute-algorithm changes (prefix length, hash) now
  re-align both test files automatically — keeps the cross-package
  fixture parity guaranteed.

T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
  TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
  + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
  of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
  assertion — distinct code path from the existing
  ok-but-SupportsUpsert==false test. Compile-time premise check
  ensures the test stays meaningful if a future refactor lifts
  SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
  the in-package vs at-the-ActionError-boundary distinction explicit.
  External callers reading [interfaces.ApplyResult].Errors lose
  errors.Is matching at the string-conversion boundary; the canonical
  "upsert: read after conflict:" prefix is the discriminant. Also
  documents the single-pass recovery contract (recovery Update that
  itself returns ErrResourceAlreadyExists surfaces unchanged rather
  than retriggering the recovery loop).

T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
  now also asserts len(result.Resources) == 1 on the success path —
  locks the resource-append contract so a regression that skipped the
  append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
  TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
  shape: empty ProviderID flows to driver, no synthesized precondition
  error, deleteCount==1 (latent bug-fix from design — the v1 path
  silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
  section documenting the decompose-then-prefix rule (bare on simple
  actions; "upsert: ..." / "replace: ..." on decomposing paths) so
  future reviewers don't suggest "let's add prefixes for consistency."

* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace

Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.

Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.

Failure shape:
  return fmt.Errorf("replace: canceled after delete: %w", err)

Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.

New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.

Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.

Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.

* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows

T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:

1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
   as an amortization-only optimization (not correctness mechanism),
   the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
   filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
   the corruption recovery contract (silent eviction + once-per-process
   info log), the plugin-downgrade safety property, and the rev3
   "all CI workflows set :memory: explicitly" statement plus a list
   of the affected workflow files.

2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
   every workflow that runs `go test` or `wfctl`:
   - .github/workflows/ci.yml          (test + lint jobs)
   - .github/workflows/benchmark.yml   (performance benchmarks)
   - .github/workflows/pre-release.yml (pre-release tests)
   - .github/workflows/release.yml     (release tests)
   - .github/workflows/dependency-update.yml (post-update test gate)

   Workflow files that don't invoke go test / wfctl are not modified
   (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
   osv-scanner.yml, test-dispatch.yml).

Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.

Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.

* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup

Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:

- Minor 1 (atomic Put, worth-doing production improvement): Put now
  uses write-temp-then-rename. POSIX rename(2) is atomic on the same
  filesystem, so a process crash mid-write leaves either the prior
  contents or the new contents — never a partial write. The
  corruption-recovery path in Get is still the safety net for cross-
  filesystem renames or NFS edge cases that don't honor atomicity.
  In production this means corruption recovery essentially never
  fires from native crashes. The .json extension filter in
  maybeEvict already excludes .tmp orphans, so no additional
  filtering needed. On rename failure, best-effort cleanup of the
  temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
  language. Linux honors XDG_CACHE_HOME; macOS uses
  ~/Library/Caches; Windows uses %LocalAppData%. The previous
  comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
  explaining the tags are for log/transcript serialization, not
  cache keying — keyFingerprint uses NUL-separated string concat,
  not JSON marshaling. Future readers checking the fingerprint
  shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
  `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
  end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
  meaningless — no code path creates a file with `*` in its name.
  Likely leftover from earlier debugging. Removing it lets us drop
  the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
  TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
  resolution assumption and listing the supported filesystems
  (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
  filesystems (FAT32, SMB) are explicitly out of scope.

Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
  concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
  in the package godoc already sets the expectation."

* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)

* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)

W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.

BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.

Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.

cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.

* feat(iac): wfctl infra apply threads provider into ComputePlan

* test(iac): update cross-package fakes for ComputePlan provider arg

W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.

Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).

Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.

Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
  in first-reference-in-`desired` order, not iac.provider declaration
  order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
  (loop over groupOrder is empty when no specs reference any provider;
  use `wfctl infra destroy --dry-run` to preview teardown).

* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace

W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).

ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:

  - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
    is true (the latter closes design issue C — pre-W-3b ForceNew was
    silently downgraded to update);
  - update,  when DiffResult.NeedsUpdate is true and replace did not
    fire;
  - skip,    when neither flag is set.

Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.

Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).

Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):

  - TestComputePlan_NeedsReplaceEmitsReplaceAction
  - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
  - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
  - TestComputePlan_DiffReturnsNoChanges_EmitsNothing
  - TestComputePlan_NilProvider_FallsBackToConfigHash
  - TestComputePlan_NilDriver_FallsBackToConfigHash
  - TestComputePlan_DriverDiffError_PropagatesAsError

platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).

Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
  fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
  by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
  phrasing.

cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
  rely on the post-adopt ComputePlan emitting update; pre-W-3b that
  was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
  panicking — the refresh-outputs test fixture only exercises Read.

* perf(iac): ComputePlan consults diffcache before invoking provider.Diff

W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).

Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.

platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).

Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):

- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
  pure function; new TestParseConcurrencyEnv table-driven test covers
  empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
  TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
  N=5 modification candidates, asserts driver.diffCount.Load() == 5
  and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
  TestComputePlan_DriverReturnsNilDiff_EmitsNothing.

And T3.6e adversarial-review minor cleanups:

- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
  errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
  the ResourceDriver instances it returns MUST be safe for concurrent
  use.

New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
  inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
  forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
  re-dispatches)

* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)

Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.

Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.

* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)

Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.

Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.

Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)

Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).

* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion

W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.

rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.

Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.

New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
  interface + DispatchVersionV2 const + DispatchVersionFor helper.
  Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
  ProviderVersion fields and ComputePlanVersion() method so tests
  drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
  iacProvider.computePlanVersion alongside existing
  capabilities.iacProvider.name; findIaCPluginDir returns the
  version; readIaCPluginComputePlanVersion is the load-time helper;
  remoteIaCProvider stores the value and exposes it via
  ComputePlanVersion() to satisfy the optional interface. (Re-reads
  plugin.json once per provider load rather than threading through
  loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
  for the existing test override; cost is one tiny os.ReadFile vs
  the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
  test seam + dispatch branch in applyWithProviderAndStore. Drift
  report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
  TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
  routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
  (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
  PrintsDriftReport (drift wiring asserted via writer-buffer
  substring). v1 fixture v1RecordingProvider intentionally does NOT
  implement ComputePlanVersionDeclarer to prove the dispatcher's
  "default to v1 when un-declared" branch.

* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)

Code-reviewer flagged 3 IMPORTANT items in T3.7:

1. Comment/code mismatch on drift-report timing. The comment promised
   "Run on success or partial failure" but the code gated on
   `err == nil` (success only). The contract the comment described
   is the more useful behavior — operators most need the
   stale-input diagnostic when an apply fails ("which input went
   stale during the failed apply?"). Without it, the failure error
   and the "what changed" context are disconnected.

   Fix: gate on `result != nil` instead of `err == nil`.
   printDriftReportIfAny already no-ops on empty/nil reports so
   unconditional-on-result-non-nil is safe.

2. No test for the drift-on-partial-failure path. Added
   TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
   which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
   and asserts both: (a) the err propagates, AND (b) the drift
   report still reaches the writer.

3. Optional-interface coverage gap. Two semantically-different "v1"
   paths exist:
   - Path A: provider doesn't implement ComputePlanVersionDeclarer
     at all → type-assert fails → legacy. Covered by
     v1RecordingProvider.
   - Path B: provider implements interface but ComputePlanVersion()
     returns "" (the realistic mid-transition state for v1 plugins
     after the SDK update lands but before they migrate) → type-
     assert succeeds, DispatchVersionFor returns "v1" → legacy.
     Was untested.

   Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
   using iactest.NoopProvider{DispatchVersion: ""}, which always
   implements the interface (the method exists on the type). Pins
   Path B specifically.

Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.

* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion

cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.

v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.

Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).

Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.

* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)

W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.

# ADR 007 — non-trivial deviation from plan-literal

Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:

  1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
     test-helper package, not a runnable binary.
  2. Real-gRPC runtime validation lands in P-DO when DO sets
     computePlanVersion: v2 in its plugin.json.
  3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
     vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
  4. W-7 conformance suite is the recurring cross-PR gRPC harness.

Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.

# Tests

- TestApply_V2_LoaderSeamDispatch_EndToEnd:
  - Writes a real config + filesystem state seeded with vpc
    region=nyc3 (under iacStateRecord shape).
  - Sets desired region=nyc1.
  - Substitutes the resolveIaCProvider seam to return a Go provider
    that declares v2 + has a driver returning NeedsReplace=true.
  - Calls applyInfraModules (the production runInfraApply
    entrypoint) and asserts driver.diffCount == 1, deleteCount ==
    1, createCount == 1, plus exact identity of the deleted
    ProviderID and the created Config["region"].

- TestApply_V2_LoaderSeam_DriftReportPrinted:
  - Same loader-seam setup + applyV2ApplyPlanFn substitution
    returning InputDriftReport with one entry.
  - Captures os.Stderr and asserts the FormatStaleError block
    reaches the operator (drift-report wiring T3.7 added is
    end-to-end alive in the v2 loader path).

# Test infrastructure

- cmd/wfctl/main_test.go: NEW TestMain forces
  WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
  scoped via getDiffCache lazy init) doesn't observe stale entries
  from a developer's local ~/.cache/wfctl/diff/ as false-positive
  cache hits skipping driver Diff dispatch. Same pattern as
  platform/main_test.go from T3.6f. Caught during dev when the
  end-to-end test failed in the full cmd/wfctl test run but passed
  in isolation.

# Bug-class context

The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.

* docs(pr): note bugs incidentally fixed by W-3b

W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.

Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:

1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
   converter (commit 40e07a1; surfaced during T3.9 runtime
   validation; v1 plugins never tripped it)

Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.

* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)

Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.

This commit:

- Status header updated to "Accepted (with extensive deliberation
  history — see Decision history section)".
- Context section adjusted to preface the deliberation history
  rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
  verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
  flips mid-execution, reviewer + implementer should refuse to
  proceed and force explicit disambiguation. Both reviewers
  endorsed this hold during transition 4; the strict-interpretation
  invariant from using-superpowers was the operative rule.

Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.

Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."

* feat(iac): add optional ProviderPlanner interface for v2 plugins (rev10 user override)

* ci(iac): cross-plugin build gate + ADR 009 (ProviderPlanner included per user override)

* docs(iac): document ProviderPlanner adapter author guide

* docs(adr): restore plan-literal Context para 1 in ADR 009 (T9.2 spec-review fix)

* docs(iac): point ProviderPlanner author guide at real ProviderIDValidator precedent (T9.3 quality fix)

* ci(iac): add fail-fast=false, concurrency, go.mod/go.sum paths to cross-plugin gate (T9.2 quality fix)

* fix(iac): R2 review — correct ProviderPlanner doc/ADR/test/CI findings (Copilot review)

Six Copilot inline findings + CodeQL workflow-permissions warning:

1. docs/iac/providerplanner.md: ComputePlan in v0.21.0 dispatches
   driver.Diff directly (in platform/differ.go); it does NOT call
   IaCProvider.Plan. The reverse is true (Plan delegates to ComputePlan
   in some implementations). Updated the call-chain description and
   the illustrative dispatch-site code block to reference the actual
   file (platform/differ.go) so adapter authors don't follow the wrong
   call chain.
2. docs/adr/009: replaced the personal email reference with "the
   workspace owner" so ADR provenance doesn't embed PII.
3. interfaces/iac_provider_planner_test.go: now actually verifies the
   additivity claim by reusing the package's existing mockProvider as
   the negative case — runtime assertion confirms mockProvider does
   NOT satisfy ProviderPlanner. Moved file to interfaces_test package
   to share fixtures.
4. .github/workflows/cross-plugin-build-test.yml: explicit `permissions:
   contents: read` (CodeQL workflow-permissions guidance); added
   `env: GOPRIVATE/GONOSUMCHECK` matching ci.yml + codeql.yml so
   downstream plugin builds resolve github.com/GoCodeAlone/* deps
   consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants