feat(iac): IaCPlan schema + plan-stale diagnostic (W-1 of 12)#523
Merged
Conversation
…olvedConfigHash + DriftEntry type
…der with preservation sentinel
…ComputeDrift sentinel-honoring
Contributor
There was a problem hiding this comment.
Pull request overview
Adds plan-format schema fields and input-fingerprint drift diagnostics to strengthen IaC plan/apply conformance, especially for the persisted wfctl infra apply --plan path.
Changes:
- Extend
interfaces.IaCPlan/interfaces.PlanActionwithSchemaVersion,InputSnapshot,ResolvedConfigHash, and a sharedDriftEntrytype. - Introduce
iac/inputsnapshotto compute env-var fingerprints, drift reports, and canonical “plan stale” diagnostics (including a typed sentinel error). - Wire snapshot capture into
wfctl infra plan, drift checking intowfctl infra apply --plan, and add a heuristic warning whenplan.jsonisn’t gitignored.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| platform/differ.go | Populate per-action ResolvedConfigHash on create/update plan actions. |
| platform/differ_test.go | Add coverage asserting ResolvedConfigHash is set correctly per action. |
| interfaces/iac_state.go | Add SchemaVersion, InputSnapshot, ResolvedConfigHash, and DriftEntry to public plan/state interfaces. |
| interfaces/iac_state_test.go | JSON round-trip tests for the newly added schema fields. |
| iac/inputsnapshot/snapshot.go | Implement fingerprint computation, OS/env providers, and tolerant sentinel handling. |
| iac/inputsnapshot/snapshot_test.go | Tests for fingerprint format, determinism, unset handling, and sentinel pass-through. |
| iac/inputsnapshot/compute_drift.go | Compute drift entries from plan/apply snapshots, honoring preservation sentinel. |
| iac/inputsnapshot/compute_drift_test.go | Tests for drift detection and sentinel suppression behavior. |
| iac/inputsnapshot/diagnostic.go | Canonical formatting for plan-stale drift diagnostics. |
| iac/inputsnapshot/errors.go | Typed sentinel error for env-var drift (plan-stale) detection. |
| cmd/wfctl/infra_inputsnapshot.go | Scan config for ${VAR}/$VAR refs and compute InputSnapshot for infra plans. |
| cmd/wfctl/infra.go | Write snapshot/schema to plan output; warn on missing gitignore; enforce drift check during apply --plan. |
| cmd/wfctl/infra_plan_inputsnapshot_test.go | Ensure wfctl infra plan -o persists InputSnapshot and SchemaVersion. |
| cmd/wfctl/infra_plan_gitignore.go | Heuristic .gitignore coverage check for plan output warnings. |
| cmd/wfctl/infra_plan_gitignore_test.go | Tests for warning emission/suppression based on .gitignore contents. |
| cmd/wfctl/infra_apply_plan_test.go | Persisted-plan path test ensuring drift triggers typed error + per-key diagnostic and blocks provider.Apply. |
Comments suppressed due to low confidence (1)
cmd/wfctl/infra.go:1104
- SchemaVersion is written to plan.json, but the --plan apply path never validates it. To avoid older wfctl binaries accidentally applying a future/incompatible plan format, add a check after loading the plan (e.g., reject SchemaVersion > supported) before proceeding with drift/hash validation.
if planFile != "" {
plan, err := loadPlanFromFile(planFile)
if err != nil {
return err
}
// Validate that the plan is still current relative to the config.
desired, err := parseInfraResourceSpecsForEnv(cfgFile, envName)
if err != nil {
return fmt.Errorf("parse infra resource specs: %w", err)
}
if plan.DesiredHash == "" {
return fmt.Errorf("plan file has no hash — regenerate with: wfctl infra plan -o plan.json")
}
// Check the input-fingerprint drift first so the operator gets a
// per-key diagnostic instead of the generic config-hash mismatch.
// (Env-var changes are a strict subset of config-hash differences;
// flagging them here yields the actionable message.) Names list is
// derived from plan.InputSnapshot keys — no separate InputNames field.
if len(plan.InputSnapshot) > 0 {
names := make([]string, 0, len(plan.InputSnapshot))
for k := range plan.InputSnapshot {
names = append(names, k)
}
applySnap := inputsnapshot.Compute(names, inputsnapshot.OSEnvProvider)
if drift := inputsnapshot.ComputeDrift(plan.InputSnapshot, applySnap); len(drift) > 0 {
return fmt.Errorf("%w\n%s", inputsnapshot.ErrEnvVarChanged, inputsnapshot.FormatStaleError(drift))
}
}
currentHash := desiredStateHash(desired)
if plan.DesiredHash != currentHash {
return fmt.Errorf("plan stale: config hash mismatch (run wfctl infra plan again)")
}
Comment on lines
+20
to
+34
| // Drift entries are sorted by Name for deterministic output. An empty drift | ||
| // report yields the singular line "plan stale: 0 input(s) changed since plan" | ||
| // — callers should avoid invoking the formatter when no drift exists. | ||
| func FormatStaleError(drift []interfaces.DriftEntry) string { | ||
| sorted := make([]interfaces.DriftEntry, len(drift)) | ||
| copy(sorted, drift) | ||
| sort.Slice(sorted, func(i, j int) bool { return sorted[i].Name < sorted[j].Name }) | ||
|
|
||
| var b strings.Builder | ||
| fmt.Fprintf(&b, "plan stale: %d input(s) changed since plan\n", len(sorted)) | ||
| for _, d := range sorted { | ||
| fmt.Fprintf(&b, " %s: fingerprint %s (plan) → %s (apply)\n", d.Name, d.PlanFingerprint, d.ApplyFingerprint) | ||
| } | ||
| b.WriteString(" hint: ensure all env vars referenced by infra.yaml are exported to both Plan and Apply steps") | ||
| return b.String() |
| // has a different fingerprint at apply time. Callers can match with | ||
| // errors.Is(err, ErrEnvVarChanged) to detect the plan-stale case | ||
| // independently of the human-readable per-key drift message. | ||
| var ErrEnvVarChanged = errors.New("plan stale: env-var changed since plan") |
Comment on lines
+44
to
+70
| // preservedFingerprint is a sentinel value indicating an env-var was set at | ||
| // plan time but is unset at apply time (sub-action cleanup is the canonical | ||
| // case). ComputeDrift (T1.5) skips drift detection for keys whose applySnap | ||
| // value is this sentinel. UNEXPORTED (rev6 — addresses cycle-5 Important on | ||
| // external-bypass channel): NewTolerantEnvProvider is the only sanctioned | ||
| // way to inject the sentinel; external callers cannot defeat drift detection. | ||
| // | ||
| // Cross-function contract: | ||
| // - Compute (this file, in-package) passes the sentinel through unhashed. | ||
| // - NewTolerantEnvProvider (this file) returns the sentinel for plan-time-set | ||
| // but apply-time-unset vars (in-package access to the constant). | ||
| // - ComputeDrift (compute_drift.go, T1.5, same package) honors the sentinel | ||
| // by skipping drift detection for that key. | ||
| const preservedFingerprint = "__plan_time_preserved__" | ||
|
|
||
| // NewTolerantEnvProvider returns an EnvProvider closure used by the | ||
| // in-process apply postcondition (T3.1.5). When a var was set at plan time | ||
| // (present in planSnapshot) but is now unset (sub-action cleanup), the | ||
| // closure returns the in-package preservedFingerprint sentinel so | ||
| // ComputeDrift suppresses the (false-positive) drift entry. For vars | ||
| // genuinely unset at both times, returns ("", false) → Compute drops the | ||
| // key from the resulting map. | ||
| // | ||
| // This is the ONLY sanctioned way to inject the preservation sentinel. | ||
| // Direct callers of Compute with a custom env-provider cannot construct | ||
| // the sentinel value because it is unexported. | ||
| func NewTolerantEnvProvider(planSnapshot map[string]string) func(name string) (string, bool) { |
Comment on lines
+63
to
+66
| func gitignoreCovers(data []byte, base, planAbs, gitignoreDir string) bool { | ||
| ext := filepath.Ext(base) | ||
| scanner := bufio.NewScanner(strings.NewReader(string(data))) | ||
| for scanner.Scan() { |
Comment on lines
+58
to
+60
| // InputSnapshot records every env var name read during ${VAR} substitution | ||
| // at plan time, mapped to a 16-hex-char (64-bit) sha256 prefix of the value. | ||
| // Apply re-computes inputs and prints diagnostic on mismatch. |
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
…view) Empty drift report previously rendered as a 2-line message (header + hint), contradicting the doc comment that promised a singular header line. Gate the hint on len(drift) > 0 so the empty case stays minimal as documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r-facing prefix (Copilot review)
The sentinel was wrapped via fmt.Errorf("%w\n%s", ErrEnvVarChanged,
FormatStaleError(...)), so its "plan stale:" prefix duplicated the
formatter's own "plan stale: %d input(s)..." header. Reduce the sentinel
to a short machine-only marker; FormatStaleError remains the sole owner
of the human-facing prefix. Existing test assertions match
strings.Contains(err.Error(), "plan stale") via the formatter portion of
the wrapped error and continue to pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…boundary not security (Copilot review) Previous comment claimed external callers "cannot defeat drift detection" because the sentinel is unexported, but any caller can return the literal string "__plan_time_preserved__" from a custom env-provider closure. Update both the constant doc and NewTolerantEnvProvider doc to be honest: the unexported boundary is API hygiene, not a security guarantee. Sentinel value unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Copilot review) bufio.NewScanner over strings.NewReader(string(data)) made an extra copy of the .gitignore contents; switch to bytes.NewReader(data) to scan the slice directly. Also check scanner.Err() after the loop — oversized lines (over bufio.MaxScanTokenSize) previously fell through silently as "not covered". Conservative behavior: scan errors return false so an operator-visible warning is emitted rather than silently letting plan.json land in source control. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ot review) Comment said "every env var name read" but inputsnapshot.Compute and OSEnvProvider intentionally omit unset vars from the resulting map. Make the contract explicit: only set vars are fingerprinted; unset-at-plan + unset-at-apply yields no drift entry by design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+72
to
+74
| if strings.HasPrefix(line, "!") { | ||
| continue // negation rules — skip; conservative (warn even if a later rule re-includes) | ||
| } |
Comment on lines
+21
to
+26
| if val == preservedFingerprint { | ||
| // Sentinel from NewTolerantEnvProvider — pass through unhashed | ||
| // so ComputeDrift recognizes the preservation signal. (rev6 — | ||
| // unexported per cycle-5; in-package access only.) | ||
| out[name] = preservedFingerprint | ||
| continue |
Comment on lines
+45
to
+46
| func TestNewTolerantEnvProvider_UnsetButPlanned_ReturnsSentinel(t *testing.T) { | ||
| os.Unsetenv("STAGING_PG_PASSWORD") |
Comment on lines
+73
to
+74
| // ResolvedConfigHash is the SHA-256 of POST-substitution Resource.Config. | ||
| // Apply re-computes per-action and surfaces per-resource diagnostic on mismatch. |
…(Copilot review) Heuristic skips !-prefixed negation rules and returns on first positive match, so a "*.json" then "!plan.json" pattern silently passes. Acceptable for a nudge-not-enforce warning; document the limitation in the negation branch comment so future maintainers know the boundary. Full last-matching-rule-wins semantics or git check-ignore shell-out are out of scope for W-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n impossible (Copilot review) Previously a real env var whose value happened to equal "__plan_time_preserved__" would be treated as a preservation sentinel and silently suppress drift detection. Embed a NUL byte: POSIX exec(3) and Windows CreateProcess both reject NUL inside env values, so no var the OS delivers to a Go process can collide with the constant. In-package call sites (Compute, NewTolerantEnvProvider, ComputeDrift) compare by string equality against the constant — value change is transparent to them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…v (Copilot review)
Previous test called os.Unsetenv("STAGING_PG_PASSWORD") without restoring
the prior value, leaking state across the test process and creating
order-dependence with any other test that reads STAGING_PG_PASSWORD.
Switch to a test-unique env var name (WFCTL_TEST_INPUTSNAPSHOT_UNSET_KEY)
the test never sets, so no cleanup is needed and there is no cross-test
state leak.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment said "SHA-256 of POST-substitution Resource.Config" but didn't specify lower-case-hex encoding (no "sha256:" prefix) or the empty-string short-circuit when the config map is empty (platform.ConfigHash behavior). Make the contract explicit so downstream consumers don't guess. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+38
to
+49
| func TestPlanAction_ResolvedConfigHashField(t *testing.T) { | ||
| a := PlanAction{Action: "create", ResolvedConfigHash: "sha256:abc"} | ||
| data, err := json.Marshal(a) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| var got PlanAction | ||
| if err := json.Unmarshal(data, &got); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if got.ResolvedConfigHash != "sha256:abc" { | ||
| t.Errorf("ResolvedConfigHash: got %q", got.ResolvedConfigHash) |
Comment on lines
+1086
to
+1090
| // Check the input-fingerprint drift first so the operator gets a | ||
| // per-key diagnostic instead of the generic config-hash mismatch. | ||
| // (Env-var changes are a strict subset of config-hash differences; | ||
| // flagging them here yields the actionable message.) Names list is | ||
| // derived from plan.InputSnapshot keys — no separate InputNames field. |
Comment on lines
+20
to
+26
| // fingerprintForTest matches inputsnapshot.Compute's fingerprint format | ||
| // (16-hex-char sha256 prefix) so tests can construct expected plan-time | ||
| // snapshots without depending on the concrete env-provider closure. | ||
| func fingerprintForTest(value string) string { | ||
| sum := sha256.Sum256([]byte(value)) | ||
| return hex.EncodeToString(sum[:])[:16] | ||
| } |
Comment on lines
+105
to
+111
| // Scanner errors (e.g. line longer than bufio.MaxScanTokenSize) cause | ||
| // silent fall-through if not checked. Conservative: treat scan failure | ||
| // as not-covered, which surfaces a warning the operator can investigate | ||
| // rather than silently letting plan.json land in source control. | ||
| if err := scanner.Err(); err != nil { | ||
| return false | ||
| } |
Comment on lines
+44
to
+54
| func TestNewTolerantEnvProvider_UnsetButPlanned_ReturnsSentinel(t *testing.T) { | ||
| // Use a test-unique env-var name to avoid colliding with anything the | ||
| // process or other tests might rely on; we never set or unset it, so | ||
| // no cleanup is required and there is no cross-test state leak. | ||
| const key = "WFCTL_TEST_INPUTSNAPSHOT_UNSET_KEY" | ||
| plan := map[string]string{key: "deadbeef00000000"} | ||
| provider := NewTolerantEnvProvider(plan) | ||
| val, ok := provider(key) | ||
| if !ok || val != preservedFingerprint { | ||
| t.Errorf("expected (preservedFingerprint, true) for plan-time-set unset-now var; got (%q, %v)", val, ok) | ||
| } |
…ip (Copilot review) Test fixture used "sha256:abc" but the actual format produced by platform.ConfigHash is a lower-case 64-hex sha256 digest with no prefix. Replace with a realistic 64-hex value so the test mirrors on-disk shape and won't mislead a future validator/refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (Copilot review) runInfraPlan stamps SchemaVersion=1 on every emitted plan, but runInfraApply was not validating it — a future bump (e.g. W-5 JIT plans) would be silently mis-read by an older binary. Add an infraPlanSchemaVersion constant + guard that rejects plans with schema_version > supported, returning a clear "newer than this wfctl supports" message. Plans with SchemaVersion=0 (predating the field) remain accepted for back-compat. Test: TestInfraApplyConsumesPlan_FutureSchemaRejected. Also reuse inputsnapshot.Compute in fingerprintForTest so the test always exercises the production fingerprint algorithm — re-implementing sha256+16-hex inline would silently drift if the scheme changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…parse failure (Copilot review) Previous round added a scanner.Err() check that returned false either way, making the branch a no-op. Change the helper signature to (bool, error) and have warnIfPlanNotGitignored emit a "could not scan ... for plan.json coverage" stderr warning when the underlying bufio.Scanner fails (e.g. a line over bufio.MaxScanTokenSize). The "not covered" warning is suppressed on scan failure so the operator sees the parse error rather than a potentially-misleading coverage warning. Test: TestGitignoreCovers_ScanError_Propagates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…with cleanup (Copilot review) Previous round used a unique env-var name and assumed it was unset, but a hostile CI environment could pre-set it and silently flip the test result. Explicitly Unsetenv at start, restore prior value (if any) via t.Cleanup so the test cannot leak state across the process. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
Comment on lines
+40
to
+62
| for { | ||
| gitignore := filepath.Join(dir, ".gitignore") | ||
| if data, err := os.ReadFile(gitignore); err == nil { | ||
| foundAny = true | ||
| ok, scanErr := gitignoreCovers(data, base, abs, dir) | ||
| if scanErr != nil { | ||
| // Surface parse failure to the operator (line over | ||
| // bufio.MaxScanTokenSize, etc.) rather than silently | ||
| // pretending the file is/isn't covered. | ||
| fmt.Fprintf(w, "warning: could not scan %s for %s coverage: %v\n", gitignore, base, scanErr) | ||
| scanFailed = true | ||
| } | ||
| if ok { | ||
| covered = true | ||
| break | ||
| } | ||
| } | ||
| parent := filepath.Dir(dir) | ||
| if parent == dir { | ||
| break // reached filesystem root | ||
| } | ||
| dir = parent | ||
| } |
Comment on lines
+58
to
+62
| // InputSnapshot records every env var name read during ${VAR} substitution | ||
| // at plan time, fingerprinting only those that were SET (16-hex-char sha256 | ||
| // prefix of the value). Unset vars are omitted from the map; their absence | ||
| // at apply time is therefore not flagged as drift. Apply re-computes inputs | ||
| // and prints diagnostic on mismatch. |
…iew) Previous heuristic walked .gitignore files from the plan dir up to the filesystem root, so an unrelated /tmp/.gitignore or $HOME/.gitignore could shadow the real coverage check (or flake the not-covered tests). Add findGitWorktreeRoot — pure stat-based discovery that walks up looking for a .git entry (handles both git directories and git-worktree pointer files). The walk now terminates at the worktree root so unrelated ancestor .gitignore files are ignored, and outside any git worktree the warning stays silent entirely. Tests updated to mark t.TempDir() as a worktree (mkdir .git) where they expect the heuristic to activate; TestPlan_NoGitWorktree_NoWarning added to cover the silent-when-untracked path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… gap (Copilot review)
The InputSnapshot field comment claimed completeness ("env var names read
during ${VAR} substitution"), but the cmd/wfctl scanner that populates
the map intentionally does not apply top-level environments[env].envVars
defaults — that limitation was already documented at
collectInfraEnvVarRefs but not surfaced in the public interface
contract. Add a "Completeness caveat" note pointing at the scanner's
limitation so consumers don't assume the snapshot is exhaustive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+22
to
+30
| func ComputeDrift(planSnap, applySnap map[string]string) []interfaces.DriftEntry { | ||
| var drift []interfaces.DriftEntry | ||
| for name, planFP := range planSnap { | ||
| applyFP, present := applySnap[name] | ||
| if !present { | ||
| drift = append(drift, interfaces.DriftEntry{ | ||
| Name: name, | ||
| PlanFingerprint: planFP, | ||
| ApplyFingerprint: unsetFingerprintPlaceholder, |
…rder (Copilot review) Map iteration order in Go is randomized, so consumers that marshal / log / compare the returned drift slice (now exposed via *StaleError.Drift) would see non-deterministic output across runs. FormatStaleError already sorts independently for its printed output; sort the structured slice once at the source so all downstream consumers benefit. Test: TestComputeDrift_ResultIsSortedByName. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+80
to
+90
| // ResolvedConfigHash is the SHA-256 of POST-substitution Resource.Config, | ||
| // computed via platform.ConfigHash. Encoded as lower-case hex (no | ||
| // "sha256:" prefix); empty string when the config map is empty | ||
| // (platform.ConfigHash short-circuit). | ||
| // | ||
| // Currently populated by ComputePlan and persisted in plan.json so apply | ||
| // has the per-action hash available; the apply-time consumer that surfaces | ||
| // a per-resource diagnostic on mismatch is wired in a follow-up PR (W-3a/ | ||
| // T3.1.5). Until then the field is observable via plan.json inspection but | ||
| // not yet enforced at apply. | ||
| ResolvedConfigHash string `json:"resolved_config_hash,omitempty"` |
…pilot review) Field is tagged json:",omitempty" so the empty-string case is dropped from plan.json entirely rather than persisted as ""; consumers should treat "key missing" and "value == empty string" as the same condition. Comment now states this explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // Compute returns a map of env-var name → 16-hex-char sha256 prefix of the value. | ||
| // Variables that aren't set (lookup returns ok=false) are omitted from the snapshot. | ||
| func Compute(varNames []string, lookup func(string) (string, bool)) map[string]string { | ||
| out := make(map[string]string) |
Comment on lines
+144
to
+148
| // Relative path from .gitignore dir, e.g. "cmd/wfctl/plan.json". | ||
| if rel, err := filepath.Rel(gitignoreDir, planAbs); err == nil { | ||
| if anchored == rel || anchored == filepath.ToSlash(rel) { | ||
| return true, nil | ||
| } |
…stants (Copilot review) Two micro-optimizations: - inputsnapshot.Compute now allocates with len(varNames) capacity hint to avoid grow-resize cycles when many env vars are referenced. - gitignoreCovers hoists filepath.Rel/ToSlash and the base-derived pattern strings (starExt, doubleStarExt, doubleStarBase) out of the per-line scan loop — they're constant for the whole .gitignore file. No behavior change; less per-line allocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change.
intel352
added a commit
that referenced
this pull request
May 4, 2026
…ft postcondition + diff cache (W-3a of 12) (#527) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * fix(iac): diffcache.Get refreshes mtime so LRU is actually LRU (Copilot review) Without this, frequently-read entries were evicted as if unused because maybeEvict orders by mtime. Now Get touches mtime via os.Chtimes(now, now), turning eviction from FIFO-by-write into true LRU. Mtime-touch chosen over a sidecar last-accessed file to keep the on-disk shape trivial; cost is one extra syscall per hit, errors are ignored (failure degrades eviction precision but never produces wrong cache results). Adds TestCache_LRURefreshesOnGet regression test: writes N entries, Gets the oldest, then triggers over-cap, asserts the oldest survives and the second-oldest (now the LRU) is evicted instead. * fix(iac): diffcache.Put uses unique temp filename to avoid same-key write races (Copilot review) Pre-fix, two goroutines calling Put with the same Key both wrote to `<key>.json.tmp` and one would clobber the other's temp file mid-write, producing either a Rename failure or a half-written final file. Now Put uses os.CreateTemp so each call gets a unique `<key>.json.<random>.tmp` filename; the final rename is racy on which payload wins, but both payloads were derived from the same Key so the outcome is deterministic from the caller's perspective. Adds godoc "Concurrency: safe for concurrent use, including concurrent Puts of the same Key." Adds TestCache_ConcurrentSameKeyPut regression: 20 goroutines Put the same Key, asserts no leftover *.tmp files, asserts final cache file decodes. Run under -race. * fix(iac): diffcache.Put atomic rename on Windows (Copilot review) Document the os.Rename Windows limitation explicitly: on Windows, os.Rename fails when the destination exists, so an in-place cache update via Put will fail. The caller treats this as a write failure and proceeds without caching — correct because apply remains correct on a 100% miss rate (per the package's cache-as-amortization framing). We chose documentation over vendoring github.com/google/renameio: adding renameio would introduce the first such dependency in the repo, and there is no Windows-supported wfctl use case today. The existing precedent in cmd/wfctl/update.go and cmd/wfctl/plugin_install.go also uses bare os.Rename without Windows guards. The fix tracks the limitation in two places: the Put godoc (where the rename happens) and the package godoc Known Limitations section (where consumers will look). * fix(iac): diffcache returns deep-copy of DiffResult to avoid shared-slice mutation (Copilot review) Pre-fix, the in-memory cache stored DiffResult by value but the Changes slice ([]FieldChange) shared its backing array between the cached entry and the value returned to the caller. A caller mutating the returned Changes slice (element-level or via append- into-cap) would silently mutate the cached entry. The symmetric case is the same: mutating the Put argument after the Put call would leak into the cached value. Fix: clone the Changes slice via slices.Clone in both Get and Put. Scalar struct fields are value-copied by struct assignment so a single helper (cloneDiffResult) covers both directions. The filesystem cache deserializes from JSON each time so each Get already yields a fresh slice — no change needed there. FieldChange.Old/New are typed any; if a caller stores a pointer or mutable map there, the deep-copy stops at the slice level. By convention DiffResult.Changes carries scalar Old/New (strings, numbers, bools), so that is the right tradeoff between correctness and copy cost. Documented in memoryCache godoc. Adds TestCache_MemoryDeepCopiesChanges regression: Put a value, mutate the original argument, Get + mutate (element + append), Get again, assert original is preserved. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…s on manifest computePlanVersion (W-3b of 12) (#528) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * fix(iac): T3.6e env-var hygiene — TestMain unsets WFCTL_PLAN_DIFF_CONCURRENCY (Copilot review) A developer shell with WFCTL_PLAN_DIFF_CONCURRENCY=1 (or any other non-default value) would serialize ComputePlan's parallel Diff dispatch and break the parallelism assertions in differ tests. Explicitly unset the var in TestMain alongside the existing WFCTL_DIFFCACHE=disabled hygiene so test runs are deterministic regardless of shell environment. Addresses Copilot inline comment on PR #528 (platform/main_test.go:24). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.6 polish — drop double error: prefix + reuse precomputed configHash (Copilot review round 2) Two real fixes from Copilot's re-review of PR #528: 1. **Double "error:" prefix on plugin-load failure** — cmd/wfctl/main.go's top-level printer already emits "error: %v" on command failure. The T3.6b error string in cmd/wfctl/infra_plan_provider.go was prefixed with a literal "error: " of its own, producing operator output like `error: error: failed to load plugin "do": ...`. Drop the in-error prefix; update the assertion in infra_plan_provider_load_test.go to match the unprefixed root error; clarify in the CHANGELOG that the "error:" prefix in the rendered string is added by wfctl's top-level printer (not the underlying error). 2. **Duplicate configHash work in classifyModification** — ComputePlan already computes `hash := configHash(spec.Config)` while bucketing create vs modification candidates; classifyModification was re-computing the same hash on every Diff dispatch. Thread the precomputed hash through via a new `hash string` field on modCandidate + new parameter on classifyModification, so the per- candidate hashing happens exactly once. Addresses Copilot inline comments on PR #528 (round 2): - cmd/wfctl/infra_plan_provider.go:121 - platform/differ.go:104 Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./interfaces/... ./iac/... ./plugin/sdk/... ./module/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.6/T3.9 polish — diff-cache bypass on empty ProviderID + omit empty current_sensitive arg (Copilot review round 3) Two real fixes from Copilot's re-review of PR #528 round 3: 1. **Diff-cache hash-collision risk on empty ProviderID** — The cache key shape (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) does not include the resource Name. When two existing-state resources of the same Type both have ProviderID=="" (state-bootstrap, broken-plugin paths, transient races) and matching SHAConfig + SHAOutputs (e.g., both freshly-discovered with default-config and empty-outputs), they would share a cache key and could serve each other's cached DiffResult — misclassifying actions or skipping a required Diff. Defensive fix: classifyModification now skips both cache.Get and cache.Put when rs.ProviderID is empty, always re- dispatching to the driver. Cost is one extra Diff call per pre-bootstrap resource; benefit is correctness regardless of state completeness. New pin: TestComputePlan_EmptyProviderID_BypassesCache. 2. **`current_sensitive` arg serialized as null instead of omitted** — sensitiveToAny's docstring promises "trim-friendly" wire shape by returning nil for empty input, but the call site at remoteResourceDriver.Diff was unconditionally setting `args["current_sensitive"] = sensitiveToAny(...)`, which structpb serializes as a NullValue field rather than omitting the key. Conditionally include the key only when sensitiveToAny returns a non-nil map, matching the docstring intent. New pins: TestRemoteDriver_Diff_OmitsCurrentSensitiveWhenEmpty + TestRemoteDriver_Diff_IncludesCurrentSensitiveWhenPopulated. Addresses Copilot inline comments on PR #528 (round 3): - platform/differ.go:240 (cache key empty-ProviderID collision) - cmd/wfctl/deploy_providers.go:542 (current_sensitive null vs omit) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.6/T3.9 polish — preserve loadErr chain + lock-free diff cache + bypass-side-effect-free (Copilot review round 4) Three real fixes from Copilot's re-review of PR #528 round 4: 1. **loadErr chain lost across runInfraPlan re-wrap (errors.Is/As)** — computePlanForInfraSpecs returned `failed to load plugin %q: %v; ...` (using %v), losing the underlying error. After runInfraPlan re-wraps with `compute plan: %w`, callers could not errors.Is / errors.As against the original loader failure (e.g. to differentiate "plugin binary missing" from "plugin crashed during handshake"). Switch the inner wrap to %w. Rendered text is identical to %v. New pin: TestRunInfraPlan_FailsLoudOnPluginLoadFailure now asserts `errors.Is(err, loadErr)` reaches the sentinel through both wrap layers. 2. **getDiffCache called even on the empty-ProviderID bypass path** — classifyModification was calling getDiffCache() unconditionally, which (under the old per-call mutex) acquired the lock, and (under any backend-init pattern) would eagerly construct the filesystem cache backend at ~/.cache/wfctl/diff/ on the operator's machine even for resources that bypass the cache. Move the getDiffCache call inside the `if cacheable` branch so the bypass path is fully side-effect free. Round-3 already pinned the bypass behavior via TestComputePlan_EmptyProviderID_BypassesCache. 3. **Per-call sync.Mutex contention on getDiffCache hot path** — Under ComputePlan's parallel Diff fan-out (planDiffConcurrency() workers), the per-call mutex on getDiffCache was contention on every cache.Get / cache.Put, especially on cache hits where the Get itself is cheap. Refactor to sync.Once for one-time init + atomic.Pointer[diffcache.Cache] for lock-free reads. Subsequent reads are just an atomic.Load (and a typed deref). The test-swap helper setDiffCacheForTest is updated to Store/Restore directly on the atomic; cleanup seeds a fresh default when there was no prior value (so subsequent tests in the binary still observe a working cache). Addresses Copilot inline comments on PR #528 (round 4): - cmd/wfctl/infra_plan_provider.go:124 (%v → %w) - platform/differ.go:235 (getDiffCache eager call on bypass path) - platform/differ.go:405 (per-call mutex on hot path) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... ./module/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): T3.7/T3.6 polish — DispatchVersionFor centralizes type assertion + cache nil-DiffResult as zero-value (Copilot review round 5) Two real fixes from Copilot's re-review of PR #528 round 5 (a third finding, plan/apply discovery duplication, is filed as a follow-up issue rather than addressed in-PR to keep W-3b scope-locked). 1. **DispatchVersionFor docstring vs signature mismatch** — The helper claims to centralize the type assertion + non-implementer defaulting, but its parameter type was `ComputePlanVersionDeclarer`, forcing every call site to type-assert externally. Change the signature to accept `any` and perform the type assertion inside; non-implementers + nil now both return "v1" inside the helper as the docstring already promised. Param is `any` (not interfaces.IaCProvider) to keep the helper package import-free of the engine's interfaces package and to keep non-engine call sites (tests, stubs) frictionless. Updated the only production call site (cmd/wfctl/infra_apply.go) to drop the external type-assert. 2. **Cache no-op when driver.Diff returns (nil, nil)** — The cache.Put was guarded by `fresh != nil`, so providers using the nil-as-no-op convention (a documented option in the (DiffResult|nil, error|nil) return shape) re-Diffed on every ComputePlan call — undermining the cache contract for that whole class of providers. Cache a zero-value DiffResult on (nil, nil) returns; classifyModification's downstream switch already treats zero-value the same as nil (no plan action), so the semantic is preserved while the cache stays effective. New pin: TestComputePlan_NilDiffResult_CachesAsZeroValue verifies that the second ComputePlan against unchanged inputs is served from cache (driver.Diff invoked exactly once across two calls). 3. **Plan/apply provider-discovery duplication** (Copilot finding R5-C, not addressed in this PR) — computePlanForInfraSpecs duplicates the iac.provider discovery + grouping logic in applyInfraModules. Per workspace memory feedback_implementer_scope_bleed, refactoring to a shared helper is a separate task: the duplication exists pre-W-3b (apply was the original; plan was added in W-3b mirroring it intentionally), and the extraction touches code paths W-3b's test plan does not cover. Filed as follow-up rather than expanding W-3b's blast radius. Documented in PR description. Addresses Copilot inline comments on PR #528 (round 5): - iac/wfctlhelpers/dispatch.go:41 (signature vs docstring mismatch) - platform/differ.go:265 (cache write skipped on (nil, nil)) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(iac): T3.7 — correct DispatchVersionFor + findIaCPluginDir doc claims (Copilot review round 6) Two doc-comment accuracy fixes from Copilot's re-review of PR #528 round 6 — both surfaced by/exposed in the round-5 changes: 1. **findIaCPluginDir docstring referenced wrong helper** — Round 5 changed wfctlhelpers.DispatchVersionFor to take `any` (a provider value), but findIaCPluginDir's docstring still told callers to pass the returned `computePlanVersion` string through DispatchVersionFor. That call wouldn't type-assert to ComputePlanVersionDeclarer (a string isn't a provider) and would silently default to "v1". Replaced with the correct pattern: string-equality against wfctlhelpers.DispatchVersionV2 at this loader-level seam where only the raw string is in hand. Includes example snippet. 2. **DispatchVersionFor docstring overstated the validation guarantee** — Claimed plugin/sdk.ParseManifest schema-validation means the dispatch only sees {"v1", "v2", ""}. True for callers that load via ParseManifest, but cmd/wfctl/deploy_providers.go's findIaCPluginDir / readIaCPluginComputePlanVersion path uses a minimal json.Unmarshal with NO schema validation — so unknown values CAN reach DispatchVersionFor at runtime. Updated the docstring to flag this honestly and call out that the default-to-v1 behavior is the safety net for those paths (callers must not rely on the validation guarantee). Doc-only; no code change. All packages still build + vet cleanly. Addresses Copilot inline comments on PR #528 (round 6): - cmd/wfctl/deploy_providers.go:107 (wrong helper referenced) - iac/wfctlhelpers/dispatch.go:18 (overstated validation guarantee) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(iac): T3.5 — TestParseConcurrencyEnv subtest names (Copilot review round 7) The first table case had `in: ""` and used `tc.in` directly as the t.Run subtest name. Go's testing package silently rewrites empty subtest names to "#00", which is unique enough to run but masks the case identity in -v output and failure reports. Add a `name` field to the table struct and use stable descriptive labels (empty, non_numeric, negative, zero, one, eight, thirty_two, thirty_three_clamped_to_max, one_hundred_clamped_to_max) while still passing the raw `tc.in` to parseConcurrencyEnv. Identical test coverage; clearer reporting. Addresses Copilot inline comment on PR #528 (round 7): - platform/differ_cache_test.go:253 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(iac): T3.5/T3.6 — clamp + in-flight counter doc accuracy (Copilot review round 8) Two doc-only nits surfaced in Copilot's round-8 re-review of PR #528. Both are accuracy fixes — no behaviour change. 1. **planDiffConcurrencyMin/Max comment overstated "disable"** — The comment said "Below 1 disables concurrency (worse than serial)", but parseConcurrencyEnv clamps values <=0 UP to planDiffConcurrencyMin (=1), which produces effectively-serial dispatch (one Diff in flight), not "disabled". Operators cannot turn the worker pool off, only narrow it to one. Updated the comment to spell that out and call out both clamp directions explicitly. 2. **channelGatedDriver.inFlight docstring claimed "peak"** — The docstring said inFlight tracks the *peak* number of simultaneous Diff goroutines, but…
intel352
added a commit
that referenced
this pull request
May 4, 2026
* feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type
* feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel
* feat(iac): wfctl infra plan writes InputSnapshot to plan.json
* feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash
* feat(iac): wfctl infra plan warns when plan.json not in .gitignore
* feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring
* feat(iac): add refreshoutputs.Refresh — read-only state output refresh
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): add wfctl infra refresh-outputs subcommand
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)
T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.
Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
no-op. Operators who use the "0"/"false" convention to disable a
feature get the expected behaviour rather than a presence-only
foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
environments that force the env var on globally).
Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).
Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(iac): concurrency stress test for refreshoutputs.Refresh
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(wfctl): document infra refresh-outputs subcommand
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1)
* fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema
Addresses code-reviewer findings on commit 695a070:
- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
capture both *jsonschema.Schema and the compile error so concurrent
callers observe a single deterministic outcome. Adds a 32-goroutine
ParseManifest stress test that fires under -race to lock in the
invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
cannot mutate the //go:embed slice (defense-in-depth; embed slices
are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
typo like "computeplanversion" or an unknown key is rejected at
parse time instead of silently defaulting to v1 dispatch. The root
object stays permissive — existing plugin.json files carry
version/author/dependencies/etc. and the SDK manifest is a strict
subset by design. New test covers both the typo-rejection and the
root-permissivity contracts.
* feat(iac): add refreshoutputs.Refresh — read-only state output refresh
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): add wfctl infra refresh-outputs subcommand
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS)
T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.
Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
no-op. Operators who use the "0"/"false" convention to disable a
feature get the expected behaviour rather than a presence-only
foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
environments that force the env var on globally).
Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).
Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(iac): concurrency stress test for refreshoutputs.Refresh
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(wfctl): document infra refresh-outputs subcommand
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields
* feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch)
* fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract
Addresses code-reviewer findings on commit 13a6fad:
- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
Name" but the populating site (T3.4 plan §1625) sets
result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
this. Re-worded to "Keyed by the *replaced* resource's Name" with an
explicit reference to action.Resource.Name + a sentence on how W-5 JIT
substitution will use the map (lookup by replaced-resource name to
obtain the new ProviderID for dependent configs). Locks the contract
before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
to its enforcing test (TestComputeDrift_ResultIsSortedByName in
iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
nil and empty-but-non-nil values for all three new fields, asserting
the JSON keys are absent from the encoded form. Locks the omitempty
tag behavior so a future refactor cannot silently regress to emitting
"initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.
* fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test
Addresses code-reviewer findings on commit 8416498:
- Important 1 (weak Replace assertion): converted fakeDriver from
boolean call recorders to integer counters. The 4-action plan
[create, update, replace, delete] now asserts Create==2, Update==1,
Delete==2. If "case replace" were silently dropped from
dispatchAction the counts would shift to 1/1/1 and the test would
fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
fakeProvider.driverErr, asserts the canonical "resolve driver:"
prefix, and verifies the loop continues past action[0] to action[1]
(best-effort contract). Folded the loop-continues-after-failure
coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
using a selectiveFakeProvider that errors on one type only — proves
one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
fmt.Sprintf("resolve driver: %v", err) since the destination is a
string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
iteration boundary; on cancel, returns the result accumulated so far
+ the ctx error as top-level. Added
TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
documenting the same-name-same-type invariant for Replace plans.
Documenting rather than enforcing — ComputePlan upstream is the
contract owner.
Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.
* fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test
Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.
Pure-mechanical cleanup. No behavior change.
* feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset)
* feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery
* feat(iac): doUpdate + doDelete actions
* feat(iac): doReplace populates ApplyResult.ReplaceIDMap
* feat(iac): add diff cache with LRU eviction + corruption recovery
* fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy
Three independent review-fix bundles:
T3.1.5 (commit f5a7ce9 review — Minor 1):
- apply_postcondition_test.go::fingerprint now delegates to
inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's
fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex
imports. Future Compute-algorithm changes (prefix length, hash) now
re-align both test files automatically — keeps the cross-package
fixture parity guaranteed.
T3.2 (commit 0c30eec review — Minors 1 + 2):
- apply_create_test.go gains
TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter
+ alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm
of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type
assertion — distinct code path from the existing
ok-but-SupportsUpsert==false test. Compile-time premise check
ensures the test stays meaningful if a future refactor lifts
SupportsUpsert onto the embedded fakeDriver.
- apply.go::doCreate godoc tightens the errors.Is contract to make
the in-package vs at-the-ActionError-boundary distinction explicit.
External callers reading [interfaces.ApplyResult].Errors lose
errors.Is matching at the string-conversion boundary; the canonical
"upsert: read after conflict:" prefix is the discriminant. Also
documents the single-pass recovery contract (recovery Update that
itself returns ErrResourceAlreadyExists surfaces unchanged rather
than retriggering the recovery loop).
T3.3 (commit a3fc98b review — Minors 1 + 2 + 4):
- apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively
now also asserts len(result.Resources) == 1 on the success path —
locks the resource-append contract so a regression that skipped the
append on nil Current would fail loudly.
- apply_update_delete_test.go gains parallel
TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive
shape: empty ProviderID flows to driver, no synthesized precondition
error, deleteCount==1 (latent bug-fix from design — the v1 path
silently skipped Delete; v2 must call it).
- apply.go package godoc adds a "Per-action error-prefix policy"
section documenting the decompose-then-prefix rule (bare on simple
actions; "upsert: ..." / "replace: ..." on decomposing paths) so
future reviewers don't suggest "let's add prefixes for consistency."
* fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace
Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703.
Without the guard, a Ctrl-C / SIGTERM arriving exactly between the
Delete and Create driver calls of a Replace action would still
trigger the Create — surprising operators who expected fast
interruption mid-Replace. The half-replaced state is still the
documented recovery surface (Delete happened, Create did not, so
ReplaceIDMap stays empty), but cancellation now propagates as soon
as it is observable.
Failure shape:
return fmt.Errorf("replace: canceled after delete: %w", err)
Wrapped to preserve the context.Canceled / context.DeadlineExceeded
sentinel for in-package errors.Is matching. The "replace: canceled
after delete:" string prefix is the discriminant for callers reading
result.Errors at the public API surface.
New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate +
cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a
captured context.CancelFunc as a side-effect, simulating exact
post-Delete cancellation. Asserts Delete ran, Create did NOT,
ReplaceIDMap stays empty for the resource, error has the canonical
prefix.
Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this
commit since it's the symmetric coverage for the new guard.
Other Minors (2/4/5/6/7) intentionally skipped — all documentary or
out-of-scope per reviewer guidance.
* docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows
T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer
finding on commit 8774205. Two plan-mandated deliverables that the
T3.5 commit's `git add` line omitted:
1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache
as an amortization-only optimization (not correctness mechanism),
the WFCTL_DIFFCACHE backend selection (disabled / :memory: /
filesystem default), the LRU eviction caps (1024 entries / 64 MiB),
the corruption recovery contract (silent eviction + once-per-process
info log), the plugin-downgrade safety property, and the rev3
"all CI workflows set :memory: explicitly" statement plus a list
of the affected workflow files.
2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in
every workflow that runs `go test` or `wfctl`:
- .github/workflows/ci.yml (test + lint jobs)
- .github/workflows/benchmark.yml (performance benchmarks)
- .github/workflows/pre-release.yml (pre-release tests)
- .github/workflows/release.yml (release tests)
- .github/workflows/dependency-update.yml (post-update test gate)
Workflow files that don't invoke go test / wfctl are not modified
(codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml,
osv-scanner.yml, test-dispatch.yml).
Each workflow gets a brief inline comment citing ci.yml as the
canonical rationale + the T3.5 rev3 lifecycle constraint reference.
Per spec-reviewer guidance: kept the original T3.5 package-code commit
(8774205) untouched and stacked this docs+CI commit on top. YAML
syntax verified on all 5 modified workflows.
* fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup
Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060:
- Minor 1 (atomic Put, worth-doing production improvement): Put now
uses write-temp-then-rename. POSIX rename(2) is atomic on the same
filesystem, so a process crash mid-write leaves either the prior
contents or the new contents — never a partial write. The
corruption-recovery path in Get is still the safety net for cross-
filesystem renames or NFS edge cases that don't honor atomicity.
In production this means corruption recovery essentially never
fires from native crashes. The .json extension filter in
maybeEvict already excludes .tmp orphans, so no additional
filtering needed. On rename failure, best-effort cleanup of the
temp file.
- Minor 3 (userCacheDir godoc): tightened the platform-conventions
language. Linux honors XDG_CACHE_HOME; macOS uses
~/Library/Caches; Windows uses %LocalAppData%. The previous
comment overstated XDG honoring on all platforms.
- Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note
explaining the tags are for log/transcript serialization, not
cache keying — keyFingerprint uses NUL-separated string concat,
not JSON marshaling. Future readers checking the fingerprint
shape now have the right pointer.
- Minor 5 (vestigial sanity check): dropped the
`os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the
end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was
meaningless — no code path creates a file with `*` in its name.
Likely leftover from earlier debugging. Removing it lets us drop
the now-unused `os` import.
- Minor 6 (mtime resolution test comment): added a paragraph to
TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime
resolution assumption and listing the supported filesystems
(ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime
filesystems (FAT32, SMB) are explicitly out of scope.
Skipped per reviewer guidance:
- Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class
concern; acceptable for W-3a scope."
- Minor 7 (Put error log-silent): "the cache-as-amortization framing
in the package godoc already sets the expectation."
* refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change)
* feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error)
W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider
modules in the config, groups desired specs by `provider:` field, loads
each via the same loader the apply path uses, and dispatches
platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates
against a real plugin process at plan time, not just at apply time.
BREAKING: configs declaring at least one iac.provider module now require
the plugin process to load successfully. Plugin-load failure exits
non-zero with the literal error documented in the v0.21.0 CHANGELOG.
There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2);
operators who need pure offline validation should use `wfctl validate`.
Configs without any iac.provider module fall back to the legacy
ConfigHash compare path so minimal/legacy fixtures and out-of-band
scripts continue to work.
cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the
package compiles; T3.6c replaces nil with the live provider handle.
* feat(iac): wfctl infra apply threads provider into ComputePlan
* test(iac): update cross-package fakes for ComputePlan provider arg
W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in
module/infra_module_integration_test.go to the new (ctx, provider, …)
signature. Lifts the no-op fake into a small public test helper at
iac/iactest/fakeprovider.go so the same shape no longer needs to be
re-declared every time a new package wants to satisfy the interface.
Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's
computePlanForInfraSpecs now dispatches via the same computeInfraPlan
seam the apply path uses (no parallel seam variable; one override point
serves both call sites). Plan-loop body is wrapped in an IIFE so each
provider's closer fires after its group is computed instead of
deferring to function exit (multi-provider plan no longer holds N gRPC
connections open at once).
Drops the duplicated planNoopProvider and applyV2RecordingProvider
no-op implementations in cmd/wfctl tests in favor of the shared
iactest.NoopProvider. Three structurally-identical 14-method shells
become one. Atomic counters carried forward where used.
Doc updates:
- godoc on computePlanForInfraSpecs corrected: groups are concatenated
in first-reference-in-`desired` order, not iac.provider declaration
order (matches actual code).
- CHANGELOG entry calls out the empty-desired alignment with apply
(loop over groupOrder is empty when no specs reference any provider;
use `wfctl infra destroy --dry-run` to preview teardown).
* feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace
W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).
ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:
- replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
is true (the latter closes design issue C — pre-W-3b ForceNew was
silently downgraded to update);
- update, when DiffResult.NeedsUpdate is true and replace did not
fire;
- skip, when neither flag is set.
Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.
Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).
Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):
- TestComputePlan_NeedsReplaceEmitsReplaceAction
- TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
- TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
- TestComputePlan_DiffReturnsNoChanges_EmitsNothing
- TestComputePlan_NilProvider_FallsBackToConfigHash
- TestComputePlan_NilDriver_FallsBackToConfigHash
- TestComputePlan_DriverDiffError_PropagatesAsError
platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).
Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
phrasing.
cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
rely on the post-adopt ComputePlan emitting update; pre-W-3b that
was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
panicking — the refresh-outputs test fixture only exercises Read.
* perf(iac): ComputePlan consults diffcache before invoking provider.Diff
W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into
classifyModification: cache.Get is consulted before each
ResourceDriver.Diff dispatch under the (PluginVersion, Type,
ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached
DiffResult is used directly; on miss, the freshly-computed result is
Put into the cache. Apply-time correctness does not depend on cache
hits — fresh CI runners always miss and re-Diff (the cache is purely
an amortization optimization for repeated `wfctl infra plan` against
the same checkout).
Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env
var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" →
in-memory; "disabled" → noop. The package-level cache instance is
lazy-initialised on first ComputePlan call and shared across
subsequent calls; tests in the same package may swap it via the
internal-package setDiffCacheForTest helper.
platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain
so the platform test suite never reads/writes the developer's
filesystem cache and so cache state cannot leak across tests with
incidentally-aligned cache keys (caught during integration: T3.6e's
Replace-emission test was Putting a result that polluted later
update/no-op tests).
Folds in the T3.6e code-review IMPORTANT carry-forwards (since both
fixes touch platform/):
- Note 1 (env-clamping testability): extract parseConcurrencyEnv as a
pure function; new TestParseConcurrencyEnv table-driven test covers
empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5".
- Note 2 (parallel-dispatch correctness): new
TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises
N=5 modification candidates, asserts driver.diffCount.Load() == 5
and the resulting plan has 5 actions.
- Note 3 (driver returns nil DiffResult): explicit test
TestComputePlan_DriverReturnsNilDiff_EmitsNothing.
And T3.6e adversarial-review minor cleanups:
- Note 4 (i := i shadowing redundant in Go 1.22+): dropped.
- Note 5 (errSentinel uses custom errFromTest): replaced with
errors.New.
- Note 7 (concurrency contract on ComputePlan godoc): added — p and
the ResourceDriver instances it returns MUST be safe for concurrent
use.
New tests (3 cache-behaviour scenarios in differ_cache_test.go):
- TestComputePlan_CacheHitSkipsDiff (second call against unchanged
inputs hits cache; diffCount stays at 1)
- TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig
forces re-dispatch)
- TestComputePlan_NoopCacheNeverHits (disabled backend always
re-dispatches)
* test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review)
Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff
(landed in T3.6f) per team-lead's explicit request: a regression that
accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would
still pass the count-only assertion as long as every candidate
eventually got dispatched. The new
TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses
a channel-gated driver to prove ≥2 Diff goroutines are simultaneously
in-flight before any returns: regression to serial dispatch would
hang on the second `<-entered` and time out at 5s.
Pure addition (no production-code change). cacheTestProvider.driver
loosened from *cacheTestDriver to interfaces.ResourceDriver so the
new channelGatedDriver shares the provider shell.
* fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review)
Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.
Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.
Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)
Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).
* feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion
W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the
loaded plugin's plugin.json declares iacProvider.computePlanVersion:
v2 (read at provider load time and surfaced via the optional
ComputePlanVersionDeclarer interface). Providers that don't declare
the field, or declare anything other than "v2", take the legacy
provider.Apply path.
rev2/rev3-locked: NO env-var, NO operator-flippable gate. The
v1/v2 routing is plugin-author-controlled via plugin.json from day 1
— there is no transitional WFCTL_USE_V2_APPLY flag to misuse.
Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5
as foundation only). The v2 dispatch path is the production caller
that surfaces the InputDriftReport to stderr after a successful
ApplyPlan return; v1 path remains untouched per the W-3a "zero
runtime change for v1 plugins" invariant.
New plumbing:
- iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer
interface + DispatchVersionV2 const + DispatchVersionFor helper.
Single override point for the dispatch decision.
- iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion +
ProviderVersion fields and ComputePlanVersion() method so tests
drive both v1 (default empty) and v2 paths through the shared fake.
- cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level
iacProvider.computePlanVersion alongside existing
capabilities.iacProvider.name; findIaCPluginDir returns the
version; readIaCPluginComputePlanVersion is the load-time helper;
remoteIaCProvider stores the value and exposes it via
ComputePlanVersion() to satisfy the optional interface. (Re-reads
plugin.json once per provider load rather than threading through
loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable
for the existing test override; cost is one tiny os.ReadFile vs
the gRPC start.)
- cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan
test seam + dispatch branch in applyWithProviderAndStore. Drift
report printed to writer on success (no-op when empty).
- cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover
TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2
routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply
(v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2
PrintsDriftReport (drift wiring asserted via writer-buffer
substring). v1 fixture v1RecordingProvider intentionally does NOT
implement ComputePlanVersionDeclarer to prove the dispatcher's
"default to v1 when un-declared" branch.
* fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review)
Code-reviewer flagged 3 IMPORTANT items in T3.7:
1. Comment/code mismatch on drift-report timing. The comment promised
"Run on success or partial failure" but the code gated on
`err == nil` (success only). The contract the comment described
is the more useful behavior — operators most need the
stale-input diagnostic when an apply fails ("which input went
stale during the failed apply?"). Without it, the failure error
and the "what changed" context are disconnected.
Fix: gate on `result != nil` instead of `err == nil`.
printDriftReportIfAny already no-ops on empty/nil reports so
unconditional-on-result-non-nil is safe.
2. No test for the drift-on-partial-failure path. Added
TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
and asserts both: (a) the err propagates, AND (b) the drift
report still reaches the writer.
3. Optional-interface coverage gap. Two semantically-different "v1"
paths exist:
- Path A: provider doesn't implement ComputePlanVersionDeclarer
at all → type-assert fails → legacy. Covered by
v1RecordingProvider.
- Path B: provider implements interface but ComputePlanVersion()
returns "" (the realistic mid-transition state for v1 plugins
after the SDK update lands but before they migrate) → type-
assert succeeds, DispatchVersionFor returns "v1" → legacy.
Was untested.
Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
using iactest.NoopProvider{DispatchVersion: ""}, which always
implements the interface (the method exists on the type). Pins
Path B specifically.
Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.
* fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion
cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.
v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.
Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).
Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.
* test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007)
W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.
# ADR 007 — non-trivial deviation from plan-literal
Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:
1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
test-helper package, not a runnable binary.
2. Real-gRPC runtime validation lands in P-DO when DO sets
computePlanVersion: v2 in its plugin.json.
3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
4. W-7 conformance suite is the recurring cross-PR gRPC harness.
Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.
# Tests
- TestApply_V2_LoaderSeamDispatch_EndToEnd:
- Writes a real config + filesystem state seeded with vpc
region=nyc3 (under iacStateRecord shape).
- Sets desired region=nyc1.
- Substitutes the resolveIaCProvider seam to return a Go provider
that declares v2 + has a driver returning NeedsReplace=true.
- Calls applyInfraModules (the production runInfraApply
entrypoint) and asserts driver.diffCount == 1, deleteCount ==
1, createCount == 1, plus exact identity of the deleted
ProviderID and the created Config["region"].
- TestApply_V2_LoaderSeam_DriftReportPrinted:
- Same loader-seam setup + applyV2ApplyPlanFn substitution
returning InputDriftReport with one entry.
- Captures os.Stderr and asserts the FormatStaleError block
reaches the operator (drift-report wiring T3.7 added is
end-to-end alive in the v2 loader path).
# Test infrastructure
- cmd/wfctl/main_test.go: NEW TestMain forces
WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
scoped via getDiffCache lazy init) doesn't observe stale entries
from a developer's local ~/.cache/wfctl/diff/ as false-positive
cache hits skipping driver Diff dispatch. Same pattern as
platform/main_test.go from T3.6f. Caught during dev when the
end-to-end test failed in the full cmd/wfctl test run but passed
in isolation.
# Bug-class context
The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.
* docs(pr): note bugs incidentally fixed by W-3b
W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md
as a stable artifact the team-lead can copy-paste at PR-open time.
Pure-additive doc; no code changes.
Captures all three incidentally-fixed bugs surfaced during W-3b's
binding dispatch wiring:
1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch)
2. ForceNew silently downgraded to Update (T3.6e replace emission)
3. map[string]bool drops gRPC args silently — sensitiveToAny
converter (commit 40e07a1; surfaced during T3.9 runtime
validation; v1 plugins never tripped it)
Includes summary, BREAKING-change call-out, ADR reference, rollout
notes, and test plan.
* docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions)
Per spec-reviewer's adversarial review of the prior keeps-grpc-stub
variant: the durability invariant for recording-decisions requires
preserving ALL transitions of a deliberation, not just the final
landing. The original ADR (loader-seam variant) recorded only one
team-lead direction; the keeps-grpc-stub variant (since superseded)
recorded only one reversal. Neither captured the full B → A → B → A →
B oscillation that played out during T3.9 execution.
This commit:
- Status header updated to "Accepted (with extensive deliberation
history — see Decision history section)".
- Context section adjusted to preface the deliberation history
rather than imply a single-direction trajectory.
- New Decision history section lists all 5 transitions with
verbatim team-lead quotes + per-transition implementer action.
- Final paragraph captures the meta-lesson: when team-lead path-
flips mid-execution, reviewer + implementer should refuse to
proceed and force explicit disambiguation. Both reviewers
endorsed this hold during transition 4; the strict-interpretation
invariant from using-superpowers was the operative rule.
Pure ADR amendment; no code changes. Branch state (c9101ba T3.9
loader-seam + d2e50d4 T3.10 PR body) unaffected.
Closes spec-reviewer's Issue 1 from c9101ba pre-review:
"ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased
the durable record of team-lead's 'Path #1 — keep A' reversal.
Future branch-readers will see no record of why Option A was
considered + rejected."
* feat(iac): add ProviderValidator optional interface + PlanDiagnostic type
Adds an OPTIONAL `interfaces.ProviderValidator` interface that an IaCProvider
implementation MAY also satisfy to expose provider-side cross-resource
constraint validation at plan time:
type ProviderValidator interface {
ValidatePlan(plan *IaCPlan) []PlanDiagnostic
}
Plus the supporting `PlanDiagnostic` type and `PlanDiagnosticSeverity` enum
(Info/Warning/Error). Consumers (e.g. the R-A10 align rule landing in the
next commit) discover ValidatePlan via type-assertion, so providers that do
not implement it keep working unchanged — purely additive.
Naming note: plan T4.1 originally proposed `Diagnostic` for this type, but
`interfaces.Diagnostic` is already taken by the unrelated Troubleshooter
runtime-event finding (`iac_resource_driver.go`). Renamed to PlanDiagnostic
to preserve W-4's pure-additive contract; the existing Troubleshooter type
is untouched.
TDD via interfaces/iac_provider_test.go covering severity-constant ordering,
PlanDiagnostic field shape, and type-assertion against both an implementor
and a non-implementor (confirms the interface remains optional).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(iac): R-A10 align rule — provider.ValidatePlan dispatch
Adds R-A10, the align rule that surfaces provider-side cross-resource
constraint diagnostics at plan time. Wiring:
cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan
Iterates providers, type-asserts ProviderValidator, calls
ValidatePlan(plan), maps each PlanDiagnostic to an AlignFinding.
Severity mapping: Error→FAIL, Warning→WARN, Info→WARN (advisory;
align has no INFO tier today). Resource label falls back to
"<provider-name>:plan" for plan-level findings; field path is
appended to the message when present.
cmd/wfctl/infra_align.go::runInfraAlignChecks
Dispatches R-A10 only when --plan is provided (R-A7 predicate parity).
Loads providers via the new alignLoadProviders test seam — the
default implementation enumerates iac.provider modules in the YAML
and loads each through the existing resolveIaCProvider plugin path.
Closers are released after the rule runs; a per-provider load failure
logs a stderr warning and continues so other R-A* findings are not
hidden.
TDD via cmd/wfctl/infra_align_ra10_test.go covers nil-plan, no-providers,
non-validating-provider-skipped, Error→FAIL, Warning→WARN, Info→WARN,
plan-level resource fallback, and multi-provider mixed-implementation
cases. Two integration tests exercise dispatch through the seam: one
asserts R-A10 fires under --strict and produces non-zero exit; the other
asserts the rule (and the loader) is silent without --plan.
Pure-additive: providers that do not implement ProviderValidator are
skipped, so this commit changes no existing align behaviour.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(iac): document ProviderValidator + R-A10 align rule
Adds the documentation pieces for W-4:
- DOCUMENTATION.md gains a new top-level "IaC Provider Plugin Interfaces"
section that documents the optional interfaces.ProviderValidator
interface, the PlanDiagnostic/PlanDiagnosticSeverity types, the
ValidatePlan contract (read-only, no remote calls), the R-A10 consumer
and its severity mapping, and the naming-distinction note vs. the
pre-existing interfaces.Diagnostic (Troubleshooter) type.
- docs/WFCTL.md adds an `infra align` subsection under the existing
`infra` command. It lists every R-A* rule (R-A1 through R-A10 with
severities), the flag table, the R-A10 severity-mapping submatrix,
and example invocations covering both plan-less and --plan/--strict
modes.
- cmd/wfctl/dsl-reference-embedded.md (the source for `wfctl
dsl-reference`) gains the R-A9 and R-A10 rows in the rule-families
table and a short paragraph on R-A10's behaviour. The `--plan`
description is updated to enable both R-A7 and R-A10.
Pure docs change; no code touched.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(iac): T4.5 verification — `--plan` help text mentions R-A10
T4.5 verification surfaced one cosmetic gap: the `--plan` flag's help
description still read "enables R-A7 checks" after T4.2 added R-A10 as a
second `--plan`-gated rule. Updated to "enables R-A7 and R-A10 checks" so
`wfctl infra align --help` reflects current behaviour.
Verification steps (no further code change required):
- `GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \
./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/...` → all PASS.
- `go build ./cmd/wfctl` → builds clean.
- `wfctl infra align --help` → shows existing flags plus the corrected
`--plan` description.
- Fixture-provider smoke (TestInfraAlign_RA10_FixtureProvider_Fires) wires
a ProviderValidator returning a fatal diagnostic through the
alignLoadProviders seam → R-A10 finding emitted, FAIL severity, non-zero
exit under `--strict`. This satisfies T4.5 Step 3 manual rule-trigger
smoke without needing a real plugin subprocess.
- `go vet ./interfaces/... ./cmd/wfctl/... ./iac/...` → clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(iac): T4.2/T4.4 review — Info diagnostics log, no finding
Spec-reviewer flagged that the rev10 plan T4.2 acceptance criteria specify
a three-tier severity mapping ("Errors → align failures; Warnings →
warnings; Info → logs"), and that the previous commit (76c4160) collapsed
Info into WARN. The collapse meant `wfctl infra align --strict` could exit
non-zero on a purely informational provider hint — the exact scenario the
Info tier exists to prevent (e.g. billing-tier change notices, deprecation
hints) — defeating the tier's contract.
Code (cmd/wfctl/infra_align_rules.go::checkRA10_provider_validate_plan):
Severity switch reworked to three explicit cases plus a conservative
default. PlanDiagnosticInfo now writes to a new package-level sink
`ra10LogInfo` (stderr by default; overridable for tests) and emits NO
AlignFinding, so it never affects exit code under any flag combination.
PlanDiagnosticError → FAIL and PlanDiagnosticWarning → WARN are unchanged.
Unknown future severities fall back to WARN so they cannot slip past
--strict undetected.
Doc-comment rewritten to spell out the three-tier mapping and the
motivating "Info must not break --strict CI" rule.
Test (cmd/wfctl/infra_align_ra10_test.go):
TestCheckRA10_InfoDiagnostic_BecomesWARN renamed/rewritten as
TestCheckRA10_InfoDiagnostic_LogsAndEmitsNoFinding. Asserts:
- len(findings) == 0
- the captured log line carries the rule tag, [info] severity marker,
"<provider>/<resource>" identifier, the diagnostic message, and the
"field: <name>" suffix
- alignExitCode(findings, strict=true) == 0 (the load-bearing guarantee)
Docs (DOCUMENTATION.md, docs/WFCTL.md):
Both severity-mapping summaries replaced with a three-row table
(Error → FAIL finding, Warning → WARN finding, Info → stderr log/no
finding/no exit-code effect). Prose surrounding the table now
explicitly calls out the strict-CI safety guarantee.
Verification:
- GOWORK=off go test -race -count=1 ./interfaces/... ./iac/...
./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/... → all PASS.
- markdown-link-check on the three modified docs → 0 dead links.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(iac): T4.4 review — embedded reference Info-tier mapping
Spec-reviewer caught one stale doc site missed in commit 9c41c1d:
`cmd/wfctl/dsl-reference-embedded.md:1358-1359` (the source for `wfctl
dsl-reference`) still claimed `PlanDiagnosticInfo` produced a WARN
AlignFinding. Replaced with the full three-tier prose so `wfctl
dsl-reference` callers see the corrected mapping:
- PlanDiagnosticError → FAIL AlignFinding (always non-zero exit)
- PlanDiagnosticWarning → WARN AlignFinding (non-zero only under --strict)
- PlanDiagnosticInfo → stderr log "R-A10 [info] <provider>/<resource>:
<message>"; no AlignFinding so --strict CI
gates never fail on informational hints
The R-A10 row in the table at :1354 ("FAIL or WARN") is unchanged — Info
no longer produces a finding so the existing severity range still
exhaustively covers the possible AlignFinding severities.
Verification:
- `markdown-link-check cmd/wfctl/dsl-reference-embedded.md` → 0 dead links.
- `GOWORK=off go test -race -count=1 ./cmd/wfctl/...` → PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(iac): R1 review — load plan + cfg once; clean R-A10 Info log fmt; clarify PlanDiagnosticSeverity doc (Copilot review)
- runInfraAlignChecks loads --plan once and reuses the parsed *IaCPlan
for R-A7 and R-A10 (was: 2x file open + JSON decode).
- alignLoadProviders now takes *alignContext (built once via
buildAlignContext in runInfraAlignChecks) instead of re-loading the
YAML from disk. Test seam updated.
- R-A10 Info log identifies plan-level diagnostics as `<provider>/plan`
(matches the documented `R-A10 [info] <provider>/<resource>: ...`
format) instead of the redundant `<provider>/<provider>:plan: ...`.
Table label still uses `<provider>:plan`.
- PlanDiagnosticSeverity doc comment now spells out the exit-code
mapping: Error always FAILs; Warning is advisory by default but FAILs
under --strict; Info never affects exit code.
New test: TestCheckRA10_PlanLevelInfoDiagnostic_LogsAsProviderSlashPlan
covers the log-format fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…#531) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * feat(iac): jitsubst.ResolveSpec for per-module deferred substitution T5.1 — new package iac/jitsubst hosts ResolveSpec, the apply-time helper that resolves ${VAR}, ${MODULE.field}, and ${MODULE.id} references in a ResourceSpec.Config tree. Strict semantics: every reference MUST resolve or the helper returns an error and the input spec unchanged. ${MODULE.id} prefers the in-apply replaceIDMap (W-3b/T3.4) over syncedOutputs so cascade-replace ProviderID propagation is authoritative over potentially stale state outputs. Used by W-5 T5.2 (wire into wfctlhelpers.ApplyPlan) and T5.3 (wire into doReplace). No behavior change yet — helper has no in-tree caller. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): ApplyPlan resolves JIT substitutions per action T5.2 — wfctlhelpers.ApplyPlan now invokes jitsubst.ResolveSpec on every action.Resource before dispatch. The substitution sees: - result.ReplaceIDMap (this-apply Replace ProviderIDs from doReplace) - syncedOutputs (state-side outputs from action.Current entries + this-apply outputs from successful prior dispatches in the same loop) - os.LookupEnv (production env source) syncedOutputs is pre-populated from every action.Current at start-of-apply so a NEW action can reference an in-state sibling module's outputs from action zero. After each successful dispatch (when result.Resources grows), the new entry is folded into syncedOutputs via flattenOutputs — flat-copy of Outputs with the canonical 'id' key shadowed by ProviderID so ${MODULE.id} resolves predictably across new and existing modules. JIT failure surfaces as a per-action ActionError with the canonical 'jit substitution:' prefix; the offending action SKIPS dispatch (unresolved spec must not reach the driver). The loop continues to the next action — best-effort apply contract preserved. Tests in apply_jit_test.go cover: 2-create plan with B referencing ${A.id}, pre-syncing from action.Current, unresolved-ref skipping dispatch with canonical prefix, no-refs passthrough, and loop-continues- after-per-action-JIT-error. T5.3 wires Replace cascade. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): ApplyPlan replace cascade propagates new ProviderID T5.3 — locks the Replace-cascade contract via apply_replace_cascade_test.go and updates doReplace godoc to document the cascade hookup explicitly. Two scenarios: - ReplaceCascade_DependentCreateGetsNewParentID: [Replace parent, Create dependent] where dependent's Config has ${parent.id}; dependent's Create receives the new ProviderID. - ReplaceCascade_DependentReplaceGetsNewParentID: extends to Replace-on- Replace shape; dependent's post-Delete Create still sees the resolved parent.id, while its own Delete continues to target the OLD ProviderID via action.Current (JIT does not alter action.Current). The behavior was already operational after T5.2's loop-level jitsubst.ResolveSpec call: doReplace populates result.ReplaceIDMap inside iteration N, and the loop's pre-dispatch substitution at iteration N+1 sees the fresh entry. T5.3 adds the assertion + doc that locks this ordering as a contract; future refactors that move substitution out of the loop OR delay ReplaceIDMap population will break these tests loudly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plan SchemaVersion=2 when JIT substitution required T5.4 — runInfraPlan now stamps plan.SchemaVersion conditionally: - V1 (1) baseline when no plan action's resolved Resource.Config carries a JIT-style ${MODULE.field} or ${MODULE.id} reference. - V2 (2) when any action does — older wfctl binaries reading the persisted plan reject with the existing 'newer than supported' diagnostic at runInfraApply. Detection is centralized in jitsubst.HasModuleRefs (recursive walk over map[string]any / []any / string), gated by a simple regex that requires non-empty segments on both sides of the dot — plain ${VAR} env-var refs (no dot) do NOT trigger the bump, so the common operator secret-via-env workflow stays at V1. cmd/wfctl/infra.go gains: - infraPlanSchemaVersionV1 (=1) and infraPlanSchemaVersionJIT (=2) constants alongside the existing infraPlanSchemaVersion (=2, max readable). The 'max readable' constant ticks up with every schema bump; V1/JIT name the per-plan choice runInfraPlan makes. - planRequiresJITSubstitution(plan) helper that walks plan.Actions once via jitsubst.HasModuleRefs. Tests: - iac/jitsubst/jitsubst_test.go — 8 new HasModuleRefs cases (env-var is false, .field/.id are true, nested map/slice, nil-safe, malformed refs are false, mixed-string is true). - cmd/wfctl/infra_plan_schema_test.go — V1 baseline (env-var only), V2 for both .field and .id, V1 negative for env-var-only, and persisted-plan SchemaVersion=2 end-to-end (where T5.5's rejection has not yet landed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): reject persisted JIT-style plans (canonical path is apply-without-plan) T5.5 — runInfraPlan now refuses to write a plan.json via -o when the plan is JIT-style (SchemaVersion = infraPlanSchemaVersionJIT). The exact operator-facing error string is contract-stable: error: plan -o requires JIT-free config; this plan references ${MODULE.field} which only resolves at apply time. Use 'wfctl infra apply' (without --plan) for JIT-aware applies. Stdout-only emission (no -o) of a JIT-style plan is permitted — it's a preview, not a contract. The guard fires AFTER plan computation so the operator sees the plan table on stdout before the rejection at the persistence step. Tests in cmd/wfctl/infra_plan_jit_reject_test.go (4 cases): - exact-string match (the strict contract) - stdout-only JIT plan permitted (negative-control on the guard scope) - persisted non-JIT plan permitted (V1 happy path unchanged) - canonical-keyword substring match (operator-search-engine safety net) Removed T5.4's now-redundant TestInfraPlan_SchemaVersionV2_PersistedToFile- Matches — its happy path has been replaced by T5.5's strict rejection contract; SchemaVersion stamping correctness is still locked by the helper-direct tests in the same file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): T5.7 runtime-launch-validation — JIT subst + plan rejection W-5 Task T5.7: per the plan's 'Files: none' instruction, this is a documentation-only commit recording the runtime-launch-validation transcript against the built wfctl binary. # Step 1: Build $ GOWORK=off go build -o /tmp/wfctl-jit-validation ./cmd/wfctl (no output, exit 0) # Step 3: T5.5 persisted-JIT-plan rejection (build-binary verification) Fixture (infra.yaml): modules: - name: app type: infra.container_service config: env_vars: VPC_UUID: "${vpc.id}" DB_HOST: "${pg.private_ip}" $ wfctl infra plan -o /tmp/jit-validation/plan.json --config infra.yaml Infrastructure Plan — infra.yaml + create app (infra.container_service) Plan: 1 to create, 0 to update, 0 to destroy. error: error: plan -o requires JIT-free config; this plan references ${MODULE.field} which only resolves at apply time. Use 'wfctl infra apply' (without --plan) for JIT-aware applies. EXIT=1 The doubled 'error: error:' prefix is because cmd/wfctl/main.go's top-level error reporter prepends 'error: ' to every command failure (line 211: `fmt.Fprintf(os.Stderr, "error: %v\n", rootErr)`), AND the team-lead-specified literal also begins with 'error: '. Per implementer brief: 'Match exactly.' Flagging here for visibility — a follow-up could either drop the prefix from the literal or special-case main.go's wrapping. Not addressing in W-5. # T5.5 inverse: stdout-only JIT plan permitted (no rejection) $ wfctl infra plan --config infra.yaml Infrastructure Plan — infra.yaml + create app (infra.container_service) Plan: 1 to create, 0 to update, 0 to destroy. EXIT=0 # T5.4 V1 baseline: non-JIT config persisted to disk still works Fixture (infra-novars.yaml): modules: - name: app type: infra.container_service config: cidr: "10.0.0.0/16" $ wfctl infra plan -o plan-novars.json --config infra-novars.yaml Plan: 1 to create, 0 to update, 0 to destroy. Plan saved to /tmp/jit-validation/plan-novars.json EXIT=0 $ jq .schema_version plan-novars.json 1 ← V1 (T5.4 stamp logic working) # Step 2: apply with ${A.id} reference — covered by in-tree tests T5.7 plan §Step 2 specifies running 'apply against fixture with ${A.id} reference' against the built binary. wfctl infra apply requires a fully- configured iac.provider plugin (manifest, plugin.json, gRPC binary), so running this end-to-end against an ad-hoc fixture is non-trivial without W-7's conformance harness. The same code path is fully covered by: - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_TwoCreate_BSpec- ResolvesAID (T5.2 — basic create+create cascade) - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace- Cascade_DependentCreateGetsNewParentID (T5.3 — replace+create cascade) - iac/wfctlhelpers/apply_replace_cascade_test.go::TestApplyPlan_Replace- Cascade_DependentReplaceGetsNewParentID (T5.3 — replace+replace cascade) - iac/wfctlhelpers/apply_jit_test.go::TestApplyPlan_JIT_UnresolvedRef_- RecordsActionErrorAndSkipsDispatch (T5.2 — failure path) These exercise the SAME wfctlhelpers.ApplyPlan code path the binary invokes; the unit-test fake driver is functionally equivalent to a v2 plugin from ApplyPlan's perspective. A binary-level apply smoke test is deferred to W-7's conformance gate (which adds the DO smoke test against real-cloud fixtures). # Verification Tests pass: GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... ./platform/... ./cmd/wfctl/... ./module/... → all packages OK. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(iac): T5.5 review — exact plan-literal error string Spec-reviewer caught that the shipped error string in cmd/wfctl/infra.go diverged from the plan literal at docs/plans/2026-05-03-iac-conformance- and-replace.md §T5.5 line 2104. The kickoff brief I worked from substituted a wordier alternate string; team-lead confirmed the plan literal is the correct contract. Three fixes: 1. cmd/wfctl/infra.go:297 — replace fmt.Errorf literal with errors.New(<plan literal>). No leading 'error:' prefix — that's prepended by cmd/wfctl/main.go's top-level error wrapper, so the doubled 'error: error:' artifact in T5.7's runtime transcript is resolved as a side benefit. Switched to errors.New per spec-reviewer suggestion: avoids govet's no-format-verbs noise on the no-substitution case and is the canonical Go pattern for fixed-string sentinels. 2. cmd/wfctl/infra_plan_jit_reject_test.go:16 — expectedJITRejectError constant updated to the plan literal. Comment block expanded to document the literal's source + the leading-error-prefix nuance for future readers. 3. cmd/wfctl/infra_plan_jit_reject_test.go:125 — substring keyword list in TestInfraPlan_RejectionErrorContainsCanonicalKeywords updated to keys actually present in the new literal: 'JIT resolution', 'persisted plan.json', 'wfctl infra apply', '-o/--plan'. The exact-match test above is the strict contract; this one stays as the operator-search-engine safety net. Verified end-to-end via rebuilt wfctl binary against the same fixture from T5.7's transcript: $ wfctl infra plan -o plan.json --config infra.yaml Infrastructure Plan — infra.yaml + create app (infra.container_service) Plan: 1 to create, 0 to update, 0 to destroy. error: this plan requires JIT resolution; persisted plan.json is not supported. Run 'wfctl infra apply' directly without -o/--plan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): ADR 008 — JIT substitution at dispatch loop, not per-helper Records the architectural choice resolved during T5.3: jitsubst.ResolveSpec runs once at the wfctlhelpers.ApplyPlan dispatch loop (immediately before each dispatchAction call), NOT inside per-action helpers. doReplace populates result.ReplaceIDMap; the next iteration's pre-dispatch ResolveSpec consumes it. This honors the Replace-cascade contract via loop-ordering invariant rather than via an explicit substitution call inside doReplace. Plan §T5.3 specified inner-resolve in doReplace; T5.2's loop-level call already covered the cascade case. Threading syncedOutputs through dispatchAction → doReplace would have made the helper boundary leaky for one call site. Option 1 (test-only T5.3 + this ADR) chosen by team-lead over option 2 (inner-resolve rework) on 2026-05-04 after spec-reviewer escalation. Cascade contract is locked by apply_replace_cascade_test.go's two scenarios; this ADR ensures future refactors that move substitution out of the loop OR delay ReplaceIDMap population see the trade-off rather than rediscovering it via git bla…
intel352
added a commit
that referenced
this pull request
May 4, 2026
…W-6 of 12) (#532) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * feat(iac): --allow-replace flag for per-resource protected-replace opt-in W-6/T6.1: gate replace and delete actions targeting `protected: true` resources behind a per-resource opt-in flag at apply time. Without --allow-replace=<csv>, the apply errors before any provider Apply or wfctlhelpers.ApplyPlan dispatch with the design-spec literal ("resource %q is protected: true and would be %sd; pass --allow-replace=%s to override"). With the resource name listed in --allow-replace, the protection is bypassed for that resource only. Gate fires on both dispatch paths — live-diff (applyWithProviderAndStore) and --plan (applyPrecomputedPlanWithStore) — so the safety guarantee holds regardless of plan provenance. The protected flag is sourced from Resource.Config for replace actions and Current.AppliedConfig for delete actions (where platform.differ leaves Resource.Config empty). The allow-set is published via package-level applyAllowReplaceSet (matching the computeInfraPlan / applyV2ApplyPlanFn seam pattern) and reset to nil at the top of every runInfraApply via deferred cleanup — override authorization must not leak across runs. T6.2 will swap this fail-fast for an aggregated multi-blocker report with a copy-paste --allow-replace=name1,name2,... value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply batch-reports protected-replace blockers with copy-paste flag W-6/T6.2: validateAllowReplaceProtected now walks the entire plan and aggregates ALL replace/delete blockers (resources annotated `protected: true` and not in --allow-replace) into a single error, instead of failing fast on the first one. The operator sees the complete blocker set in one apply attempt and gets a pre-formatted copy-paste flag value to authorize them all at once: plan would require destructive action on N protected resource(s): <name1> (replace) <name2> (delete) ... to authorize, re-run with: --allow-replace=<name1>,<name2>,... Names and the csv preserve plan-action declaration order so output is deterministic. The single-blocker case still emits the batch format — operator-facing UX is consistent regardless of blocker count, which matters for automation pinning the copy-paste flag pattern. Per plan T6.2 "(or apply-time check; pick one — apply is cleaner since plan output already shows all actions)" — the gate stays in cmd/wfctl/infra_apply.go rather than platform/differ.go::ComputePlan. ComputePlan remains plugin-agnostic; the protected-resource policy is a wfctl-side operator-experience concern. T6.1's single-line error literal is superseded; T6.1 tests are updated to assert on the operator-facing essentials (resource name + copy-paste flag value) rather than the legacy literal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document --allow-replace flag W-6/T6.4: add a dedicated `infra apply` subsection to docs/WFCTL.md covering the protected-resource gate, the new --allow-replace=<csv> override, and its relation to the older --allow-protected-prune flag. Includes the canonical aggregated-blocker error format from T6.2 so operators know what to expect (and what to copy-paste) when the gate fires, plus three runnable examples (standard apply, --plan apply, authorized Replace cascade). Per W-4 team-lead Option-3, mdformat is waived; markdown-link-check is the meaningful baseline. WFCTL.md links all resolve clean against the local repo (3 internal/external refs). Pre-existing dead links elsewhere in docs/ are unchanged by this commit and out of W-6 scope. Verification: markdown-link-check docs/WFCTL.md → 0 errors GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... \ ./platform/... ./cmd/wfctl/... ./module/... → all pass Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(merge): restore T6.1 + T6.2 helpers lost during cascade-merge with -X theirs * fix(iac): R1 review — drop redundant ComputePlanVersionDeclarer assertion at apply call site (Copilot review) DispatchVersionFor is documented to centralise the type-assertion plus the default-to-v1 fallback so call sites pass the raw provider value rather than re-asserting the optional interface. The v2 dispatch condition reverts to the canonical form: if wfctlhelpers.DispatchVersionFor(provider) == wfctlhelpers.DispatchVersionV2 { ... } No behavior change: a provider that doesn't implement the interface, or returns anything other than "v2", still routes to the legacy v1 provider.Apply path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
intel352
added a commit
that referenced
this pull request
May 4, 2026
…9) (#534) * feat(iac): add IaCPlan.SchemaVersion + InputSnapshot + PlanAction.ResolvedConfigHash + DriftEntry type * feat(iac): add inputsnapshot.Compute + Snapshot + NewTolerantEnvProvider with preservation sentinel * feat(iac): wfctl infra plan writes InputSnapshot to plan.json * feat(iac): ComputePlan sets PlanAction.ResolvedConfigHash * feat(iac): wfctl infra plan warns when plan.json not in .gitignore * feat(iac): typed ErrEnvVarChanged sentinel + plan-stale diagnostic + ComputeDrift sentinel-honoring * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): plugin manifest gains iacProvider.computePlanVersion (default v1) * fix(iac): T3.0 review — sync.Once-guarded schema cache + tighter iacProvider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts. * feat(iac): add refreshoutputs.Refresh — read-only state output refresh T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add wfctl infra refresh-outputs subcommand T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]` reads live Outputs for each resource already in state and persists any field-level changes back to the state backend. Read-only at the cloud level — never invokes Update or Replace. Discovers iac.provider modules in the config (with per-env resolution), groups state entries by their owning iac.provider module (ProviderRef-first, falling back to provider type when exactly one module of that type exists), loads each provider once, calls iac/refreshoutputs.Refresh per group, and SaveResource()s any state whose Outputs map changed. When the resolved config has no usable iac.provider module for the requested env, emits the literal error refresh-outputs: provider not configured for env "<env>" verbatim per `fmt.Errorf("refresh-outputs: provider not configured for env %q", env)`. T2.7's runtime-launch-validation asserts against this exact line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): apply-time refresh-outputs pre-step (opt-in via WFCTL_REFRESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(iac): concurrency stress test for refreshoutputs.Refresh T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh with 100 fake resources at Concurrency=8 and asserts: 1. No deadlock (10s watchdog around the call). 2. Read called exactly once per ProviderID (atomic per-ID counter). 3. Every refreshed state carries the live Outputs map — no write-into-wrong-slot bug under concurrency. 4. Concurrent in-flight peak between 2 and the requested cap, proving both that parallelism happened AND that the semaphore enforced its limit. The countingDriver introduces a 5ms sleep per Read so the bounded pool actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well under the 10s watchdog). Test runs ~1.5s wall. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(wfctl): document infra refresh-outputs subcommand T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md: - New row in the Command Tree mermaid graph. - New row in the infra Action table. - Dedicated #### subsection with usage, flag table, behavior summary, literal-error contract (load-bearing per T2.7), apply-time pre-step semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three representative examples. See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md records the T2.3 plan-deviation (ParseBool vs plan-literal presence check) that the docs in this commit accurately reflect. Verification — plan §T2.6 line 1090 invocation `mdformat --check docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +` ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check 3.14.2 (npm): $ mdformat --check docs/WFCTL.md Error: File "docs/WFCTL.md" is not formatted. exit=1 This failure is PRE-EXISTING. Verified by checking out the file at the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning mdformat against it: identical error. docs/WFCTL.md has never been mdformat-formatted in this repo. Reformatting the entire file is out of scope for T2.6 (would introduce a multi-thousand-line unrelated diff). T2.6's own additions follow the existing in-file conventions exactly. $ markdown-link-check docs/WFCTL.md FILE: docs/WFCTL.md [✓] https://github.com/GoCodeAlone/workflow [✓] #build-ui [✓] mcp.md 3 links checked. exit=0 docs/WFCTL.md has zero broken links — including the new refresh-outputs section. The directory-wide scan reports 7 broken links in unrelated files (self-improvement-tutorial.md, getting-started.md, etc.); all are pre-existing and out of scope. T2.7 runtime-launch-validation transcript (folded into this commit body per the "Files: none new" plan note for T2.7): $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl exit=0 $ /tmp/wfctl infra refresh-outputs --help Usage of infra refresh-outputs: -c string Config file (short for --config) -concurrency int Maximum concurrent Read calls (default 8) -config string Config file -e string Environment name (short for --env) -env string Environment name (resolves per-module overrides) exit=0 $ cat /tmp/t27-fake.yaml modules: - name: state-store type: iac.state config: backend: filesystem directory: /tmp/t27-fake-state $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging error: refresh-outputs: provider not configured for env "staging" exit=1 No panic, no stack trace. Stderr line is the verbatim literal pinned by T2.7 (plan line 1098), produced by T2.2's fmt.Errorf("refresh-outputs: provider not configured for env %q", env) at cmd/wfctl/infra_refresh_outputs.go:49. PR W-2 mandate (plan line 1101): $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s Manual smoke against staging-PG: not run — no staging-PG available in this worktree environment. Plan line 1102 marks this "if available", so deferring to the operator landing the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(adr): record WFCTL_REFRESH_OUTPUTS ParseBool semantics deviation from plan §T2.3 ADR 006 — formalises the spec-vs-quality-review trade-off recorded during W-2 T2.3 review: - Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`. - Code-reviewer flagged this as a foot-gun (=0 mis-enables). - Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses strconv.ParseBool so falsey values explicitly disable. - Spec-reviewer accepted post-hoc and requested this ADR per superpowers:recording-decisions. - Team-lead approved option-1 (approve-as-is + follow-up ADR) over a plan revert; provenance recorded in the ADR itself. Captures the rejected alternative, the rationale, references back to the plan spec, the implementation site, the pinning test, and the operator-facing docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(iac): add ApplyResult.InitialInputSnapshot + InputDriftReport + ReplaceIDMap fields * feat(iac): add wfctlhelpers.ApplyPlan skeleton (4-action dispatch) * fix(iac): T3.0.4 review — correct ReplaceIDMap key direction + lock omitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}. * fix(iac): T3.1 review — strengthen Replace coverage + ctx-cancel + driver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once. * fix(wfctl): drop unused crypto/sha256 + encoding/hex from infra_apply_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change. * feat(iac): in-process apply unconditional drift postcondition (panic-safe + tolerant of mid-apply env unset) * feat(iac): doCreate honors UpsertSupporter for ErrResourceAlreadyExists recovery * feat(iac): doUpdate + doDelete actions * feat(iac): doReplace populates ApplyResult.ReplaceIDMap * feat(iac): add diff cache with LRU eviction + corruption recovery * fix(iac): T3.1.5/T3.2/T3.3 review minors — helper consistency, type-assertion coverage, prefix policy Three independent review-fix bundles: T3.1.5 (commit f5a7ce9 review — Minor 1): - apply_postcondition_test.go::fingerprint now delegates to inputsnapshot.Compute, mirroring cmd/wfctl/infra_apply_plan_test.go's fingerprintForTest. Drops the inline crypto/sha256 + encoding/hex imports. Future Compute-algorithm changes (prefix length, hash) now re-align both test files automatically — keeps the cross-package fixture parity guaranteed. T3.2 (commit 0c30eec review — Minors 1 + 2): - apply_create_test.go gains TestApplyPlan_Create_AlreadyExists_DriverDoesNotImplementUpsertSupporter + alreadyExistsBareDriver + bareDriverProvider. Covers the `!ok` arm of doCreate's `us, ok := d.(interfaces.UpsertSupporter)` type assertion — distinct code path from the existing ok-but-SupportsUpsert==false test. Compile-time premise check ensures the test stays meaningful if a future refactor lifts SupportsUpsert onto the embedded fakeDriver. - apply.go::doCreate godoc tightens the errors.Is contract to make the in-package vs at-the-ActionError-boundary distinction explicit. External callers reading [interfaces.ApplyResult].Errors lose errors.Is matching at the string-conversion boundary; the canonical "upsert: read after conflict:" prefix is the discriminant. Also documents the single-pass recovery contract (recovery Update that itself returns ErrResourceAlreadyExists surfaces unchanged rather than retriggering the recovery loop). T3.3 (commit a3fc98b review — Minors 1 + 2 + 4): - apply_update_delete_test.go::TestApplyPlan_Update_NilCurrentIsHandledDefensively now also asserts len(result.Resources) == 1 on the success path — locks the resource-append contract so a regression that skipped the append on nil Current would fail loudly. - apply_update_delete_test.go gains parallel TestApplyPlan_Delete_NilCurrentIsHandledDefensively. Same defensive shape: empty ProviderID flows to driver, no synthesized precondition error, deleteCount==1 (latent bug-fix from design — the v1 path silently skipped Delete; v2 must call it). - apply.go package godoc adds a "Per-action error-prefix policy" section documenting the decompose-then-prefix rule (bare on simple actions; "upsert: ..." / "replace: ..." on decomposing paths) so future reviewers don't suggest "let's add prefixes for consistency." * fix(iac): T3.4 review — ctx-cancel guard between Delete and Create in doReplace Addresses code-reviewer Minor 1 (worth-doing) on commit b17d703. Without the guard, a Ctrl-C / SIGTERM arriving exactly between the Delete and Create driver calls of a Replace action would still trigger the Create — surprising operators who expected fast interruption mid-Replace. The half-replaced state is still the documented recovery surface (Delete happened, Create did not, so ReplaceIDMap stays empty), but cancellation now propagates as soon as it is observable. Failure shape: return fmt.Errorf("replace: canceled after delete: %w", err) Wrapped to preserve the context.Canceled / context.DeadlineExceeded sentinel for in-package errors.Is matching. The "replace: canceled after delete:" string prefix is the discriminant for callers reading result.Errors at the public API surface. New test: TestApplyPlan_Replace_CtxCancelAfterDelete_SkipsCreate + cancelOnDeleteFakeProvider scaffolding. Driver's Delete invokes a captured context.CancelFunc as a side-effect, simulating exact post-Delete cancellation. Asserts Delete ran, Create did NOT, ReplaceIDMap stays empty for the resource, error has the canonical prefix. Code-reviewer Minor 3 (ctx-cancel mid-Replace test) folded into this commit since it's the symmetric coverage for the new guard. Other Minors (2/4/5/6/7) intentionally skipped — all documentary or out-of-scope per reviewer guidance. * docs(iac): document diffcache + set WFCTL_DIFFCACHE=:memory: in CI workflows T3.5 lifecycle constraint #4 (rev3) follow-up — addresses spec-reviewer finding on commit 8774205. Two plan-mandated deliverables that the T3.5 commit's `git add` line omitted: 1. **docs/WFCTL.md gains a "Diff Cache" section.** Documents the cache as an amortization-only optimization (not correctness mechanism), the WFCTL_DIFFCACHE backend selection (disabled / :memory: / filesystem default), the LRU eviction caps (1024 entries / 64 MiB), the corruption recovery contract (silent eviction + once-per-process info log), the plugin-downgrade safety property, and the rev3 "all CI workflows set :memory: explicitly" statement plus a list of the affected workflow files. 2. **WFCTL_DIFFCACHE=:memory: at workflow-level env in CI.** Set in every workflow that runs `go test` or `wfctl`: - .github/workflows/ci.yml (test + lint jobs) - .github/workflows/benchmark.yml (performance benchmarks) - .github/workflows/pre-release.yml (pre-release tests) - .github/workflows/release.yml (release tests) - .github/workflows/dependency-update.yml (post-update test gate) Workflow files that don't invoke go test / wfctl are not modified (codeql.yml, copilot-setup-steps.yml, create-release.yml, helm-lint.yml, osv-scanner.yml, test-dispatch.yml). Each workflow gets a brief inline comment citing ci.yml as the canonical rationale + the T3.5 rev3 lifecycle constraint reference. Per spec-reviewer guidance: kept the original T3.5 package-code commit (8774205) untouched and stacked this docs+CI commit on top. YAML syntax verified on all 5 modified workflows. * fix(iac): T3.5 review minors — atomic Put + godoc tightening + test cleanup Addresses 5 of 7 code-reviewer minors on commits 8774205 + f80a060: - Minor 1 (atomic Put, worth-doing production improvement): Put now uses write-temp-then-rename. POSIX rename(2) is atomic on the same filesystem, so a process crash mid-write leaves either the prior contents or the new contents — never a partial write. The corruption-recovery path in Get is still the safety net for cross- filesystem renames or NFS edge cases that don't honor atomicity. In production this means corruption recovery essentially never fires from native crashes. The .json extension filter in maybeEvict already excludes .tmp orphans, so no additional filtering needed. On rename failure, best-effort cleanup of the temp file. - Minor 3 (userCacheDir godoc): tightened the platform-conventions language. Linux honors XDG_CACHE_HOME; macOS uses ~/Library/Caches; Windows uses %LocalAppData%. The previous comment overstated XDG honoring on all platforms. - Minor 4 (Key JSON tags vs keyFingerprint): added a godoc note explaining the tags are for log/transcript serialization, not cache keying — keyFingerprint uses NUL-separated string concat, not JSON marshaling. Future readers checking the fingerprint shape now have the right pointer. - Minor 5 (vestigial sanity check): dropped the `os.Stat(filepath.Join(dir, "*.json"))` literal-glob check at the end of TestCache_EvictionTouchesNothingWhenUnderCap. The check was meaningless — no code path creates a file with `*` in its name. Likely leftover from earlier debugging. Removing it lets us drop the now-unused `os` import. - Minor 6 (mtime resolution test comment): added a paragraph to TestCache_LRUEvictionByCount's godoc explaining the ≤1ms mtime resolution assumption and listing the supported filesystems (ext4/btrfs/xfs/APFS/NTFS — the CI matrix). Coarse-mtime filesystems (FAT32, SMB) are explicitly out of scope. Skipped per reviewer guidance: - Minor 2 (maybeEvict O(N) scan on every Put): "skeleton-class concern; acceptable for W-3a scope." - Minor 7 (Put error log-silent): "the cache-as-amortization framing in the package godoc already sets the expectation." * refactor(iac): ComputePlan signature accepts ctx+provider (no behavior change) * feat(iac)!: wfctl infra plan now loads provider for Diff dispatch (BREAKING: fails on plugin-load error) W-3b T3.6b. Adds computePlanForInfraSpecs which discovers iac.provider modules in the config, groups desired specs by `provider:` field, loads each via the same loader the apply path uses, and dispatches platform.ComputePlan per group so the v2 Diff contract (T3.6e) operates against a real plugin process at plan time, not just at apply time. BREAKING: configs declaring at least one iac.provider module now require the plugin process to load successfully. Plugin-load failure exits non-zero with the literal error documented in the v0.21.0 CHANGELOG. There is no --no-provider escape hatch (rev3 YAGNI fix per cycle-2); operators who need pure offline validation should use `wfctl validate`. Configs without any iac.provider module fall back to the legacy ConfigHash compare path so minimal/legacy fixtures and out-of-band scripts continue to work. cmd/wfctl/infra_apply.go:350 receives a temporary nil provider so the package compiles; T3.6c replaces nil with the live provider handle. * feat(iac): wfctl infra apply threads provider into ComputePlan * test(iac): update cross-package fakes for ComputePlan provider arg W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown). * feat(iac): ComputePlan dispatches Diff per resource; emits replace action when ForceNew or NeedsReplace W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract (rev3 fix for the cycle-2 self-contradiction: test + impl ship in the same SHA, no t.Skip placeholder). ComputePlan now classifies each existing resource via p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the per-resource Diff calls in parallel under errgroup with a bounded worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override clamped 1..32). Action emission: - replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew is true (the latter closes design issue C — pre-W-3b ForceNew was silently downgraded to update); - update, when DiffResult.NeedsUpdate is true and replace did not fire; - skip, when neither flag is set. Net-new resources still emit create without dispatching Diff; resources removed from desired still emit delete in reverse-dep order. Nil-tolerance contract preserved: if p is nil, or if p.ResourceDriver(typ) returns (nil, nil) for a resource type, ComputePlan falls back to the legacy ConfigHash compare for the affected resources. Replace cannot be expressed via the legacy path — callers needing Replace must supply a provider whose drivers implement Diff. Per-resource driver.Diff errors propagate via errgroup so operators see the underlying cause (rate limit, network, etc.). Test surface (platform/differ_replace_test.go, NEW; ships in this commit per the rev3 atomicity rule): - TestComputePlan_NeedsReplaceEmitsReplaceAction - TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace - TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate - TestComputePlan_DiffReturnsNoChanges_EmitsNothing - TestComputePlan_NilProvider_FallsBackToConfigHash - TestComputePlan_NilDriver_FallsBackToConfigHash - TestComputePlan_DriverDiffError_PropagatesAsError platform/fake_provider_test.go extended with newFakeProviderWithDiff helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse to iac/iactest until cache_test in T3.6f also depends on the helper — deferred to keep T3.6e's diff bounded). Carry-forward notes addressed: - T3.6a note 1: dropped unused *testing.T param from newFakeProvider(). - T3.6a note 2: added compile-time interface conformance asserts on fakeProvider and fakeDriver. - T3.6a note 3: nil-provider AND nil-driver guards baked in; covered by two explicit tests. - T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based phrasing. cmd/wfctl test fakes updated to match the new dispatch model: - readDriver.Diff now returns NeedsUpdate=true (the adoption tests rely on the post-adopt ComputePlan emitting update; pre-W-3b that was the ConfigHash compare's job). - refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of panicking — the refresh-outputs test fixture only exercises Read. * perf(iac): ComputePlan consults diffcache before invoking provider.Diff W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches) * test(iac): T3.6e review — channel-gated parallel-dispatch in-flight test (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell. * fix(iac): T3.6f review — pluginVersionKey uses sha256 instead of @ separator (Copilot review) Code-reviewer flagged the T3.6f cache PluginVersion key as fragile: composing via `p.Name() + "@" + p.Version()` would let two genuinely-different providers — `("foo", "bar@1.0")` vs `("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"` and serve each other's cached DiffResults. Today's registered providers (digitalocean, dockercompose, mock) don't carry `@` in either field so no observed bug, but there's no compile-time guard against a future provider declaring `do@enterprise` or similar. Replace with sha256(name + "\x00" + version) — fixed-length, NUL is invalid in both fields by Unicode convention, ambiguity-free. Matches how configHash already keys per-config inputs. Three regression tests pin the fix: - TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug) - TestPluginVersionKey_NilProvider (defensive — empty key, no panic) - TestPluginVersionKey_Stable (deterministic across calls) Pure additive — no change to any existing test outcome. The cache re-keys against the new digest, which means any DiffResults persisted under the old `name@version` keys will miss on the next plan and re-Diff naturally (cache misses are correct by design). * feat(iac): apply path branches on plugin manifest's iacProvider.computePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch. * fix(iac): T3.7 review — drift report on partial failure + Path B coverage (Copilot review) Code-reviewer flagged 3 IMPORTANT items in T3.7: 1. Comment/code mismatch on drift-report timing. The comment promised "Run on success or partial failure" but the code gated on `err == nil` (success only). The contract the comment described is the more useful behavior — operators most need the stale-input diagnostic when an apply fails ("which input went stale during the failed apply?"). Without it, the failure error and the "what changed" context are disconnected. Fix: gate on `result != nil` instead of `err == nil`. printDriftReportIfAny already no-ops on empty/nil reports so unconditional-on-result-non-nil is safe. 2. No test for the drift-on-partial-failure path. Added TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure which has applyV2ApplyPlanFn return (resultWithDrift, applyErr) and asserts both: (a) the err propagates, AND (b) the drift report still reaches the writer. 3. Optional-interface coverage gap. Two semantically-different "v1" paths exist: - Path A: provider doesn't implement ComputePlanVersionDeclarer at all → type-assert fails → legacy. Covered by v1RecordingProvider. - Path B: provider implements interface but ComputePlanVersion() returns "" (the realistic mid-transition state for v1 plugins after the SDK update lands but before they migrate) → type- assert succeeds, DispatchVersionFor returns "v1" → legacy. Was untested. Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty using iactest.NoopProvider{DispatchVersion: ""}, which always implements the interface (the method exists on the type). Pins Path B specifically. Pure correctness fixes — no signature change, no behavior change for the success-only or v1-RecordingProvider paths. * fix(iac): map[string]bool drops gRPC args silently — sensitiveToAny conversion cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing current.Sensitive (map[string]bool) directly into the args map. structpb.NewStruct rejects map[string]bool — it accepts map[string]any only — and the upstream plugin/external/convert.go::mapToStruct returns &structpb.Struct{} on err rather than surfacing the typing failure. Result: every Diff dispatch over gRPC for any provider whose ResourceOutput.Sensitive map was non-nil (or even an empty map[string]bool{}) silently observed args=map[] on the plugin side. v1 plugins never tripped this because v1 dispatches IaCProvider.Plan server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's manifest-driven dispatch) surfaces it immediately on the first existing-resource Diff call. Fix: convert via sensitiveToAny() to the map[string]any shape NewStruct accepts. Returns nil for empty/nil input so the wire stays trim-friendly. Bug discovered during W-3b T3.9 runtime-launch validation against an out-of-band gRPC stub plugin; the canonical T3.9 in-tree test ships separately as a loader-seam Go integration test (per team-lead direction + plan precedent at plugin/sdk/iaclint/). Will surface in T3.10's PR description as a third incidentally-fixed-by-W-3b bug. * test(iac): T3.9 runtime-launch-validation via loader-seam (ADR 007) W-3b T3.9. Exercises the full v2 dispatch chain — config parse → state load → provider load (via the resolveIaCProvider seam from T3.6c) → ComputePlan Diff dispatch (T3.6e/f) → wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace decomposition into Delete + Create → printDriftReportIfAny — by injecting a Go in-process v2-declaring provider through the package- level seam. No out-of-process gRPC binary or plugin.json under internal/testdata/. # ADR 007 — non-trivial deviation from plan-literal Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin in internal/testdata/stub-provider/." Team-lead authorized switching to in-tree loader-seam validation per: 1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go test-helper package, not a runnable binary. 2. Real-gRPC runtime validation lands in P-DO when DO sets computePlanVersion: v2 in its plugin.json. 3. Hours-of-stub-plumbing cost doesn't earn proportional coverage vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end. 4. W-7 conformance suite is the recurring cross-PR gRPC harness. Full reasoning + considered alternatives in docs/adr/007-t3-9-runtime-validation-via-loader-seam.md. # Tests - TestApply_V2_LoaderSeamDispatch_EndToEnd: - Writes a real config + filesystem state seeded with vpc region=nyc3 (under iacStateRecord shape). - Sets desired region=nyc1. - Substitutes the resolveIaCProvider seam to return a Go provider that declares v2 + has a driver returning NeedsReplace=true. - Calls applyInfraModules (the production runInfraApply entrypoint) and asserts driver.diffCount == 1, deleteCount == 1, createCount == 1, plus exact identity of the deleted ProviderID and the created Config["region"]. - TestApply_V2_LoaderSeam_DriftReportPrinted: - Same loader-seam setup + applyV2ApplyPlanFn substitution returning InputDriftReport with one entry. - Captures os.Stderr and asserts the FormatStaleError block reaches the operator (drift-report wiring T3.7 added is end-to-end alive in the v2 loader path). # Test infrastructure - cmd/wfctl/main_test.go: NEW TestMain forces WFCTL_DIFFCACHE=disabled so the platform diffcache (process- scoped via getDiffCache lazy init) doesn't observe stale entries from a developer's local ~/.cache/wfctl/diff/ as false-positive cache hits skipping driver Diff dispatch. Same pattern as platform/main_test.go from T3.6f. Caught during dev when the end-to-end test failed in the full cmd/wfctl test run but passed in isolation. # Bug-class context The Option-A draft (real gRPC binary; not retained on this branch per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1 (remoteResourceDriver.Diff sensitiveToAny conversion). The bug exists independent of which T3.9 option ships; the fix is in tree and surfaces in T3.10's PR description as the third W-3b incidentally-fixed bug. * docs(pr): note bugs incidentally fixed by W-3b W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan. * docs(adr): amend ADR 007 with full T3.9 decision history (5 transitions) Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected." * feat(iac): add optional ProviderPlanner interface for v2 plugins (rev10 user override) * ci(iac): cross-plugin build gate + ADR 009 (ProviderPlanner included per user override) * docs(iac): document ProviderPlanner adapter author guide * docs(adr): restore plan-literal Context para 1 in ADR 009 (T9.2 spec-review fix) * docs(iac): point ProviderPlanner author guide at real ProviderIDValidator precedent (T9.3 quality fix) * ci(iac): add fail-fast=false, concurrency, go.mod/go.sum paths to cross-plugin gate (T9.2 quality fix) * fix(iac): R2 review — correct ProviderPlanner doc/ADR/test/CI findings (Copilot review) Six Copilot inline findings + CodeQL workflow-permissions warning: 1. docs/iac/providerplanner.md: ComputePlan in v0.21.0 dispatches driver.Diff directly (in platform/differ.go); it does NOT call IaCProvider.Plan. The reverse is true (Plan delegates to ComputePlan in some implementations). Updated the call-chain description and the illustrative dispatch-site code block to reference the actual file (platform/differ.go) so adapter authors don't follow the wrong call chain. 2. docs/adr/009: replaced the personal email reference with "the workspace owner" so ADR provenance doesn't embed PII. 3. interfaces/iac_provider_planner_test.go: now actually verifies the additivity claim by reusing the package's existing mockProvider as the negative case — runtime assertion confirms mockProvider does NOT satisfy ProviderPlanner. Moved file to interfaces_test package to share fixtures. 4. .github/workflows/cross-plugin-build-test.yml: explicit `permissions: contents: read` (CodeQL workflow-permissions guidance); added `env: GOPRIVATE/GONOSUMCHECK` matching ci.yml + codeql.yml so downstream plugin builds resolve github.com/GoCodeAlone/* deps consistently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
W-1 of the 12-PR IaC root-cause + provider conformance plan series. Adds the
IaCPlan.SchemaVersion+IaCPlan.InputSnapshot+PlanAction.ResolvedConfigHash+ standaloneDriftEntrytype. Wires per-key plan-stale diagnostic into the persisted---planapply path. Addsiac/inputsnapshot/package (Compute + Snapshot + NewTolerantEnvProvider + preservedFingerprint sentinel + ComputeDrift + FormatStaleError + ErrEnvVarChanged sentinel). Warns when plan.json is not in .gitignore.Plan reference:
docs/plans/2026-05-03-iac-conformance-and-replace.mdrev10 (commit ondesign/iac-conformance-and-replacebranch). 9 adversarial-review cycles; user ratified Option C (W-9 includes ProviderPlanner; ADR 007).What ships
6 commits in dependency order:
Out of scope (deferred to later PRs)
InitialInputSnapshot,InputDriftReport,ReplaceIDMap) ship in W-3a/T3.0.4wfctlhelpers.ApplyPlanbody ships in W-3a/T3.1Critical design constraints (rev1-rev10 history)
preservedFingerprintis UNEXPORTED;NewTolerantEnvProvideris the only sanctioned sentinel injectorComputeDrifthonors the sentinel via in-package access; tests referenceunsetFingerprintPlaceholderconstant (not the literal"(unset)"string) — closes cycle-7 brittle-test fixOSEnvProvider(drift detection desired); in-process path will useNewTolerantEnvProviderin W-3a (preservation desired for sub-action env-cleanup case)wfctlhelpers.ApplyPlanships in this PR — the helper package lands in W-3a/T3.1Test plan
GOWORK=off go test ./interfaces/... ./iac/inputsnapshot/... ./platform/... ./cmd/wfctl/...PASSwfctl infra plan -o plan.jsonagainst a config with${VAR}references; inspect plan.json containsinput_snapshotmap; .gitignore warning surfaces if entry absent🤖 Generated with Claude Code