feat(iac)!: Replace dispatch — ComputePlan calls Diff + apply branches on manifest computePlanVersion (W-3b of 12)#528
Merged
Merged
Conversation
…olvedConfigHash + DriftEntry type
…der with preservation sentinel
…ComputeDrift sentinel-honoring
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rovider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts.
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ReplaceIDMap fields
…mitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.
…iver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once.
…_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change.
…safe + tolerant of mid-apply env unset)
…cache + bypass-side-effect-free (Copilot review round 4) Three real fixes from Copilot's re-review of PR #528 round 4: 1. **loadErr chain lost across runInfraPlan re-wrap (errors.Is/As)** — computePlanForInfraSpecs returned `failed to load plugin %q: %v; ...` (using %v), losing the underlying error. After runInfraPlan re-wraps with `compute plan: %w`, callers could not errors.Is / errors.As against the original loader failure (e.g. to differentiate "plugin binary missing" from "plugin crashed during handshake"). Switch the inner wrap to %w. Rendered text is identical to %v. New pin: TestRunInfraPlan_FailsLoudOnPluginLoadFailure now asserts `errors.Is(err, loadErr)` reaches the sentinel through both wrap layers. 2. **getDiffCache called even on the empty-ProviderID bypass path** — classifyModification was calling getDiffCache() unconditionally, which (under the old per-call mutex) acquired the lock, and (under any backend-init pattern) would eagerly construct the filesystem cache backend at ~/.cache/wfctl/diff/ on the operator's machine even for resources that bypass the cache. Move the getDiffCache call inside the `if cacheable` branch so the bypass path is fully side-effect free. Round-3 already pinned the bypass behavior via TestComputePlan_EmptyProviderID_BypassesCache. 3. **Per-call sync.Mutex contention on getDiffCache hot path** — Under ComputePlan's parallel Diff fan-out (planDiffConcurrency() workers), the per-call mutex on getDiffCache was contention on every cache.Get / cache.Put, especially on cache hits where the Get itself is cheap. Refactor to sync.Once for one-time init + atomic.Pointer[diffcache.Cache] for lock-free reads. Subsequent reads are just an atomic.Load (and a typed deref). The test-swap helper setDiffCacheForTest is updated to Store/Restore directly on the atomic; cleanup seeds a fresh default when there was no prior value (so subsequent tests in the binary still observe a working cache). Addresses Copilot inline comments on PR #528 (round 4): - cmd/wfctl/infra_plan_provider.go:124 (%v → %w) - platform/differ.go:235 (getDiffCache eager call on bypass path) - platform/differ.go:405 (per-call mutex on hot path) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... ./module/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rtion + cache nil-DiffResult as zero-value (Copilot review round 5) Two real fixes from Copilot's re-review of PR #528 round 5 (a third finding, plan/apply discovery duplication, is filed as a follow-up issue rather than addressed in-PR to keep W-3b scope-locked). 1. **DispatchVersionFor docstring vs signature mismatch** — The helper claims to centralize the type assertion + non-implementer defaulting, but its parameter type was `ComputePlanVersionDeclarer`, forcing every call site to type-assert externally. Change the signature to accept `any` and perform the type assertion inside; non-implementers + nil now both return "v1" inside the helper as the docstring already promised. Param is `any` (not interfaces.IaCProvider) to keep the helper package import-free of the engine's interfaces package and to keep non-engine call sites (tests, stubs) frictionless. Updated the only production call site (cmd/wfctl/infra_apply.go) to drop the external type-assert. 2. **Cache no-op when driver.Diff returns (nil, nil)** — The cache.Put was guarded by `fresh != nil`, so providers using the nil-as-no-op convention (a documented option in the (DiffResult|nil, error|nil) return shape) re-Diffed on every ComputePlan call — undermining the cache contract for that whole class of providers. Cache a zero-value DiffResult on (nil, nil) returns; classifyModification's downstream switch already treats zero-value the same as nil (no plan action), so the semantic is preserved while the cache stays effective. New pin: TestComputePlan_NilDiffResult_CachesAsZeroValue verifies that the second ComputePlan against unchanged inputs is served from cache (driver.Diff invoked exactly once across two calls). 3. **Plan/apply provider-discovery duplication** (Copilot finding R5-C, not addressed in this PR) — computePlanForInfraSpecs duplicates the iac.provider discovery + grouping logic in applyInfraModules. Per workspace memory feedback_implementer_scope_bleed, refactoring to a shared helper is a separate task: the duplication exists pre-W-3b (apply was the original; plan was added in W-3b mirroring it intentionally), and the extraction touches code paths W-3b's test plan does not cover. Filed as follow-up rather than expanding W-3b's blast radius. Documented in PR description. Addresses Copilot inline comments on PR #528 (round 5): - iac/wfctlhelpers/dispatch.go:41 (signature vs docstring mismatch) - platform/differ.go:265 (cache write skipped on (nil, nil)) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closed
4 tasks
4 tasks
…laims (Copilot review round 6) Two doc-comment accuracy fixes from Copilot's re-review of PR #528 round 6 — both surfaced by/exposed in the round-5 changes: 1. **findIaCPluginDir docstring referenced wrong helper** — Round 5 changed wfctlhelpers.DispatchVersionFor to take `any` (a provider value), but findIaCPluginDir's docstring still told callers to pass the returned `computePlanVersion` string through DispatchVersionFor. That call wouldn't type-assert to ComputePlanVersionDeclarer (a string isn't a provider) and would silently default to "v1". Replaced with the correct pattern: string-equality against wfctlhelpers.DispatchVersionV2 at this loader-level seam where only the raw string is in hand. Includes example snippet. 2. **DispatchVersionFor docstring overstated the validation guarantee** — Claimed plugin/sdk.ParseManifest schema-validation means the dispatch only sees {"v1", "v2", ""}. True for callers that load via ParseManifest, but cmd/wfctl/deploy_providers.go's findIaCPluginDir / readIaCPluginComputePlanVersion path uses a minimal json.Unmarshal with NO schema validation — so unknown values CAN reach DispatchVersionFor at runtime. Updated the docstring to flag this honestly and call out that the default-to-v1 behavior is the safety net for those paths (callers must not rely on the validation guarantee). Doc-only; no code change. All packages still build + vet cleanly. Addresses Copilot inline comments on PR #528 (round 6): - cmd/wfctl/deploy_providers.go:107 (wrong helper referenced) - iac/wfctlhelpers/dispatch.go:18 (overstated validation guarantee) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ew round 7) The first table case had `in: ""` and used `tc.in` directly as the t.Run subtest name. Go's testing package silently rewrites empty subtest names to "#00", which is unique enough to run but masks the case identity in -v output and failure reports. Add a `name` field to the table struct and use stable descriptive labels (empty, non_numeric, negative, zero, one, eight, thirty_two, thirty_three_clamped_to_max, one_hundred_clamped_to_max) while still passing the raw `tc.in` to parseConcurrencyEnv. Identical test coverage; clearer reporting. Addresses Copilot inline comment on PR #528 (round 7): - platform/differ_cache_test.go:253 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t review round 8) Two doc-only nits surfaced in Copilot's round-8 re-review of PR #528. Both are accuracy fixes — no behaviour change. 1. **planDiffConcurrencyMin/Max comment overstated "disable"** — The comment said "Below 1 disables concurrency (worse than serial)", but parseConcurrencyEnv clamps values <=0 UP to planDiffConcurrencyMin (=1), which produces effectively-serial dispatch (one Diff in flight), not "disabled". Operators cannot turn the worker pool off, only narrow it to one. Updated the comment to spell that out and call out both clamp directions explicitly. 2. **channelGatedDriver.inFlight docstring claimed "peak"** — The docstring said inFlight tracks the *peak* number of simultaneous Diff goroutines, but the field is just an atomic counter incremented on Diff entry / decremented on return — that's *current* in-flight, not high-water-mark. The actual parallel-dispatch assertion happens via the `entered` channel (one signal per goroutine that reached the gate); inFlight is retained for diagnostic logging + as a sanity invariant (returns to zero post-release). Updated docstring to describe what the field actually is and why it's NOT the assertion vehicle. All packages still build + vet cleanly. Platform tests pass. Addresses Copilot inline comments on PR #528 (round 8): - platform/differ.go:365 (clamp doc) - platform/differ_cache_test.go:375 (peak vs current) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+246
to
+253
| cache = getDiffCache() | ||
| key = diffcache.Key{ | ||
| PluginVersion: pluginVersionKey(p), | ||
| Type: spec.Type, | ||
| ProviderID: rs.ProviderID, | ||
| SHAConfig: hash, | ||
| SHAOutputs: configHash(rs.Outputs), | ||
| } |
Contributor
Author
There was a problem hiding this comment.
Real correctness concern. PR #528 was admin-merged before this fix could land here, so I've opened follow-up PR #533 with the full implementation:
- New
configHashErr(map) (string, error)surfaces the marshal failure. - Existing
configHash(map) stringkept as backward-compat wrapper (legacy YAML-derived callers don't reach the failure path; documented). ComputePlan+classifyModificationthread a newhashableflag — cache-key paths bypass when EITHERspec.ConfigORrs.Outputsfails to marshal (same defensive treatment as the empty-ProviderIDpath).- New pins:
TestComputePlan_UnhashableInputs_BypassCache(proves 4 driver calls across 2 invocations on channel-poisoned Outputs, vs. bogus 3 without the fix) +TestConfigHashErr_PropagatesMarshalFailure(unit-level err contract).
Thanks for the catch — this would have been a silently-misclassifying-actions bug in the cache-hit path for any custom provider whose Outputs contain types with broken MarshalJSON.
This was referenced May 4, 2026
intel352
added a commit
that referenced
this pull request
May 4, 2026
…unhashable inputs (#533) * fix(iac): T3.6/T3.5 — configHashErr surfaces marshal failure; cache bypassed for unhashable inputs (Copilot review round 9) configHash silently swallowed json.Marshal errors via `data, _ := json.Marshal(ordered)`, so any input containing a non-marshalable value (channels, functions, cycles, types with broken MarshalJSON) collapsed to the constant sha256("") hash. For the diff-cache key path that meant two genuinely-different resources with such inputs would share the same SHAOutputs (or SHAConfig) and serve each other's cached DiffResult — silently misclassifying actions. In practice IaC configs come from YAML and cannot contain those types, so the bug is unreachable for the common case. Defensive coverage matters for custom-provider Outputs that could conceivably have types with broken MarshalJSON, and for any future code path that passes non-YAML-derived configs into ComputePlan. Fix: 1. **Add configHashErr(map) (string, error)** — error-aware variant that returns ("", err) on marshal failure. Backward compat: keep configHash(map) string as a wrapper that ignores the error (legacy callers operating on YAML-derived configs don't reach this path; documented in updated docstring). 2. **ComputePlan + classifyModification thread a `hashable` flag** — The candidate-bucketing loop calls configHashErr on spec.Config; on failure it sets `hashable=false` on the modCandidate. The cache- keying call site at differ.go:235 ANDs `hashable` into the `cacheable` predicate AND calls configHashErr on rs.Outputs (also bypassing on failure there). Both bypass paths fall through to unconditional driver.Diff dispatch — same defensive treatment as the empty-ProviderID path. Cost is one extra Diff call per resource with unhashable inputs, never correctness. 3. **New pins**: - TestComputePlan_UnhashableInputs_BypassCache — two resources with channel-poisoned Outputs both re-Diff on the second ComputePlan (4 driver calls across 2 invocations); without the fix the two would collide on SHAOutputs="" and one would bogus-hit the other's cache (3 driver calls). - TestConfigHashErr_PropagatesMarshalFailure — unit-level assertion that configHashErr returns ("", err) on chan-input and that the un-suffixed configHash returns "" silently for the same input (backward-compat). Addresses Copilot inline comment on PR #528 (round 9): - platform/differ.go:253 (configHash silently collapses unmarshalable inputs) Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/... ./iac/... ./interfaces/... ./plugin/sdk/... — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(iac): R10 review — restore configHash legacy sha256(nil) on marshal failure for backward-compat (Copilot review) Copilot R9 caught a silent ABI break in T3.6: the new configHash wrapper returned "" for unmarshalable inputs, whereas the pre-T3.6 implementation returned sha256(nil) = e3b0c44... (the well-known sha256 of empty bytes; data is nil after json.Marshal failure). That changes any persisted ResolvedConfigHash / ConfigHash values computed for an unhashable input — they would no longer compare equal across the upgrade boundary. While unmarshalable values are unreachable for YAML-derived configs in practice, the public platform.ConfigHash function is part of the API surface, so the defensive choice is byte-for-byte stability. Restored configHash to the pre-T3.6 sha256-of-best-effort body. Cache-key callers continue to use configHashErr (the strict variant that surfaces the marshal error so the cache is bypassed for that resource — collapsing all unmarshalable inputs to a constant hash would risk cache-key collisions across distinct resources). ComputePlan now falls back to configHash for the stored ResolvedConfigHash when configHashErr fails, while still using the hashable flag to gate cache participation in classifyModification. Test TestConfigHashErr_PropagatesMarshalFailure pinned to the restored legacy contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR W-3b: ComputePlan Diff dispatch + manifest-driven v2 apply
This PR implements W-3b of the IaC conformance + Replace plan
(
docs/plans/2026-05-03-iac-conformance-and-replace.md). Built ontop of W-3a (PR #527, manifest field + helper package + drift
postcondition + diff cache foundation). Branch:
feat/iac-replace-dispatch→main.Summary
W-3a shipped the v2 IaC contract's foundation; W-3b is the binding
runtime change that wires it together. After this PR:
platform.ComputePlanaccepts(ctx, provider, desired, current)and dispatches
ResourceDriver.Diffper existing-resourcecandidate under bounded errgroup concurrency (default 8;
WFCTL_PLAN_DIFF_CONCURRENCY1..32). Replace classification ishonest: the helper emits
Action="replace"wheneverDiffResult.NeedsReplaceis true OR anyFieldChange.ForceNewistrue. (T3.6a → T3.6e)
iac/diffcache(W-3a'sT3.5) under the
(PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs)key tuple. Apply correctness does not depend oncache hits — fresh CI runners always miss and re-Diff. (T3.6f)
wfctl infra plannow loads the gRPC plugin process so it canhonestly emit Replace actions before apply. BREAKING:
plugin-load failure exits non-zero with the literal error
error: failed to load plugin "<name>": <reason>; wfctl infra plan now requires the plugin process to compute Diff (since v0.21.0).No
--no-providerescape hatch (rev3 YAGNI fix); operators whoneed pure offline validation use
wfctl validate. (T3.6b)wfctl infra applybranches on the loaded plugin's manifestiacProvider.computePlanVersion:"v2"routes throughwfctlhelpers.ApplyPlan(W-3a's helper, now wired); anythingelse takes the legacy
provider.Applypath. NO env-var, NOoperator-flippable gate — the v1/v2 routing is plugin-author-
controlled via
plugin.json. (T3.7)printDriftReportIfAny(W-3a/T3.1.5, shipped unwired) now firesin the v2 dispatch path on success OR partial failure, so
operators see the stale-input diagnostic exactly when they need
it most.
Bugs incidentally fixed by this PR
W-3b's binding nature surfaced three pre-existing bugs that the v1
dispatch path either masked or never reached. All three ship fixed:
Delete-via-Apply state leakage — Today:
ComputePlanemitsdelete actions but
DOProvider.Applyhas nocase "delete"(falls through to
default: unknown action). wfctl prunesstate regardless. Result: cloud resource leaks. This PR adds
case "delete"towfctlhelpers.ApplyPlan(shipped in W-3a'sT3.3 doDelete + activated by T3.7's manifest-driven dispatch),
fixing the leakage for v2 plugins. Operators relying on the
(broken) skip behavior may see different outcomes — review
downstream automation that assumed delete actions were no-ops.
ForceNew silently downgraded to Update (issue C from
design) — Pre-W-3b: a provider that sets
NeedsUpdate=truewithone or more
ForceNew=trueFieldChanges (but forgets to setNeedsReplace) would haveAction="update"emitted instead ofreplace. The Update path then either no-op'd the change (whenthe underlying API rejected the field as immutable) or applied
it in-place against an API that interpreted the new value as a
recreate request — both wrong. Fixed in T3.6e:
ComputePlanemits
replacewheneverNeedsReplace=trueORhasForceNew(diff.Changes). Pin:platform/differ_replace_test.go'sTestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace.map[string]boolsilently drops gRPC args at the wire boundary —cmd/wfctl/deploy_providers.go::remoteResourceDriver.Diffwaspassing
current.Sensitive(map[string]bool) directly intothe
argsmap.structpb.NewStructrejectsmap[string]bool(it accepts
map[string]anyonly), and the upstreamplugin/external/convert.go::mapToStructreturns&structpb.Struct{}on err rather than surfacing the typingfailure. Result: every Diff dispatch over gRPC for any provider
whose
ResourceOutput.Sensitivemap was non-nil silentlyobserved
args=map[]on the plugin side — the plugin saw anempty arg map and could not honestly compute a diff. v1 plugins
never tripped this because v1 dispatches
IaCProvider.Planserver-side (no
ResourceDriver.Diffover gRPC); v2's per-resource Diff dispatch surfaces it on the first existing-resource
call. Fix:
sensitiveToAny()converter at the call site(commit
40e07a1). Discovered during T3.9 runtime-launch-validation against an out-of-band gRPC stub plugin (the kind
of v2-only regression that motivated the validation step).
Forward concern (not blocking this PR): the upstream
plugin/external/convert.go::mapToStructstill swallowsNewStructerrors. Any future caller in this codebase (or anyplugin SDK author) that passes an unsupported type to args will
hit the same silent-drop. A follow-up should make
mapToStructsurface or panic on the typing error so the bug class is closed
at the root, not just at the one observed call site.
Architecture decisions recorded
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md—T3.9 runtime-launch-validation ships as an in-tree Go integration
test using the
resolveIaCProviderseam (lifted in T3.6c)rather than as an out-of-process gRPC stub plugin. ADR records
the reasoning + considered alternatives.
Rollout
This PR ships the v2 dispatch path; it does NOT migrate any
existing plugin to declare
iacProvider.computePlanVersion: "v2".v1 plugins (every plugin shipping today, including
workflow-plugin- digitalocean) continue to take the legacyprovider.Applypathwith zero runtime change. P-DO sets
v2in a follow-up PRafter this one merges, at which point its CI exercises the gRPC
roundtrip against W-3b's tip.
Test plan
GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... ./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/...— all green
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md—end-to-end loader-seam test asserts driver.{diff,delete,create}
invocations + identity of deleted ProviderID + created Config
region
captureStderrtest assertsprintDriftReportIfAnyoutput reaches operator on v2 success +partial-failure paths
[Unreleased]calls out the BREAKINGchange to
wfctl infra planplus the empty-desired alignmentwith apply
iacProvider.computePlanVersion: "v2"to exercise the gRPCdispatch end-to-end against a production provider