Skip to content

feat(iac)!: Replace dispatch — ComputePlan calls Diff + apply branches on manifest computePlanVersion (W-3b of 12)#528

Merged
intel352 merged 59 commits into
mainfrom
feat/iac-replace-dispatch
May 4, 2026
Merged

feat(iac)!: Replace dispatch — ComputePlan calls Diff + apply branches on manifest computePlanVersion (W-3b of 12)#528
intel352 merged 59 commits into
mainfrom
feat/iac-replace-dispatch

Conversation

@intel352
Copy link
Copy Markdown
Contributor

@intel352 intel352 commented May 4, 2026

PR W-3b: ComputePlan Diff dispatch + manifest-driven v2 apply

This PR implements W-3b of the IaC conformance + Replace plan
(docs/plans/2026-05-03-iac-conformance-and-replace.md). Built on
top of W-3a (PR #527, manifest field + helper package + drift
postcondition + diff cache foundation). Branch:
feat/iac-replace-dispatchmain.

Summary

W-3a shipped the v2 IaC contract's foundation; W-3b is the binding
runtime change that wires it together. After this PR:

  • platform.ComputePlan accepts (ctx, provider, desired, current)
    and dispatches ResourceDriver.Diff per existing-resource
    candidate under bounded errgroup concurrency (default 8;
    WFCTL_PLAN_DIFF_CONCURRENCY 1..32). Replace classification is
    honest: the helper emits Action="replace" whenever
    DiffResult.NeedsReplace is true OR any FieldChange.ForceNew is
    true. (T3.6a → T3.6e)
  • Per-resource Diff results are cached via iac/diffcache (W-3a's
    T3.5) under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) key tuple. Apply correctness does not depend on
    cache hits — fresh CI runners always miss and re-Diff. (T3.6f)
  • wfctl infra plan now loads the gRPC plugin process so it can
    honestly emit Replace actions before apply. BREAKING:
    plugin-load failure exits non-zero with the literal error
    error: failed to load plugin "<name>": <reason>; wfctl infra plan now requires the plugin process to compute Diff (since v0.21.0).
    No --no-provider escape hatch (rev3 YAGNI fix); operators who
    need pure offline validation use wfctl validate. (T3.6b)
  • wfctl infra apply branches on the loaded plugin's manifest
    iacProvider.computePlanVersion: "v2" routes through
    wfctlhelpers.ApplyPlan (W-3a's helper, now wired); anything
    else takes the legacy provider.Apply path. NO env-var, NO
    operator-flippable gate
    — the v1/v2 routing is plugin-author-
    controlled via plugin.json. (T3.7)
  • printDriftReportIfAny (W-3a/T3.1.5, shipped unwired) now fires
    in the v2 dispatch path on success OR partial failure, so
    operators see the stale-input diagnostic exactly when they need
    it most.

Bugs incidentally fixed by this PR

W-3b's binding nature surfaced three pre-existing bugs that the v1
dispatch path either masked or never reached. All three ship fixed:

  1. Delete-via-Apply state leakage — Today: ComputePlan emits
    delete actions but DOProvider.Apply has no case "delete"
    (falls through to default: unknown action). wfctl prunes
    state regardless. Result: cloud resource leaks. This PR adds
    case "delete" to wfctlhelpers.ApplyPlan (shipped in W-3a's
    T3.3 doDelete + activated by T3.7's manifest-driven dispatch),
    fixing the leakage for v2 plugins. Operators relying on the
    (broken) skip behavior may see different outcomes — review
    downstream automation that assumed delete actions were no-ops.

  2. ForceNew silently downgraded to Update (issue C from
    design) — Pre-W-3b: a provider that sets NeedsUpdate=true with
    one or more ForceNew=true FieldChanges (but forgets to set
    NeedsReplace) would have Action="update" emitted instead of
    replace. The Update path then either no-op'd the change (when
    the underlying API rejected the field as immutable) or applied
    it in-place against an API that interpreted the new value as a
    recreate request — both wrong. Fixed in T3.6e: ComputePlan
    emits replace whenever NeedsReplace=true OR
    hasForceNew(diff.Changes). Pin: platform/differ_replace_test.go's
    TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace.

  3. map[string]bool silently drops gRPC args at the wire boundary
    cmd/wfctl/deploy_providers.go::remoteResourceDriver.Diff was
    passing current.Sensitive (map[string]bool) directly into
    the args map. structpb.NewStruct rejects map[string]bool
    (it accepts map[string]any only), and the upstream
    plugin/external/convert.go::mapToStruct returns
    &structpb.Struct{} on err rather than surfacing the typing
    failure. Result: every Diff dispatch over gRPC for any provider
    whose ResourceOutput.Sensitive map was non-nil silently
    observed args=map[] on the plugin side — the plugin saw an
    empty arg map and could not honestly compute a diff. v1 plugins
    never tripped this because v1 dispatches IaCProvider.Plan
    server-side (no ResourceDriver.Diff over gRPC); v2's per-
    resource Diff dispatch surfaces it on the first existing-resource
    call. Fix: sensitiveToAny() converter at the call site
    (commit 40e07a1). Discovered during T3.9 runtime-launch-
    validation against an out-of-band gRPC stub plugin
    (the kind
    of v2-only regression that motivated the validation step).

    Forward concern (not blocking this PR): the upstream
    plugin/external/convert.go::mapToStruct still swallows
    NewStruct errors. Any future caller in this codebase (or any
    plugin SDK author) that passes an unsupported type to args will
    hit the same silent-drop. A follow-up should make mapToStruct
    surface or panic on the typing error so the bug class is closed
    at the root, not just at the one observed call site.

Architecture decisions recorded

  • docs/adr/007-t3-9-runtime-validation-via-loader-seam.md
    T3.9 runtime-launch-validation ships as an in-tree Go integration
    test using the resolveIaCProvider seam (lifted in T3.6c)
    rather than as an out-of-process gRPC stub plugin. ADR records
    the reasoning + considered alternatives.

Rollout

This PR ships the v2 dispatch path; it does NOT migrate any
existing plugin to declare iacProvider.computePlanVersion: "v2".
v1 plugins (every plugin shipping today, including workflow-plugin- digitalocean) continue to take the legacy provider.Apply path
with zero runtime change. P-DO sets v2 in a follow-up PR
after this one merges, at which point its CI exercises the gRPC
roundtrip against W-3b's tip.

Test plan

  • GOWORK=off go test -race -count=1 ./interfaces/... ./iac/... ./platform/... ./plugin/sdk/... ./cmd/wfctl/... ./module/...
    — all green
  • T3.9 runtime-launch-validation per
    docs/adr/007-t3-9-runtime-validation-via-loader-seam.md
    end-to-end loader-seam test asserts driver.{diff,delete,create}
    invocations + identity of deleted ProviderID + created Config
    region
  • T3.9 drift-report wiring — captureStderr test asserts
    printDriftReportIfAny output reaches operator on v2 success +
    partial-failure paths
  • CHANGELOG entry under [Unreleased] calls out the BREAKING
    change to wfctl infra plan plus the empty-desired alignment
    with apply
  • (Manual, post-merge) P-DO rebases on W-3b tip and sets
    iacProvider.computePlanVersion: "v2" to exercise the gRPC
    dispatch end-to-end against a production provider

intel352 and others added 30 commits May 3, 2026 19:05
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rovider schema

Addresses code-reviewer findings on commit 695a070:

- Important: race on lazy compiledSchema cache. Wrap with sync.Once;
  capture both *jsonschema.Schema and the compile error so concurrent
  callers observe a single deterministic outcome. Adds a 32-goroutine
  ParseManifest stress test that fires under -race to lock in the
  invariant going forward.
- Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers
  cannot mutate the //go:embed slice (defense-in-depth; embed slices
  are technically writable). New test verifies the copy semantics.
- Minor: iacProvider sub-object gains additionalProperties:false so a
  typo like "computeplanversion" or an unknown key is rejected at
  parse time instead of silently defaulting to v1 dispatch. The root
  object stays permissive — existing plugin.json files carry
  version/author/dependencies/etc. and the SDK manifest is a strict
  subset by design. New test covers both the typo-rejection and the
  root-permissivity contracts.
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls
ResourceDriver.Read per resource and returns a copy of the state slice with
Outputs reconciled to the live values. Default concurrency 8 when
Options.Concurrency < 1; otherwise honor the caller's value. On any Read or
driver-resolution failure, returns (nil, err) so callers don't half-persist
a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in
apply pre-step (T2.3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.

Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.

When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
  refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ESH_OUTPUTS)

T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan
read-only state reconciliation. Default OFF: operators get pre-W-2
behavior unless they explicitly opt in.

Activation rules:
- WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default).
- WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) →
  run pre-step.
- WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) →
  no-op. Operators who use the "0"/"false" convention to disable a
  feature get the expected behaviour rather than a presence-only
  foot-gun.
- --skip-refresh → suppress pre-step regardless of env var (for CI
  environments that force the env var on globally).

Behavior: after the existing --refresh drift/prune phase and before the
plan/apply dispatch, discovers iac.provider modules with per-env
resolution, loads current state, and calls
refreshOutputsAcrossProviders to read live Outputs and persist any
field-level changes. On any Read or driver-resolution failure, apply
aborts with the wrapped error from T2.1's helper (no half-persisted
refresh, no plan computed against stale state). Only fires for
infra.* configs (legacy platform.* path is silently skipped).

Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert
this commit. Reverting removes the pre-step entirely (helper file plus
the gated block in infra.go).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:

  1. No deadlock (10s watchdog around the call).
  2. Read called exactly once per ProviderID (atomic per-ID counter).
  3. Every refreshed state carries the live Outputs map — no
     write-into-wrong-slot bug under concurrency.
  4. Concurrent in-flight peak between 2 and the requested cap, proving
     both that parallelism happened AND that the semaphore enforced
     its limit.

The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:

- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
  literal-error contract (load-bearing per T2.7), apply-time pre-step
  semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
  representative examples.

See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.

Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):

  $ mdformat --check docs/WFCTL.md
  Error: File "docs/WFCTL.md" is not formatted.
  exit=1

  This failure is PRE-EXISTING. Verified by checking out the file at
  the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
  mdformat against it: identical error. docs/WFCTL.md has never been
  mdformat-formatted in this repo. Reformatting the entire file is
  out of scope for T2.6 (would introduce a multi-thousand-line
  unrelated diff). T2.6's own additions follow the existing in-file
  conventions exactly.

  $ markdown-link-check docs/WFCTL.md
  FILE: docs/WFCTL.md
    [✓] https://github.com/GoCodeAlone/workflow
    [✓] #build-ui
    [✓] mcp.md
    3 links checked.
  exit=0

  docs/WFCTL.md has zero broken links — including the new
  refresh-outputs section. The directory-wide scan reports 7 broken
  links in unrelated files (self-improvement-tutorial.md,
  getting-started.md, etc.); all are pre-existing and out of scope.

T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):

  $ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
  exit=0

  $ /tmp/wfctl infra refresh-outputs --help
  Usage of infra refresh-outputs:
    -c string
      	Config file (short for --config)
    -concurrency int
      	Maximum concurrent Read calls (default 8)
    -config string
      	Config file
    -e string
      	Environment name (short for --env)
    -env string
      	Environment name (resolves per-module overrides)
  exit=0

  $ cat /tmp/t27-fake.yaml
  modules:
    - name: state-store
      type: iac.state
      config:
        backend: filesystem
        directory: /tmp/t27-fake-state

  $ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
  error: refresh-outputs: provider not configured for env "staging"
  exit=1

  No panic, no stack trace. Stderr line is the verbatim literal pinned
  by T2.7 (plan line 1098), produced by T2.2's
  fmt.Errorf("refresh-outputs: provider not configured for env %q",
  env) at cmd/wfctl/infra_refresh_outputs.go:49.

  PR W-2 mandate (plan line 1101):
  $ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
  ok  	github.com/GoCodeAlone/workflow/iac/refreshoutputs	1.405s
  ok  	github.com/GoCodeAlone/workflow/cmd/wfctl	10.485s

  Manual smoke against staging-PG: not run — no staging-PG available
  in this worktree environment. Plan line 1102 marks this "if
  available", so deferring to the operator landing the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… from plan §T2.3

ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:

- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
  strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
  superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
  plan revert; provenance recorded in the ADR itself.

Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mitempty contract

Addresses code-reviewer findings on commit 13a6fad:

- Important: ReplaceIDMap godoc said "Keyed by the dependent resource
  Name" but the populating site (T3.4 plan §1625) sets
  result.ReplaceIDMap[action.Resource.Name] where action.Resource is the
  REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms
  this. Re-worded to "Keyed by the *replaced* resource's Name" with an
  explicit reference to action.Resource.Name + a sentence on how W-5 JIT
  substitution will use the map (lookup by replaced-resource name to
  obtain the new ProviderID for dependent configs). Locks the contract
  before the field has any consumers.
- Minor: cross-referenced the InputDriftReport sort-stability guarantee
  to its enforcing test (TestComputeDrift_ResultIsSortedByName in
  iac/inputsnapshot/compute_drift_test.go) so the contract is no longer
  free-floating on the field godoc.
- Minor: added TestApplyResult_OmitEmptyContract — table-driven across
  nil and empty-but-non-nil values for all three new fields, asserting
  the JSON keys are absent from the encoded form. Locks the omitempty
  tag behavior so a future refactor cannot silently regress to emitting
  "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.
…iver-resolve test

Addresses code-reviewer findings on commit 8416498:

- Important 1 (weak Replace assertion): converted fakeDriver from
  boolean call recorders to integer counters. The 4-action plan
  [create, update, replace, delete] now asserts Create==2, Update==1,
  Delete==2. If "case replace" were silently dropped from
  dispatchAction the counts would shift to 1/1/1 and the test would
  fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that
  isolates Replace via a single-action plan: 1 Delete + 1 Create + 0
  Update. Removes the calledReplace() proxy entirely.
- Important 2 (resolve-driver-error path uncovered): added
  TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises
  fakeProvider.driverErr, asserts the canonical "resolve driver:"
  prefix, and verifies the loop continues past action[0] to action[1]
  (best-effort contract). Folded the loop-continues-after-failure
  coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure
  using a selectiveFakeProvider that errors on one type only — proves
  one action's failure does not block another's success.
- Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to
  fmt.Sprintf("resolve driver: %v", err) since the destination is a
  string field and the wrapping chain dies at the field boundary.
- Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop
  iteration boundary; on cancel, returns the result accumulated so far
  + the ctx error as top-level. Added
  TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel:
  driver receives zero invocations, top-level error is context.Canceled.
- Minor 5 (refFromAction defensive note): added a godoc paragraph
  documenting the same-name-same-type invariant for Replace plans.
  Documenting rather than enforcing — ComputePlan upstream is the
  contract owner.

Minor 2 (uniform error prefixing across sub-functions) intentionally
deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the
final sub-function bodies and can pick the convention once.
…_plan_test

Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when
fingerprintForTest was switched to delegate to inputsnapshot.Compute
instead of computing sha256 inline. cmd/wfctl test build was broken on
HEAD because of the unused imports — surfaced while landing T3.1.5,
which adds a new test file in the same package.

Pure-mechanical cleanup. No behavior change.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Comment thread cmd/wfctl/infra_plan_provider.go Outdated
Comment thread platform/differ.go Outdated
Comment thread platform/differ.go
…cache + bypass-side-effect-free (Copilot review round 4)

Three real fixes from Copilot's re-review of PR #528 round 4:

1. **loadErr chain lost across runInfraPlan re-wrap (errors.Is/As)** —
   computePlanForInfraSpecs returned `failed to load plugin %q: %v;
   ...` (using %v), losing the underlying error. After runInfraPlan
   re-wraps with `compute plan: %w`, callers could not errors.Is /
   errors.As against the original loader failure (e.g. to differentiate
   "plugin binary missing" from "plugin crashed during handshake").
   Switch the inner wrap to %w. Rendered text is identical to %v.
   New pin: TestRunInfraPlan_FailsLoudOnPluginLoadFailure now asserts
   `errors.Is(err, loadErr)` reaches the sentinel through both wrap
   layers.

2. **getDiffCache called even on the empty-ProviderID bypass path** —
   classifyModification was calling getDiffCache() unconditionally,
   which (under the old per-call mutex) acquired the lock, and (under
   any backend-init pattern) would eagerly construct the filesystem
   cache backend at ~/.cache/wfctl/diff/ on the operator's machine
   even for resources that bypass the cache. Move the getDiffCache
   call inside the `if cacheable` branch so the bypass path is fully
   side-effect free. Round-3 already pinned the bypass behavior via
   TestComputePlan_EmptyProviderID_BypassesCache.

3. **Per-call sync.Mutex contention on getDiffCache hot path** —
   Under ComputePlan's parallel Diff fan-out (planDiffConcurrency()
   workers), the per-call mutex on getDiffCache was contention on
   every cache.Get / cache.Put, especially on cache hits where the
   Get itself is cheap. Refactor to sync.Once for one-time init +
   atomic.Pointer[diffcache.Cache] for lock-free reads. Subsequent
   reads are just an atomic.Load (and a typed deref). The test-swap
   helper setDiffCacheForTest is updated to Store/Restore directly
   on the atomic; cleanup seeds a fresh default when there was no
   prior value (so subsequent tests in the binary still observe a
   working cache).

Addresses Copilot inline comments on PR #528 (round 4):
- cmd/wfctl/infra_plan_provider.go:124 (%v → %w)
- platform/differ.go:235 (getDiffCache eager call on bypass path)
- platform/differ.go:405 (per-call mutex on hot path)

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... ./module/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Comment thread iac/wfctlhelpers/dispatch.go
Comment thread platform/differ.go Outdated
Comment thread cmd/wfctl/infra_plan_provider.go
…rtion + cache nil-DiffResult as zero-value (Copilot review round 5)

Two real fixes from Copilot's re-review of PR #528 round 5 (a third
finding, plan/apply discovery duplication, is filed as a follow-up
issue rather than addressed in-PR to keep W-3b scope-locked).

1. **DispatchVersionFor docstring vs signature mismatch** — The
   helper claims to centralize the type assertion + non-implementer
   defaulting, but its parameter type was `ComputePlanVersionDeclarer`,
   forcing every call site to type-assert externally. Change the
   signature to accept `any` and perform the type assertion inside;
   non-implementers + nil now both return "v1" inside the helper as
   the docstring already promised. Param is `any` (not
   interfaces.IaCProvider) to keep the helper package
   import-free of the engine's interfaces package and to keep
   non-engine call sites (tests, stubs) frictionless. Updated the
   only production call site (cmd/wfctl/infra_apply.go) to drop the
   external type-assert.

2. **Cache no-op when driver.Diff returns (nil, nil)** — The
   cache.Put was guarded by `fresh != nil`, so providers using the
   nil-as-no-op convention (a documented option in the
   (DiffResult|nil, error|nil) return shape) re-Diffed on every
   ComputePlan call — undermining the cache contract for that whole
   class of providers. Cache a zero-value DiffResult on (nil, nil)
   returns; classifyModification's downstream switch already treats
   zero-value the same as nil (no plan action), so the semantic is
   preserved while the cache stays effective. New pin:
   TestComputePlan_NilDiffResult_CachesAsZeroValue verifies that the
   second ComputePlan against unchanged inputs is served from cache
   (driver.Diff invoked exactly once across two calls).

3. **Plan/apply provider-discovery duplication** (Copilot finding R5-C,
   not addressed in this PR) — computePlanForInfraSpecs duplicates
   the iac.provider discovery + grouping logic in applyInfraModules.
   Per workspace memory feedback_implementer_scope_bleed, refactoring
   to a shared helper is a separate task: the duplication exists
   pre-W-3b (apply was the original; plan was added in W-3b mirroring
   it intentionally), and the extraction touches code paths W-3b's
   test plan does not cover. Filed as follow-up rather than expanding
   W-3b's blast radius. Documented in PR description.

Addresses Copilot inline comments on PR #528 (round 5):
- iac/wfctlhelpers/dispatch.go:41 (signature vs docstring mismatch)
- platform/differ.go:265 (cache write skipped on (nil, nil))

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Comment thread cmd/wfctl/deploy_providers.go Outdated
Comment thread iac/wfctlhelpers/dispatch.go Outdated
…laims (Copilot review round 6)

Two doc-comment accuracy fixes from Copilot's re-review of PR #528
round 6 — both surfaced by/exposed in the round-5 changes:

1. **findIaCPluginDir docstring referenced wrong helper** — Round 5
   changed wfctlhelpers.DispatchVersionFor to take `any` (a provider
   value), but findIaCPluginDir's docstring still told callers to pass
   the returned `computePlanVersion` string through DispatchVersionFor.
   That call wouldn't type-assert to ComputePlanVersionDeclarer (a
   string isn't a provider) and would silently default to "v1".
   Replaced with the correct pattern: string-equality against
   wfctlhelpers.DispatchVersionV2 at this loader-level seam where only
   the raw string is in hand. Includes example snippet.

2. **DispatchVersionFor docstring overstated the validation
   guarantee** — Claimed plugin/sdk.ParseManifest schema-validation
   means the dispatch only sees {"v1", "v2", ""}. True for callers
   that load via ParseManifest, but cmd/wfctl/deploy_providers.go's
   findIaCPluginDir / readIaCPluginComputePlanVersion path uses a
   minimal json.Unmarshal with NO schema validation — so unknown
   values CAN reach DispatchVersionFor at runtime. Updated the
   docstring to flag this honestly and call out that the default-to-v1
   behavior is the safety net for those paths (callers must not rely
   on the validation guarantee).

Doc-only; no code change. All packages still build + vet cleanly.

Addresses Copilot inline comments on PR #528 (round 6):
- cmd/wfctl/deploy_providers.go:107 (wrong helper referenced)
- iac/wfctlhelpers/dispatch.go:18 (overstated validation guarantee)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Comment thread platform/differ_cache_test.go
…ew round 7)

The first table case had `in: ""` and used `tc.in` directly as the
t.Run subtest name. Go's testing package silently rewrites empty
subtest names to "#00", which is unique enough to run but masks the
case identity in -v output and failure reports. Add a `name` field
to the table struct and use stable descriptive labels (empty,
non_numeric, negative, zero, one, eight, thirty_two,
thirty_three_clamped_to_max, one_hundred_clamped_to_max) while still
passing the raw `tc.in` to parseConcurrencyEnv. Identical test
coverage; clearer reporting.

Addresses Copilot inline comment on PR #528 (round 7):
- platform/differ_cache_test.go:253

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Comment thread platform/differ.go Outdated
Comment thread platform/differ_cache_test.go Outdated
…t review round 8)

Two doc-only nits surfaced in Copilot's round-8 re-review of PR #528.
Both are accuracy fixes — no behaviour change.

1. **planDiffConcurrencyMin/Max comment overstated "disable"** —
   The comment said "Below 1 disables concurrency (worse than serial)",
   but parseConcurrencyEnv clamps values <=0 UP to planDiffConcurrencyMin
   (=1), which produces effectively-serial dispatch (one Diff in flight),
   not "disabled". Operators cannot turn the worker pool off, only narrow
   it to one. Updated the comment to spell that out and call out both
   clamp directions explicitly.

2. **channelGatedDriver.inFlight docstring claimed "peak"** — The
   docstring said inFlight tracks the *peak* number of simultaneous
   Diff goroutines, but the field is just an atomic counter
   incremented on Diff entry / decremented on return — that's
   *current* in-flight, not high-water-mark. The actual
   parallel-dispatch assertion happens via the `entered` channel
   (one signal per goroutine that reached the gate); inFlight is
   retained for diagnostic logging + as a sanity invariant (returns
   to zero post-release). Updated docstring to describe what the
   field actually is and why it's NOT the assertion vehicle.

All packages still build + vet cleanly. Platform tests pass.

Addresses Copilot inline comments on PR #528 (round 8):
- platform/differ.go:365 (clamp doc)
- platform/differ_cache_test.go:375 (peak vs current)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intel352 intel352 merged commit a434d19 into main May 4, 2026
17 of 18 checks passed
@intel352 intel352 deleted the feat/iac-replace-dispatch branch May 4, 2026 06:57
@intel352 intel352 requested a review from Copilot May 4, 2026 06:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Comment thread platform/differ.go
Comment on lines +246 to +253
cache = getDiffCache()
key = diffcache.Key{
PluginVersion: pluginVersionKey(p),
Type: spec.Type,
ProviderID: rs.ProviderID,
SHAConfig: hash,
SHAOutputs: configHash(rs.Outputs),
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real correctness concern. PR #528 was admin-merged before this fix could land here, so I've opened follow-up PR #533 with the full implementation:

  1. New configHashErr(map) (string, error) surfaces the marshal failure.
  2. Existing configHash(map) string kept as backward-compat wrapper (legacy YAML-derived callers don't reach the failure path; documented).
  3. ComputePlan + classifyModification thread a new hashable flag — cache-key paths bypass when EITHER spec.Config OR rs.Outputs fails to marshal (same defensive treatment as the empty-ProviderID path).
  4. New pins: TestComputePlan_UnhashableInputs_BypassCache (proves 4 driver calls across 2 invocations on channel-poisoned Outputs, vs. bogus 3 without the fix) + TestConfigHashErr_PropagatesMarshalFailure (unit-level err contract).

Thanks for the catch — this would have been a silently-misclassifying-actions bug in the cache-hit path for any custom provider whose Outputs contain types with broken MarshalJSON.

intel352 added a commit that referenced this pull request May 4, 2026
…unhashable inputs (#533)

* fix(iac): T3.6/T3.5 — configHashErr surfaces marshal failure; cache bypassed for unhashable inputs (Copilot review round 9)

configHash silently swallowed json.Marshal errors via `data, _ :=
json.Marshal(ordered)`, so any input containing a non-marshalable
value (channels, functions, cycles, types with broken MarshalJSON)
collapsed to the constant sha256("") hash. For the diff-cache key
path that meant two genuinely-different resources with such inputs
would share the same SHAOutputs (or SHAConfig) and serve each
other's cached DiffResult — silently misclassifying actions.

In practice IaC configs come from YAML and cannot contain those
types, so the bug is unreachable for the common case. Defensive
coverage matters for custom-provider Outputs that could conceivably
have types with broken MarshalJSON, and for any future code path
that passes non-YAML-derived configs into ComputePlan.

Fix:

1. **Add configHashErr(map) (string, error)** — error-aware variant
   that returns ("", err) on marshal failure. Backward compat: keep
   configHash(map) string as a wrapper that ignores the error
   (legacy callers operating on YAML-derived configs don't reach
   this path; documented in updated docstring).

2. **ComputePlan + classifyModification thread a `hashable` flag** —
   The candidate-bucketing loop calls configHashErr on spec.Config;
   on failure it sets `hashable=false` on the modCandidate. The cache-
   keying call site at differ.go:235 ANDs `hashable` into the
   `cacheable` predicate AND calls configHashErr on rs.Outputs (also
   bypassing on failure there). Both bypass paths fall through to
   unconditional driver.Diff dispatch — same defensive treatment as
   the empty-ProviderID path. Cost is one extra Diff call per
   resource with unhashable inputs, never correctness.

3. **New pins**:
   - TestComputePlan_UnhashableInputs_BypassCache — two resources
     with channel-poisoned Outputs both re-Diff on the second
     ComputePlan (4 driver calls across 2 invocations); without
     the fix the two would collide on SHAOutputs="" and one would
     bogus-hit the other's cache (3 driver calls).
   - TestConfigHashErr_PropagatesMarshalFailure — unit-level
     assertion that configHashErr returns ("", err) on chan-input
     and that the un-suffixed configHash returns "" silently for
     the same input (backward-compat).

Addresses Copilot inline comment on PR #528 (round 9):
- platform/differ.go:253 (configHash silently collapses unmarshalable inputs)

Tests: GOWORK=off go test -race -count=1 ./platform/... ./cmd/wfctl/...
./iac/... ./interfaces/... ./plugin/sdk/... — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(iac): R10 review — restore configHash legacy sha256(nil) on marshal failure for backward-compat (Copilot review)

Copilot R9 caught a silent ABI break in T3.6: the new configHash
wrapper returned "" for unmarshalable inputs, whereas the pre-T3.6
implementation returned sha256(nil) = e3b0c44... (the well-known
sha256 of empty bytes; data is nil after json.Marshal failure).

That changes any persisted ResolvedConfigHash / ConfigHash values
computed for an unhashable input — they would no longer compare
equal across the upgrade boundary. While unmarshalable values are
unreachable for YAML-derived configs in practice, the public
platform.ConfigHash function is part of the API surface, so the
defensive choice is byte-for-byte stability.

Restored configHash to the pre-T3.6 sha256-of-best-effort body.
Cache-key callers continue to use configHashErr (the strict variant
that surfaces the marshal error so the cache is bypassed for that
resource — collapsing all unmarshalable inputs to a constant hash
would risk cache-key collisions across distinct resources).

ComputePlan now falls back to configHash for the stored
ResolvedConfigHash when configHashErr fails, while still using the
hashable flag to gate cache participation in classifyModification.

Test TestConfigHashErr_PropagatesMarshalFailure pinned to the
restored legacy contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants