feat(iac): ProviderPlanner interface + cross-plugin build CI gate (W-9)#534
Merged
Conversation
…olvedConfigHash + DriftEntry type
…der with preservation sentinel
…ComputeDrift sentinel-honoring
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rovider schema Addresses code-reviewer findings on commit 695a070: - Important: race on lazy compiledSchema cache. Wrap with sync.Once; capture both *jsonschema.Schema and the compile error so concurrent callers observe a single deterministic outcome. Adds a 32-goroutine ParseManifest stress test that fires under -race to lock in the invariant going forward. - Minor: ManifestSchemaJSON() now returns bytes.Clone(...) so callers cannot mutate the //go:embed slice (defense-in-depth; embed slices are technically writable). New test verifies the copy semantics. - Minor: iacProvider sub-object gains additionalProperties:false so a typo like "computeplanversion" or an unknown key is rejected at parse time instead of silently defaulting to v1 dispatch. The root object stays permissive — existing plugin.json files carry version/author/dependencies/etc. and the SDK manifest is a strict subset by design. New test covers both the typo-rejection and the root-permissivity contracts.
T2.1 — bounded-concurrency Refresh(ctx, provider, states, opts) that calls ResourceDriver.Read per resource and returns a copy of the state slice with Outputs reconciled to the live values. Default concurrency 8 when Options.Concurrency < 1; otherwise honor the caller's value. On any Read or driver-resolution failure, returns (nil, err) so callers don't half-persist a refresh. Foundation for wfctl infra refresh-outputs (T2.2) and the opt-in apply pre-step (T2.3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.2 — `wfctl infra refresh-outputs [-c CONFIG] [--env ENV] [--concurrency N]`
reads live Outputs for each resource already in state and persists any
field-level changes back to the state backend. Read-only at the cloud
level — never invokes Update or Replace.
Discovers iac.provider modules in the config (with per-env resolution),
groups state entries by their owning iac.provider module (ProviderRef-first,
falling back to provider type when exactly one module of that type exists),
loads each provider once, calls iac/refreshoutputs.Refresh per group, and
SaveResource()s any state whose Outputs map changed.
When the resolved config has no usable iac.provider module for the
requested env, emits the literal error
refresh-outputs: provider not configured for env "<env>"
verbatim per `fmt.Errorf("refresh-outputs: provider not configured for
env %q", env)`. T2.7's runtime-launch-validation asserts against this
exact line.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ESH_OUTPUTS) T2.3 — wires iac/refreshoutputs.Refresh into runInfraApply as a pre-plan read-only state reconciliation. Default OFF: operators get pre-W-2 behavior unless they explicitly opt in. Activation rules: - WFCTL_REFRESH_OUTPUTS unset, empty, or unrecognised → no-op (default). - WFCTL_REFRESH_OUTPUTS="1"/"true"/"t" (strconv.ParseBool truthy) → run pre-step. - WFCTL_REFRESH_OUTPUTS="0"/"false"/"f" (strconv.ParseBool falsey) → no-op. Operators who use the "0"/"false" convention to disable a feature get the expected behaviour rather than a presence-only foot-gun. - --skip-refresh → suppress pre-step regardless of env var (for CI environments that force the env var on globally). Behavior: after the existing --refresh drift/prune phase and before the plan/apply dispatch, discovers iac.provider modules with per-env resolution, loads current state, and calls refreshOutputsAcrossProviders to read live Outputs and persist any field-level changes. On any Read or driver-resolution failure, apply aborts with the wrapped error from T2.1's helper (no half-persisted refresh, no plan computed against stale state). Only fires for infra.* configs (legacy platform.* path is silently skipped). Rollback: unset WFCTL_REFRESH_OUTPUTS, pass --skip-refresh, or revert this commit. Reverting removes the pre-step entirely (helper file plus the gated block in infra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.5 — pure-package stress test in iac/refreshoutputs/. Drives Refresh
with 100 fake resources at Concurrency=8 and asserts:
1. No deadlock (10s watchdog around the call).
2. Read called exactly once per ProviderID (atomic per-ID counter).
3. Every refreshed state carries the live Outputs map — no
write-into-wrong-slot bug under concurrency.
4. Concurrent in-flight peak between 2 and the requested cap, proving
both that parallelism happened AND that the semaphore enforced
its limit.
The countingDriver introduces a 5ms sleep per Read so the bounded pool
actually queues at the cap (5ms × 100 / 8 ≈ 63ms total at peak; well
under the 10s watchdog). Test runs ~1.5s wall.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
T2.6 — adds the infra refresh-outputs section to docs/WFCTL.md:
- New row in the Command Tree mermaid graph.
- New row in the infra Action table.
- Dedicated #### subsection with usage, flag table, behavior summary,
literal-error contract (load-bearing per T2.7), apply-time pre-step
semantics (WFCTL_REFRESH_OUTPUTS, --skip-refresh), and three
representative examples.
See also: docs/adr/006-wfctl-refresh-outputs-env-var-parsebool.md
records the T2.3 plan-deviation (ParseBool vs plan-literal presence
check) that the docs in this commit accurately reflect.
Verification — plan §T2.6 line 1090 invocation `mdformat --check
docs/WFCTL.md && find docs -name "*.md" -exec markdown-link-check {} +`
ran with locally-installed mdformat 1.0.0 (pip) and markdown-link-check
3.14.2 (npm):
$ mdformat --check docs/WFCTL.md
Error: File "docs/WFCTL.md" is not formatted.
exit=1
This failure is PRE-EXISTING. Verified by checking out the file at
the W-2 T2.2 tip (181e579) before any T2.6 edits and rerunning
mdformat against it: identical error. docs/WFCTL.md has never been
mdformat-formatted in this repo. Reformatting the entire file is
out of scope for T2.6 (would introduce a multi-thousand-line
unrelated diff). T2.6's own additions follow the existing in-file
conventions exactly.
$ markdown-link-check docs/WFCTL.md
FILE: docs/WFCTL.md
[✓] https://github.com/GoCodeAlone/workflow
[✓] #build-ui
[✓] mcp.md
3 links checked.
exit=0
docs/WFCTL.md has zero broken links — including the new
refresh-outputs section. The directory-wide scan reports 7 broken
links in unrelated files (self-improvement-tutorial.md,
getting-started.md, etc.); all are pre-existing and out of scope.
T2.7 runtime-launch-validation transcript (folded into this commit
body per the "Files: none new" plan note for T2.7):
$ GOWORK=off go build -o /tmp/wfctl ./cmd/wfctl
exit=0
$ /tmp/wfctl infra refresh-outputs --help
Usage of infra refresh-outputs:
-c string
Config file (short for --config)
-concurrency int
Maximum concurrent Read calls (default 8)
-config string
Config file
-e string
Environment name (short for --env)
-env string
Environment name (resolves per-module overrides)
exit=0
$ cat /tmp/t27-fake.yaml
modules:
- name: state-store
type: iac.state
config:
backend: filesystem
directory: /tmp/t27-fake-state
$ /tmp/wfctl infra refresh-outputs -c /tmp/t27-fake.yaml --env staging
error: refresh-outputs: provider not configured for env "staging"
exit=1
No panic, no stack trace. Stderr line is the verbatim literal pinned
by T2.7 (plan line 1098), produced by T2.2's
fmt.Errorf("refresh-outputs: provider not configured for env %q",
env) at cmd/wfctl/infra_refresh_outputs.go:49.
PR W-2 mandate (plan line 1101):
$ GOWORK=off go test ./iac/refreshoutputs/... ./cmd/wfctl/... -count=1 -race
ok github.com/GoCodeAlone/workflow/iac/refreshoutputs 1.405s
ok github.com/GoCodeAlone/workflow/cmd/wfctl 10.485s
Manual smoke against staging-PG: not run — no staging-PG available
in this worktree environment. Plan line 1102 marks this "if
available", so deferring to the operator landing the PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… from plan §T2.3
ADR 006 — formalises the spec-vs-quality-review trade-off recorded
during W-2 T2.3 review:
- Plan §T2.3 line 1061 specified `os.Getenv("WFCTL_REFRESH_OUTPUTS") != ""`.
- Code-reviewer flagged this as a foot-gun (=0 mis-enables).
- Implementation at cmd/wfctl/infra_apply_refresh_pre.go (bfd1bbe) uses
strconv.ParseBool so falsey values explicitly disable.
- Spec-reviewer accepted post-hoc and requested this ADR per
superpowers:recording-decisions.
- Team-lead approved option-1 (approve-as-is + follow-up ADR) over a
plan revert; provenance recorded in the ADR itself.
Captures the rejected alternative, the rationale, references back to
the plan spec, the implementation site, the pinning test, and the
operator-facing docs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ReplaceIDMap fields
…mitempty contract Addresses code-reviewer findings on commit 13a6fad: - Important: ReplaceIDMap godoc said "Keyed by the dependent resource Name" but the populating site (T3.4 plan §1625) sets result.ReplaceIDMap[action.Resource.Name] where action.Resource is the REPLACED resource. The roundtrip fixture {"vpc":"new-uuid"} confirms this. Re-worded to "Keyed by the *replaced* resource's Name" with an explicit reference to action.Resource.Name + a sentence on how W-5 JIT substitution will use the map (lookup by replaced-resource name to obtain the new ProviderID for dependent configs). Locks the contract before the field has any consumers. - Minor: cross-referenced the InputDriftReport sort-stability guarantee to its enforcing test (TestComputeDrift_ResultIsSortedByName in iac/inputsnapshot/compute_drift_test.go) so the contract is no longer free-floating on the field godoc. - Minor: added TestApplyResult_OmitEmptyContract — table-driven across nil and empty-but-non-nil values for all three new fields, asserting the JSON keys are absent from the encoded form. Locks the omitempty tag behavior so a future refactor cannot silently regress to emitting "initial_input_snapshot": {} / "input_drift_report": [] / "replace_id_map": {}.
…iver-resolve test Addresses code-reviewer findings on commit 8416498: - Important 1 (weak Replace assertion): converted fakeDriver from boolean call recorders to integer counters. The 4-action plan [create, update, replace, delete] now asserts Create==2, Update==1, Delete==2. If "case replace" were silently dropped from dispatchAction the counts would shift to 1/1/1 and the test would fail. Added TestApplyPlan_ReplaceDispatchesViaDeleteThenCreate that isolates Replace via a single-action plan: 1 Delete + 1 Create + 0 Update. Removes the calledReplace() proxy entirely. - Important 2 (resolve-driver-error path uncovered): added TestApplyPlan_ResolveDriverErrorRecordsActionError which exercises fakeProvider.driverErr, asserts the canonical "resolve driver:" prefix, and verifies the loop continues past action[0] to action[1] (best-effort contract). Folded the loop-continues-after-failure coverage into a separate TestApplyPlan_LoopContinuesAfterPerActionFailure using a selectiveFakeProvider that errors on one type only — proves one action's failure does not block another's success. - Minor 1 (wasted %w): switched fmt.Errorf(...).Error() to fmt.Sprintf("resolve driver: %v", err) since the destination is a string field and the wrapping chain dies at the field boundary. - Minor 3 (ctx.Done not checked): added ctx.Err() check at the loop iteration boundary; on cancel, returns the result accumulated so far + the ctx error as top-level. Added TestApplyPlan_CtxCancellationStopsLoop covering pre-call cancel: driver receives zero invocations, top-level error is context.Canceled. - Minor 5 (refFromAction defensive note): added a godoc paragraph documenting the same-name-same-type invariant for Replace plans. Documenting rather than enforcing — ComputePlan upstream is the contract owner. Minor 2 (uniform error prefixing across sub-functions) intentionally deferred to T3.2/T3.3/T3.4 per reviewer guidance — those tasks own the final sub-function bodies and can pick the convention once.
…_plan_test Imports were left orphaned by W-1 PR #523 (commit 48f7a0c) when fingerprintForTest was switched to delegate to inputsnapshot.Compute instead of computing sha256 inline. cmd/wfctl test build was broken on HEAD because of the unused imports — surfaced while landing T3.1.5, which adds a new test file in the same package. Pure-mechanical cleanup. No behavior change.
…safe + tolerant of mid-apply env unset)
W-3b T3.6d. Updates the 4 cross-package ComputePlan call sites in module/infra_module_integration_test.go to the new (ctx, provider, …) signature. Lifts the no-op fake into a small public test helper at iac/iactest/fakeprovider.go so the same shape no longer needs to be re-declared every time a new package wants to satisfy the interface. Folds in the T3.6c review's IMPORTANT follow-up: cmd/wfctl's computePlanForInfraSpecs now dispatches via the same computeInfraPlan seam the apply path uses (no parallel seam variable; one override point serves both call sites). Plan-loop body is wrapped in an IIFE so each provider's closer fires after its group is computed instead of deferring to function exit (multi-provider plan no longer holds N gRPC connections open at once). Drops the duplicated planNoopProvider and applyV2RecordingProvider no-op implementations in cmd/wfctl tests in favor of the shared iactest.NoopProvider. Three structurally-identical 14-method shells become one. Atomic counters carried forward where used. Doc updates: - godoc on computePlanForInfraSpecs corrected: groups are concatenated in first-reference-in-`desired` order, not iac.provider declaration order (matches actual code). - CHANGELOG entry calls out the empty-desired alignment with apply (loop over groupOrder is empty when no specs reference any provider; use `wfctl infra destroy --dry-run` to preview teardown).
…tion when ForceNew or NeedsReplace
W-3b T3.6e — the binding TDD red→green commit for the v2 IaC contract
(rev3 fix for the cycle-2 self-contradiction: test + impl ship in the
same SHA, no t.Skip placeholder).
ComputePlan now classifies each existing resource via
p.ResourceDriver(spec.Type).Diff(ctx, spec, currentOut), running the
per-resource Diff calls in parallel under errgroup with a bounded
worker pool (default 8; WFCTL_PLAN_DIFF_CONCURRENCY env var override
clamped 1..32). Action emission:
- replace, when DiffResult.NeedsReplace OR any FieldChange.ForceNew
is true (the latter closes design issue C — pre-W-3b ForceNew was
silently downgraded to update);
- update, when DiffResult.NeedsUpdate is true and replace did not
fire;
- skip, when neither flag is set.
Net-new resources still emit create without dispatching Diff;
resources removed from desired still emit delete in reverse-dep order.
Nil-tolerance contract preserved: if p is nil, or if
p.ResourceDriver(typ) returns (nil, nil) for a resource type,
ComputePlan falls back to the legacy ConfigHash compare for the
affected resources. Replace cannot be expressed via the legacy path —
callers needing Replace must supply a provider whose drivers implement
Diff. Per-resource driver.Diff errors propagate via errgroup so
operators see the underlying cause (rate limit, network, etc.).
Test surface (platform/differ_replace_test.go, NEW; ships in this
commit per the rev3 atomicity rule):
- TestComputePlan_NeedsReplaceEmitsReplaceAction
- TestComputePlan_ForceNewWithoutNeedsReplace_StillEmitsReplace
- TestComputePlan_NeedsUpdateWithoutForceNew_EmitsUpdate
- TestComputePlan_DiffReturnsNoChanges_EmitsNothing
- TestComputePlan_NilProvider_FallsBackToConfigHash
- TestComputePlan_NilDriver_FallsBackToConfigHash
- TestComputePlan_DriverDiffError_PropagatesAsError
platform/fake_provider_test.go extended with newFakeProviderWithDiff
helper; in-package no-op fakeProvider/fakeDriver kept (cannot collapse
to iac/iactest until cache_test in T3.6f also depends on the helper —
deferred to keep T3.6e's diff bounded).
Carry-forward notes addressed:
- T3.6a note 1: dropped unused *testing.T param from newFakeProvider().
- T3.6a note 2: added compile-time interface conformance asserts on
fakeProvider and fakeDriver.
- T3.6a note 3: nil-provider AND nil-driver guards baked in; covered
by two explicit tests.
- T3.6a note 4: rewrote fake_provider_test.go godoc to behavior-based
phrasing.
cmd/wfctl test fakes updated to match the new dispatch model:
- readDriver.Diff now returns NeedsUpdate=true (the adoption tests
rely on the post-adopt ComputePlan emitting update; pre-W-3b that
was the ConfigHash compare's job).
- refreshOutputsCmdFakeDriver.Diff now returns (nil, nil) instead of
panicking — the refresh-outputs test fixture only exercises Read.
W-3b T3.6f. Wires the iac/diffcache package (W-3a/T3.5) into classifyModification: cache.Get is consulted before each ResourceDriver.Diff dispatch under the (PluginVersion, Type, ProviderID, SHAConfig, SHAOutputs) tuple; on hit, the cached DiffResult is used directly; on miss, the freshly-computed result is Put into the cache. Apply-time correctness does not depend on cache hits — fresh CI runners always miss and re-Diff (the cache is purely an amortization optimization for repeated `wfctl infra plan` against the same checkout). Cache backend selection follows iac/diffcache's WFCTL_DIFFCACHE env var contract: unset → filesystem (~/.cache/wfctl/diff/); ":memory:" → in-memory; "disabled" → noop. The package-level cache instance is lazy-initialised on first ComputePlan call and shared across subsequent calls; tests in the same package may swap it via the internal-package setDiffCacheForTest helper. platform/main_test.go (NEW) sets WFCTL_DIFFCACHE=disabled at TestMain so the platform test suite never reads/writes the developer's filesystem cache and so cache state cannot leak across tests with incidentally-aligned cache keys (caught during integration: T3.6e's Replace-emission test was Putting a result that polluted later update/no-op tests). Folds in the T3.6e code-review IMPORTANT carry-forwards (since both fixes touch platform/): - Note 1 (env-clamping testability): extract parseConcurrencyEnv as a pure function; new TestParseConcurrencyEnv table-driven test covers empty, non-numeric, "0", "1", "8", "32", "33", "100", "-5". - Note 2 (parallel-dispatch correctness): new TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff exercises N=5 modification candidates, asserts driver.diffCount.Load() == 5 and the resulting plan has 5 actions. - Note 3 (driver returns nil DiffResult): explicit test TestComputePlan_DriverReturnsNilDiff_EmitsNothing. And T3.6e adversarial-review minor cleanups: - Note 4 (i := i shadowing redundant in Go 1.22+): dropped. - Note 5 (errSentinel uses custom errFromTest): replaced with errors.New. - Note 7 (concurrency contract on ComputePlan godoc): added — p and the ResourceDriver instances it returns MUST be safe for concurrent use. New tests (3 cache-behaviour scenarios in differ_cache_test.go): - TestComputePlan_CacheHitSkipsDiff (second call against unchanged inputs hits cache; diffCount stays at 1) - TestComputePlan_CacheMissesOnDifferentInputs (varying SHAConfig forces re-dispatch) - TestComputePlan_NoopCacheNeverHits (disabled backend always re-dispatches)
…est (Copilot review) Strengthens the count-only TestComputePlan_ParallelDispatch_AllCandidatesObserveDiff (landed in T3.6f) per team-lead's explicit request: a regression that accidentally serialized Diff dispatch (e.g., g.SetLimit(1)) would still pass the count-only assertion as long as every candidate eventually got dispatched. The new TestComputePlan_ParallelDiffDispatch_InFlightGoroutinesObserved uses a channel-gated driver to prove ≥2 Diff goroutines are simultaneously in-flight before any returns: regression to serial dispatch would hang on the second `<-entered` and time out at 5s. Pure addition (no production-code change). cacheTestProvider.driver loosened from *cacheTestDriver to interfaces.ResourceDriver so the new channelGatedDriver shares the provider shell.
…parator (Copilot review)
Code-reviewer flagged the T3.6f cache PluginVersion key as fragile:
composing via `p.Name() + "@" + p.Version()` would let two
genuinely-different providers — `("foo", "bar@1.0")` vs
`("foo@bar", "1.0")` — collide on the literal string `"foo@bar@1.0"`
and serve each other's cached DiffResults. Today's registered
providers (digitalocean, dockercompose, mock) don't carry `@` in
either field so no observed bug, but there's no compile-time guard
against a future provider declaring `do@enterprise` or similar.
Replace with sha256(name + "\x00" + version) — fixed-length, NUL is
invalid in both fields by Unicode convention, ambiguity-free.
Matches how configHash already keys per-config inputs.
Three regression tests pin the fix:
- TestPluginVersionKey_NoCollisionOnAtSeparator (the actual bug)
- TestPluginVersionKey_NilProvider (defensive — empty key, no panic)
- TestPluginVersionKey_Stable (deterministic across calls)
Pure additive — no change to any existing test outcome. The cache
re-keys against the new digest, which means any DiffResults persisted
under the old `name@version` keys will miss on the next plan and
re-Diff naturally (cache misses are correct by design).
…tePlanVersion W-3b T3.7. Routes apply through wfctlhelpers.ApplyPlan when the loaded plugin's plugin.json declares iacProvider.computePlanVersion: v2 (read at provider load time and surfaced via the optional ComputePlanVersionDeclarer interface). Providers that don't declare the field, or declare anything other than "v2", take the legacy provider.Apply path. rev2/rev3-locked: NO env-var, NO operator-flippable gate. The v1/v2 routing is plugin-author-controlled via plugin.json from day 1 — there is no transitional WFCTL_USE_V2_APPLY flag to misuse. Wires the printDriftReportIfAny helper (added unwired in W-3a/T3.1.5 as foundation only). The v2 dispatch path is the production caller that surfaces the InputDriftReport to stderr after a successful ApplyPlan return; v1 path remains untouched per the W-3a "zero runtime change for v1 plugins" invariant. New plumbing: - iac/wfctlhelpers/dispatch.go (NEW): ComputePlanVersionDeclarer interface + DispatchVersionV2 const + DispatchVersionFor helper. Single override point for the dispatch decision. - iac/iactest/fakeprovider.go: NoopProvider gains DispatchVersion + ProviderVersion fields and ComputePlanVersion() method so tests drive both v1 (default empty) and v2 paths through the shared fake. - cmd/wfctl/deploy_providers.go: iacPluginManifest reads top-level iacProvider.computePlanVersion alongside existing capabilities.iacProvider.name; findIaCPluginDir returns the version; readIaCPluginComputePlanVersion is the load-time helper; remoteIaCProvider stores the value and exposes it via ComputePlanVersion() to satisfy the optional interface. (Re-reads plugin.json once per provider load rather than threading through loadIaCPlugin's 4-tuple var-seam — keeps the seam signature stable for the existing test override; cost is one tiny os.ReadFile vs the gRPC start.) - cmd/wfctl/infra_apply.go: applyV2ApplyPlanFn = wfctlhelpers.ApplyPlan test seam + dispatch branch in applyWithProviderAndStore. Drift report printed to writer on success (no-op when empty). - cmd/wfctl/infra_apply_v2_test.go: 3 new tests cover TestApplyWithProviderAndStore_V2RoutesThroughWfctlhelpers (v2 routes), TestApplyWithProviderAndStore_V1FallsThroughToProviderApply (v1/un-declared routes legacy), TestApplyWithProviderAndStore_V2 PrintsDriftReport (drift wiring asserted via writer-buffer substring). v1 fixture v1RecordingProvider intentionally does NOT implement ComputePlanVersionDeclarer to prove the dispatcher's "default to v1 when un-declared" branch.
…rage (Copilot review)
Code-reviewer flagged 3 IMPORTANT items in T3.7:
1. Comment/code mismatch on drift-report timing. The comment promised
"Run on success or partial failure" but the code gated on
`err == nil` (success only). The contract the comment described
is the more useful behavior — operators most need the
stale-input diagnostic when an apply fails ("which input went
stale during the failed apply?"). Without it, the failure error
and the "what changed" context are disconnected.
Fix: gate on `result != nil` instead of `err == nil`.
printDriftReportIfAny already no-ops on empty/nil reports so
unconditional-on-result-non-nil is safe.
2. No test for the drift-on-partial-failure path. Added
TestApplyWithProviderAndStore_V2PrintsDriftReportOnPartialFailure
which has applyV2ApplyPlanFn return (resultWithDrift, applyErr)
and asserts both: (a) the err propagates, AND (b) the drift
report still reaches the writer.
3. Optional-interface coverage gap. Two semantically-different "v1"
paths exist:
- Path A: provider doesn't implement ComputePlanVersionDeclarer
at all → type-assert fails → legacy. Covered by
v1RecordingProvider.
- Path B: provider implements interface but ComputePlanVersion()
returns "" (the realistic mid-transition state for v1 plugins
after the SDK update lands but before they migrate) → type-
assert succeeds, DispatchVersionFor returns "v1" → legacy.
Was untested.
Added TestApplyWithProviderAndStore_V1Path_DeclarerReturnsEmpty
using iactest.NoopProvider{DispatchVersion: ""}, which always
implements the interface (the method exists on the type). Pins
Path B specifically.
Pure correctness fixes — no signature change, no behavior change for
the success-only or v1-RecordingProvider paths.
…onversion
cmd/wfctl/deploy_providers.go remoteResourceDriver.Diff was passing
current.Sensitive (map[string]bool) directly into the args map.
structpb.NewStruct rejects map[string]bool — it accepts map[string]any
only — and the upstream plugin/external/convert.go::mapToStruct
returns &structpb.Struct{} on err rather than surfacing the typing
failure. Result: every Diff dispatch over gRPC for any provider whose
ResourceOutput.Sensitive map was non-nil (or even an empty
map[string]bool{}) silently observed args=map[] on the plugin side.
v1 plugins never tripped this because v1 dispatches IaCProvider.Plan
server-side (no ResourceDriver.Diff over gRPC). v2 (W-3b T3.7's
manifest-driven dispatch) surfaces it immediately on the first
existing-resource Diff call.
Fix: convert via sensitiveToAny() to the map[string]any shape
NewStruct accepts. Returns nil for empty/nil input so the wire stays
trim-friendly. Bug discovered during W-3b T3.9 runtime-launch
validation against an out-of-band gRPC stub plugin; the canonical
T3.9 in-tree test ships separately as a loader-seam Go integration
test (per team-lead direction + plan precedent at plugin/sdk/iaclint/).
Will surface in T3.10's PR description as a third
incidentally-fixed-by-W-3b bug.
W-3b T3.9. Exercises the full v2 dispatch chain — config parse →
state load → provider load (via the resolveIaCProvider seam from
T3.6c) → ComputePlan Diff dispatch (T3.6e/f) →
wfctlhelpers.ApplyPlan (T3.7's manifest-driven branch) → Replace
decomposition into Delete + Create → printDriftReportIfAny — by
injecting a Go in-process v2-declaring provider through the package-
level seam. No out-of-process gRPC binary or plugin.json under
internal/testdata/.
# ADR 007 — non-trivial deviation from plan-literal
Plan §T3.9 specified "Build a real gRPC-loaded stub provider plugin
in internal/testdata/stub-provider/." Team-lead authorized switching
to in-tree loader-seam validation per:
1. Plan precedent cite (plugin/sdk/iaclint/) is itself a Go
test-helper package, not a runnable binary.
2. Real-gRPC runtime validation lands in P-DO when DO sets
computePlanVersion: v2 in its plugin.json.
3. Hours-of-stub-plumbing cost doesn't earn proportional coverage
vs. T3.6e/f + T3.7 unit tests + this loader-seam end-to-end.
4. W-7 conformance suite is the recurring cross-PR gRPC harness.
Full reasoning + considered alternatives in
docs/adr/007-t3-9-runtime-validation-via-loader-seam.md.
# Tests
- TestApply_V2_LoaderSeamDispatch_EndToEnd:
- Writes a real config + filesystem state seeded with vpc
region=nyc3 (under iacStateRecord shape).
- Sets desired region=nyc1.
- Substitutes the resolveIaCProvider seam to return a Go provider
that declares v2 + has a driver returning NeedsReplace=true.
- Calls applyInfraModules (the production runInfraApply
entrypoint) and asserts driver.diffCount == 1, deleteCount ==
1, createCount == 1, plus exact identity of the deleted
ProviderID and the created Config["region"].
- TestApply_V2_LoaderSeam_DriftReportPrinted:
- Same loader-seam setup + applyV2ApplyPlanFn substitution
returning InputDriftReport with one entry.
- Captures os.Stderr and asserts the FormatStaleError block
reaches the operator (drift-report wiring T3.7 added is
end-to-end alive in the v2 loader path).
# Test infrastructure
- cmd/wfctl/main_test.go: NEW TestMain forces
WFCTL_DIFFCACHE=disabled so the platform diffcache (process-
scoped via getDiffCache lazy init) doesn't observe stale entries
from a developer's local ~/.cache/wfctl/diff/ as false-positive
cache hits skipping driver Diff dispatch. Same pattern as
platform/main_test.go from T3.6f. Caught during dev when the
end-to-end test failed in the full cmd/wfctl test run but passed
in isolation.
# Bug-class context
The Option-A draft (real gRPC binary; not retained on this branch
per the ADR) surfaced a real wfctl bug fixed in commit 40e07a1
(remoteResourceDriver.Diff sensitiveToAny conversion). The bug
exists independent of which T3.9 option ships; the fix is in tree
and surfaces in T3.10's PR description as the third W-3b
incidentally-fixed bug.
W-3b T3.10. Stages the W-3b PR body text in docs/prs/w3b-pr-body.md as a stable artifact the team-lead can copy-paste at PR-open time. Pure-additive doc; no code changes. Captures all three incidentally-fixed bugs surfaced during W-3b's binding dispatch wiring: 1. Delete-via-Apply state leakage (T3.3 doDelete + T3.7 dispatch) 2. ForceNew silently downgraded to Update (T3.6e replace emission) 3. map[string]bool drops gRPC args silently — sensitiveToAny converter (commit 40e07a1; surfaced during T3.9 runtime validation; v1 plugins never tripped it) Includes summary, BREAKING-change call-out, ADR reference, rollout notes, and test plan.
Per spec-reviewer's adversarial review of the prior keeps-grpc-stub variant: the durability invariant for recording-decisions requires preserving ALL transitions of a deliberation, not just the final landing. The original ADR (loader-seam variant) recorded only one team-lead direction; the keeps-grpc-stub variant (since superseded) recorded only one reversal. Neither captured the full B → A → B → A → B oscillation that played out during T3.9 execution. This commit: - Status header updated to "Accepted (with extensive deliberation history — see Decision history section)". - Context section adjusted to preface the deliberation history rather than imply a single-direction trajectory. - New Decision history section lists all 5 transitions with verbatim team-lead quotes + per-transition implementer action. - Final paragraph captures the meta-lesson: when team-lead path- flips mid-execution, reviewer + implementer should refuse to proceed and force explicit disambiguation. Both reviewers endorsed this hold during transition 4; the strict-interpretation invariant from using-superpowers was the operative rule. Pure ADR amendment; no code changes. Branch state (c9101ba T3.9 loader-seam + d2e50d4 T3.10 PR body) unaffected. Closes spec-reviewer's Issue 1 from c9101ba pre-review: "ADR-history erasure: cherry-picking 92f060e onto 40e07a1 erased the durable record of team-lead's 'Path #1 — keep A' reversal. Future branch-readers will see no record of why Option A was considered + rejected."
…10 user override)
…per user override)
…ator precedent (T9.3 quality fix)
…ss-plugin gate (T9.2 quality fix)
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an additive IaC extension seam (ProviderPlanner) to the public interfaces/ package, plus a new GitHub Actions gate that builds key external IaC plugins against the PR’s checkout to catch cross-repo compile breaks early. The PR also documents the new interface via an adapter author guide, ADR, and a brief entry in DOCUMENTATION.md.
Changes:
- Introduce
interfaces.ProviderPlanneras an optional interface for custom plan computation (PlanV2). - Add a
cross-plugin-build-testGitHub Actions workflow that checks out AWS/GCP/Azure plugin repos and runsgo buildwith a localreplaceto the PR’s workflow checkout. - Add documentation: adapter author guide + ADR +
DOCUMENTATION.mdentry.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
interfaces/iac_provider.go |
Adds the new optional ProviderPlanner interface to the IaC provider API surface. |
interfaces/iac_provider_planner_test.go |
Adds a compile-time-ish test asserting a mock type satisfies ProviderPlanner. |
DOCUMENTATION.md |
Documents ProviderPlanner and links to the guide + ADR. |
docs/iac/providerplanner.md |
New adapter author guide explaining when/how to implement and consume ProviderPlanner. |
docs/adr/009-providerplanner-included-per-user-override.md |
ADR recording the decision/ratification to ship the interface now. |
.github/workflows/cross-plugin-build-test.yml |
New CI workflow to build external IaC plugins against the PR’s code via go mod edit -replace. |
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
…s (Copilot review) Six Copilot inline findings + CodeQL workflow-permissions warning: 1. docs/iac/providerplanner.md: ComputePlan in v0.21.0 dispatches driver.Diff directly (in platform/differ.go); it does NOT call IaCProvider.Plan. The reverse is true (Plan delegates to ComputePlan in some implementations). Updated the call-chain description and the illustrative dispatch-site code block to reference the actual file (platform/differ.go) so adapter authors don't follow the wrong call chain. 2. docs/adr/009: replaced the personal email reference with "the workspace owner" so ADR provenance doesn't embed PII. 3. interfaces/iac_provider_planner_test.go: now actually verifies the additivity claim by reusing the package's existing mockProvider as the negative case — runtime assertion confirms mockProvider does NOT satisfy ProviderPlanner. Moved file to interfaces_test package to share fixtures. 4. .github/workflows/cross-plugin-build-test.yml: explicit `permissions: contents: read` (CodeQL workflow-permissions guidance); added `env: GOPRIVATE/GONOSUMCHECK` matching ci.yml + codeql.yml so downstream plugin builds resolve github.com/GoCodeAlone/* deps consistently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
W-9 from the IaC conformance plan (Phase 3 final). Adds the ProviderPlanner optional interface + cross-plugin build CI gate + adapter author guide, per ADR 009.
This PR closes the design's W-9 commitment ratified under user Option C (ratified 2026-05-03): ship ProviderPlanner now rather than defer to first consumer, so adapter PRs can begin against a stable seam.
Changes
interfaces/iac_provider.go: addProviderPlanneroptional interface (purely additive).github/workflows/cross-plugin-build-test.yml: matrix CI gate (aws, gcp, azure) + ADR 009docs/iac/providerplanner.md: adapter author guideADR
docs/adr/009-providerplanner-included-per-user-override.md(ADR slot 008 reserved for W-5's JIT-substitution ADR)Incidental findings (plan-spec defects caught in review)
Plan rev10 §T9.2 didn't anticipate repo CI hygiene precedent. Fix-now applied during quality review (5-line diff total):
strategy.fail-fast: false— first failing matrix job no longer cancels the others (defeats the purpose of a cross-plugin breakage surface)pathsfilter now includesgo.mod+go.sum— dep-bump PRs no longer silently skip the gateconcurrencygroup on${{ github.ref }}withcancel-in-progress: true— standard repo hygieneSame defect class as W-4 PlanDiagnostic + W-5 T5.5 string drift. Worth a post-pipeline retro entry to bake CI-workflow hygiene checklist into upstream `writing-plans`.
Test plan
Engineering notes
🤖 Generated with Claude Code