Skip to content
Merged
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,5 @@ benchstat-output.txt
.worktrees
cmd/wfctl/.wfctl.yaml
cmd/wfctl/.wfctl-lock.yaml

.claude/
2 changes: 1 addition & 1 deletion cmd/wfctl/infra_apply_in_process_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ func TestApply_InProcess_PlanStaleDiagnostic_NamesChangedKeys(t *testing.T) {
plan := &interfaces.IaCPlan{
InputSnapshot: map[string]string{varName: planFP},
}
result, err := wfctlhelpers.ApplyPlan(context.Background(), inProcessFakeProvider{}, plan)
result, err := wfctlhelpers.ApplyPlanWithHooks(context.Background(), inProcessFakeProvider{}, plan, wfctlhelpers.ApplyPlanHooks{})
if err != nil {
t.Fatalf("ApplyPlan returned top-level error: %v", err)
}
Expand Down
56 changes: 56 additions & 0 deletions decisions/0040-v2-action-lifecycle-provider-compatibility.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# 0040. V2 action lifecycle — provider compatibility expectations + Phase 2 hard-cutover

**Status:** Accepted
**Date:** 2026-05-16
**Decision-makers:** autonomous pipeline (cloud-sdk-bcd shutdown; new fresh team for Phase 2/3 execution per user mandate 2026-05-16), Jon (operator — direction 2026-05-16: "#640 is worth tracking as well")
**Related:** GoCodeAlone/workflow#640, PR #639 (v2 hooks engine landing), `docs/migrations/2026-05-16-v2-lifecycle-phase1-inventory.md`, `decisions/0024-iac-typed-force-cutover.md` (no-compat-shim mandate), `feedback_force_strict_contracts_no_compat`

## Context

PR #639 landed `wfctlhelpers.ApplyPlanWithHooks` — the v2 action lifecycle hook path. wfctl persists state at each successful cloud-mutation boundary instead of waiting for whole-plan completion. The pre-existing `wfctlhelpers.ApplyPlan` is preserved as `ApplyPlanWithHooks(ctx, p, plan, ApplyPlanHooks{})` (empty-hooks path) for backwards compatibility.

#640 tracks the migration of all v1 callers to v2 + eventual removal of v1. Phase 1 (this ADR + inventory doc) defines the provider-side compatibility expectations + records the cutover constraint for Phase 2.

**Critical Phase 1 finding (verified inline 2026-05-16):** 3 of 4 IaC plugins (workflow-plugin-aws / -gcp / -azure) do NOT use the iac-codemod canonical pattern (`return wfctlhelpers.ApplyPlan(ctx, p, plan)`). Only workflow-plugin-digitalocean does. The 3 non-canonical plugins implement their own `IaCProvider.Apply` loop, dispatching per-action via `provider.ResourceDriver(...)` directly. Phase 2 v2 contract design must accommodate both implementation paths.

**Anti-pattern guard (per ADR 0024):** the strict-contracts cutover (`decisions/0024-iac-typed-force-cutover.md`) explicitly mandates "no compat shim, no build-tag dual-path" — graceful-fallback proto evolution would re-introduce the bug-class surface ADR 0024 forbids.

## Decision

**Adopt 5 normative provider-side compatibility expectations.** Plugins shipping Phase 2-conformant `IaCProviderRequiredServer.Apply` RPC MUST satisfy:

1. **Per-action success evidence.** `ApplyResponse` MUST include per-action outcome (success / error per `PlanAction`), not aggregated whole-plan result. Wire format: extend `ApplyResponse` proto with `repeated ActionResult actions`, where `ActionResult { uint32 action_index; ActionStatus status; map<string,string> output_keys; string error; }`. Required to fire wfctl-side `OnResourceApplied` hooks at correct boundary.

2. **Failed-delete preservation.** Apply MUST flag failed-delete actions distinctly (e.g., `ActionStatus = DELETE_FAILED`) so wfctl-side `OnResourceDeleted` hook does NOT fire and state is preserved. Today this is detected via `ApplyResult.Errors` array with no per-action granularity — invariant requires action-level tagging.

3. **Compensation evidence.** Apply MUST include the compensation outcome when create/replace persistence/routing fails — wfctl needs to know whether the cloud-side resource was successfully torn down (state DOESN'T leak) or whether compensation itself failed (state SHOULD be preserved with operator alert). Today this is opaque.

4. **Update-failure non-deletion.** Plugins MUST NOT pre-emptively delete existing managed resources on update failure. Engine-side already enforces this (per #639); the contract requires plugins to confirm they don't override with custom cleanup.

5. **ResourceReplacer advertisement.** Plugin manifest MUST advertise `ResourceReplacer` interface usage at the resource-type level so wfctl pre-mutation can decide whether to abort (delete-hook-active case) or allow (no-hook case). Today wfctl finds out at dispatch time, which is too late for safe rejection.

**Phase 2 ships as a coordinated PR cascade per ADR 0024.** No compat shim, no graceful proto fallback. Workflow + workflow-plugin-aws/gcp/azure/digitalocean all ship Phase 2-conformant `ApplyResponse` shape simultaneously. Plugins on workflow ≥ Phase-2-tag receive the new contract; plugins on older workflow tags are NOT supported (operator upgrades workflow + all 4 plugins together).

**Phase 3 plugin migration is bifurcated.** workflow-plugin-digitalocean (canonical-form delegate) gets codemod-driven rewrite once iac-codemod's `applyCanonicalCallExpr` constant + `rewriteApplyBody` + `isAlreadyDelegatedApplyBody` + `runAssertApplyDelegatesToHelper` AST functions are updated in lockstep (Phase 3 work). The other 3 plugins (aws/gcp/azure) need MANUAL migration since their custom Apply loops are not codemod-rewritable.

## Consequences

- **Phase 2 design pass MUST cite this ADR as a constraint.** The Phase 2 writing-plans agent MUST read ADR 0040 first and reject any compat-shim or graceful-fallback approach. Recorded explicitly here so the consequence survives team rotation per user mandate "recreate your agent team for each task."

- **Phase 2 release coordination cost is real.** Same shape as the strict-contracts cutover (referenced in `feedback_force_strict_contracts_no_compat`): 5 repos must release in coordination, each pinned to the same workflow tag. Operator workflow during the cutover: install/upgrade workflow Phase-2 tag → install/upgrade aws+gcp+azure+DO Phase-2 tags simultaneously.

- **Manifest validation gap on `cmd/wfctl/deploy_providers.go::findIaCPluginDir`** (per Phase 1 design Assumption 8) MUST be addressed in Phase 2 scope. `dispatch.go`'s own warning ("DO NOT rely on the manifest-validation guarantee in callers") means a typo in plugin's manifest `computePlanVersion` field SILENTLY falls to v1 dispatch. Phase 2 adds runtime validation.

- **Phase 3 plugin migration is bifurcated** (codemod for DO, manual for aws/gcp/azure). Phase 3 design must split into 4 sub-plans, not assume uniform codemod-fix-it run.

- **Cost of the per-action ActionResult proto field** is small (additive proto change) but the COORDINATION cost (5 repos simultaneous release) is large. Prior cloud-SDK extraction + plugin sweep precedents (this session) show the team can execute coordinated multi-repo cascades in a single autonomous pipeline run.

- **Phase 4 conformance scenario migration SHIPPED in the same PR as this ADR** (4 call sites total: 3 conformance scenarios + 1 test file in cmd/wfctl). Folded into Phase 1 because staticcheck SA1019 (from the new godoc Deprecated marker) required immediate migration. Removes the Phase 5 prerequisite gating.

- **Phase 5 ApplyPlan removal** now gates ONLY on Phase 2 + Phase 3 (since Phase 4 shipped with this PR). Plugin canonical-form propagation is the remaining blocker: DO already canonical; aws/gcp/azure must migrate to either canonical OR Phase-2-direct path.

## Alternatives rejected

- **Graceful proto-evolution fallback** — DIRECTLY contradicts ADR 0024. Would silently degrade plugin Apply outcomes to "all succeeded" when extended fields absent, exact failure mode ADR 0024 forbids.
- **Per-plugin opt-in via manifest flag** — re-introduces the dual-path bug class. ADR 0024 explicitly rejects build-tag/feature-flag dual-paths.
- **Single codemod-fix-it run across all 4 plugins** — would silently rewrite aws/gcp/azure custom Apply loops INCORRECTLY (codemod assumes canonical pattern). Phase 1 finding rules this out.
89 changes: 89 additions & 0 deletions docs/migrations/2026-05-16-v2-lifecycle-phase1-inventory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# V2 Action Lifecycle — Phase 1 Inventory

**Status:** Phase 1 complete 2026-05-16
**Tracking issue:** GoCodeAlone/workflow#640
**Phase 1 design:** `docs/plans/2026-05-16-v2-lifecycle-phase1-design.md`
**ADR:** `decisions/0040-v2-action-lifecycle-provider-compatibility.md`

## Background

PR #639 landed the v2 action lifecycle hook path (`wfctlhelpers.ApplyPlanWithHooks`) — wfctl can persist state at each successful cloud-mutation boundary instead of waiting for whole-plan completion. The pre-existing `wfctlhelpers.ApplyPlan` is preserved for backwards compatibility but is now legacy debt.

#640 tracks the multi-phase migration:
- **Phase 1 (this document) — SHIPPED**: inventory + provider-compatibility ADR + Deprecated marker
- **Phase 2 — pending**: design + ship v2-hooks-over-gRPC contract (HARD-CUTOVER per ADR 0024)
- **Phase 3 — pending**: migrate plugins to v2 contract (per-plugin manual + codemod for canonical-form plugins)
- ~~Phase 4~~ — **SHIPPED in this PR**: migrated 3 conformance scenarios + 1 test file from `ApplyPlan` to `ApplyPlanWithHooks(..., ApplyPlanHooks{})`. Folded into Phase 1 because staticcheck SA1019 (from the new godoc marker) required it
- **Phase 5 — pending**: remove `wfctlhelpers.ApplyPlan` entirely (after Phase 2 + Phase 3 land)

## Workflow-side caller inventory

### v1 `wfctlhelpers.ApplyPlan` callers (production)

| File:Line | Classification | Notes |
|-----------|----------------|-------|
| `iac/conformance/scenario_upsert_on_already_exists.go:88` | MIGRATED in this PR (Phase 4 folded) | Renamed to `ApplyPlanWithHooks(..., ApplyPlanHooks{})`. Empty-hooks form has zero semantic difference. |
| `iac/conformance/scenario_delete_action.go:74` | MIGRATED in this PR | Same |
| `iac/conformance/scenario_replace_cascade_preserves_dependents.go:92` | MIGRATED in this PR | Same |
| `cmd/wfctl/infra_apply_in_process_test.go:77` | MIGRATED in this PR (was missing from initial inventory; surfaced by staticcheck SA1019) | Same |

### iac-codemod tool references (NOT runtime callers)

| File:Line | Notes |
|-----------|-------|
| `cmd/iac-codemod/refactor_apply.go:29` (`applyCanonicalCallExpr` constant, `//nolint:unused`) | **Documentation-only constant; NOT consumed by AST rewriter.** Phase 3 lockstep update bumps this constant TOGETHER with `rewriteApplyBody` (line 1231 hardcoded `ast.NewIdent("ApplyPlan")`) + `isAlreadyDelegatedApplyBody` (line 630 hardcoded `sel.Sel.Name != "ApplyPlan"`) + `runAssertApplyDelegatesToHelper` + `refactor_apply_test.go:593`. Phase 1 does NOT touch this — bumping just the constant would create internal inconsistency. |
| `cmd/iac-codemod/refactor_apply.go:1208` (doc comment) | Same Phase 3 lockstep update |
| `cmd/iac-codemod/lint.go:54` (comment) + `lint.go:641` (matcher consumer) | Same |

### v1 `provider.Apply(ctx, &plan)` direct callers (workflow side, NOT through wfctlhelpers)

| File:Line | Classification | Notes |
|-----------|----------------|-------|
| `cmd/wfctl/infra_apply.go:486` | MIGRATE-NEEDED (Phase 2 — v1 dispatch path removal) | `else` branch sister to L473 `applyV2ApplyPlanWithHooksFn`. Selected by `DispatchVersionFor(provider) == DispatchVersionV2` predicate. v2 path uses `provider.ResourceDriver(action.Resource.Type)` per action — NOT `provider.Apply`. Phase 2 removes this v1 path entirely after all 4 plugins ship Phase 2-conformant Apply RPC + manifest declaration. |
| `cmd/wfctl/infra_apply.go:1615` | MIGRATE-NEEDED (Phase 2) | Same — second occurrence (likely refresh path). |
| `cmd/wfctl/iac_typed_adapter.go:350` | NOT IN V2 HOT PATH (Phase 2 wire-format work) | `typedIaCAdapter.Apply` is called ONLY on the v1 dispatch path. Per-action wire format change happens in `applyResultFromPB` (the typed adapter's response decoder) when Phase 2 extends `ApplyResponse` proto with per-action `Actions []ActionResult` field. Adapter dispatch shape doesn't need to change; response decoder does. |

## Plugin-side IaCProvider.Apply implementation inventory (verified 2026-05-16 via `gh api`)

| Plugin | File:Line | Pattern | Phase 3 path |
|--------|-----------|---------|--------------|
| workflow-plugin-aws | `provider/provider.go:237` `AWSProvider.Apply` | **NON-CANONICAL** — own loop with `p.mu.RLock` + init check + custom dispatch | **MIGRATE-NEEDED (Phase 3 MANUAL)** — codemod cannot rewrite |
| workflow-plugin-gcp | `provider/provider.go:226` `GCPProvider.Apply` | **NON-CANONICAL** — own `for _, action := range plan.Actions` with `p.ResourceDriver(action.Resource.Type)` per-action | **MIGRATE-NEEDED (Phase 3 MANUAL)** — codemod cannot rewrite |
| workflow-plugin-azure | `internal/provider.go:138` `AzureProvider.Apply` | **NON-CANONICAL** — own loop with `p.mu.RLock` | **MIGRATE-NEEDED (Phase 3 MANUAL)** — codemod cannot rewrite |
| workflow-plugin-digitalocean | `internal/provider.go:274-275` `DOProvider.Apply` | **CANONICAL** — `result, err := wfctlhelpers.ApplyPlan(ctx, p, plan)` delegate (with custom post-flush wrapper) | LEAVE-AS-IS for Phase 1; Phase 3 codemod CAN rewrite (after AST functions updated in lockstep with constant) |

## Major architectural finding

**3 of 4 IaC plugins do NOT use the iac-codemod canonical pattern (`return wfctlhelpers.ApplyPlan(ctx, p, plan)`).** The canonical-form constant in `cmd/iac-codemod/refactor_apply.go:29` is aspirational, not reality.

Phase 2 + Phase 3 implications:
1. **Phase 3 is NOT a single codemod-fix-it run.** 3 plugins need MANUAL migration; 1 plugin (DO) can be codemod-rewritten.
2. **Phase 2 v2 hooks contract design** must accommodate two plugin implementation paths:
- **(a) Canonical delegate**: `provider.Apply` → `wfctlhelpers.ApplyPlan(ctx, p, plan)` → `applyPlanWithEnvProviderAndHooks(ctx, p, plan, nil, hooks)` — wfctl-side hooks fire automatically at each `dispatchAction` boundary
- **(b) Custom loop**: `provider.Apply` runs its own `for _, action := range plan.Actions` loop, calling `p.ResourceDriver(action.Resource.Type)` per-action. The plugin must EMIT per-action outcome via the Phase 2 extended `ApplyResponse` proto so wfctl-side reconstructs the hook events
3. **Phase 2 contract MUST be a hard-cutover per ADR 0024.** No graceful proto fallback — workflow + 4 plugin repos ship the new ApplyResponse shape simultaneously, same coordination pattern as the strict-contracts cutover.

## Provider compatibility expectations (Phase 2 contract preview)

Per `decisions/0040-v2-action-lifecycle-provider-compatibility.md`, plugins MUST satisfy these 5 invariants at the IaCProviderRequiredServer.Apply RPC boundary:

1. **Per-action success evidence** — ApplyResponse MUST include per-action outcome (success/error per `PlanAction`)
2. **Failed-delete preservation** — Apply MUST flag failed-delete actions distinctly so wfctl `OnResourceDeleted` does NOT fire
3. **Compensation evidence** — Apply MUST include compensation outcome when create/replace persistence/routing fails
4. **Update-failure non-deletion** — Plugins MUST NOT pre-emptively delete on update failure (engine-side enforced; plugin must not override)
5. **ResourceReplacer advertisement** — Plugin manifest MUST advertise ResourceReplacer usage so wfctl pre-mutation gates correctly

## Out of scope for Phase 1

- Phase 2 gRPC contract design + implementation — separate design pass
- Phase 3 plugin migration — separate per-plugin design + execution passes
- Phase 4 conformance scenario migration — separate trivial PR
- Phase 5 ApplyPlan removal — gates on Phase 4 completion + plugin canonical-form propagation
Comment on lines +80 to +81

## References

- ADR 0024 (`decisions/0024-iac-typed-force-cutover.md`) — strict-contracts cutover precedent (no compat shim)
- ADR 0040 (`decisions/0040-v2-action-lifecycle-provider-compatibility.md`) — provider-side compatibility contract
- PR #639 — v2 hooks engine landing
- `iac/wfctlhelpers/apply.go` — engine source; package-level doc comment now embeds the v1/v2 contract pointer; `ApplyPlan` symbol carries `// Deprecated:` marker per Phase 1
- Prior product design: `docs/plans/2026-04-25-wfctl-lifecycle-product-design.md`
Loading
Loading