docs: v0.19.0 architectural cleanup design#474
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a v0.19.0 architectural cleanup design document and fixes a wfctl infra state-store bug where --env was not applied when initializing/loading the iac.state backend (causing remote backends with env-only config to silently fall back to no-op persistence).
Changes:
- Add a detailed v0.19.0 architectural cleanup design doc (plugins manifest/lockfile, registries, typed args, migrate image, teardown).
- Thread
envNamethrough wfctl infra direct-path state loading/store resolution (resolveStateStore,loadCurrentState) for apply/destroy/status/drift/plan. - Add regression tests plus a changelog entry for the env-aware state store fix.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/plans/2026-04-24-v0.19.0-architectural-cleanup-design.md | New design doc describing the planned v0.19.0 architectural cleanup bundle and rollout. |
| cmd/wfctl/infra_state_store.go | Updates resolveStateStore to apply env-resolved module config before initializing the state backend. |
| cmd/wfctl/infra.go | Updates loadCurrentState signature and callers to pass envName so env-specific backends work. |
| cmd/wfctl/infra_apply.go | Uses env-aware loadCurrentState / resolveStateStore during apply. |
| cmd/wfctl/infra_destroy.go | Uses env-aware resolveStateStore during destroy. |
| cmd/wfctl/infra_status_drift.go | Uses env-aware resolveStateStore during status/drift. |
| cmd/wfctl/infra_state.go | Updates calls to the new loadCurrentState(cfgFile, envName) signature (currently passing empty env). |
| cmd/wfctl/infra_state_store_test.go | New tests covering env override behavior and persistence regression for apply. |
| CHANGELOG.md | Adds a 0.18.8 entry documenting the env-aware state store fix and tests. |
| **Goal:** Land four architectural changes together that share config-file shape (`wfctl.yaml` + `.wfctl-lock.yaml`), release boundary (workflow v0.19.0), and consumer impact (BMW CI deploy.yml collapses to pure declarative). Each is independently valuable; bundling avoids breaking downstream config format twice. | ||
|
|
| ## Non-goals | ||
|
|
||
| - Schema evolution of `ResourceSpec` or `ResourceState` — stable. | ||
| - Registry types beyond the five enumerated (digitalocean/aws-ecr/gcp-ar/azure-acr/github/docker). Future types file a follow-up; the dispatch is extensible. |
| | A. Plugin manifest + lockfile split | #42, #43 | workflow core | | ||
| | B. Multi-registry + IaCProvider.EnsureRegistryAuth | #48 | workflow core + all 4 IaC plugins | | ||
| | C. Typed-args refactor for IaCProvider gRPC | #41 | workflow core + all 4 IaC plugins | | ||
| | D. Official workflow-migrate Docker image | #49 | workflow core (CI only) | |
| **New command:** `wfctl infra teardown --env <env> -c <config> [--dry-run] [--approve | --yes]` | ||
|
|
||
| **Safety model (mandatory, non-negotiable):** | ||
|
|
||
| Teardown is the most destructive operation in wfctl — it deletes cloud resources AND wipes state. The command refuses to proceed unless one of these conditions holds: | ||
|
|
||
| | Flag | Behavior | | ||
| |---|---| | ||
| | `--dry-run` (default on first run without approval) | Prints a detailed preview of what would be destroyed + what state keys would be wiped. No API calls that modify anything. Exits 0 after preview. | | ||
| | `--approve` (explicit) | Skips interactive prompt. Required in non-interactive environments (CI, scripts). Cannot be aliased via env var. | | ||
| | Interactive TTY | Prints the same preview as `--dry-run`, then prompts `Type 'yes' to destroy all N resources and wipe state: ` on stderr. Only proceeds on exact `yes` literal match. | | ||
| | No flag, non-interactive | Errors out with "refusing to destroy without --approve or interactive confirmation. Run with --dry-run first." | |
| states := loadCurrentState(cfgFile, "") | ||
| if len(states) == 0 { | ||
| fmt.Println("No resources tracked in state.") | ||
| return nil | ||
| } | ||
|
|
||
| tw := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0) | ||
| fmt.Fprintln(tw, "Name\tType\tProvider\tProviderID") | ||
| fmt.Fprintln(tw, "----\t----\t--------\t----------") | ||
| for i := range states { | ||
| s := &states[i] | ||
| fmt.Fprintf(tw, "%s\t%s\t%s\t%s\n", s.Name, s.Type, s.Provider, s.ProviderID) | ||
| } | ||
| tw.Flush() | ||
| fmt.Printf("\n%d resource(s) tracked.\n", len(states)) | ||
| return nil | ||
| } | ||
|
|
||
| func runInfraStateExport(args []string) error { | ||
| fs := flag.NewFlagSet("infra state export", flag.ContinueOnError) | ||
| var configFlag, formatVal, outputFlag string | ||
| fs.StringVar(&configFlag, "config", "", "Config file") | ||
| fs.StringVar(&configFlag, "c", "", "Config file (short for --config)") | ||
| fs.StringVar(&formatVal, "format", "tfstate", "Export format: tfstate") | ||
| fs.StringVar(&formatVal, "f", "tfstate", "Export format (short for --format)") | ||
| fs.StringVar(&outputFlag, "output", "", "Output file (default: stdout)") | ||
| fs.StringVar(&outputFlag, "o", "", "Output file (short for --output)") | ||
| if err := fs.Parse(args); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| cfgFile := configFlag | ||
| if cfgFile == "" { | ||
| var err error | ||
| cfgFile, err = resolveInfraConfig(fs, configFlag) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| } | ||
|
|
||
| states := loadCurrentState(cfgFile) | ||
| states := loadCurrentState(cfgFile, "") | ||
|
|
| // Attempt env resolution so per-env backend config (e.g. region, prefix) | ||
| // is applied before initialising the store. Failure is non-fatal — fall | ||
| // back to the base config rather than dropping state persistence entirely. | ||
| if tmp, err := writeEnvResolvedConfig(cfgFile, envName); err == nil { | ||
| defer os.Remove(tmp) | ||
| cfgToUse = tmp | ||
| } |
| ## [0.18.8] - 2026-04-24 | ||
|
|
||
| ### Fixed | ||
|
|
||
| - **`resolveStateStore` and `loadCurrentState` now accept `envName`** — when `--env` is passed to `wfctl infra apply/destroy/status/drift`, per-environment overrides on the `iac.state` module (e.g. `region`, `bucket` prefix) are now applied before the state backend is initialised. Previously the base config was always used, so remote backends (Spaces, S3) that declare credentials or endpoints only under `environments.<env>:` failed with "region or endpoint must be set", silently falling back to no-op persistence and causing every deploy to re-create already-provisioned resources (409 cascades). | ||
|
|
||
| ### Tests | ||
|
|
||
| - `cmd/wfctl/infra_state_store_test.go` — `TestResolveStateStore_NoEnv_FallsBackToBase`, `TestResolveStateStore_EnvOverride_UsesEnvConfig`, `TestApplyInfraModules_PersistsResourceState` (end-to-end regression gate: verifies provider.Apply results are persisted to state store) |
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
There was a problem hiding this comment.
Pull request overview
Adds v0.19.0 architectural-cleanup design + execution plan documentation, and includes a wfctl infra state-store fix to correctly apply per-environment iac.state overrides (with tests and changelog entry).
Changes:
- Added design + implementation-plan docs for the v0.19.0 architectural cleanup bundle.
- Fixed wfctl direct-path infra commands to pass
envNameinto state-store initialization and state loading (so env overrides are honored). - Added regression tests and documented the fix in
CHANGELOG.md.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/plans/2026-04-24-v0.19.0-architectural-cleanup.md | New implementation plan doc for phased delivery of v0.19.0 cleanup features. |
| docs/plans/2026-04-24-v0.19.0-architectural-cleanup-design.md | New design doc describing the v0.19.0 cleanup features, schema, rollout, and test strategy. |
| cmd/wfctl/infra_status_drift.go | Propagates envName into state-store resolution for status/drift. |
| cmd/wfctl/infra_state_store_test.go | New tests covering state-store env override behavior and persistence regression gate. |
| cmd/wfctl/infra_state_store.go | Updates resolveStateStore to accept envName and apply env-resolved config before backend init. |
| cmd/wfctl/infra_state.go | Updates state list/export to call the new loadCurrentState(cfgFile, envName) signature. |
| cmd/wfctl/infra_destroy.go | Propagates envName into state-store resolution for destroy. |
| cmd/wfctl/infra_apply.go | Propagates envName into state loading + state-store resolution for apply. |
| cmd/wfctl/infra.go | Updates loadCurrentState signature and forwards envName where needed. |
| CHANGELOG.md | Adds v0.18.8 entry documenting the envName/state-store fix and new tests. |
| **Goal:** Land four architectural changes together that share config-file shape (`wfctl.yaml` + `.wfctl-lock.yaml`), release boundary (workflow v0.19.0), and consumer impact (BMW CI deploy.yml collapses to pure declarative). Each is independently valuable; bundling avoids breaking downstream config format twice. | ||
|
|
||
| **Why:** Tonight's BMW deploy surfaced four pieces of architectural debt through the blocker chain: |
| sensitive: true | ||
| sinks: | ||
| - type: github_secret | ||
| repo: "${{ github.repository }}" # or literal GoCodeAlone/buymywishlist |
| SPACES_access_key: ${{ secrets.SPACES_access_key }} | ||
| SPACES_secret_key: ${{ secrets.SPACES_secret_key }} | ||
| GH_TOKEN: ${{ steps.app-token.outputs.token }} | ||
| run: wfctl infra teardown --env staging -c infra.yaml --confirm |
| > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. | ||
|
|
||
| **Goal:** Ship workflow v0.19.0 with 7 architectural cleanup features bundled into a single config-schema release boundary (wfctl.yaml + .wfctl-lock.yaml): plugin manifest split, multi-registry, typed gRPC args, official migrate image, wfctl infra teardown, wfctl infra outputs (masked), wfctl deploy verify, declarative secret sinks. | ||
|
|
| // ── TestApplyInfraModules_PersistsResourceState ──────────────────────────────── | ||
|
|
||
| // TestApplyInfraModules_PersistsResourceState is an end-to-end regression gate | ||
| // verifying that applyInfraModules actually calls store.SaveResource for each | ||
| // resource returned by provider.Apply. This test would have caught the silent | ||
| // state-drop caused by the missing envName propagation to resolveStateStore. | ||
| func TestApplyInfraModules_PersistsResourceState(t *testing.T) { |
| ## [0.18.8] - 2026-04-24 | ||
|
|
||
| ### Fixed | ||
|
|
||
| - **`resolveStateStore` and `loadCurrentState` now accept `envName`** — when `--env` is passed to `wfctl infra apply/destroy/status/drift`, per-environment overrides on the `iac.state` module (e.g. `region`, `bucket` prefix) are now applied before the state backend is initialised. Previously the base config was always used, so remote backends (Spaces, S3) that declare credentials or endpoints only under `environments.<env>:` failed with "region or endpoint must be set", silently falling back to no-op persistence and causing every deploy to re-create already-provisioned resources (409 cascades). | ||
|
|
||
| ### Tests | ||
|
|
||
| - `cmd/wfctl/infra_state_store_test.go` — `TestResolveStateStore_NoEnv_FallsBackToBase`, `TestResolveStateStore_EnvOverride_UsesEnvConfig`, `TestApplyInfraModules_PersistsResourceState` (end-to-end regression gate: verifies provider.Apply results are persisted to state store) |
…egistry, typed gRPC args, migrate image, teardown Five features bundled into v0.19.0 for shared config-file shape (wfctl.yaml + .wfctl-lock.yaml) and release boundary. Each addresses architectural debt surfaced during BMW tonight's deploy blocker chain. Features: - A. Plugin manifest + lockfile split (tasks #42/#43) - B. Multi-registry + IaCProvider.EnsureRegistryAuth (task #48) - C. Typed-args refactor for IaCProvider gRPC (task #41) - D. Official workflow-migrate Docker image (task #49) - E. wfctl infra teardown with mandatory dry-run + --approve flag (new) Non-goals: constraint-based plugin resolution (v0.20.0), transitive plugin deps, OCI chart/artifact registries, cross-registry mirroring. Autonomous pipeline target: v0.19.0 after BMW post-teardown stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sinks) Scope expanded from 5 to 7 features per user feedback on BMW CI gap audit: - F. wfctl infra outputs with masked-by-default sensitivity + GHA ::add-mask:: - G. wfctl deploy verify with multi-target healthcheck + retry/timeout gate - H. Declarative secret sinks (outputs.<field>.sinks[]) — plaintext never leaves wfctl process; built-in github_secret + github_env handlers; aws/gcp/azure sinks via plugin fan-out in v0.19.x Motivation: BMW's Capture staging DB URL step uses doctl + awk + gh secret set shell pipeline, leaking DATABASE_URL plaintext through stdout/env/argv. Declarative sink pattern (like terraform's output-to-secret-manager) writes the value in-process directly to the GitHub secrets API with libsodium encryption. Matches user's stated principle: "if BMW CI has provider-specific shell, fix it in workflow/wfctl so the CI stays declarative." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches design doc 2026-04-24-v0.19.0-architectural-cleanup-design.md: - Phase 1 alpha.1: Feature A (plugin manifest + lockfile) - Phase 2 alpha.2: Feature C client-side (typed gRPC args) - Phase 3 (DO plugin v0.8.0): Feature C server-side + integration tests - Phase 4 alpha.3: Feature B (multi-registry) - Phase 5 (DO plugin v0.8.1): Feature B server-side (EnsureRegistryAuth) - Phase 6a rc1: Feature D (workflow-migrate image) - Phase 6b rc2: Feature E (wfctl infra teardown) - Phase 6c rc3: Features F + G + H (outputs + verify + sinks) - Phase 7: v0.19.0 final + changelog + docs - Phase 8: Plugin fan-out (aws/gcp/azure/tofu) in parallel - Phase 9: BMW migration PR (after v0.19.0 stabilizes) Timing: all phases can merge independently; final v0.19.0 tag and Phase 9 hold until BMW's tonight deploy chain reaches prod /healthz green (task #26). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…source task column
608635b to
2cd43c2
Compare
…nfra_output (#476) * docs: v0.19.0 architectural cleanup design — plugin manifest, multi-registry, typed gRPC args, migrate image, teardown Five features bundled into v0.19.0 for shared config-file shape (wfctl.yaml + .wfctl-lock.yaml) and release boundary. Each addresses architectural debt surfaced during BMW tonight's deploy blocker chain. Features: - A. Plugin manifest + lockfile split (tasks #42/#43) - B. Multi-registry + IaCProvider.EnsureRegistryAuth (task #48) - C. Typed-args refactor for IaCProvider gRPC (task #41) - D. Official workflow-migrate Docker image (task #49) - E. wfctl infra teardown with mandatory dry-run + --approve flag (new) Non-goals: constraint-based plugin resolution (v0.20.0), transitive plugin deps, OCI chart/artifact registries, cross-registry mirroring. Autonomous pipeline target: v0.19.0 after BMW post-teardown stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: v0.19.0 design — add Features F, G, H (outputs, verify, secret sinks) Scope expanded from 5 to 7 features per user feedback on BMW CI gap audit: - F. wfctl infra outputs with masked-by-default sensitivity + GHA ::add-mask:: - G. wfctl deploy verify with multi-target healthcheck + retry/timeout gate - H. Declarative secret sinks (outputs.<field>.sinks[]) — plaintext never leaves wfctl process; built-in github_secret + github_env handlers; aws/gcp/azure sinks via plugin fan-out in v0.19.x Motivation: BMW's Capture staging DB URL step uses doctl + awk + gh secret set shell pipeline, leaking DATABASE_URL plaintext through stdout/env/argv. Declarative sink pattern (like terraform's output-to-secret-manager) writes the value in-process directly to the GitHub secrets API with libsodium encryption. Matches user's stated principle: "if BMW CI has provider-specific shell, fix it in workflow/wfctl so the CI stays declarative." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: v0.19.0 implementation plan — 7 features × 9 phases Matches design doc 2026-04-24-v0.19.0-architectural-cleanup-design.md: - Phase 1 alpha.1: Feature A (plugin manifest + lockfile) - Phase 2 alpha.2: Feature C client-side (typed gRPC args) - Phase 3 (DO plugin v0.8.0): Feature C server-side + integration tests - Phase 4 alpha.3: Feature B (multi-registry) - Phase 5 (DO plugin v0.8.1): Feature B server-side (EnsureRegistryAuth) - Phase 6a rc1: Feature D (workflow-migrate image) - Phase 6b rc2: Feature E (wfctl infra teardown) - Phase 6c rc3: Features F + G + H (outputs + verify + sinks) - Phase 7: v0.19.0 final + changelog + docs - Phase 8: Plugin fan-out (aws/gcp/azure/tofu) in parallel - Phase 9: BMW migration PR (after v0.19.0 stabilizes) Timing: all phases can merge independently; final v0.19.0 tag and Phase 9 hold until BMW's tonight deploy chain reaches prod /healthz green (task #26). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: address PR #474 review — reconcile feature count, flag naming, source task column * docs: v0.18.9 phase-continuation design — env-resolution consistency BMW deploy run 24888583717 created a duplicate DO App Platform app because wfctl infra apply used env-resolved name "bmw-staging" while wfctl ci run --phase deploy used base module name "bmw-app". Both paths call driver.Read by name; with different names they find different resources (or none) and each calls Create, producing duplicates. Root cause: cmd/wfctl/deploy_providers.go:769 reads m.Name directly after ResolveForEnv has been applied. Same class as v0.18.7's Task #32 fix but in the deploy-phase code path. Fix: refactor resolveModCfg closure to return *ResolvedModule, use resolved.Name at call sites. Audit + patch infra_output source resolution (task #56) with the same pattern. Ship as v0.18.9. Does not require state-sharing between IaC and CI phases; the bug is about names, not state. Both phases use driver.Read by name; aligning the names aligns the lookups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: v0.18.9 phase-continuation implementation plan 9 tasks across Phase 1 (core fixes: deploy_providers.go + infra_secrets.go + regression tests) and Phase 2 (release + BMW unblock: PR, merge, tag, BMW bump, teardown, redeploy). Same-class fix as v0.18.7 Task #32: env-resolved Name used consistently wherever modules are consumed. Target: v0.18.9 hotfix; unblocks BMW staging deploy from run 24888583717 duplicate-resource failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(wfctl): ci run deploy uses env-resolved module name (not base) Refactored resolveModCfg closure in deploy_providers.go to return *config.ResolvedModule so callers see both resolved.Name (env-override lifted from Config["name"]) and resolved.Config. All three call sites (iac.provider lookup, findByType, fallback loop) now read resolved.Name instead of m.Name. Same class as v0.18.7 Task #32 fix for ResourceSpec.Name — env override of Config["name"] was lifted into ResolvedModule.Name but deploy_providers.go read m.Name directly, ignoring the override. Caused BMW deploy run 24888583717 to create duplicate DO apps (bmw-app vs bmw-staging). Regression tested via: - TestPluginDeployProvider_UsesEnvResolvedName (new, was failing) - TestPluginDeployProvider_FallsBackToModuleNameWhenNoEnv (new, baseline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(wfctl): infra_output source module name flows through env resolution Introduces resolveInfraOutput(wfCfg, source, envName, stateOutputs) which translates the base module name in a "module.field" source string to its env-resolved name before looking up state. State is persisted under the env-resolved name (e.g. "bmw-staging-db"), so "bmw-database.uri" with --env staging now correctly finds the state entry. syncInfraOutputSecrets now accepts wfCfg and envName so the new resolution is applied for every infra_output secret in the generate list. The call site in infra.go (runInfraApply) loads the workflow config and passes it through. Closes task #56. Regression tested via: - TestInfraOutput_EnvResolvesModuleSource (new, was failing) - TestInfraOutput_NoEnvUsesBaseName (new, baseline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: CHANGELOG v0.18.9 entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(wfctl): stateKeys actually sorts keys (comment matched implementation) Agent-Logs-Url: https://github.com/GoCodeAlone/workflow/sessions/a0429849-a053-4485-914d-ccb115be94e8 Co-authored-by: intel352 <77607+intel352@users.noreply.github.com> * fix(wfctl): address 4 Copilot round-1 findings on v0.18.9 (#476) - resolveInfraOutput: ResolveForEnv ok=false now errors (config error) instead of silently falling back to base module name — prevents the env-resolution fix from being bypassed on misconfigured envs - stateKeys: add sort.Strings so error messages list available modules in deterministic order (comment already said "sorted") - infra.go: surface config.LoadFromFile error instead of discarding it — silent failure would regress env resolution to the pre-fix nil-wfCfg path - CHANGELOG: replace "Closes task #60" (ambiguous GitHub issue ref) with "Root cause from BMW deploy run 24888583717" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(wfctl): accurate error message + test for explicitly-disabled module in resolveInfraOutput Agent-Logs-Url: https://github.com/GoCodeAlone/workflow/sessions/3accbfdf-259b-4b98-a44e-8b538d3f5857 Co-authored-by: intel352 <77607+intel352@users.noreply.github.com> * fix(wfctl): gate LoadFromFile on envName + infra_output presence (#476) Skip config.LoadFromFile when env resolution is not needed: - envName="" → no env resolution, wfCfg=nil is correct - no infra_output generators → syncInfraOutputSecrets ignores wfCfg Avoids unnecessary file I/O on every infra apply when the caller has no infra_output secrets or is not running with --env. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Summary
Design doc for workflow v0.19.0 — five architectural cleanup features bundled for shared config-file shape (
wfctl.yaml+.wfctl-lock.yaml) and release boundary. Each addresses architectural debt surfaced during BMW tonight's deploy blocker chain.Features
app.yamlplugin version pins intowfctl.yamlmanifest + committed.wfctl-lock.yamllockfileScope + rollout
9-phase rollout across workflow core + 4 IaC plugins + BMW consumer migration. Details in the doc.
Test plan
🤖 Generated with Claude Code