fix(wfctl): v0.18.9 — env-resolution consistency in ci run deploy + infra_output#476
fix(wfctl): v0.18.9 — env-resolution consistency in ci run deploy + infra_output#476
Conversation
…egistry, typed gRPC args, migrate image, teardown Five features bundled into v0.19.0 for shared config-file shape (wfctl.yaml + .wfctl-lock.yaml) and release boundary. Each addresses architectural debt surfaced during BMW tonight's deploy blocker chain. Features: - A. Plugin manifest + lockfile split (tasks #42/#43) - B. Multi-registry + IaCProvider.EnsureRegistryAuth (task #48) - C. Typed-args refactor for IaCProvider gRPC (task #41) - D. Official workflow-migrate Docker image (task #49) - E. wfctl infra teardown with mandatory dry-run + --approve flag (new) Non-goals: constraint-based plugin resolution (v0.20.0), transitive plugin deps, OCI chart/artifact registries, cross-registry mirroring. Autonomous pipeline target: v0.19.0 after BMW post-teardown stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sinks) Scope expanded from 5 to 7 features per user feedback on BMW CI gap audit: - F. wfctl infra outputs with masked-by-default sensitivity + GHA ::add-mask:: - G. wfctl deploy verify with multi-target healthcheck + retry/timeout gate - H. Declarative secret sinks (outputs.<field>.sinks[]) — plaintext never leaves wfctl process; built-in github_secret + github_env handlers; aws/gcp/azure sinks via plugin fan-out in v0.19.x Motivation: BMW's Capture staging DB URL step uses doctl + awk + gh secret set shell pipeline, leaking DATABASE_URL plaintext through stdout/env/argv. Declarative sink pattern (like terraform's output-to-secret-manager) writes the value in-process directly to the GitHub secrets API with libsodium encryption. Matches user's stated principle: "if BMW CI has provider-specific shell, fix it in workflow/wfctl so the CI stays declarative." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches design doc 2026-04-24-v0.19.0-architectural-cleanup-design.md: - Phase 1 alpha.1: Feature A (plugin manifest + lockfile) - Phase 2 alpha.2: Feature C client-side (typed gRPC args) - Phase 3 (DO plugin v0.8.0): Feature C server-side + integration tests - Phase 4 alpha.3: Feature B (multi-registry) - Phase 5 (DO plugin v0.8.1): Feature B server-side (EnsureRegistryAuth) - Phase 6a rc1: Feature D (workflow-migrate image) - Phase 6b rc2: Feature E (wfctl infra teardown) - Phase 6c rc3: Features F + G + H (outputs + verify + sinks) - Phase 7: v0.19.0 final + changelog + docs - Phase 8: Plugin fan-out (aws/gcp/azure/tofu) in parallel - Phase 9: BMW migration PR (after v0.19.0 stabilizes) Timing: all phases can merge independently; final v0.19.0 tag and Phase 9 hold until BMW's tonight deploy chain reaches prod /healthz green (task #26). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…source task column
BMW deploy run 24888583717 created a duplicate DO App Platform app because wfctl infra apply used env-resolved name "bmw-staging" while wfctl ci run --phase deploy used base module name "bmw-app". Both paths call driver.Read by name; with different names they find different resources (or none) and each calls Create, producing duplicates. Root cause: cmd/wfctl/deploy_providers.go:769 reads m.Name directly after ResolveForEnv has been applied. Same class as v0.18.7's Task #32 fix but in the deploy-phase code path. Fix: refactor resolveModCfg closure to return *ResolvedModule, use resolved.Name at call sites. Audit + patch infra_output source resolution (task #56) with the same pattern. Ship as v0.18.9. Does not require state-sharing between IaC and CI phases; the bug is about names, not state. Both phases use driver.Read by name; aligning the names aligns the lookups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks across Phase 1 (core fixes: deploy_providers.go + infra_secrets.go + regression tests) and Phase 2 (release + BMW unblock: PR, merge, tag, BMW bump, teardown, redeploy). Same-class fix as v0.18.7 Task #32: env-resolved Name used consistently wherever modules are consumed. Target: v0.18.9 hotfix; unblocks BMW staging deploy from run 24888583717 duplicate-resource failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refactored resolveModCfg closure in deploy_providers.go to return *config.ResolvedModule so callers see both resolved.Name (env-override lifted from Config["name"]) and resolved.Config. All three call sites (iac.provider lookup, findByType, fallback loop) now read resolved.Name instead of m.Name. Same class as v0.18.7 Task #32 fix for ResourceSpec.Name — env override of Config["name"] was lifted into ResolvedModule.Name but deploy_providers.go read m.Name directly, ignoring the override. Caused BMW deploy run 24888583717 to create duplicate DO apps (bmw-app vs bmw-staging). Regression tested via: - TestPluginDeployProvider_UsesEnvResolvedName (new, was failing) - TestPluginDeployProvider_FallsBackToModuleNameWhenNoEnv (new, baseline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces resolveInfraOutput(wfCfg, source, envName, stateOutputs) which translates the base module name in a "module.field" source string to its env-resolved name before looking up state. State is persisted under the env-resolved name (e.g. "bmw-staging-db"), so "bmw-database.uri" with --env staging now correctly finds the state entry. syncInfraOutputSecrets now accepts wfCfg and envName so the new resolution is applied for every infra_output secret in the generate list. The call site in infra.go (runInfraApply) loads the workflow config and passes it through. Closes task #56. Regression tested via: - TestInfraOutput_EnvResolvesModuleSource (new, was failing) - TestInfraOutput_NoEnvUsesBaseName (new, baseline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@copilot review |
There was a problem hiding this comment.
Pull request overview
This PR targets wfctl v0.18.9 by fixing env-resolution consistency so deploy and infra_output secret resolution use the same env-resolved module identity, preventing duplicate cloud resources and failed state lookups.
Changes:
- Refactors
newPluginDeployProviderenv-resolution to return/use*config.ResolvedModuleso deploy usesresolved.Name/resolved.Typeconsistently. - Adds env-aware infra_output source resolution (
module.field) for secret generation by resolving the module name throughResolveForEnv. - Adds regression tests for both the deploy path and infra_output resolution, plus a v0.18.9 changelog entry and planning/design docs.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/plans/2026-04-24-wfctl-phase-continuation.md | Implementation plan documenting steps/tests for the v0.18.9 hotfix. |
| docs/plans/2026-04-24-wfctl-phase-continuation-design.md | Design doc describing the env-resolution identity mismatch and fix approach. |
| docs/plans/2026-04-24-v0.19.0-architectural-cleanup.md | Longer-term v0.19.0 plan (context/follow-up work). |
| docs/plans/2026-04-24-v0.19.0-architectural-cleanup-design.md | Longer-term v0.19.0 design (context/follow-up work). |
| cmd/wfctl/deploy_providers.go | Uses ResolvedModule so deploy uses env-resolved name/type instead of base m.Name. |
| cmd/wfctl/deploy_providers_env_test.go | Regression tests for env-resolved deploy target naming and no-env fallback. |
| cmd/wfctl/infra_output_secrets.go | Adds resolveInfraOutput helper and threads wfCfg/envName into infra_output secret sync. |
| cmd/wfctl/infra_secrets_env_test.go | Regression tests ensuring infra_output source resolves via env-renamed module. |
| cmd/wfctl/infra_output_secrets_test.go | Updates test call sites for new syncInfraOutputSecrets signature. |
| cmd/wfctl/infra.go | Passes wfCfg and envName into syncInfraOutputSecrets after infra apply. |
| CHANGELOG.md | Adds v0.18.9 entry describing the fixes and tests. |
| if resolved, ok := m.ResolveForEnv(envName); ok && resolved.Name != "" { | ||
| moduleName = resolved.Name | ||
| } |
There was a problem hiding this comment.
ResolveForEnv can return ok=false when the module is explicitly deleted for this env (m.Environments[envName] == nil). In that case this code breaks out of the loop without changing moduleName, potentially falling back to the base module name and reading stale/incorrect state outputs. Treat ok=false as an error for infra_output resolution (module removed for env) rather than silently continuing.
| if resolved, ok := m.ResolveForEnv(envName); ok && resolved.Name != "" { | |
| moduleName = resolved.Name | |
| } | |
| resolved, ok := m.ResolveForEnv(envName) | |
| if !ok { | |
| return "", fmt.Errorf("infra_output: module %q is not available in environment %q", moduleName, envName) | |
| } | |
| if resolved.Name == "" { | |
| return "", fmt.Errorf("infra_output: module %q resolved to empty name for environment %q", moduleName, envName) | |
| } | |
| moduleName = resolved.Name |
There was a problem hiding this comment.
Fixed in 25602af — ResolveForEnv ok=false now returns an explicit error instead of silently falling back to the base name. Message: infra_output: module X is not available in env Y (no environment override defined). This prevents the env-resolution fix from being bypassed on misconfigured modules.
| } | ||
| states := loadCurrentState(cfgFile, envName) | ||
| return syncInfraOutputSecrets(ctx, secretsCfg, secretsProvider, states) | ||
| wfCfg, _ := config.LoadFromFile(cfgFile) |
There was a problem hiding this comment.
config.LoadFromFile(cfgFile) error is ignored here. If loading fails (or returns nil), syncInfraOutputSecrets will skip env-based module-name resolution and may reintroduce the base-name vs env-resolved-name mismatch this PR is fixing. Please handle the error (and consider failing fast when envName != "").
| wfCfg, _ := config.LoadFromFile(cfgFile) | |
| wfCfg, err := config.LoadFromFile(cfgFile) | |
| if err != nil { | |
| if envName != "" { | |
| return fmt.Errorf("load workflow config for infra_output sync with environment %q: %w", envName, err) | |
| } | |
| return fmt.Errorf("load workflow config for infra_output sync: %w", err) | |
| } | |
| if wfCfg == nil && envName != "" { | |
| return fmt.Errorf("load workflow config for infra_output sync with environment %q: empty config", envName) | |
| } |
There was a problem hiding this comment.
Fixed in 25602af — error is now surfaced: wfCfg, err := config.LoadFromFile(cfgFile) with if err != nil { return fmt.Errorf("load config for infra_output env resolution: %w", err) }. Silent failure would have regressed env resolution to the pre-fix nil-wfCfg path.
| - **`wfctl ci run --phase deploy` now uses env-resolved module name** — `pluginDeployProvider.resourceName` was populated from `m.Name` (base config name) instead of `ResolvedModule.Name` (env-override lifted from `Config["name"]`). When infra apply had env-renamed a module (e.g. BMW's `bmw-app` → `bmw-staging` for staging), the deploy phase used the base name for `driver.Read` lookup, didn't find the resource, and went down the Create path — producing duplicate DO resources. Same class as v0.18.7 Task #32 fix, but in the ci-run code path. BMW deploy run 24888583717 is the regression case. Closes task #60. | ||
| - **`infra_output` source resolution now applies env override to module name** — `secrets.generate[].source: "bmw-database.uri"` now resolves to the env-resolved state key (e.g. `bmw-staging-db`) when `--env staging` is set, matching how infra apply persists state. Closes task #56. |
There was a problem hiding this comment.
The v0.18.9 changelog entry says it “Closes task #60”, but the linked issue #60 in this PR is about plugin handler coupling (PipelineHandler() getter), not env-resolution consistency. If this is intended to auto-close the GitHub issue, the reference looks incorrect—please adjust/remove to avoid closing an unrelated issue.
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
…tion) Agent-Logs-Url: https://github.com/GoCodeAlone/workflow/sessions/a0429849-a053-4485-914d-ccb115be94e8 Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Review complete. The two core fixes (env-resolved name in One bug found and fixed in 6674def: |
- resolveInfraOutput: ResolveForEnv ok=false now errors (config error) instead of silently falling back to base module name — prevents the env-resolution fix from being bypassed on misconfigured envs - stateKeys: add sort.Strings so error messages list available modules in deterministic order (comment already said "sorted") - infra.go: surface config.LoadFromFile error instead of discarding it — silent failure would regress env resolution to the pre-fix nil-wfCfg path - CHANGELOG: replace "Closes task #60" (ambiguous GitHub issue ref) with "Root cause from BMW deploy run 24888583717" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@copilot review |
| wfCfg, err := config.LoadFromFile(cfgFile) | ||
| if err != nil { | ||
| return fmt.Errorf("load config for infra_output env resolution: %w", err) | ||
| } | ||
| return syncInfraOutputSecrets(ctx, secretsCfg, secretsProvider, states, wfCfg, envName) | ||
| } | ||
|
|
There was a problem hiding this comment.
runInfraApply now always reloads the full workflow config via config.LoadFromFile(cfgFile) before calling syncInfraOutputSecrets. This work is unnecessary when envName == "" (no env resolution needed) and also when there are no infra_output generators (since syncInfraOutputSecrets will return early). Consider only loading/passing wfCfg when envName != "" and at least one secrets.generate[] entry has Type == "infra_output"; otherwise pass nil to avoid extra IO/parse overhead in the common case.
| wfCfg, err := config.LoadFromFile(cfgFile) | |
| if err != nil { | |
| return fmt.Errorf("load config for infra_output env resolution: %w", err) | |
| } | |
| return syncInfraOutputSecrets(ctx, secretsCfg, secretsProvider, states, wfCfg, envName) | |
| } | |
| var wfCfg any | |
| if envName != "" && hasInfraOutputGenerators(secretsCfg) { | |
| wfCfg, err = config.LoadFromFile(cfgFile) | |
| if err != nil { | |
| return fmt.Errorf("load config for infra_output env resolution: %w", err) | |
| } | |
| } | |
| return syncInfraOutputSecrets(ctx, secretsCfg, secretsProvider, states, wfCfg, envName) | |
| } | |
| func hasInfraOutputGenerators(secretsCfg *secrets.Config) bool { | |
| if secretsCfg == nil { | |
| return false | |
| } | |
| for _, gen := range secretsCfg.Generate { | |
| if gen.Type == "infra_output" { | |
| return true | |
| } | |
| } | |
| return false | |
| } |
There was a problem hiding this comment.
Fixed in b50e3f3 — LoadFromFile is now gated: only called when envName != "" AND at least one secretsCfg.Generate entry has Type=="infra_output". When either condition is false, wfCfg remains nil and syncInfraOutputSecrets skips env resolution via its existing nil-guard.
…ule in resolveInfraOutput Agent-Logs-Url: https://github.com/GoCodeAlone/workflow/sessions/3accbfdf-259b-4b98-a44e-8b538d3f5857 Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Review complete. Two remaining issues found and fixed in 4bc15d2:
All existing tests continue to pass. |
Skip config.LoadFromFile when env resolution is not needed: - envName="" → no env resolution, wfCfg=nil is correct - no infra_output generators → syncInfraOutputSecrets ignores wfCfg Avoids unnecessary file I/O on every infra apply when the caller has no infra_output secrets or is not running with --env. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| var wfCfg *config.WorkflowConfig | ||
| if envName != "" { | ||
| for _, g := range secretsCfg.Generate { | ||
| if g.Type == "infra_output" { | ||
| var loadErr error | ||
| wfCfg, loadErr = config.LoadFromFile(cfgFile) | ||
| if loadErr != nil { | ||
| return fmt.Errorf("load config for infra_output env resolution: %w", loadErr) | ||
| } | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
secretsCfg can be nil here, which would panic when evaluating secretsCfg.Generate. Since syncInfraOutputSecrets already treats a nil secrets config as a no-op, add an explicit guard (e.g., if envName != "" && secretsCfg != nil { ... }) before ranging over secretsCfg.Generate.
There was a problem hiding this comment.
Declined — secretsCfg cannot be nil at this point. Line 820 guards: if err != nil || secretsCfg == nil { return err }. Execution only reaches the optimization block when secretsCfg is non-nil. Adding a redundant guard would mislead future readers into thinking nil is possible here. b50e3f3 is correct as-is.
| ### Tests | ||
|
|
||
| - `cmd/wfctl/deploy_providers_env_test.go` — `TestPluginDeployProvider_UsesEnvResolvedName`, `TestPluginDeployProvider_FallsBackToModuleNameWhenNoEnv` | ||
| - `cmd/wfctl/infra_secrets_env_test.go` — `TestInfraOutput_EnvResolvesModuleSource`, `TestInfraOutput_NoEnvUsesBaseName` |
There was a problem hiding this comment.
The v0.18.9 changelog test list omits TestInfraOutput_ExplicitlyDisabledModuleErrors, which was added in cmd/wfctl/infra_secrets_env_test.go. Please update this line to include the third test (or list the file without enumerating specific test names) so the changelog stays accurate.
| - `cmd/wfctl/infra_secrets_env_test.go` — `TestInfraOutput_EnvResolvesModuleSource`, `TestInfraOutput_NoEnvUsesBaseName` | |
| - `cmd/wfctl/infra_secrets_env_test.go` — `TestInfraOutput_EnvResolvesModuleSource`, `TestInfraOutput_NoEnvUsesBaseName`, `TestInfraOutput_ExplicitlyDisabledModuleErrors` |
| if stateOutputs == nil { | ||
| return "", fmt.Errorf("infra_output: state outputs not available for source %q — did infra apply succeed?", source) | ||
| } | ||
| outputs, ok := stateOutputs[moduleName] | ||
| if !ok { | ||
| return "", fmt.Errorf("infra_output: module %q not found in state (available: %s)", moduleName, strings.Join(stateKeys(stateOutputs), ", ")) | ||
| } | ||
| val, ok := outputs[field] | ||
| if !ok { | ||
| return "", fmt.Errorf("infra_output: field %q not found in outputs of module %q", field, moduleName) | ||
| } | ||
| s, ok := val.(string) | ||
| if !ok { | ||
| return "", fmt.Errorf("infra_output: output field %q of module %q is %T, expected string", field, moduleName, val) | ||
| } | ||
| return s, nil |
There was a problem hiding this comment.
resolveInfraOutput re-implements the existing secrets.GenerateSecret "infra_output" generator logic (source parsing, state lookup, type assertion, and related error messages). This duplicates behavior across packages and risks drift/inconsistent user-facing errors. Consider reusing the existing generator by doing only the env-specific module-name resolution here (including the explicit-disabled error), then calling secrets.GenerateSecret("infra_output", ...) with an updated source and the _state_outputs map. That keeps infra_output semantics in one place and preserves existing error wording/prefixes.
| if stateOutputs == nil { | |
| return "", fmt.Errorf("infra_output: state outputs not available for source %q — did infra apply succeed?", source) | |
| } | |
| outputs, ok := stateOutputs[moduleName] | |
| if !ok { | |
| return "", fmt.Errorf("infra_output: module %q not found in state (available: %s)", moduleName, strings.Join(stateKeys(stateOutputs), ", ")) | |
| } | |
| val, ok := outputs[field] | |
| if !ok { | |
| return "", fmt.Errorf("infra_output: field %q not found in outputs of module %q", field, moduleName) | |
| } | |
| s, ok := val.(string) | |
| if !ok { | |
| return "", fmt.Errorf("infra_output: output field %q of module %q is %T, expected string", field, moduleName, val) | |
| } | |
| return s, nil | |
| return secrets.GenerateSecret( | |
| "infra_output", | |
| map[string]any{ | |
| "source": moduleName + "." + field, | |
| }, | |
| map[string]any{ | |
| "_state_outputs": stateOutputs, | |
| }, | |
| ) |
|
Round-4 follow-up (PR merged before fixes landed): Finding 1 (CHANGELOG test name): TestInfraOutput_ExplicitlyDisabledModuleErrors is in the test file (4bc15d2) — CHANGELOG omission is doc-only, can be a follow-up. Finding 2 (DRY refactor): Declined — out-of-scope for v0.18.9 hotfix per team review. Cross-package refactor + error prefix change logged as follow-up task. |
Problem
Two related env-resolution gaps caused BMW staging deploy to create duplicate DO App Platform apps (run 24888583717):
ci run --phase deploy(deploy_providers.go):resolveModCfgreturned only the config map; callers still readm.Name(base name) to identify the resource. When--env stagingrenamedbmw-app → bmw-staging, the deploy used the base name and tried to create a second resource instead of updating the existing one.infra_outputsecret generation (infra_output_secrets.go):syncInfraOutputSecretshad no access towfCfgorenvName, so source strings likebmw-database.uricould not be resolved to the env-overridden module name (bmw-staging-db) used as the state key.Changes
cmd/wfctl/deploy_providers.goresolveModCfg→resolveModule; return type changed from(map[string]any, bool)to(*config.ResolvedModule, bool)findByTypeclosure, fallback loop) now useresolved.Nameandresolved.Typecmd/wfctl/infra_output_secrets.goresolveInfraOutput(wfCfg, source, envName, stateOutputs)helper: parsesmodule.field, applies env-resolution viam.ResolveForEnv(envName), then looks up the statesyncInfraOutputSecretssignature extended withwfCfg *config.WorkflowConfig, envName stringcmd/wfctl/infra.gowfCfgand passesenvNametosyncInfraOutputSecretsTests
deploy_providers_env_test.go: 2 regression tests (env-resolved name / base name fallback)infra_secrets_env_test.go: 2 regression tests (BMW staging scenario / no-env path)syncInfraOutputSecretscall sites in tests updated with, nil, ""Follow-up (non-blocking)
stateKeys()returns unsorted keys in error messages —sort.Stringscleanup can be a follow-upFixes #60, #56.
🤖 Generated with Claude Code