feat: native Helm components (experimental) Erik Osterman (Cloud Posse) (@osterman) (#2667)
what
Adds the native Helm component type to Atmos (stacked on the native Kubernetes work), so components.helm.<name> is a first-class component with the same stack semantics as Terraform/Kubernetes — rendered, diffed, applied, and deleted through the Helm Go SDK (no helm/helmfile binary required).
This branch contributes, on top of osterman/kubernetes-native-component:
- Native Helm component +
atmos helmcommands —template,diff,plan,apply,deploy,delete. Charts can be local, repository, OCI, or vendored from asource:(parity with terraform/kubernetes JIT provisioning).values:is the chart's values; secrets flow via!secretand are masked. - Marked experimental —
atmos helmrenders the[EXPERIMENTAL]badge and honorssettings.experimental/ATMOS_EXPERIMENTAL. - Real
diff—atmos helm diff/plannow produces a true unified diff via the helm-diff library (used as a Go library, not the CLI plugin;v3.15.10pins the samehelm.sh/helm/v4 v4.2.1Atmos uses). Secret values are redacted. Three baselines:- deployed release (default;
action.NewGet, cluster only for the baseline) --from-manifest=<path>— local baseline file (offline)--against=target[:<name>]— current manifests in a git deployment-repo provision target (offline; the GitOps producer-side diff)
- deployed release (default;
- GitOps producer side —
apply/deploycan publish rendered manifests to a git deployment repository (provisiontargets). A new optionalFetchercapability on the target registry letsdiffread that target's current state. - CI
.Diffjob summary at parity with the native Kubernetes component (collapsible block, Secrets omitted).
why
Helm has no native cross-release dependency ordering, no first-class secrets, and no in-process rendering — the ecosystem stitches together helm, helmfile, helm-secrets, and helm-diff. Atmos provides these directly through the stack model and the Helm Go SDK, including a real diff with no plugin to install, plus an offline GitOps-repo diff for producer-side workflows.
references
- Docs:
atmos helm,helm diff, Helm components - Changelog:
website/blog/2026-06-15-native-helm-components.mdx - Example:
examples/helm - Helmfile parity request: #2069
notes
- Experimental feature; ships behind the experimental gate.
- Pre-existing helm-feature lint debt (5 issues in
executor.go/provision.go, e.g.os.Getenv→viper, funlen, arg-limit) is tracked for a follow-up cleanup; the diff work itself is lint-clean against the base.
feat: local Terraform tests against cloud emulators Erik Osterman (Cloud Posse) (@osterman) (#2663)
what
- Run
atmos terraform test(Terraform's native*.tftest.hclframework) against a local cloud emulator instead of a real cloud account, via a newexamples/terraform-testsexample. - Add
before.terraform.test/after.terraform.testlifecycle events and wirecmd/terraform/test.goto capture output and fire them — which drives both componenthooks:and the native-CI plugin from one place. - New
emulatorworkflow step type that drives emulator up/down/reset, so declarativekind: stephooks can bring a sandbox up before tests and tear it down after (when: always), with no manualatmos emulator up/down. - Native-CI job step summary for
terraform test: pass/fail/skip badges and a per-run results table, alongside the existingplan/applysummaries. - Bug fix: under the Podman runtime,
parsePodmanContainerdropped the containerPortsarray, so the emulator endpoint resolved empty and Terraform silently hit real AWS (403 InvalidAccessKeyId). Podman's structuredPortsare now parsed intoInfo.Ports. - Docs (emulator step type, hook events, job summaries, hooks guide), a changelog blog post, a roadmap milestone, and a
docs/fixes/write-up for the Podman fix.
why
terraform test'sapplyrun blocks create real infrastructure, so they need a cloud account and spend and rarely run locally — pointing them at an emulator makes them free, hermetic, and identical local↔CI.- A single hook-events seam keeps the emulator lifecycle declarative (in the component) rather than a hand-written custom command, and reuses the existing
kind: stepmachinery. - The Podman fix is required for any emulator-backed Terraform to reach the sandbox at all on Podman (it also fixes the existing
emulator-awsexample), and is documented indocs/fixes/rather than the changelog because it restores already-intended behavior.
references
- Builds on the emulators feature (#2647).
- Podman fix rationale:
docs/fixes/2026-06-27-podman-port-readback-emulator-endpoint.md. - Changelog:
website/blog/2026-06-27-local-terraform-tests-with-emulators.mdx.
[codex] Fix mobile gutters and name runtime CSS Erik Osterman (Cloud Posse) (@osterman) (#2673)
what
- Renamed the homepage runtime stylesheet from
landing-redesign.csstolanding-runtime.css. - Updated the homepage import to use the new runtime stylesheet name.
- Tightened mobile and tablet hero CSS so the homepage content keeps consistent left/right gutters and CTA elements stay within the content column.
- Added a more compact phone hero by reducing vertical spacing, scaling mobile type, hiding the heavier demo/runs band on small screens, centering the overall mobile content column, placing cloud logos in the whitespace to the right of the value props, and centering the CTA row.
- Optimized the mobile AI section by hiding the decorative badge, reducing text scale/line-height, tightening spacing, and using left-aligned copy on phones.
why
- Makes the stylesheet name describe the current homepage theme instead of a past redesign event.
- Fixes the mobile homepage hero feeling clipped or overly left-aligned on narrow viewports without making the lower action area look disconnected.
- Helps the primary mobile hero and AI section content fit better above the fold on common devices.
- Protects the runtime hero from legacy broad landing-page header rules at responsive breakpoints.
references
- Validation: pre-commit hooks passed during commit.
- Validation: Docusaurus dev server compiled successfully with
src/css/landing-runtime.cssandAISection/styles.css. - Validation:
postcss.parsepassed for the updated CSS files.
feat: native Kubernetes components with GitOps deployment-repo delivery Erik Osterman (Cloud Posse) (@osterman) (#2607)
what
- Native
kubernetescomponent type. Define Kubernetes objects in stacks and runatmos kubernetes render|diff|plan|apply|deploy|delete <component> -s <stack>through the Kubernetes Go SDKs (server-side apply) — nokubectlorkustomizebinary required. - Inputs can be inline
manifests, files/directories (paths), and Kustomize overlays; full stack semantics (vars/env/auth/metadata/inheritance/overrides),--all/--affectedDAG ordering, Atmos Auth (e.g. EKS) integration, and dotted lifecycle hooks (before/after.kubernetes.*). - GitOps delivery via
provision.targets.apply/deploydeliver to a target selected bykind:kubernetesapplies to the cluster (default),gitrenders the manifests and commits them to a managed Git deployment repository (Argo CD/Flux source-of-truth) instead. Selected with--target(precedence:--target→provision.default→ implicit cluster), so existing components are unaffected. - New reusable, component-agnostic target-provisioner registry (
pkg/provisioner/target, registry pattern) + aProvisionArtifactmodel. The git target composes thepkg/gitservice: clone-reconcile agit.repositories.<name>, replace the managed templatedpathwith the rendered files, path-scoped commit with provenance trailers, and push-with-retry. Credentials come from Atmos Auth (GitHub STS);pull_requestpublishing is deferred. - Schema, LSP, docs, examples, changelog. Typed
kubernetescomponent andprovision.targetsin Go schema and both JSON schemas; LSP; command/config/stack docs;examples/kubernetesandexamples/kustomize; a changelog blog post and a roadmap milestone.
why
- Kubernetes should be orchestrated by the same stack-based engine as Terraform/Helmfile/Ansible — one set of inheritance, auth, and affected-detection — rather than shelling out to
kubectl/kustomizefrom glue scripts. - GitOps pipelines have always needed ad hoc glue to render manifests into a deployment repo, commit, survive push races, and wire credentials. Atmos already owns rendering, lifecycle events, and authentication, so
provision.targetsadds the delivery step with centralized safety rules — the same component config can apply to a cluster in dev and publish to a GitOps repo in prod with one flag.
references
- Builds on the Atmos Git foundational capability (#2597), now merged into
main, which provides the reusablepkg/gitservice andgit.repositoriesconfiguration consumed by the git target. - Docs: Kubernetes component,
atmos kubernetes.
Add emulator workflows, skill catalog, and website refresh Erik Osterman (Cloud Posse) (@osterman) (#2665)
what
- Added emulator workflow improvements, including emulator listing, Kubernetes readiness handling, Podman port parsing, and emulator-aware Terraform backend reads for AWS, GCP, and Azure.
- Added offline bundled AI skill catalog support, including available-vs-installed skill listing and install-by-name behavior.
- Added component dependency listing support plus updated examples, docs, landing-page demo assets, and website sidebar/landing refresh work.
why
- Makes local emulator workflows more reliable by keeping in-process backend reads pointed at emulator endpoints instead of real cloud services.
- Lets users discover and install bundled Atmos AI skills without requiring network or Git access.
- Improves dependency visibility and updates the docs/website experience around the new emulator and skill workflows.
references
- None.
Add parallel and matrix workflow control steps Erik Osterman (Cloud Posse) (@osterman) (#2635)
what
- Add
parallelandmatrixworkflow control steps with siblingneedsDAG scheduling. - Add configurable failure behavior, parent-owned grouped/prefixed/none output, summary rendering through UI helpers, and child result metadata capture.
- Keep the internal exec integration thin while placing the scheduler, matrix expansion, command child executor, output handling, and tests in
pkg/workflow. - Add workflow/schema validation, registered
pkg/runner/step/parallelandpkg/runner/step/matrixhandlers, JSON schema updates, andexamples/parallel-steps.
why
- Enables non-interactive workflow steps to run concurrently without moving orchestration policy into
internal/exec. - Provides deterministic dependency, failure, output, and matrix semantics before allowing broader step types inside concurrent groups.
- Documents the new behavior with runnable examples and keeps
pkg/workflowcoverage above 80%.
references
pkg/workflowcoverage: 82.9%- Validation run:
go test ./pkg/schema ./pkg/runner/step ./pkg/scheduler ./pkg/workflow ./internal/exec - Validation run:
go test ./cmd ./tests -run 'Workflow|workflow|Schema|schema' - Validation run:
./custom-gcl run --new-from-rev=origin/main --config=.golangci.yml
feat(emulator): local cloud emulators + emulator-based advanced quick-start & docs example drawer Erik Osterman (Cloud Posse) (@osterman) (#2647)
what
Emulator feature — run cloud-service emulators locally as first-class Atmos components:
emulatorcomponent kind + driver registry (pkg/emulator): theEmulatorDriverinterface,ResolveDriver/Drivers,Endpoint/Profiletypes, the built-in AWS Floci driver, and the AWS target-profile builder (dummy creds,AWS_ENDPOINT_URL, and the Terraform provider behavior flags env can't set).atmos emulatorCLI (cmd/emulator): lifecycle verbs (up/down/reset/list/…), flags, and shell completions.- Auth/identity binding so in-process AWS and Terraform both reach the emulator (
pkg/auth,pkg/component,internal/exec); generic provider-config contribution (pkg/generator). - Design captured as three PRDs:
docs/prd/emulators.md,docs/prd/kubernetes-identity.md,docs/prd/provider-config-contributor.md. - Examples + E2E:
examples/emulator-aws,examples/demo-floci, and the floci/acceptance jobs. - Changelog:
website/blog/2026-06-22-emulator-persistence.mdx.
Emulator-based advanced quick-start — rewrote the advanced tutorial to deploy a real event-driven AWS backend (KMS key, encrypted S3 bucket, DynamoDB table, SNS topic, SQS queue, SSM Parameter Store config) entirely on your laptop, with no AWS account and no credentials, via the emulator. New backing example examples/quick-start-advanced (replaces the old VPC-based one).
Docs UI — a right-side [Example] drawer that follows each quickstart page and shows the page's backing example files (QuickStartExampleDrawer, wired through theme/DocItem/Content + theme/Root, reading the file-browser plugin's global data). Plus restyled File, Terminal, KeyPoints ("You will learn"), KeyTakeaways, EmbedExample, and ActionCard components, a CodeBlock line-numbers toggle, and supporting theme/CSS overrides.
why
- Emulators let contributors and CI run the full Atmos workflow — auth, secrets, vendoring, toolchain, Terraform apply — against local cloud emulators, the same on a laptop and in CI, with no cloud credentials. That makes the advanced tutorial runnable by anyone and gives fast, hermetic local iteration.
- The example drawer and component restyle let each tutorial page show its backing example inline, so readers can follow the docs and the code side by side.
references
- Stacked on
osterman/container-component-type(reuses its persistent container lifecycle viaComponentType: "emulator"). - See
docs/prd/emulators.mdfor the full design and per-step implementation sequence.
Fix bare command docs links Erik Osterman (Cloud Posse) (@osterman) (#2660)
what
- Adds explicit redirects from bare command overview routes for
auth,ai, andtoolchainto their canonical/usagepages. - Updates announcement and feature-card links to point directly at the canonical command overview URLs.
why
- Prevents users from hitting 404s when following bare command docs links like
/cli/commands/auth. - Keeps existing
/usagecommand overview URLs canonical without changing valid bare command routes such asworkflow,devcontainer, andci.
references
- Reported from
https://atmos.tools/cli/commands/authreturning 404. - Validated with
cd website && npm run build.
feat(hooks): run custom step types as lifecycle hooks (kind: step) Erik Osterman (Cloud Posse) (@osterman) (#2658)
what
- Add a new
kind: stepcomponent-lifecycle hook that delegates to the workflow/custom-command step registry, making every registered step type (container,http,toast,log,markdown, …) runnable on terraform lifecycle events — name a steptype:and pass its parameters underwith:. - Plumb the operation outcome to hooks: user hooks now fire on the failure path (not just success), a new
when: success|failure|alwaysselector (defaultsuccess) controls outcome-based firing, and{{ .status }}/{{ .exit_code }}/{{ .error }}template context plusATMOS_HOOK_*env vars (alongside component/stack) let a hook announce exactly what happened. - Tighten the
hooksJSON schema into a structured per-hook envelope (kindenum incl.step,events,on_failure,when,type,with,retry) across all three schema copies, kept non-breaking (additionalProperties: true). - Add docs (hooks reference + new sections), a PRD, a changelog blog post, and a roadmap milestone; unit tests cover routing, nested
with:decode,whenfiltering, outcome template/env exposure, retry, andon_failure.
why
- The hook system previously hard-coded a small kind list (
store,command,infracost,checkov,kics,trivy,git); every new capability meant a new kind. Reusing the existing, well-tested step registry lets the whole step library work as hooks without forking the abstraction. - A key use case — "the VPC component in the foobar stack failed" — was impossible:
after-*hooks fired only on success (cobra skipsPostRunEon error) and the outcome reached only CI hooks, never user hooks. Firing user hooks on failure withwhen+ outcome context closes that gap while defaulting to success-only so existing hooks (e.g.store) keep their behavior.
references
- PRD:
docs/prd/hooks-step-types.md - Docs:
/stacks/hooks#kind-step-run-a-step-typeand#reacting-to-success-or-failure - The
httpstep type used in the Slack example lands in a separate PR; the bridge works today with every registered step type.
Skip fork autofix and refresh setup-go pins Erik Osterman (Cloud Posse) (@osterman) (#2659)
what
- Skip the
atmos.ciautofixjob when a pull request comes from a fork. - Keep the existing
atmos-pro[bot]loop guard and same-repo PR autofix behavior. - Refresh eight
actions/setup-gov6SHA pins to match the current upstreamv6tag.
why
- Fork PRs do not receive OIDC, repo variables, or writable credentials, so
atmos pro commitcannot authenticate or push fixes. - Skipping the job avoids guaranteed red checks for external contributors while preserving formatting automation for internal PRs.
- The
verifyworkflow checks that SHA-pinned actions match their tag comments; the previoussetup-gopins pointed atv6.4.0while labeled asv6.
references
- Validated with workflow YAML parsing, upstream tag checks for
actions/setup-go, and commit hookcheck yaml.
feat(workflows): http step type (webhook alias) with retries Erik Osterman (Cloud Posse) (@osterman) (#2641)
what
- Add a native
httpworkflow/custom-command step (type: http) that performs an HTTP request — any verb (GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS),querystring parameters,headers, and a request body viabody(raw) orform(urlencoded, or JSON whenContent-Typeis JSON). - Keep
webhookas a first-class alias forhttp(type: webhookbehaves identically) for the fire-a-notification use case. This adds alias support to the step registry:NewBaseHandleris variadic for aliases,Get()resolves aliases, andList/Countreport only the canonical entry (no duplicate step type). - Per-attempt
timeoutandretrythat composes with the existingretry:policy; retry is HTTP-aware (transport errors,5xx, and429retry by default, other4xxfail fast, andretry.conditionsregexes force additional cases). - Configurable success criteria via
expect.status(codes) andexpect.response(regexes); the response body and status are captured as the step's value/metadata for downstream steps. - Schema fields on
WorkflowStepandTask(so it works in both workflows and custom commands) plus theHTTPExpectstruct,ErrHTTPStep*sentinels, JSON manifest updates, docs, anexamples/http-webhooksexample, a changelog blog post, and a roadmap milestone.
why
- Calling external endpoints (notify a service, trigger a CI job, hit a deployment webhook, poll a health check) previously required shelling out to
curl, which isn't portable (Windows), is awkward to template, and gets no first-class timeout/retry handling. - The step is a general-purpose, verb-agnostic outbound HTTP client, so
httpis the accurate name (an inbound callback receiver is what "webhook" conventionally means);webhookis retained as an alias so the evocative name still works. - Extended/registry step types are not wrapped by the legacy
retry.Dopath thatshell/atmosuse, so the handler applies retry itself viaretry.WithPredicate— which is what enables status-code-aware retry decisions a generic wrapper can't make.
references
- Docs: workflow step types and custom commands
- Changelog:
website/blog/2026-06-20-http-step-type.mdx
feat(workflows): add `say` step for audible TTS notifications Erik Osterman (Cloud Posse) (@osterman) (#2640)
what
- Add a new
sayworkflow step type that speaks itscontentaloud using text-to-speech, and gracefully degrades to printing the message as a Markdown blockquote when no speech engine is available or when running in CI/headless. - Introduce a reusable cross-platform
pkg/saypackage (mirroringpkg/browser) that detects macOSsay, Linuxspd-say/espeak/espeak-ng, and Windows PowerShellSystem.Speech, behind aSpeakerinterface with aCommandRunnerDI seam and functional options. - Support a CSS font-family-style
voicelist (first installed voice on the host wins), aratefield (slow/normal/fast), and aprintpolicy (fallback/always/never); add the matchingVoice/Rate/Printfields toWorkflowStepand sentinel errorsErrSayNotFound/ErrVoiceListUnsupported. - Add an
examples/say-something/reference example, workflow step-type docs, an announcement blog post, and a roadmap milestone under the Workflows Overhaul initiative.
why
- Long-running workflows often outlast your attention;
saygives an audible cue when a workflow finishes or needs input, going beyond the bell-onlyalertstep by announcing what happened. - Shelling out to
sayonly works on macOS — this makes audible notifications portable across macOS/Linux/Windows and safe in CI, so the same workflow runs unchanged everywhere and never fails on a missing engine.
feat(hooks): CI annotations and SARIF upload for scanner findings Erik Osterman (Cloud Posse) (@osterman) (#2631)
what
- Surface scanner-hook findings (Checkov, Trivy, KICS) in CI beyond the job summary:
ci.annotations(default on) — inline GitHub::error/::warningannotations anchored at each finding's file and line on the PR diff. The non-Code-Scanning path: needs no GitHub Advanced Security.ci.results(default off) — upload the raw SARIF to GitHub Code Scanning (Security tab) natively, with nogithub/codeql-actionstep. Analysis category is auto-derived from the scan target so per-component uploads don't overwrite each other.
- Implemented as native CI provider capabilities (
Annotator,SARIFReporter) — siblings of the existing check-run/comment/summary capabilities — not as hooks. All three reporting outputs (ci.summary/ci.annotations/ci.results) are gated byci.enabled. - Custom hooks opt in by adding
format: sarifto akind: commandhook — any SARIF-emitting tool (tfsec, semgrep, gitleaks, …) gets findings, annotations, and upload with no Go code. - Docs (incl. required GitHub Actions permissions), a changelog blog post, and a roadmap milestone.
why
- The CI job summary (#2617) gave a readable report, but the two richest GitHub surfaces — inline PR annotations and tracked Code Scanning alerts — were missing even though the data was already in the parsed SARIF.
- Modeling this as provider capabilities (rather than reviving the deprecated
ci.*hook kinds) keeps CI reporting where it belongs and lets every SARIF-emitting hook, built-in or custom, participate through one shared path.
references
feat(container): persistent container component kind + compositions Erik Osterman (Cloud Posse) (@osterman) (#2645)
what
Adds a stack-scoped, Atmos-native container component kind — one component is one persistent service — plus first-class compositions membership.
- Lifecycle (
atmos container <verb> <component> -s <stack>):build,push,pull,run,up,ps,logs,exec,restart,stop,rm,down. Long-running containers are discovered by labels derived from the canonical instance address<stack>/container/<component>(nameatmos-<stack>-container-<component>), not local state files.up/runbuild the image first whenvars.build-stylebuild:is set and the image is missing. - First-class config —
image,build,runare top-level component sections (reusing the workflow container-step structsContainerBuildStep/ContainerRunStep), not nested undervars. Container app env comes from the componentenv:section. atmos container listshows per-instance running state (running/stopped/unknown), discovered by label. The genericatmos list componentslists containers as a component type without container-specific status — consistent with terraform/ansible (there is noatmos terraform list/atmos ansible list).- Compositions — a first-class
composition:membership field + a top-levelcompositions:section (closed for membership, open for fulfillment). Operating a component with undeclared membership is a hard error;atmos composition validate <name> -s <stack>reports fulfilled vs. not-provided services. - Deep merge — the custom-component fallback now runs
metadata.inheritsinheritance + generic deep-merge of all top-level keys, so container honors catalog/abstract defaults like built-in kinds. Abstract components are rejected for execution and filtered from listings. - Extends the describe-component auto-detect and the describe/list type whitelist for
container(and fixes the pre-existingansiblegap inlist components).
why
Containers should be first-class, addressable component instances like terraform/helmfile/packer/ansible, and atmos list components should show whether each is running. A set of container components grouped by a composition is "our own Compose" with no compose.yaml. Implements docs/prd/container-components.md.
references
- PRD:
docs/prd/container-components.md,docs/prd/compositions.md - Examples:
examples/container-component/,examples/compositions/ - Docs:
website/docs/cli/commands/container/,website/docs/components/container.mdx - Contributor skill:
.claude/skills/atmos-core-component-development/
[!NOTE]
Stacked onosterman/container-step-prd(the container step), notmain. Changelog/roadmap are not required for this base (the gate is main-only); they'll go on the PR that brings the container feature tomain.
docs(ci): document github/artifacts planfile runtime-token requirement + E2E test Erik Osterman (Cloud Posse) (@osterman) (#2649)
what
- Planfile storage works end-to-end in CI. The
github/artifactsstore talks to the GitHub Artifacts runtime API for both upload and download, so a planfile uploaded by aplanjob can be consumed by a separatedeployjob in the same run. - Automatic, configurable drift verification on
deploy. When planfile storage is configured andatmos terraform deployruns under CI, Atmos downloads the stored plan, re-plans, compares them with a semantic JSON plan-diff, and applies the verified plan — failing on drift by default. Configurable viacomponents.terraform.planfiles.verify(fail | warn | off) and--verify-plan/--no-verify-plan(CLI > config > CI default). - Generalized the in-repo
github-runtimeaction to advertise planfiles, documented the runtime-token requirement, and added the automatic-flowplanfile-verify-e2eworkflow (kept the manualplanfile-artifacts-e2e).
why
- The same-run plan→deploy handoff (the core CI use case) was broken: GitHub's REST API won't serve an in-progress run's artifact, and verification was opt-in and undocumented.
- A planfile legitimately varies between review and apply (values known-after-apply, computed fields, hashes, ordering). A naive diff rejects a still-valid plan as "drifted"; the semantic comparison tolerates benign variation while catching real drift — which is what makes plan-then-deploy practical.
- Verification belongs on
deploy(which re-runsplan, so a fresh plan exists to diff against), notapply(which never re-plans).
references
- Docs: Planfile Storage, Planfile drift verification,
atmos terraform deploy - Changelog:
website/blog/2026-06-22-native-ci-planfile-verification.mdx
feat: native container steps for workflows and custom commands Erik Osterman (Cloud Posse) (@osterman) (#2626)
what
- Add a native
type: containerstep (build/push/run) to the shared step library used by both workflows and custom commands, built on the existingpkg/containerDocker/Podman runtime (new ephemeral one-shot runner plus image build/tag/push/inspect helpers;ImageInspectadded to theRuntimeinterface, mocks regenerated). - Formalize step outputs: every named step exposes
value/values/metadata/outputs/skipped/error(command-like steps addstdout/stderr/exit_code), so a build step can publish an image reference consumed by later push/run steps via{{ .steps.<name>.outputs.<key> }}. - Support per-step
identityfor registry auth and Docker Buildx + Buildx Bake builds; Podman uses the nativepodman buildpath. - Add the
examples/container-stepexample and a hermetic GitHub Actions job ([container-step]) that exercises build → push → run against aregistry:2service onlocalhost:5000, including failure-propagation. - Document the step type (
website/docs/workflows), add a changelog blog post, and update the roadmap (container steps + step outputs marked shipped). - Land the design PRDs for the follow-on primitives —
container-components.md,compose-components.md, and a rewritten membership-basedcompositions.md— and trimcontainer-actions-and-step-outputs.mdto cover only the procedural step. Remove the earliertargets:-based composition scaffolding (pkg/composition,cmd/composition, the composition step, andschema.Composition*) in favor of those PRDs. - Split the container-step handler into focused files and reduce complexity to satisfy the lint gate.
why
- Atmos workflows and custom commands increasingly resemble CI pipelines; running containers natively (build images, push to registries, run tools) removes the need for one-off shell scripts and keeps the same automation usable locally and in CI.
- A first-class step-outputs contract lets build → push → run/deploy pipelines pass structured values without shell parsing or temporary env files.
- The procedural container step is the shippable foundation; the component kinds (container, compose) and compositions are specified as PRDs so the broader system can be designed and reviewed before implementation, without blocking this PR.
references
- PRDs:
docs/prd/container-actions-and-step-outputs.md,docs/prd/container-components.md,docs/prd/compose-components.md,docs/prd/compositions.md - Changelog:
website/blog/2026-06-17-native-container-steps.mdx - Roadmap initiative: "Container Composition & Local Development" in
website/src/data/roadmap.js
docs: add Custom to the Component Library Erik Osterman (Cloud Posse) (@osterman) (#2638)
what
- Add a Custom entry to the Component Library so command-backed custom component types are discoverable alongside Terraform/OpenTofu, Helmfile, Packer, and Ansible.
- New page
website/docs/components/custom.mdxexplaining custom component types (with a minimal Script Runner example and a native-vs-custom comparison), linking to the existing reference and changelog rather than duplicating them. - Wire the new page into the Component Library sidebar (
website/sidebars.js) after Ansible. - Surface custom types in the Component Library overview (
components-overview.mdx) — a pointer under the Component Types table and a Next Steps bullet.
why
- Custom component types already shipped and are fully documented under
cli/configuration/commands#custom-component-types, but a user browsing the Component Library never saw them as a first-class option — the nav didn't match the actual capability. - This is a docs-only change (
no-release): no behavior changes, and the feature already has its own changelog post.
references
- Reference:
/cli/configuration/commands#custom-component-types - Changelog:
/changelog/custom-component-types
feat: support description in component metadata Erik Osterman (Cloud Posse) (@osterman) (#2634)
what
- Add an optional
descriptionfield to componentmetadata. - Update the embedded, test-fixture, and published website JSON schemas to allow
metadata.description. - Document the field in the component metadata reference and quick-start guides, and demo it in the quick-start example.
- Add a schema validation test (
pkg/datafetcher/schema_metadata_validation_test.go) verifying both the embedded and website schemas acceptmetadata.description. - Add a changelog blog post and a shipped roadmap milestone.
why
- Lets users document the purpose of a component inline, right next to its configuration — especially useful when many components share the same Terraform root module with different configs.
- The field is purely informational: it does not change how a component is processed, planned, or applied, so the change is safe and additive (component
metadatais already a free-form map at runtime, so no Go changes were required). - Schema support gives editors auto-completion and validation for the new field.
references
- Component metadata docs: /stacks/components/component-metadata
feat: terminal steps - tty/interactive fields and exec step type Erik Osterman (Cloud Posse) (@osterman) (#2602)
what
Terminal steps for custom commands and workflows — three related capabilities:
interactive: true— attach host stdin and let the step own Ctrl-C. Atmos suspends its SIGINT-exit handler while the step runs (newpkg/signalssuspension registry consulted by themain.gosignal handler).tty: true— allocate a pseudo-terminal (reusingpkg/terminal/pty, same engine asatmos devcontainer attach). The command sees a real TTY; secret masking is applied to PTY output. Withinteractive: true, the host terminal switches to raw mode so Ctrl-C flows through the PTY to the child.type: exec— replace the Atmos process entirely (shellexecsemantics):execveof the system shell on Unix (env, working directory, and terminal inherited natively;ATMOS_SHLVLunchanged), spawn-and-propagate-exit-code emulation on Windows. Validated to be the final step;tty/interactive/retry/timeout/outputare rejected on exec steps.
Architecture: all logic lives in narrow packages — pkg/process (RunShellStep routing, RunShellSession, ReplaceShellSession), pkg/schema (validation), pkg/signals (interrupt suspension). cmd/ and internal/exec contain only inline switch-case call sites; pkg/runner and the step handler share the same routing.
Also fixes in pkg/terminal/pty found along the way:
- stdin copier no longer blocks completion (it's detached, docker-CLI pattern)
- session teardown is bounded: when grandchildren (e.g. aws ssm's
session-manager-plugin) keep the PTY slave open after the child exits, output drains on a 1s deadline instead of hanging with the terminal in raw mode DisableStdinForwardfor-t-without--isemantics
why
Custom commands had no way to hand the terminal to an interactive process:
commands:
- name: ssh
steps:
- type: shell
command: "exec aws ssm start-session --target {{ .Arguments.instance_id }}"ran the SSM session as a piped, masked subprocess: full-screen rendering broke, and Ctrl-C inside the session killed Atmos itself (global SIGINT handler exits 130), killing the orphaned session with SIGPIPE.
With this change:
commands:
- name: ssh
steps:
- type: shell
tty: true
interactive: true
command: "aws ssm start-session --target {{ .Arguments.instance_id }}"behaves like docker run -it (supervised: masking preserved, more steps can follow), and:
- type: exec
command: "aws ssm start-session --target {{ .Arguments.instance_id }}"hands the process over entirely (launcher: native job control, zero proxy overhead, must be the last step).
references
- Reported in SweetOps Slack (SSM session via custom command gets a mangled terminal and dies with SIGPIPE on Ctrl-C); teardown hang + raw-terminal-after-exit reproduced live on this PR and fixed
- Docs: Interactive and TTY Steps
feat(secrets): declarative secrets management with !secret, CRUD CLI, and masking Erik Osterman (Cloud Posse) (@osterman) (#1911)
what
Implements the Secrets Management PRD end to end — a GitOps-friendly, multi-cloud secrets workflow built on top of the existing store registry (not a parallel backend). Secrets are declared in stack config (committed to git) and their values live in a cloud secret backend or a SOPS-encrypted file, managed with a Vercel-like CLI and resolved at runtime with a new !secret YAML function.
Stores (pkg/store)
StoreConfiggainssecret: true(subsystem membership) andkind(cloud/thing) with legacytypemapping;!storeagainst asecret: truestore is now an error ("use!secret").- New
DeletableStore/StatusStore/SecretAwareStoreinterfaces; AWS SSM writesSecureStringwhen used as a secret backend and gainsDelete/Has. - New store backends: AWS Secrets Manager and HashiCorp Vault (KV v2). Registry refactored to a table-driven builder map;
kind↔typecompatibility.
Secrets core (pkg/secrets)
service, declarationregistry,resolver,validator,kinds, and a leafpkg/secrets/providers/subpackage with a store-adapter (track 1) and a native SOPS provider (track 2:age/aws-kms/gcp-kms/gpg).- SOPS providers can be defined in
atmos.yaml, globally in a stack (secrets:top-level merges into every component), or under a single component.
!secret + masking (the headline)
!secret NAME [| path ...] [| default ...]wired into the live YAML pipeline, with automatic masker registration.- Mask-without-retrieval: inspection commands (
describe,list) resolve!secretto<MASKED>without contacting the backend when masking is on (the default) — so you can inspect a stack with no cloud credentials. Value-producing commands (secret get,terraform plan/apply) always retrieve;--mask/ATMOS_MASKonly controls redaction of display output. - Sensitive Terraform outputs (
sensitive = true) auto-register with the masker as they flow through!terraform.output/atmos.Component()/describe.
CLI (cmd/secret)
init, set (alias add), get, delete (alias rm), list, pull, push, import, validate.
Stack processing
secrets is now a first-class inheritable component section, plus a global stack-level secrets: block that merges into every component.
Docs + example
- Full Docusaurus docs:
atmos secretoverview + all 9 subcommands, secrets configuration page,!secretfunction page; blog post (with an embedded example) and a roadmap milestone. examples/sops-secrets/— fully self-contained, age-encrypted, no cloud credentials. Bundledatmos testcustom command (.atmos.d/test.yaml) proves the full lifecycle end to end (set → encrypted-at-rest → get → list → validate → masked-without-credentials inspection → reveal-needs-key).
why
There was no unified way to manage human-provisioned secrets in Atmos — stores were designed for machine-written Terraform outputs, and the historical workaround (Chamber) was AWS-only. This adds explicit, declarative secret registration so a secret must be declared before it can be set, read, or resolved, and makes "inspect a stack" decoupled from "authenticate to the secret backend."
references
- PRD:
docs/prd/secrets-management.mdanddocs/prd/secrets-masking/ - Example:
examples/sops-secrets/(runatmos test)
notes / follow-ups
- Fixed a pre-existing init-ordering bug where the global
--mask=falseflag did not disable the early-initialized I/O masker (onlyATMOS_MASK=falsedid).io.ReconcileMasking()now reconciles the masker after flags are parsed, so--mask=falseandATMOS_MASK=falsebehave identically. pkg/storebackend implementations could be moved into apkg/store/providers/subpackage (mirroringpkg/secrets/providers/) — deferred to a dedicated follow-up PR since it touches ~30 external call-sites.- Base-component (
metadata.component) inheritance of thesecretssection is not wired yet (component-level +import:+ global-stack inheritance all work).
feat(terraform): registry cache, RC management, and multi-platform mirror Erik Osterman (Cloud Posse) (@osterman) (#2582)
what
- Add a transparent Terraform/OpenTofu registry cache: an ephemeral local HTTPS network-mirror proxy (
pkg/http/proxy,pkg/terraform/{cache,registry}) that caches providers and modules in the canonicalfilesystem_mirrorlayout, enabled withcomponents.terraform.cache.enabled: true. - Add the
atmos terraform cachecommand group —list,stats,prune,delete, plusmirror(aliaswarm) for eager multi-platform pre-seeding andtrust/untrustfor the proxy certificate. - Add declarative Terraform CLI-config (
.terraformrc) management viacomponents.terraform.rc, exposed to the subprocess throughTF_CLI_CONFIG_FILE/TOFU_CLI_CONFIG_FILE. - Add a first-class
components.terraform.platformssetting (target<os>_<arch>list) that drives both eageratmos terraform cache mirrorpre-seeding (--all/--components/--query/-s, package-manager-style TUI,--format json|yaml) and automatic completion of.terraform.lock.hcl. - Keep
.terraform.lock.hclcomplete across platforms: a built-inafter.terraform.initprovisioner runsterraform/tofu providers lock -platform=…for the declaredplatformswhenever a customized provider installation method (the default plugin cache, or the registry cache) is active. Because it runs afterinit, it sees the fully JIT-vendored and code-generated working directory, so the generated provider set (including stack-config provider versions) is what gets locked — and committed lock files install cleanly on every platform in a fleet. - Generate and cache a self-signed loopback certificate so the proxy can serve HTTPS (required by Terraform/OpenTofu network mirrors); trusted automatically via
SSL_CERT_FILEon Linux/CI and via a one-timeatmos terraform cache truston macOS/Windows. - Add
examples/caching(auto-installs OpenTofu via the toolchain), PRDs, command + configuration docs, blog posts, and a roadmap update.
why
- Repeated and CI runs re-download the same providers and modules; the cache eliminates that, keeps runs working through registry outages, and preserves the exact versions a deployment used.
- Atmos enables a provider plugin cache (
TF_PLUGIN_CACHE_DIR) by default, and network mirrors behave the same way: Terraform can no longer record the registry's signed cross-platform checksums, soinitwrites a.terraform.lock.hclwith hashes for only the current platform and prints the "Incomplete lock file information for providers" warning. Declaringcomponents.terraform.platformslets Atmos complete the lock automatically for every target platform. - The lazy proxy only caches the host platform, so mixed CI/developer fleets and air-gapped reproducible builds need declarative multi-platform pre-seeding —
components.terraform.platforms+cache mirrorprovide it. - Declarative
rclets teams manage provider mirrors, credentials, and other CLI-config directives fromatmos.yamlinstead of per-machine.terraformrcfiles.
references
- Closes #2150
docs/prd/terraform-registry-cache.md,docs/prd/terraform-rc-management.md,docs/prd/terraform-registry-cache-tls.md
feat: Atmos Git — foundational capability for GitOps enablement Erik Osterman (Cloud Posse) (@osterman) (#2597)
what
Atmos Git: Git becomes a foundational platform capability, on par with Toolchain, Auth, and Hooks — the enablement layer for GitOps workflows where Atmos commits generated artifacts to a source-of-truth repository. PRD: docs/prd/git-ops.md.
- Top-level
gitconfig —git.repositories.<name>declares managed repositories (uri, branch, remote, clone depth/filter/single-branch/submodules,auth.identity,commit.signing/commit.author,push.retries),git.hooksdeclares local Git hooks,git.listconfigures list output. Workdirs default to automatic XDG cache locations ($XDG_CACHE_HOME/atmos/git/repositories/<name>) so the native CI cache captures and restores managed clones. pkg/gitservice — provider registry (registry pattern) with thecliprovider in v1 (chosen because GitHub STS materializes credentials asGIT_CONFIG_*env vars, which subprocess git honors and go-git ignores). Clone is defined as reconcile (clone-if-absent, else fetch + checkout + ff-only) so stale CI-cache restores are just faster clones. Safety rules: ff-only pull, no force push ever, push retry-with-rebase on non-fast-forward rejection, path-scoped commits that fail on unrelated dirty files, worktree path-traversal validation, per-invocation commit author injection (CI runners need nouser.name), provenance trailers (Atmos-Stack,Atmos-Component,Atmos-Source-SHA).atmos gitcommand group —clone,pull,status,diff,commit,push,list,clean, plusgit hooks install|uninstall|run, registered under the Git help group.--allbulk operations (bounded concurrency, attempt-all witherrors.Join). Clone accepts configured names, plain URLs, and go-gettergit::...?ref=&depth=URIs. No-arg clone in native CI (ci.enabled: true) infers the current repository from CI metadata and clones into the workspace — anactions/checkoutreplacement.atmos list git-repositoriesalias registered.githook kind — publishes generated artifacts on lifecycle events (after.terraform.apply, ...) to the current repository by default or a named managed repository, with templated commit messages, trailers, clean no-ops, and push-after-commit with retry. Inherits--skip-hooksandon_failure.- Local Git hook shims —
atmos git hooks installwrites worktree-aware.git/hooks/*shims (marker-protected,--forceto overwrite, warns whencore.hooksPathis set);rundispatches configured commands with stdin forwarding and exit-code propagation. - Error handling — new sentinels (
ErrGitRepositoryNotFound,ErrGitAuthFailed,ErrGitPushRejected,ErrGitDirtyUnmanagedFiles,ErrGitPathEscapesWorktree,ErrGitHookNotConfigured,ErrGitRepositoryRequired,ErrGitProviderNotFound) with error-builder hints and exit-code mapping. Git stderr streams to the masked writer and is never embedded in error chains. - Docs & example — command pages under
website/docs/cli/commands/git/,gitconfiguration reference, hook kind docs, changelog blog post (atmos-gitops), roadmap milestone (CI/CD Simplification initiative), and a GitOps publishing demo atexamples/gitops(reconcile → review → publish against a managed deployment repo via custom commands).
What this is — and isn't
Atmos owns the publishing side of GitOps: render → diff → commit → push, with centralized safety rules. Reconciliation stays with the consumer — Argo CD or Flux pulls from the repository, or CI applies on merge. There are no agents and no drift-correction loop in Atmos itself (explicit non-goal in the PRD); Atmos is the producer feeding the reconciler. This also isn't a replacement for the existing GitHub Actions plan/apply integration — it's the Git plumbing those pipelines use.
why
GitOps workflows have always needed glue: ad hoc scripts to render manifests into deployment repos, commit them, survive push races, and wire credentials. Atmos already owns rendering, lifecycle events, toolchain, and credentials (GitHub STS) — this PR gives it the Git operations between them, with centralized safety rules instead of per-pipeline shell scripts. It is the foundation for Kubernetes deployment-repository provisioning (Argo CD / Flux rendered-manifest publishing, on the kubernetes component branch) and a future github provider for pull-request-based publishing to protected branches.
references
- PRD:
docs/prd/git-ops.md(in this PR) - Coverage:
pkg/git86%,pkg/git/providers/cli88%,pkg/hooks/kinds/git94%,cmd/git81% - Related: native CI cache (XDG-root archiving) and the Kubernetes component branch (consumes
provision.gitnext)
feat: support dotenv files in !include Erik Osterman (Cloud Posse) (@osterman) (#1930)
Summary
Adds explicit dotenv file support to the existing !include YAML function. Dotenv files now resolve to maps, so they can be used directly in CLI and stack env sections and with YAML merge keys.
env:
<<: !include .env
AWS_REGION: us-east-2Dotenv files can also be layered with YAML merge sequences. This uses YAML's << merge-key syntax, the same YAML mechanism commonly used with anchors and aliases:
env:
<<:
- !include .env.local
- !include .env
AWS_REGION: us-east-2YAML merge sequence precedence is earlier item wins, and inline keys under env override all merged values.
What Changed
- Parse
.env,.env.*, and exact*.envfilenames as dotenv files when used with!include - Support
env: !include .envandenv: { <<: !include .env }/ block merge forms in stack config - Support dotenv
!includeinatmos.yamlenv, including merge sequences for layered dotenv files - Preserve
!include.rawbehavior for raw file contents - Keep
.envrcandfoo.env.localunsupported/raw; Atmos does not auto-load or execute dotenv files - Preserve YAML custom tags during schema validation so
env: !include .envsatisfies stack manifest schema rules - Update the stack manifest JSON schema description for
envto document the!includestring form - Document dotenv includes in both CLI
envand stackenvdocs, including YAML merge-key behavior, include path resolution, and layered files - Add a short blog post for explicit dotenv inclusion
- Add a roadmap milestone entry for the shipped dotenv
!includesupport - Add coverage-focused tests for dotenv merge-key retry handling, include path helpers, case-preservation helpers, and YAML custom-tag conversion
- Harden the LocalStack demo provider config to use the local edge endpoint directly, path-style S3, and skip AWS account-ID discovery so Terraform does not hang before reaching LocalStack in CI
Tests
cd examples/demo-localstack && ATMOS_IDENTITY=false go run ../.. describe component demo -s dev --format json --logs-level Off | jq '.providers.aws'cd examples/demo-localstack && ATMOS_IDENTITY=false go run ../.. validate stacks --logs-level Offgo test ./pkg/config ./pkg/validator ./pkg/filetypego test ./internal/exec -run 'TestGenerateProviderOverrides|TestGenerateProviderOverridesForAliases|TestProcessStackConfigProviderSection'go test ./pkg/config ./pkg/validator -coverprofile=.context/dotenv-include-coverage.outgo test ./pkg/utils -run 'TestInclude(Dotenv|ExtensionBased|RawFunction|WithNoExtension)'node -e "import('./website/src/data/roadmap.js').then(() => console.log('roadmap import ok'))"git diff --check- Real stack manifest schema regression:
env: !include .envvalidates againsttests/fixtures/schemas/atmos/atmos-manifest/1.0/atmos-manifest.json - Commit hooks passed: go-fumpt, Go build, go mod tidy, golangci-lint, whitespace/EOF/large-file checks
feat(ci): GitHub Actions build cache (atmos ci cache) Erik Osterman (Cloud Posse) (@osterman) (#2579)
what
- Add a CI build cache that restores the well-known Atmos cache root (
~/.cache/atmos— toolchain binaries, vendored components, remote import clones, provider/plugin caches) at startup and saves it at exit, using the same storeactions/cacheuses (GitHub Actions Cache Service v2). - New
atmos ci cachesubcommands:restore,save,list,delete— so the lifecycle can run in one invocation or be spread across CI steps. - New
ci.cacheconfiguration block (enabled,auto: off|restore|save|both,root,paths,key,restore_keys,compression) withATMOS_CI_CACHE_*env overrides. - Model it as a CI-provider capability (
provider.CacheProvider+ci.DetectCache()) with a backend registry (pkg/ci/cache) and a GitHub Actions implementation (pkg/ci/cache/github), mirroring the existing artifact subsystem; outside a runner it's a clean no-op. - Consolidate the default toolchain install path under the XDG cache root (
~/.cache/atmos/toolchain) so a single cache captures it; add a PRD, command/config docs, blog post, and roadmap entry.
why
- In CI, every job re-downloads the toolchain, providers, and modules from upstream — wasting time/bandwidth and exposing runs to transient and rate-limit failures. Persisting the cache root across jobs makes executions faster, more reliable, and reduces supply-chain exposure.
- Teams otherwise hand-wire an
actions/cachestep and own thekey/pathlogic themselves; Atmos already knows its cache root and can derive a stable key fromtoolchain.lock.yaml+ OS/arch, so it's two settings to enable. - Cache entries are write-once; a per-run state marker makes automatic and manual usage idempotent (an exact-key hit on restore skips the save), so the same operations work whether triggered automatically or via the subcommands.
references
- PRD:
docs/prd/native-ci/framework/ci-cache.md - Docs:
/cli/commands/ci/cacheand/cli/configuration/ci/cache - GitHub Actions Cache Service v2 (the store
actions/cacheuses)
🚀 Enhancements
Warn on explicit version constraint overrides Erik Osterman (Cloud Posse) (@osterman) (#2670)
what
- Downgrade version constraint failures to structured
log.Warnmessages when an explicit version override is present. - Detect overrides from
--use-version,ATMOS_VERSION_USE,ATMOS_USE_VERSION, andATMOS_VERSION, while keeping config-onlyversion.useenforcement unchanged. - Preserve fatal errors for invalid constraint syntax and add coverage for non-semver override binaries like
test.
why
--use-version ref:*can re-exec into unreleased binaries that reportversion.Version == "test", which previously failed constraint validation before the requested command could run.- Explicit overrides are intentional, so Atmos should continue with a warning that explains the bypass instead of enforcing the configured constraint.
references
- Closes #2668
fix(validate): dogfood `atmos validate stacks` for example YAML; fix nil-map crash Erik Osterman (Cloud Posse) (@osterman) (#2666)
what
- Replace the deprecated third-party
InoUno/yaml-ls-checkGitHub Action in the[validate]CI matrix with Atmos itself, runningatmos validate stacks --schemas-atmos-manifest <in-repo schema>against each example. - Expand the matrix to also validate three previously-excluded function-using examples (
custom-components,sops-secrets,onepassword-secrets), which now pass because Atmos understands its own YAML tags natively. - Fix a crash in
atmos validate stacks --schemas-atmos-manifest: it panicked withassignment to entry in nil mapwhen the targetatmos.yamlhad noschemas:section. Added a lazy-initializingSetSchemaRegistrysetter and a regression test.
why
- The third-party action targets Node 20 and is force-run on Node 24, emitting deprecation warnings across every
[validate]job; it also can't parse Atmos YAML tags, which forced many examples to be excluded from validation. atmos validate stacksis a strict superset of the old static check (YAML syntax + manifest JSON Schema + import resolution + duplicate-component detection) and parses Atmos tags natively — better coverage with no external dependency. Pointing--schemas-atmos-manifestat the in-repo schema lets a PR add a schema field and use it in an example in the same change.- Dogfooding immediately surfaced and fixed a real user-visible crash in the validate command.
references
quick-start-advancedandnative-terraformare intentionally left out of the matrix (documented inline): the former'sstacks/workflows/*.yamluses newer workflow step types the manifest schema doesn't describe yet, and the latter intentionally configures nostacks.name_pattern.
fix(hooks): store-output hooks inherit the run's default identity Andriy Knysh (@aknysh) (#2662)
what
- Make the terraform after-apply
store-outputshook path inherit the run's auto-detected identity for
stores that don't declare their ownidentity, matching the main terraform path. - Add a new
internal/exec.HookStoreDefaultIdentityhelper (auto-detect the active identity from the
auth manager's chain, normalize empty/select/disabledto"");cmd/terraform's
injectHookStoreAuthResolvernow callsSetAuthContextResolverWithDefaultIdentityinstead of the
resolver-only variant. - Fix an adjacent bug:
pkg/store.defaultIdentityForStorewas missing*SecretsManagerStore
(aws/asm), so AWS Secrets Manager stores never inherited a default identity on any path. Added
the case soaws/asmbehaves likeaws/ssm. - Tests:
internal/exec.TestHookStoreDefaultIdentity(new),cmd/terraform
TestInjectHookStoreAuthResolver_InheritsDefaultIdentity(replaces…_ResolverOnly), updated
pkg/storedefault-identity test so identity-lessaws/asmasserts inheritance, and Floci E2E
TestAWSStoreHooks_InheritedIdentity_FlociE2Ewith fixtureaws-store-hooks-floci-inherit. - Fix doc:
docs/fixes/2026-06-27-store-hook-inherit-default-identity.md.
why
-
Hook fix. Under Atmos auth,
atmos terraform applyon a component with astore-outputshook
applied successfully but then failed in the hook when the target store had noidentity:INFO Running hooks event=after.terraform.apply status=success ✓ Fetching <output> from <component> in <stack> Error: failed to assume write role: … get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, … ec2imds: GetMetadata …Hooks run in a freshly-loaded config, so the apply-phase store registry (and its injected default
identity) is gone. The hook re-injected the resolver but no default identity, so identity-less
stores fell back to the default AWS SDK credential chain — empty under Atmos auth (credentials live
in the keyring, not the environment) — and dropped to EC2 IMDS. The main terraform path and!store
reads already inherit the run's identity; this removes a surprising asymmetry and completes the
follow-up explicitly deferred in #2625 ("Component-identity inheritance for identity-less stores is
intentionally left for a follow-up design decision"). -
ASM fix.
defaultIdentityForStorehandled*SSMStore,*AzureKeyVaultStore, and*GSMStore
but not*SecretsManagerStore, soaws/asmstores without an explicitidentitycould never
inherit one. This was latent before (and was even codified by the old test); the hook fix's E2E
surfaced it. -
Backward compatible.
HookStoreDefaultIdentityreturns""whenever no identity is resolved
(no auth manager, or empty/select/disabled), andSetAuthContextResolverWithDefaultIdentity("")
is a no-op for the default — so runs without Atmos auth keep their prior ambient/default-SDK
credential behavior, and stores with an explicitidentityare never overridden.
references
- Follow-up to #2625 (AWS stores/secrets auth; deferred identity-less inheritance in the hook path).
- Related fix docs:
docs/fixes/2026-06-17-aws-stores-secrets-auth-and-gists.md,
docs/fixes/2026-05-25-store-hook-missing-backend-role-assumption.md.
fix(secret): skip remote-state reads in credential-free secret list Brian Ojeda (@sgtoj) (#2657)
what
- Make credential-free
atmos secret listskip the YAML functions that perform authenticated backend reads (!terraform.state,!terraform.output,!store,!store.get) while it enumerates secret declarations. - Add a
credentialFreeSkip()helper and use it in the two credential-free paths:secret list -s <stack>enumeration and the single-scopesecret list -s <stack> -c <component>path without--verify. - Authenticated paths (
get/set/exec/shell, andsecret list --verify) are unchanged — they keep skipping only!secret. - Adds
TestCredentialFreeSkippinning the skip set and adocs/fixeswrite-up.
why
- Secret listing is intentionally credential-free: it disables authentication so a large stack doesn't run one full auth cycle per component. But it still evaluated
!terraform.state/!terraform.output/!storein componentvars/settings. With auth disabled, the S3 backend assumes its configured role with no base credentials, the AWS SDK falls back to the default credential chain, and ultimately dials the EC2 IMDS endpoint — unreachable on a workstation — so the command aborts with a confusing assume-role/credentials error even immediately after a successfulatmos auth login. - Listing only needs the static
secrets.varsdeclarations (secrets.ExtractDeclarations), which never require a resolved value. Evaluating these functions was unnecessary and failure-prone. A skipped function leaves its raw string in place, which the declaration extractor ignores, so discovery is unchanged. - This is a regression: before credential-free enumeration was introduced,
secret listauthenticated per component, so these reads had credentials (slow, but working). Disabling auth removed the credentials without removing the reads.
references
- Related to #2639 (originally reported against
atmos secret list). - Follow-up to #2646, which made secret-list enumeration credential-free but left the credentialed function evaluation in place.
- Write-up:
docs/fixes/2026-06-23-secret-list-credential-free-skip.md - Verified with
go test ./cmd/secret/...and the repo'scustom-gcllint (both green), and end-to-end against a multi-account repo whose components reference cross-account!terraform.state:secret list -s <stack>aborted before, completes after (no state reads, no credential-resolution fallbacks).
fix(auth): retry transient auth on freshly-brokered STS git clones Erik Osterman (Cloud Posse) (@osterman) (#2653)
what
- Retry transient git authentication failures within a bounded window (default 30s, exponential backoff + jitter) only when Atmos brokered a fresh GitHub STS token this process — wired through a new
broker.HasBrokeredCredentials()signal and aCustomGitGetter.RetryAuthErrorsflag (existing per-sourceretry:config still takes precedence). - Keep auth failures terminal (fail fast) for non-brokered/static-credential clones, so a genuinely wrong or expired token is never masked by retries.
- Surface previously-swallowed credential-broker failures at
Warn(wasDebug, invisible at the defaultWarninglog level) and log an actionableErrorwhen the brokered-auth retry window is exhausted. - Add tests: brokered retry succeeds, non-brokered fails fast, bounded-budget exhaustion, and a
-raceconcurrency guard provingEnsureCredentialsprovisions exactly once with a happens-before barrier.
why
- Under Atmos Pro cross-repo STS,
atmos vendor pullintermittently failed withfatal: Authentication failedeven though the same run logged a successful token mint and OIDC auth — a subset of clones failed and a rerun was clean. - Root cause is GitHub's brief post-creation 401 window: a just-minted installation token is not yet valid across all of GitHub's git frontends. The atmos-pro server already self-heals its own API calls on this 401 (Sentry
APP-CLOUDPOSSE-COM-AM2), but the CLI git path did not —isRetryableGitErrortreated auth as terminal and vendor sources have no retry by default, so the earliest clones failed hard. - This gives the CLI the same tolerance the server has, scoped narrowly to brokered tokens so static credentials still fail fast, and removes the observability gap that made the failure hard to diagnose.
references
- Token TTL is GitHub's standard ~60 min (confirmed in atmos-pro
mint.ts/ token-provider), ruling out mid-run expiry; the post-mint propagation window is the cause. - Follow-up (out of scope):
revoke_on_exitcross-process token race on the shared, unlocked github/stsstate.json.
perf(stacks): dedupe per-identity auth in nested terraform.state resolution Brian Ojeda (@sgtoj) (#2656)
what
- Extends the per-component
AuthManagermemoization introduced in #2652 from the top-leveldescribe stackspass into the nested resolution path that runs while templates and YAML functions are evaluated (!terraform.state,!terraform.output,atmos.Component(...)). - Adds a process-scoped
nestedAuthManagerCache, consulted byresolveAuthManagerForNestedComponent, keyed by the parent auth chain + a deterministic JSON fingerprint of the component's auth section. - Extracts the key logic into a shared
buildComponentAuthCacheKeyused by both the processor cache (#2652) and this nested path, so the two keying strategies cannot drift. - Caches only successful, non-nil resolutions;
ResetStateCache()also clears the new cache (kept consistent with theterraformStateCacheit pairs with). Neither is reset in production.
why
- #2652 deduped per-component auth at the top level, but a component that references another component via
!terraform.statestill ran a full auth cycle (credential writes, file locks, keyring rebuilds) once per distinct target — even when every target resolves to the same identity.terraformStateCacheonly short-circuits a repeat read of the same target, not distinct targets that share an identity. - The result was the same N-auth blowup #2652 removed, just relocated into template/YAML resolution. Memoizing by identity removes it.
Measured — atmos describe stacks -s <stack> on a large real-world stack (credentials provided via auth exec, 45s cap; only the binary under test varies):
| build | wall time | per-component auth cycles |
|---|---|---|
| latest release | DNF (>45s) | — |
main (includes #2652) |
~17–19s | 44 |
| this PR (#2652 + nested dedup) | ~10–11s | 5 |
Output was verified equivalent to main: the remaining run-to-run differences are pre-existing auto_provision_workdir_for_outputs / tofu output provisioning nondeterminism present on both builds (same identity resolved throughout, no new errors). A matched-output pair differed by fewer lines than the main-vs-main noise floor.
The nested path is shared by describe affected, list, and terraform --all/--query. On a large multi-component change, the full per-identity auth cycles during describe affected likewise drop from scaling with the number of resolved components to roughly one-per-identity, with the rest served from the cache.
test plan
go build ./... && go test ./internal/exec/...— new unit tests cover key behavior, dedupe-by-identity, parent-chain isolation, errors-not-cached, unserializable-section-not-cached, and theResetStateCachecoupling.custom-gcl run --new-from-rev=main→ 0 issues.- Real-repo benchmark above.
references
fix(describe): respect metadata.enabled when evaluating component functions Brian Ojeda (@sgtoj) (#2655)
what
- Respect
metadata.enabledwhen the shared describe pipeline (describe affected,describe stacks,list) evaluates a component's functions:!terraform.state/!terraform.outputare skipped for components disabled viametadata.enabled: false— the raw function string is left unresolved (no backend read).atmos.Component(...)returns empty sections (including an emptyoutputs) when the enclosing component is disabled — no describe, no state read, and template-safe (.outputs.x/.vars.xevaluate to nil instead of erroring).
- The gate keys strictly on
metadata.enabled(via the existingisComponentEnabled), independent ofvars.enabled.
why
describe affecteddescribes the current and base stacks and evaluates every component's templates/YAML functions with nometadata.enabledgate. A component disabled withmetadata.enabled: falsethat references an unprovisioned component's state therefore failed hard withterraform state not provisioned— even though disabled components are (correctly) excluded from the final affected list. The enabled-aware filters (shouldSkipComponent,FilterAbstractComponents) only run when assembling that list, after the describe phase has already failed.
Fixes #2654.
references
- Fixes #2654
- Design notes:
docs/fixes/2026-06-22-describe-respect-metadata-enabled.md
test plan
- Unit tests:
disabledComponentTerraformSkip(adds the terraform funcs, clones the base skip),enclosingComponentDisabled(nil/absent metadata ⇒ enabled;vars.enabled:falsealone ⇒ enabled;metadata.enabled:false⇒ disabled),componentFuncreturns empty sections for a disabled enclosing component, and an end-to-endprocessComponentEntrytest (disabled ⇒!terraform.statenot resolved; enabled /vars.enabled:false-only ⇒ resolved). go build ./...,go vet ./internal/exec/..., andcustom-gcl run --new-from-rev=main(0 issues).
Note: the
TestDescribeAffected*integration tests are environment-sensitive and fail identically on a cleanmaincheckout locally (macOS); they are unrelated to this change. CI (Linux) is authoritative.
fix(stacks): scope and cache per-component auth in describe stacks Brian Ojeda (@sgtoj) (#2652)
what
- Move the stack and component filters above
resolveComponentAuthManagerinprocessComponentEntryso only in-scope components authenticate (auth still precedesBuildTerraformWorkspace, template, and YAML-function processing). - Add a pass-scoped auth cache keyed by the parent chain + a deterministic JSON fingerprint of the component auth section, so components that share an auth section reuse one authenticated manager.
- Regression tests: out-of-scope skip + cache reuse.
why
Any auth-enabled ExecuteDescribeStacks caller — atmos describe stacks, atmos list values/instances, atmos terraform --all/--query — resolves per-component auth before the stack/component filters and never reuses it. On a multi-stack repo where components declare their own default: true identity, atmos describe stacks -s <stack> authenticates components in other stacks before discarding them, and re-authenticates each same-identity component from scratch — so the command effectively hangs.
Per-component auth exists only to populate info.AuthContext for that component's later template (atmos.Component(...)) and YAML-function (!terraform.state, !terraform.output) processing, which is skipped for filtered-out components — so authenticating them is wasted work.
#2646 fixed atmos secret list by disabling per-component auth for that command; it did not touch the shared processor, so every other caller still hits this.
Measured with the identical command atmos describe stacks -s <stack> --logs-level Debug under a 45s budget, only the atmos binary varying:
| binary | result |
|---|---|
| latest release (v1.221.1) | did not complete within 45s (authenticating mostly out-of-scope stacks) |
current main (aa68d85be) |
did not complete within 45s |
| this PR | completed in ~18s |
With the fix, in-scope processor-path authentications drop to 2 and out-of-scope ones to zero (the ~42 remaining auths are legitimate nested !terraform.output / atmos.Component reads).
references
fix(secrets): fast, credential-free atmos secret list Erik Osterman (Cloud Posse) (@osterman) (#2646)
what
- Make
atmos secret listrequire no authenticated identity and never decrypt — it only reports whether each secret is set. On a 72-component stack, listing drops from ~21–34s (it previously authenticated every component and decrypted every secret) to effectively instant. - Disable per-component authentication during secret-list stack enumeration.
- Resolve SOPS initialization status from the file's cleartext key names — no age key, no decryption.
- Rewrite every store's existence check (
Has) to a non-decrypting metadata API: SSMGetParameterwithWithDecryption=false, Secrets ManagerDescribeSecret, GCPGetSecretVersion, Azure Key Vault properties pager, Vault KV metadata read, and a no-reveal 1Password check. - Add a tri-state
STATUS(initialized/missing/unknown) plus a new--verifyflag: remote-store status showsunknownby default;--verifycontacts backends with a read/describe identity (never a decrypt identity) on a fully-scoped target.
why
- Listing is introspection — it shows which secrets are declared and whether they exist, and never needs a plaintext value, so it should not force authentication or decryption (or require
kms:Decrypt-style permissions). - The old path authenticated per component and fetched+decrypted every secret just to populate the status column, making
secret listslow (and prone to hanging) on real-world stacks and unusable without cloud credentials. - Existence on a remote store still needs a read credential, so those rows now default to
unknown(credential-free) and opt into a real check via--verify, while local backends (SOPS) always show accurate status for free.
references
fix(secrets): SOPS cloud-KMS secrets authenticate via Atmos identity Erik Osterman (Cloud Posse) (@osterman) (#2643)
what
atmos secretand!secret(duringterraform plan) against a SOPS cloud-KMS backend now authenticate using the Atmos identity —--identity/ATMOS_IDENTITY, the per-providersecrets.providers.<name>.identity, or the stack/component effective identity — instead of requiring ambient cloud credentials.- The cloud is inferred from the SOPS file's master-key type at runtime (AWS KMS / GCP KMS / Azure Key Vault); there is no per-cloud
kind. Credentials are injected into the in-process getsops encrypt/decrypt viaApplyToMasterKey(no process-environment mutation). - Refactors SOPS into its own package
pkg/secrets/providers/sops/with a registry of per-cloud key handlers (aws.go/gcp.go/azure.go); the cloud-SDK credential building lives in the depguard-exemptpkg/store/sopsauth/bridge so the SOPS package imports no cloud SDK directly. - Threads the auth resolver + effective identity to the provider via a transient
AtmosConfiguration.SecretsAuthseam, populated in both theatmos secretand terraform code paths. - Fixes the SOPS decrypt error to emit identity/permission hints for cloud-KMS files (derived from the file's actual key types) instead of the misleading age-key hint.
why
- Resolves #2637: the documented
secrets.providers.<name>.identityfield and--identitywere silently ignored for the SOPS cloud-KMS track, forcing every command to be wrapped inatmos auth execeven though Track-1 stores (SSM/ASM/Key Vault/Secret Manager) already authenticated via the identity. - The fix is additive and backward compatible: with no resolvable identity, the SOPS provider falls back to the ambient credential chain exactly as before.
kindremains only for the legitimate age-vs-KMS keygen distinction. - Covered by a Floci KMS end-to-end test (ambient AWS creds cleared, identity-only
secret set/get— the exact #2637 scenario, RED before this change) plus unit tests for key-service selection, per-cloud registry dispatch, identity precedence, and kind-aware error hints.
references
- Closes #2637
docs/fixes/2026-06-20-sops-cloud-kms-identity.md(root cause, fix, and full backend audit)
Add Atmos media kit and CI branding Erik Osterman (Cloud Posse) (@osterman) (#2636)
what
- Add an Atmos media kit page, blog announcement, brandkit redirect, and generated ZIP download workflow.
- Add logo, wordmark, animated gradient, Atmos CI, and Atmos AI SVG variants for light and dark surfaces.
- Update native Terraform CI summaries and fixtures to use the Atmos CI lockup linking to https://atmos.tools/ci.
why
- Provide a canonical source for Atmos brand assets and usage guidance.
- Align CI summary branding with Atmos instead of the Cloud Posse logo.
- Keep animated treatment assets downloadable and consistent across docs, media kit, and CI output.
references
- Validation:
go test ./pkg/ci/plugins/terraform - Validation:
pnpm exec docusaurus build
Fix AWS store auth and add Floci E2E coverage Erik Osterman (Cloud Posse) (@osterman) (#2625)
what
- Fix AWS SSM/Secrets Manager store auth during hooks, describe, and secret commands, including inherited identities and secret-store access enforcement.
- Make slash kind notation canonical, add AWS store/secrets gists, document the fix, and add custom endpoint support for AWS, GCP Secret Manager, and Azure Key Vault.
- Add opt-in Floci E2E tests and CI coverage for AWS, GCP, and Azure store/secrets workflows.
why
- The reported SSM hook workflow could fall back to ambient AWS credentials or fail with a missing auth resolver even when the Terraform identity was valid.
- The feature needed full-circle examples plus emulator-backed regression coverage so AWS stores and declared secrets stay working across providers.
references
- No issue linked.
Fix use-version before command resolution Erik Osterman (Cloud Posse) (@osterman) (#2629)
what
- Run explicit
--use-version/ATMOS_USE_VERSIONre-exec before Cobra resolves subcommands. - Add regression coverage for env var,
--use-version=..., and--use-version ...forms with commands unknown to the current binary. - We also took the liberty of adding a few unrelated, test-only coverage improvements to satisfy Codecov; these do not change production behavior.
why
- Cobra rejected newly added commands before
PersistentPreRuncould switch Atmos versions. - This restores the workflow for testing new commands from
ref:,sha:, and PR Atmos builds.
references
- Closes #2624
- Tested with
go test ./cmd -run 'UseVersion|UnknownSubcommand|ParseUseVersion'andgo test ./pkg/version -run 'CheckAndReexec|UseVersion|RefVersion'.
fix(toolchain): harden cosign verifier bootstrap Erik Osterman (Cloud Posse) (@osterman) (#2627)
what
- Keep verifier bootstrap version resolution latest-first, using the existing authenticated GitHub/Aqua lookup path.
- Add a
sigstore/cosign@v3.0.6fallback only when latest-version lookup fails. - Add Renovate regex-manager coverage for the fallback cosign version so the safety pin is updateable.
- Update installer tests to prove latest wins when available, cosign falls back when lookup fails, and non-pinned verifier lookup errors still surface.
why
- Prevent OpenTofu toolchain installs from failing when cosign auto-install hits a slow or unavailable GitHub releases API.
- Avoid making the fallback version the default forever; normal installs still use the latest resolved cosign release when GitHub lookup succeeds.
- Preserve existing escape hatches: existing
cosignonPATHstill wins, andverifier_install: path_onlystill disables auto-install.
references
- Failing run: https://github.com/cloudposse/atmos/actions/runs/27661641040/job/81808473011
- Fallback cosign release: https://github.com/sigstore/cosign/releases/tag/v3.0.6
test: stabilize Terraform cache coverage Erik Osterman (Cloud Posse) (@osterman) (#2620)
what
- Add environment overrides for
components.terraform.cache.enabledandcomponents.terraform.cache.location, plus docs in the Terraform config and environment variable references. - Add focused registry-cache coverage, including Windows-safe trust command unit tests and a non-golden acceptance test with an isolated cache location.
- Stabilize acceptance CI provider reuse with a process-level
TF_PLUGIN_CACHE_DIRunder the Atmos XDG cache root, and bump the CI cache key soactions/cachesaves a fresh provider-plugin cache.
why
- The native registry cache should be testable on Windows only after its loopback certificate is trusted, but it should not be enabled globally where cold/warm cache state can flip snapshots or screenshots.
- Windows timeout mitigation should use Terraform’s provider plugin cache, which avoids the native cache proxy TLS trust problem.
- The new environment overrides make targeted cache dogfooding possible without editing shared fixture
atmos.yamlfiles.
references
- Related context from #2607.
- Validated with
go test ./pkg/config -run 'TestViperBindEnv_.*Cache',go test ./pkg/terraform/cache -run 'Test.*Trust|Test.*Windows',go test ./tests -run TestTerraformRegistryCache -timeout 10m, andgit diff --check.
refactor(utils): drop dead helpers and hand-rolled SliceContainsString Erik Osterman (Cloud Posse) (@osterman) (#2608)
what
- Replace the hand-rolled
SliceContainsString/SliceContainsStringHasPrefix/SliceContainsStringStartsWithhelpers with stdlibslices.Contains/slices.ContainsFuncacross ~39 call sites, and remove the helpers frompkg/utils. - Delete nine dead exported functions that had zero callers anywhere:
ExtractAtmosConfig,GetGitHubRepoReleases,GetGitHubReleaseByTag,GetGitHubLatestRelease,PrintAsHcl,NewHighlightWriter(plus the now-orphanedHighlightWritertype/method),GetAtmosConfigJSON,PrintAsJSONToFileDescriptor, andPrintAsYAMLWithConfig— including the now-emptyconfig_utils.goand cascaded-unused imports/aliases. - Convert two
depends_ondynamic errors instack_utils.goto wrapped static errors (ErrDependencyResolution); their messages now carry adependency resolution failed:prefix.
why
- First step in dismantling
pkg/utils, one of the repo's historical "dumping grounds" —CLAUDE.mdalready forbids adding to it, so this begins draining it. slices.Containsis the identical O(n) scan as the deleted helper (the hot path inyaml_utils.goalready uses an O(1) map), so there is no behavior or performance change from the swap; it also drops a per-callperf.Trackdefer.- The static-error conversion satisfies the
err113lint gate after a flaggedif/elsechain was restructured into early returns, and aligns with the mandatory static-error policy.
references
- Internal cleanup; no issue. Follow-up PRs will relocate the remaining
pkg/utilsfiles into purpose-built packages (pkg/yaml,pkg/filesystem,pkg/data, etc.).
fix(terraform): restore init + workspace in terraform shell, add --skip-init Erik Osterman (Cloud Posse) (@osterman) (#2616)
what
- Restore
terraform initand Terraform workspaceselect/newtoatmos terraform shellso the interactive shell again starts in an initialized component and the correct workspace (notdefault). - Extract a provisioner-free
executeTerraformInitCommandfromexecuteTerraformInitPhaseso the shell can runinitwithout re-firing thebefore.terraform.initprovisioners it already runs (no double execution). MainExecuteTerraformpipeline behavior is unchanged. - The shell now resolves the terraform/tofu binary (and toolchain), generates backend/provider-override files, and assembles the full component environment before launching — matching the shared pipeline.
- Add a
--skip-initopt-out toatmos terraform shell(reuses the existing terraform flag; no new flag definition). Workspace selection stays governed byworkspaces_enabled. - Add regression tests for the init → workspace → shell ordering, the
--skip-initdecoupling, and the shell'sshouldRunTerraformInit/shouldSkipWorkspaceSetupcontract; document--skip-initin the command docs.
why
- This was an accidental regression introduced in v1.202.0 by #1813, which migrated
terraform shellto a standaloneExecuteTerraformShelland silently dropped theinit+ workspace steps that the sharedExecuteTerraformpipeline used to run. - The result contradicted the published docs (which promise the command generates a backend file and creates the component's workspace) and forced users to pin old versions.
--with-secretsbehavior is preserved: secrets are still kept out of the on-disk varfile and withheld from the shell unless explicitly requested.
references
- Regression introduced in #1813 (first released in v1.202.0).
fix(templates): honor ignore_missing_template_values for stack name_template (#2345) Andriy Knysh (@aknysh) (#2619)
what
- Route the global
templates.settings.ignore_missing_template_valuesflag into every stackname_templaterendering site. Previously all 11 name-templateProcessTmpl(...)call sites passed a hardcodedfalse, so the flag was silently ignored for name-template rendering. - Sites updated: atlantis stack name, EKS cluster name, spacelift admin/stack name (describe affected),
describe localsname, spacelift utils, terraform workspace, terraform generate backends/varfiles, the shared name-template util, and validate stacks. - Add
TestBuildTerraformWorkspace_IgnoreMissingTemplateValuesasserting both directions (flag off → error; flag on →<no value>). - Incidental cleanup:
gofumptreformatting two adjacent pre-existingfmt.Errorfcalls instack_utils.go-err113debt undergolangci-lint --new-from-rev=origin/main. Converted them to the mandated static-wrapped-error pattern (new sentinelsErrInvalidDependsOn/ErrInvalidSettingsDependsOninerrors/errors.go) with tests covering both resolution branches and theerrors.Isbehavior.
why
- When a user sets
templates.settings.ignore_missing_template_values: true, they still hit hard errors likemap has no entry for key "..."whenever the error originated from rendering the stackname_template— because the name-templateProcessTmplsites bypassed the flag. - The fix is behavior-preserving: the flag defaults to
false, so existing configurations render exactly as before; behavior only changes for users who explicitly opt in. - The
err113conversion follows the repository's mandated static-error pattern and keeps the pre-commit/CI lint green; messages are unchanged.
references
- Closes #2345
- Fix doc:
docs/fixes/2026-06-15-name-template-ignore-missing-template-values.md
Summary by CodeRabbit
Summary by CodeRabbit
-
Bug Fixes
- Updated template rendering to consistently honor
ignore_missing_template_valuesacross stack- and dependency-related name derivations (including Terraform workspace and generated stack naming). - Added clearer error handling for invalid
depends_oninputs via dedicated sentinel errors.
- Updated template rendering to consistently honor
-
Tests
- Added regression tests covering enabled/disabled
ignore_missing_template_valuesbehavior and dependency resolution success/failure.
- Added regression tests covering enabled/disabled
-
Documentation
- Added a documentation page explaining the corrected
ignore_missing_template_valuesbehavior for stack name template rendering.
- Added a documentation page explaining the corrected
fix(flags): scope --skip-hooks to the terraform command subtree Erik Osterman (Cloud Posse) (@osterman) (#2578)
what
- Scope
--skip-hooksto the terraform command subtree. The flag (andATMOS_SKIP_HOOKS) moved off the global flag set ontoatmos terraformand its subcommands, so it no longer appears in the help of unrelated commands (auth,helmfile,atlantis,toolchain,about,secret, …). Lifecycle hooks only ever run onterraform plan/apply/deploy. - Stop tracking native-ci CI scratch output.
tests/fixtures/scenarios/native-ci/{github-output,github-step-summary}.txtare runtime artifacts; gitignored and untracked (matching the newernative-ci-gha-planscenario). - Standardize the CLI test suite on OpenTofu. The suite forces
ATMOS_COMPONENTS_TERRAFORM_COMMAND=tofuvia a single test-harness default, gates every binary-invoking test on a precondition so a missing binary skips cleanly (instead of baking "executable file not found" into goldens), and sanitizes the harness-injected env var out of debug snapshots. A small parity set (terraform -help/-version passthrough) opts back into terraform. - Provision test tooling via the Atmos toolchain (dogfooding).
TestMaininstalls any missing pinned binary (terraform/tofu/packer/helmfile/helm) through the Atmos toolchain itself and prepends it toPATH— "install as necessary", so CI (which supplies them viasetup-*actions) downloads nothing while local runs become self-contained. No host binaries (brew, etc.) required.
why
--skip-hookson every command was misleading — hooks only run on terraform. Mirrors the existing--github-token/toolchain scoping precedent.- The native-ci scratch files were tracked, so every local run without terraform dirtied them. They're CI artifacts, not fixtures.
- Test runs depended on whatever terraform/tofu binary was on the host; a missing binary silently corrupted golden snapshots and tracked fixtures. Standardizing on a single, license-clean (MPL) OpenTofu — with explicit preconditions — makes the suite deterministic and host-independent. The product runtime default stays
terraform; only tests change. - Provisioning tools through the toolchain dogfoods the feature and removes the dependency on host-installed binaries, so the suite runs the same way everywhere.
references
- Follows the
--github-token/toolchain flag-scoping precedent inpkg/flags/global_builder.go.
fix(toolchain): retry cosign verification on transport-level network errors Erik Osterman (Cloud Posse) (@osterman) (#2604)
what
- Add a
transportFlakeMarkersallowlist to the cosign retry classifier (pkg/toolchain/verification/signature_rekor.go) so transport-level network errors are retried like other transient Sigstore Rekor flakes:stream error: stream ID(Gonet/http2stream errors — covers all HTTP/2 error codes and both send/recv variants)connection reset by peerTLS handshake timeouti/o timeoutunexpected EOF
- Extend
TestClassifyCosignErrorwith the exact error observed in CI plus one case per new marker, and addTestRunCosignWithRetry_RecoversFromTransportFlakecovering end-to-end retry recovery.
why
CI failed on TestToolchainCustomCommands_InstallAllTools/Install_tofu while toolchain install opentofu/opentofu@1.9.0 was verifying the download signature. Cosign's query to the Sigstore Rekor transparency log died with:
searching log query: stream error: stream ID 1; INTERNAL_ERROR; received from peer
Atmos already retries cosign flakes (runCosignWithRetry, 5 attempts with exponential backoff), but the retryable classification is a deliberate allowlist that only recognized Rekor HTTP response flakes (searchLogQueryBadRequest, the IEEE_P1363 decode error, and 5xx scoped to the tlog retrieve endpoint). An HTTP/2 transport error matched none of the markers, so it surfaced on the first attempt with no retry.
Broadening to transport-level failures is safe within the allowlist's design rule: the allowlist exists so a real signature verdict (tampering, identity mismatch, expired cert) is never silently retried away. A transport failure means the request never completed and no verdict was rendered, so retrying it categorically cannot mask tampering. Existing negative tests (tampered artifact, identity mismatch, generic failure) continue to assert those still fail on the first attempt.
references
- Observed failure: Acceptance Tests (linux),
TestToolchainCustomCommands_InstallAllTools/Install_tofu