Skip to content

v0.13.0

Choose a tag to compare

@github-actions github-actions released this 16 May 00:18
· 284 commits to main since this release
Immutable release. Only release title and notes can be modified.
v0.13.0
134b23d

This release focuses on scaling out our recipe matrix, evidence-based recipe validation, additional deployer targets, and hardened component supply chain.

Highlights

Recipe Evidence — New capability to capture evidence during cluster validation allows users and contributors alike to verify that the recipe actually deployed and delivered the expected performance characteristics without access to the validating cluster. aicr validate now emits a Recipe Evidence v1 bundle, and a new aicr evidence verify command validates that evidence from either a local directory or a signed OCI image. This new capability closes the loop between recipe authorship, deployment, and audit.

New Deployers — The bundler command now supports Helmfile and Flux alongside the existing Argo CD and raw-Helm targets. AICR also adds a URL-portable argocd-helm bundle option so users can apply a single manifest without local chart access. Helm vendoring is also supported for air-gapped environments (option for image mirroring is still coming — see NVIDIA/aicr#743).

Overlays & Components

  • Added deployment validation to EKS GB200
  • Added Slinky platform support with Slurm operator
  • Added Talos Linux support via new os-talos mixin and bundler preManifestFiles
  • Updated AKS H100 Dynamo to match working cluster state
  • Migrated GB200 kernel-module-params to preManifestFiles
  • Fixed AKS H100 RDMA network operator dependency and metrics

Other Improvements

  • New doc site is now live at docs.nvidia.com/aicr with per-release versioning
  • diff command to help detect configuration drift between recipes and live state
  • Unified file-based config across snapshot, recipe, bundle, and validate to enable easier reproducibility
  • Reliable cluster identity based on snapshot measurements to enable easier over-time correlation
  • storage-class support on bundle command for registry-driven storage-class injection

Supply Chain — New CycloneDX 1.6 BOM generator publishes a per-recipe container image inventory as an in-repo artifact, with strict validation that rejects bare scalar image references missing a tag, digest, or registry host. A growing number of component chart versions now also explicitly digest-pin image references.

Thanks to @ayuskauskas, @dims, @dtzar, @faganihajizada, @haarchri, @Jont828, @lockwobr, @njhensley, @pdmack, @sanjeevrg89, @xdu31, @yuanchen8911, and @mchmarny.

Changelog

New Features

  • (tools) Add install-rc helper for latest RC binary by @mchmarny
  • (cli) Add --config support to snapshot command by @mchmarny
  • (recipes) Update AKS H100 Dynamo recipe to match working cluster state by @Jont828
  • (bom) Add CycloneDX 1.6 image BOM generator by @mchmarny
  • (ci) Add self-hosted renovate alongside dependabot by @njhensley
  • (recipes) Pin nfd and k8s-ephemeral-storage-metrics chart versions by @mchmarny
  • (bom) Publish container image inventory as a doc artifact by @mchmarny
  • (bundler) Add --storage-class flag for registry-driven injection by @dtzar
  • (recipes) Pin chart versions for NVIDIA-owned components (#748 Phase B) by @mchmarny
  • (recipes) Digest-pin explicit image references by @mchmarny
  • (cli) Unified --config flag for recipe and bundle by @mchmarny
  • (tools) Add s3c supply-chain presence checker by @mchmarny
  • (bundler) URL-portable argocd-helm bundle (#664, #665) by @lockwobr
  • (docs) Add versioned docs dropdown with CI content pinning by @pdmack
  • (tools) Add local Talos cluster + snapshot chainsaw test by @ayuskauskas
  • (fingerprint) Cluster identity projection from snapshot measurements by @njhensley
  • Add support for helm vendoring by @lockwobr
  • (oci) Expose URIScheme constant and Ensure/TrimScheme helpers by @njhensley
  • (cli) Add aicr diff for configuration drift detection by @sanjeevrg89
  • (config) Aicr validate --config support by @njhensley
  • (validator) Apply hybrid resource pattern to ValidatorCatalog by @xdu31
  • (recipe) Extract Validation as standalone type with hybrid resource pattern by @xdu31
  • Os-talos mixin + bundler preManifestFiles support by @ayuskauskas
  • (flux) Add bundle flux option by @haarchri
  • (evidence) Emit Recipe Evidence v1 bundle from aicr validate by @njhensley
  • (evidence) Aicr evidence verify (directory input) by @njhensley
  • (evidence) Aicr evidence verify (signed OCI bundles) by @njhensley
  • (recipes) Add deployment validation to GB200/EKS recipes by @njhensley
  • (bundler) Add helmfile deployer by @lockwobr
  • (recipes) Add Slinky slurm-operator as platform-slurm by @faganihajizada

Bug Fixes

  • (validator) Accept pre-release tags as release versions by @mchmarny
  • (bundler) Synthesize GKE ResourceQuota for critical-priority pods by @mchmarny
  • (bundler) Split helmfile bundle into CRD + main sub-helmfiles by @mchmarny
  • (bundler) Wire PreManifestFiles through flux deployer with terminal-aware dependsOn by @yuanchen8911
  • (bundler) Carry localformat createNamespace into helmfile.yaml by @yuanchen8911
  • (ci) Harden Fern docs CI and configure custom domain by @pdmack
  • (docs) Replace bare angle-bracket URL that breaks MDX parser by @pdmack
  • (recipes) Fully-qualify image refs in component manifests by @mchmarny
  • AKS H100 RDMA sets network operator as dependency and fix chart values/metrics by @Jont828
  • (recipes) Document aws-efa regional ECR override pattern by @mchmarny
  • (bom) Reject bare scalars without tag, digest, or registry host by @mchmarny
  • (validators) Bump aiperf-bench to python:3.13 to clear CVEs by @mchmarny
  • (recipes) Track nri-device-injector by tag, ignore tcpxo image by @njhensley
  • (api) Sync OpenAPI platform enum with Go criteria type by @mchmarny
  • (bundler) Suppress kubectl auth prompt in undeploy.sh post-flight by @mchmarny
  • (fern) Drop https scheme from instances URL by @pdmack
  • (recipes) Migrate GB200 kernel-module-params to preManifestFiles by @mchmarny
  • (validator) Write ValidationInput wire shape to ConfigMap by @njhensley
  • (validator) Make ExtractResult sidecar-safe by reading 'validator' container explicitly by @xdu31
  • (validator) Per-run RBAC names to prevent concurrent-run races by @yuanchen8911
  • (evidence) Fix a regression in cncf ai conformance evidence collection by @yuanchen8911
  • (ci) Populate frozen version content in preview build and surface fern errors by @pdmack
  • (validator) Surface skip reason in CTRF, treat missing constraint as skip by @ayuskauskas
  • fix(bundler) stratify helmfile bundle by DAG level by @lockwobr
  • (recipes) Fix stale kgateway-crds path in slinky-slurm-operator-crds comment by @yuanchen8911
  • (recipes) Align overlay network-operator pins to v26.1.1 by @yuanchen8911

Other Tasks

  • (demos) Add config-driven GKE CUJ with evidence verify by @mchmarny
  • Add top level THIRD_PARTY_NOTICES by @ayuskauskas
  • (bom) Wrap auto-generated image inventory with hand-written prose by @mchmarny
  • (recipes) Enforce sha256 specifically in digest-pin gate (CodeRabbit follow-up to #778) by @mchmarny
  • (adr) Add ADR-006 container image pinning policy by @mchmarny
  • (go) .go-version as single source of truth for Go toolchain by @mchmarny
  • (renovate) Hand workflow bumps to dependabot, disable dashboard by @njhensley
  • Update copyright headers to NVIDIA CORPORATION & AFFILIATES by @ayuskauskas
  • Update golang version by @lockwobr
  • (design) Add ADR-007 verifiable recipe test evidence by @njhensley
  • (tests) Use host aicr binary in snapshot deploy-agent test by @pdmack
  • (design) Add ADR-008 KWOK CI deployer matrix by @mchmarny
  • (evidence) Split CNCF code into pkg/evidence/cncf subpackage by @njhensley
  • (bundler) Expose SignStatement primitive for predicate-agnostic keyless signing by @njhensley
  • (catalog) Fix wrong component-selection bullets by @faganihajizada
  • (validator) Use ValidationInput throughout, remove dead Validation type by @xdu31
  • (docs) Catch up container-images.md BOM + document the regen rule by @yuanchen8911
  • (recipes) Bump aws-efa to v0.5.26 keeping AICR's hardened securityContext by @yuanchen8911
  • (recipes) Migrate kgateway -> agentgateway for v2.2 inference routing by @yuanchen8911
  • (validator) Move Phase to v1 and export FilterEntriesByValidation by @xdu31