v0.11.0
·
727 commits
to main
since this release
Immutable
release. Only release title and notes can be modified.
Changelog
New Features
- 500b561: feat(recipes): add GKE COS inference and Dynamo overlay recipes (#414) (@yuanchen8911)
- 3e46e47: feat(snapshot): add --runtime-class flag for CDI environments (#434) (@atif1996)
- d3fd483: feat(validator): add EKS/GKE cluster autoscaling fallback (#438) (@yuanchen8911)
- 87fd28f: feat: Add AKS (Azure Kubernetes Service) H100 recipe overlays (#415) (@Jont828)
- 0866ef0: feat: add B200 accelerator type support (#437) (@atif1996)
- 46736f8: feat: add query command for hydrated recipe value extraction (#445) (@mchmarny)
Bug Fixes
- 7c377c1: fix(bundler): clean up orphaned KAI and Kubeflow Trainer CRDs on undeploy (#416) (@yuanchen8911)
- 437126c: fix(gke): remove CAP_ prefix from capability names in TCPXO manifests (#428) (@yuanchen8911)
- f2ec6b2: fix(gke): update TCPXO to NRI profile without hostNetwork (#420) (@yuanchen8911)
- 8a65335: fix(validator): add retry for ai-service-metrics Prometheus query (#393) (@yuanchen8911)
- d99235e: fix(validator): remove hostNetwork and privileged from GKE NCCL runtime, use NRI device injection (#427) (@xdu31)
- e15a3c6: fix(validator): source NCCL env from host profile instead of hardcoding (#422) (@xdu31)
- 70efe82: fix: ArgoCD deployer generates valid YAML, add structural validation (#410) (#413) (@lockwobr)
Other Tasks
- 84f3c4c: chore: bump nvsentinel from v0.10.x to v1.1.0 (#423) (@mchmarny)
- 75092d8: chore: deps: bump github.com/in-toto/attestation from 1.1.2 to 1.2.0 (#431) (@dependabot[bot])
- ea19bdf: chore: deps: bump github/codeql-action from 4.32.6 to 4.33.0 (#418) (@dependabot[bot])
- a10d4b3: chore: deps: bump google.golang.org/grpc from 1.79.2 to 1.79.3 (#430) (@dependabot[bot])
- 9e81d69: chore: deps: bump the kubernetes group with 3 updates (#446) (@dependabot[bot])
- f23ade5: chore: ignore movies (@mchmarny)
- d4e818f: ci(kwok): implement tiered testing strategy per ADR-003 (#432) (@mchmarny)
- 9101d29: ci: build and publish validator images on merge to main (#412) (@yuanchen8911)
- ff9c66d: docs(conformance): update CNCF evidence for multi-platform and training (#425) (@yuanchen8911)
- 5d4aa7c: docs(validator): add custom image testing and private registry guide (#417) (@xdu31)