v1.93.0
What's Changed
Key New Features 🎉
- feat(gke): Add example for GKE H4D - DWS flex start compact placement by @agrawalkhushi18 in #5737
- feat: integrate GKE-managed ML Diagnostics by @AdarshK15 in #5731
- feat: Cluster Toolkit Slurm HA Controllers by @andybubu in #5352
Improvements 🛠
- [Telemetry] Fix flaky is_googler metric and improve it by local caching by @kadupoornima in #5703
- Remove additional network settings from slurm blueprints - A3U, A4H, A4X by @agrawalkhushi18 in #5738
- Upgrading slurm-gcp-rocky 8 images to rocky 9 images by @LAVEEN in #5735
- Remove additional network settings from slurm H4D examples by @agrawalkhushi18 in #5741
- Remove additional network settings from slurm A3U,A4,A4X VM examples by @agrawalkhushi18 in #5740
- Updating from Rocky8 plain images to Rocky9 by @LAVEEN in #5736
Bug fixes 🐞
- fix(gke-cluster): Respect release channel when selecting GKE version by prefix by @kadupoornima in #5729
- fix(job): add labels for dynamic slicing to kueue by @Neelabh94 in #5730
- Fix(slurm): resolve KeyError control_host in slurmdbd config templates by @SwarnaBharathiMantena in #5765
- fix: add null check for controller_state_disk by @sudheer-quad in #5766
Full Changelog: v1.92.0...v1.93.0