v1.83.0
What's Changed
Key New Features 🎉
- feat(validations): Add early conditional validation by @AdarshK15 in #5160
- A4x Max BM slurm support. by @arpit974 in #5222
- Adding GKE TPU DWS Queued Provisioning support for v6e and 7x by @shubpal07 in #5218
- feat(validations): Add early required validation by @AdarshK15 in #5166
- Module deprecation warning system by @vikramvs-gg in #5229
- A4X-Max Bare Metal GKE toolkit blueprint by @vikramvs-gg in #5211
Breaking Changes 🚨
- Update and pin terraform version to 1.12.2 by @parulbajaj01 in #5216
- Update wait flag and resolving helm_release deadlock destruction error by @agrawalkhushi18 in #5147
Module Improvements 🔨
- Migrate configure_kueue from gavinbunney to helm by @agrawalkhushi18 in #5129
- Migrate install_gib from kubectl to helm by @agrawalkhushi18 in #5256
Improvements 🛠
- Add reservation name check validator by @saara-tyagi27 in #5185
- Update go files to add timestamps to gcluster logs by @agrawalkhushi18 in #5198
- Pin Dcgm version 4.5.1-1 by @saara-tyagi27 in #5197
- Add support for DualStack (IPv4/IPv6) networks by @DomiKoPL in #5206
Bug fixes 🐞
- Update slurm_cluster_name regex by @saara-tyagi27 in #5261
- Fix SELinux issue in hpc-build-slurm-image blueprint by @AdarshK15 in #5266
- Hotfix: update G4 NVIDIA drivers for kernel 6.17 compatibility by @SwarnaBharathiMantena in #5289
- Hardcode zone in a2high PR test to fix test failures by @kadupoornima in #5305
- Modifying prefix_length for PSA to accomodate sufficient IPs for peering by @vikramvs-gg in #5306
- fix: Update a3m and a3u script to resolve slurm nccl test failure by @agrawalkhushi18 in #5308
New Contributors
Full Changelog: v1.82.0...v1.83.0