22.01
DeepOps 22.01 Release Notes
General
- Updates for Slurm and Kubernetes
- Bug fixes
Slurm
- Slurm version 21.08.5
- HPC SDK 22.1
- Open OnDemand v2.0.9
- CUDA toolkit 11.5
- Slurm Pyxis plugin 0.11.1
- Enroot container runtime v3.2.0
- Hwloc 2.5.0, pmix 3.2.3
- Spack v0.16.2
K8s
- Kubernetes version v1.20.7 (kubespray v2.17.1)
- Helm version v3.7.1
- GPU Operator v1.8.2 (GPU driver 470.57.02)
- GPU Device Plugin v0.9.0
- GPU Feature Discovery v0.4.1
- NFS Client Provisioner v4.0.13
Changes
Bugs/Enhancements
- Add new HPL files for DGX A100 (#1047)
- Fix vagrant_startup.sh on Ubuntu 20.04 (#1049)
- Improve documentation and playbook for DGX firmware upgrade (#1058)
- Update firmware docs (#1063)
- Fix python interpreter (#1061)
- GPU Operator automation with NVIDIA AI Enterprise (#1059)
- [Open OnDemand] Remove task for ood_auth_map.regex permisisons (#1068)
- Change default Interpreter in Ansible system default instead of Python3 (#1078)
- Add Log4Shell mitigation to ES statefulset example (#1080)
- Default to testing in Ubuntu 20.04 (#1051)
- Update k8s logging doc to use Elastic stack (#1081)
- Rewrite of DeepOps update documentation (#1050)
- Update Slurm ElasticSearch logging playbook for log4shell (#1079)
- Introduce a common script library, config for env vars, and inject these into all scripts (#953)
- Add proxy config to standalone container registry (#1090)
- Stop systemd-resolved on Ubuntu 20.04 (#1089)
- Add Molecule testing for Singularity, plus infra for more roles (#1088)
Upgrade steps
If you are upgrading to this version of DeepOps from a previous release you will need to follow the upgrade section of the Slurm or Kubernetes Deployment Guides. In addition to this, the ./scripts/setup.sh
script must be re-run and any new variables in the config.example files should be added to the existing config. For a full diff from release 21.09
run git diff 21.09 22.01 -- config.example/
. If you encounter problem please open a GitHub issue. See the update guide for additional guidance.