Releases: aws/sagemaker-training-toolkit
Releases · aws/sagemaker-training-toolkit
v4.4.0
Features
- integrate SMDDP collectives into smdataparallel runner
v4.3.2
Bug Fixes and Other Changes
- add general exception to filter
v4.3.1
Bug Fixes and Other Changes
- integrate upcoming dataparallel change to modelparallel
- add unit tests for torchrun launcher and collections package deprecationWarning
v4.3.0
Features
- Add torch_distributed support for Trainium instances in SageMaker
v4.2.10
v4.2.9
Bug Fixes and Other Changes
- Add SageMaker Debugger exceptions
v4.2.8
prepare release v4.2.8
v4.2.7
Bug Fixes and Other Changes
- improve worker node wait logic and update EFA flags
v4.2.6
Bug Fixes and Other Changes
- Enable PT XLA distributed training on homogeneous clusters
v4.2.5
Bug Fixes and Other Changes
- relax exception type