Skip to content

Releases: aws/sagemaker-training-toolkit

v4.4.0

06 Dec 17:48
Compare
Choose a tag to compare

Features

  • integrate SMDDP collectives into smdataparallel runner

v4.3.2

29 Nov 23:06
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • add general exception to filter

v4.3.1

27 Oct 20:32
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • integrate upcoming dataparallel change to modelparallel
  • add unit tests for torchrun launcher and collections package deprecationWarning

v4.3.0

20 Oct 18:48
Compare
Choose a tag to compare

Features

  • Add torch_distributed support for Trainium instances in SageMaker

v4.2.10

17 Oct 16:31
Compare
Choose a tag to compare

Bug Fixes and Other Changes

    • feature: Add neuron cores support (#21)

v4.2.9

26 Sep 16:33
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • Add SageMaker Debugger exceptions

v4.2.8

12 Sep 20:19
Compare
Choose a tag to compare
prepare release v4.2.8

v4.2.7

10 Sep 00:18
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • improve worker node wait logic and update EFA flags

v4.2.6

18 Aug 15:17
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • Enable PT XLA distributed training on homogeneous clusters

v4.2.5

17 Aug 16:28
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • relax exception type