SageMaker distributed training libraries offer both data parallel and model parallel training strategies. They combine software and hardware technologies to improve inter-GPU and inter-node communications. They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
smd_data_parallel sdp_versions/latest smd_data_parallel_use_sm_pysdk smd_data_parallel_release_notes/smd_data_parallel_change_log
Note
Since the release of the SageMaker model parallelism (SMP) version 2 in December 2023, this documentation is no longer supported for maintenence. The live documentation is available at SageMaker model parallelism library v2 in the Amazon SageMaker User Guide.
The documentation for the SMP library v1.x is archived and available at Run distributed training with the SageMaker model parallelism library in the Amazon SageMaker User Guide, and the SMP v1.x API reference is available in the SageMaker Python SDK v2.199.0 documentation.