Skip to content

[PyTorch][Training][EC2][SageMaker] PyTorch 2.10 Currency Release#5746

Merged
bhanutejagk merged 4 commits intoaws:masterfrom
bhanutejagk:pytorch-2.10-currency-release
Mar 17, 2026
Merged

[PyTorch][Training][EC2][SageMaker] PyTorch 2.10 Currency Release#5746
bhanutejagk merged 4 commits intoaws:masterfrom
bhanutejagk:pytorch-2.10-currency-release

Conversation

@bhanutejagk
Copy link
Copy Markdown
Contributor

@bhanutejagk bhanutejagk commented Mar 13, 2026

  • Add CPU and GPU Dockerfiles with SM SDK v3, fastai, Python 3.13, CUDA 13.0
  • Add buildspecs for EC2 and SageMaker
  • Add EC2 test file for PyTorch 2.10
  • Update conftest.py with pytorch_training___2__10 fixture and version regex fix
  • Update SageMaker conftest.py skip_smppy_test for 2.10
  • Add mlflow/skops CVEs to SM allowlists
  • Add sanity test fixes for SM SDK v3 in-image (utility install, remote function, pip check)
  • Configure dlc_developer_config.toml for PyTorch training build

Purpose

Currency release of PyTorch 2.10.0 training DLC images (CPU and GPU) to keep AWS Deep Learning Containers up to date with the latest stable PyTorch release. This enables customers to use PyTorch 2.10 with Python 3.13, CUDA 13.0, and SageMaker SDK v3 on both EC2 and SageMaker.

Test Plan

Testing the following on images built from this PR. Tests enabled via dlc_developer_config.toml:

sanity_tests = true — pip check, utility installation, remote function compatibility, pre-release checks
security_tests = true — ECR scan, safety report with CVE allowlists
ecs_tests = true
eks_tests = true
ec2_tests = true — EC2 training via test_pytorch_training_2_10.py
ec2_benchmark_tests = true
ec2_tests_on_heavy_instances = true
sagemaker_local_tests = true
sagemaker_remote_tests = true
sagemaker_efa_tests = true
sagemaker_rc_tests = true
sagemaker_benchmark_tests = true

Test Result

EC2 image -
d4aa71f - passing ec2, ecs, sanity, security and sanity tests
cc85ac4 - passing eks test

SM image -
179e610 - passing all sm related tests


Toggle if you are merging into master Branch

By default, docker image builds and tests are disabled. Two ways to run builds and tests:

  1. Using dlc_developer_config.toml
  2. Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)
How to use the helper utility for updating dlc_developer_config.toml

Assuming your remote is called origin (you can find out more with git remote -v)...

  • Run default builds and tests for a particular buildspec - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp origin

  • Enable specific tests for a buildspec or set of buildspecs - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp origin

  • Restore TOML file when ready to merge

python src/prepare_dlc_dev_environment.py -rcp origin

NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:

  • sagemaker_remote_tests = true
  • sagemaker_efa_tests = true
  • sagemaker_rc_tests = true
  • sagemaker_local_tests = true
How to use PR description Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:
  • # /buildspec <buildspec_path>
    • e.g.: # /buildspec pytorch/training/buildspec.yml
    • If this line is commented out, dlc_developer_config.toml will be used.
  • # /tests <test_list>
    • e.g.: # /tests sanity security ec2
    • If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.
# /buildspec <buildspec_path>
# /tests <test_list>
Toggle if you are merging into main Branch

PR Checklist

  • [] I ran pre-commit run --all-files locally before creating this PR. (Read DEVELOPMENT.md for details).

- Add CPU and GPU Dockerfiles with SM SDK v3, fastai, Python 3.13, CUDA 13.0
- Add buildspecs for EC2 and SageMaker
- Add EC2 test file for PyTorch 2.10
- Update conftest.py with pytorch_training___2__10 fixture and version regex fix
- Update SageMaker conftest.py skip_smppy_test for 2.10
- Add mlflow/skops CVEs to SM allowlists
- Add sanity test fixes for SM SDK v3 in-image (utility install, remote function, pip check)
- Configure dlc_developer_config.toml for PyTorch training build
@bhanutejagk bhanutejagk requested a review from a team as a code owner March 13, 2026 22:15
@aws-deep-learning-containers-ci aws-deep-learning-containers-ci Bot added authorized build Reflects file change in build folder ec2 pytorch Reflects file change in pytorch folder sagemaker_tests sanity Size:XL Determines the size of the PR test Reflects file change in test folder labels Mar 13, 2026
@bhanutejagk bhanutejagk merged commit c369b98 into aws:master Mar 17, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authorized build Reflects file change in build folder ec2 pytorch Reflects file change in pytorch folder sagemaker_tests sanity Size:XL Determines the size of the PR test Reflects file change in test folder

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants