Skip to content

Conversation

@awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Oct 4, 2023

What does this PR do?

Upgrades our GPU CI to PyTorch 2.1


📚 Documentation preview 📚: https://pytorch-lightning--18719.org.readthedocs.build/en/18719/

cc @carmocca @Borda

@github-actions github-actions bot added ci Continuous Integration dockers labels Oct 4, 2023
@Borda Borda changed the title WIP: Update GPU CI and docker images for PyTorch 2.1 WIP: Update GPU CI and docker images for PyTorch 2.1 [wip] Oct 4, 2023
@Borda Borda marked this pull request as ready for review October 4, 2023 21:37
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2023

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Azure GPU
Check ID Status
[pytorch-lightning (GPUs) (testing Lightning latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=177809&view=logs&jobId=47e66f3c-897a-5428-da11-bf5c7745762e) success
[pytorch-lightning (GPUs) (testing PyTorch latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=177809&view=logs&jobId=fe777007-6e77-5e50-6e71-fc5977ab193a) success

These checks are required after the changes to .azure/gpu-tests-pytorch.yml.

🟢 pytorch_lightning: Docker
Check ID Status
build-cuda (3.9, 1.12, 11.7.1) success
build-cuda (3.9, 1.13, 11.8.0) success
build-cuda (3.9, 1.13, 12.0.1) success
build-cuda (3.10, 2.0, 11.8.0) success
build-cuda (3.10, 2.1, 12.1.0) success
build-pl (3.9, 1.12, 11.7.1) success
build-pl (3.9, 1.13, 11.8.0) success
build-pl (3.9, 1.13, 12.0.1) success
build-pl (3.10, 2.0, 11.8.0) success
build-pl (3.10, 2.1, 12.1.0) success

These checks are required after the changes to .github/workflows/docker-build.yml, dockers/base-cuda/Dockerfile.

🟢 lightning_fabric: Azure GPU
Check ID Status
[lightning-fabric (GPUs) (testing Fabric latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=177810&view=logs&jobId=3f274fac-2e11-54ca-487e-194c91f3ae9f) success
[lightning-fabric (GPUs) (testing Lightning latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=177810&view=logs&jobId=47e66f3c-897a-5428-da11-bf5c7745762e) success

These checks are required after the changes to .azure/gpu-tests-fabric.yml.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@awaelchli awaelchli mentioned this pull request Oct 4, 2023
@Borda
Copy link
Collaborator

Borda commented Oct 4, 2023

to build images it need to be full PR and when all pass, you can dispatch the branch to upload such images to dockerhub

@Borda
Copy link
Collaborator

Borda commented Oct 4, 2023

seems that PT was built only for CUDA 12.1 but not 12.0 or 12.2
also, NCCL has a limited build fro 12.1, see: https://docs.nvidia.com/deeplearning/nccl/release-notes/index.html

awaelchli and others added 5 commits October 5, 2023 00:04
@awaelchli awaelchli changed the title WIP: Update GPU CI and docker images for PyTorch 2.1 [wip] Update GPU CI and docker images for PyTorch 2.1 Oct 5, 2023
@awaelchli awaelchli added this to the 2.1 milestone Oct 5, 2023
awaelchli and others added 3 commits October 4, 2023 21:02
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
@awaelchli awaelchli added the fun Staff contributions outside working hours - to differentiate from the "community" label label Oct 5, 2023
@mergify mergify bot added the ready PRs ready to be merged label Oct 6, 2023
@awaelchli awaelchli merged commit 77eef8a into master Oct 6, 2023
@awaelchli awaelchli deleted the feature/2-1-cuda branch October 6, 2023 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous Integration dockers fun Staff contributions outside working hours - to differentiate from the "community" label ready PRs ready to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants