Add multi-GPU support with `accelerate` #76

GavEdwards · 2022-02-04T09:06:04Z

Summary

Enable GPU support (+more) via the Accelerate library.

This is still work in progress - there's still some bugs to be ironed out around multi-gpu and some models.

TODO:

add documentation for accelerate
multi-gpu tests
test with all models

Unit tests provided for these changes
Documentation and docstrings added for these changes using the sphinx style

Changes

Add Accelerate as a dependency
Adjust the pipeline code to use accelerate

benedekrozemberczki · 2022-02-04T17:17:39Z

Packedgraphs do not have a .to() method, but a custom .cuda() method. Pretty messed up.

https://github.com/DeepGraphLearning/torchdrug/blob/master/torchdrug/data/graph.py

GavEdwards · 2022-02-05T16:23:20Z

Examples:
https://github.com/huggingface/accelerate/tree/main/examples

cthoyt · 2022-02-05T16:41:25Z

So one main thing about this PR is that I don't think it's necessary to use Accelerate to add GPU support - we only have to more cleverly keep track of devices. I'm not against using Accelerate since it has some nice add-ons for multi-gpu etc. but I wouldn't depend on it to solve the original problem

Perhaps we can monkey patch a to() into the packed graph class

GavEdwards · 2022-02-07T13:35:35Z

So one main thing about this PR is that I don't think it's necessary to use Accelerate to add GPU support - we only have to more cleverly keep track of devices. I'm not against using Accelerate since it has some nice add-ons for multi-gpu etc. but I wouldn't depend on it to solve the original problem

Perhaps we can monkey patch a to() into the packed graph class

Hi @cthoyt 😄 My thinking was this approach avoids us having to re-invent the wheel and quickly solves the current gpu need. The original issue doesn't describe requirements to aim for #65.

Two questions:

In your opinion, what parts of the original problem doesn't Accelerate solve?
What requirements need to be met for this approach be accepted?

I'm not attached to Accelerate as a solution either, just trying to understand limits & needs

codecov-commenter · 2022-02-07T20:36:38Z

Codecov Report

Merging #76 (0542ab9) into main (20f0e85) will decrease coverage by 0.65%.
The diff coverage is 54.16%.

@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
- Coverage   94.65%   94.00%   -0.66%     
==========================================
  Files          34       34              
  Lines        1478     1500      +22     
==========================================
+ Hits         1399     1410      +11     
- Misses         79       90      +11

Impacted Files	Coverage Δ
chemicalx/pipeline.py	`80.19% <54.16%> (-8.41%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20f0e85...0542ab9. Read the comment docs.

cthoyt · 2022-02-08T14:46:19Z

I'm saying that implementing GPU usability and implementing accelerate are two independent things, and I would rather see a PR that first enables GPU usage explicitly without accelerate to make sure we don't make any accelerate-specific mistakes

cthoyt · 2022-02-08T14:47:16Z

Additionally I've opened a PR on torchdrug to solve the problem upstream, which will be much more elegant than us hacking it in: DeepGraphLearning/torchdrug#70. In the meantime, we could provide a compat module where we subclass PackedGraph with this function built-in and refer to that class throughout chemicalx

cthoyt · 2022-02-09T10:19:08Z

chemicalx/pipeline.py

-            prediction = model(*model.unpack(batch))
-            loss_value = loss(prediction, batch.labels)
+
+            device_batch = to_device(model.unpack(batch), device)


I think the batch generator should know what device to put the batches on so this doesn't have to be changed in the pipeline

I.e., the BatchGenerator.__init__ should take an optional torch.device (if not given, assume CPU) and the generation steps should take care of moving the tensors over to the appropriate device

Closes AstraZeneca#76. This PR first requires AstraZeneca#84 to be tested and merged. ## Blocked by - [ ] AstraZeneca#84

cthoyt · 2022-02-12T12:05:58Z

@GavEdwards please note we've already merged a simple solution into the main branch and now updated your PR with it, please check it out

cthoyt · 2022-02-12T12:18:21Z

setup.py

@@ -15,6 +15,9 @@
    "pystow",
    "pytdc",
    "more-itertools",
+    "accelerate",
+    # FIXME what is packaging for?
+    "packaging",


what's packaging for?

cthoyt · 2022-02-12T12:19:01Z

chemicalx/pipeline.py

@@ -70,6 +71,45 @@ def save(self, directory: Union[str, Path]) -> None:
        )


+def to_device(objects, device):


I don't think this is necessary now that #84 and #86 are merged - please read through those PRs

Agreed, will close the PR

@GavEdwards I was only referring to the to_device function. There's still some benefit to consider adding accelerate for multi-GPU or TPU usage

GavEdwards added 2 commits February 3, 2022 22:09

black formatting

71efaa4

add Accelerate to enable gpu training

79a20d3

GavEdwards mentioned this pull request Feb 4, 2022

GPU Transfer #65

Closed

fix .cuda() issue on PackedGraph

1af25d5

GavEdwards added 3 commits February 7, 2022 20:41

formatting fixes

59f2dda

fix black formatting... again...

84d3bb7

fix device placement

2acd85d

cthoyt reviewed Feb 9, 2022

View reviewed changes

cthoyt added a commit to cthoyt/chemicalx that referenced this pull request Feb 10, 2022

Add GPU support to pipeline

b08d46a

Closes AstraZeneca#76. This PR first requires AstraZeneca#84 to be tested and merged. ## Blocked by - [ ] AstraZeneca#84

cthoyt mentioned this pull request Feb 10, 2022

Add simple GPU support to chemicalx.pipeline() #86

Merged

3 tasks

cthoyt changed the title ~~Add GPU support~~ Add multi-GPU support with accelerate Feb 11, 2022

benedekrozemberczki closed this in 20f0e85 Feb 11, 2022

benedekrozemberczki reopened this Feb 11, 2022

benedekrozemberczki closed this Feb 11, 2022

benedekrozemberczki reopened this Feb 11, 2022

Merge branch 'main' into gpu-support

ebbb502

cthoyt added 4 commits February 12, 2022 13:13

Update pipeline.py

2e90f55

Update pipeline.py

4d59a7d

Add note

2d088d6

Further unify interface

0542ab9

cthoyt reviewed Feb 12, 2022

View reviewed changes

GavEdwards closed this Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-GPU support with `accelerate` #76

Add multi-GPU support with `accelerate` #76

GavEdwards commented Feb 4, 2022 •

edited

Loading

benedekrozemberczki commented Feb 4, 2022

GavEdwards commented Feb 5, 2022

cthoyt commented Feb 5, 2022

GavEdwards commented Feb 7, 2022

codecov-commenter commented Feb 7, 2022 •

edited

Loading

cthoyt commented Feb 8, 2022

cthoyt commented Feb 8, 2022 •

edited

Loading

cthoyt Feb 9, 2022

cthoyt Feb 9, 2022

cthoyt commented Feb 12, 2022

cthoyt Feb 12, 2022

cthoyt Feb 12, 2022 •

edited

Loading

GavEdwards Mar 21, 2022

cthoyt Mar 21, 2022

		@@ -70,6 +71,45 @@ def save(self, directory: Union[str, Path]) -> None:
		)


		def to_device(objects, device):

Add multi-GPU support with accelerate #76

Add multi-GPU support with accelerate #76

Conversation

GavEdwards commented Feb 4, 2022 • edited Loading

Summary

Changes

benedekrozemberczki commented Feb 4, 2022

GavEdwards commented Feb 5, 2022

cthoyt commented Feb 5, 2022

GavEdwards commented Feb 7, 2022

codecov-commenter commented Feb 7, 2022 • edited Loading

Codecov Report

cthoyt commented Feb 8, 2022

cthoyt commented Feb 8, 2022 • edited Loading

cthoyt Feb 9, 2022

Choose a reason for hiding this comment

cthoyt Feb 9, 2022

Choose a reason for hiding this comment

cthoyt commented Feb 12, 2022

cthoyt Feb 12, 2022

Choose a reason for hiding this comment

cthoyt Feb 12, 2022 • edited Loading

Choose a reason for hiding this comment

GavEdwards Mar 21, 2022

Choose a reason for hiding this comment

cthoyt Mar 21, 2022

Choose a reason for hiding this comment

Add multi-GPU support with `accelerate` #76

Add multi-GPU support with `accelerate` #76

GavEdwards commented Feb 4, 2022 •

edited

Loading

codecov-commenter commented Feb 7, 2022 •

edited

Loading

cthoyt commented Feb 8, 2022 •

edited

Loading

cthoyt Feb 12, 2022 •

edited

Loading