Replies: 1 comment 1 reply
-
While multi-GPU training should work in theory, no testing has been made to validate. If you can make it work yourself (lucky you having multiple GPUs! :-)) then a PR to either the TorchSharpExamples repo or to this repo showing it enabled would be very much appreciated! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I see a few examples of data and model parallel training with PyTorch. It doesn't seem supported by TorchSharp out of the box, correct me if I'm wrong. Even if that is not supported out of the box, do you have examples how to do it making data transfers between GPUs explicitly? I mean data parallelism boils down to training on N GPUs with identical copies of the model. There is a step in the training where we need to combine gradients from all the models, right? I don't quite understand how and when to do that. A step-by-step list of actions with examples of what calls to the APIs should be made would help. My model size is about 400K parameters, and I have about 1.5x10^9 training samples. It takes about 5 days to go through one round of feeding all samples on my system, but I have 2 GPUs and 1 powerful CPU so, I would like to try to use all the resources in a hope to reduce the training time at least twice.
Beta Was this translation helpful? Give feedback.
All reactions