-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test and add a description of distributed computation #14
Comments
hi Alan, |
Hi Vladimir,
PyTorch provides several options for distributed execution:
https://pytorch.org/tutorials/beginner/dist_overview.html . It should also
be possible to achieve it manually using mpi4py, if you prefer. Do you wish
to run on multiple GPUs connected to a single node, multiple nodes each
with a single device (CPU or GPU), or multiple nodes each with multiple
GPUs? Are you just doing forward modeling, or also
backpropagation/inversion?
…-Alan
|
Thanks a lot for getting back. I am trying to distribute FWI and LSRTM within a single node. I tried to replace prop=nn.DataParallel(prop), which appears to be the most basic option. |
Hi Vladimir, I think I see the problem. The model is not being registered as a parameter of the propagator, and so PyTorch doesn't know that it needs to copy it to the other devices. I don't have multiple GPUs to test it on at the moment, but does manually registering the parameters as below work? prop = deepwave.scalar.Propagator({'vp': model}, dx)
prop.register_parameter('vp', torch.nn.Parameter(model))
prop = torch.nn.DataParallel(prop) If so, I will fix it in the next release of Deepwave so that the parameters are automatically registered. |
Hi Alan, Registering the model as a parameter did not change the behavior. Sending it manually to the device with inputs inside the forward method lets nn.DataParallel run, but it looks like it runs sequentially the forward propagation for different GPUs) |
Hi Vladimir,
That is unfortunate. It should definitely be possible, though. Perhaps
larger changes are required. I planned to overhaul Deepwave during the
summer, but will try to get to it sooner. Please let me know if you
can think of any other features that you would like it to have so that
I can try to include them when planning the modifications.
In the meantime, another option, if you are very keen to use multiple
GPUs and feeling ambitious, might be to use mpi4py. It will be more
work, as you will need to manually use mpi4py to run multiple
processes and send data and gradient updates between them, but should
provide you with enough control that you can make it work. If you can
wait a few weeks, though, I will hopefully have a clean way working.
|
PyTorch allows distributed computation, and this is usually necessary for realistic datasets. It should be tested to ensure that it works with Deepwave, and a description added to the documentation explaining to users how to do it.
The text was updated successfully, but these errors were encountered: