Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and add a description of distributed computation #14

Closed
ar4 opened this issue Jul 14, 2018 · 6 comments
Closed

Test and add a description of distributed computation #14

ar4 opened this issue Jul 14, 2018 · 6 comments

Comments

@ar4
Copy link
Owner

ar4 commented Jul 14, 2018

PyTorch allows distributed computation, and this is usually necessary for realistic datasets. It should be tested to ensure that it works with Deepwave, and a description added to the documentation explaining to users how to do it.

@vkazei
Copy link
Contributor

vkazei commented Mar 21, 2022

hi Alan,
Could you share some thoughts on the easiest way to distribute shots?
Best,
Vladimir

@ar4
Copy link
Owner Author

ar4 commented Mar 22, 2022 via email

@vkazei
Copy link
Contributor

vkazei commented Mar 22, 2022

Thanks a lot for getting back. I am trying to distribute FWI and LSRTM within a single node. I tried to replace prop=nn.DataParallel(prop), which appears to be the most basic option.
source_amplitudes.swapaxes(0,1).swapaxes(1,2) before the propagator and swapping back within the propagator allows distributing inputs seemingly properly. The model itself does not get replicated staying at the "cuda:0" device so even forward propagation does not work...

@ar4
Copy link
Owner Author

ar4 commented Mar 22, 2022

Hi Vladimir,

I think I see the problem. The model is not being registered as a parameter of the propagator, and so PyTorch doesn't know that it needs to copy it to the other devices. I don't have multiple GPUs to test it on at the moment, but does manually registering the parameters as below work?

prop = deepwave.scalar.Propagator({'vp': model}, dx)
prop.register_parameter('vp', torch.nn.Parameter(model))
prop = torch.nn.DataParallel(prop)

If so, I will fix it in the next release of Deepwave so that the parameters are automatically registered.

@vkazei
Copy link
Contributor

vkazei commented Mar 23, 2022

Hi Alan,

Registering the model as a parameter did not change the behavior. Sending it manually to the device with inputs inside the forward method lets nn.DataParallel run, but it looks like it runs sequentially the forward propagation for different GPUs)

@ar4
Copy link
Owner Author

ar4 commented Mar 23, 2022 via email

@ar4 ar4 closed this as completed in ab95326 Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants