-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multigpu Capability #2
Comments
Multi-GPU support is in the works, but I've run into issues with some PyTorch differences (while replicating the Lua/Torch7 code). |
Hello ProGamerGov and thanks for your tool. Any news about this support? Is it still planned? I would love to be able to use multiple GPUs too. |
@LouSparfell I attempted to implement multi-GPU support here: https://github.com/ProGamerGov/neural-style-pt/tree/multi-gpu, but I've run into a bunch of issues. I don't have a readily available computer with multiple GPUs either, so I can't really test things. You are welcome to submit a pull request if you are able to get it working! |
@ProGamerGov I'd like to make an attempt at writing the multigpu code, would you have any free time to do a quick chat about what issues you have already run into so I don't trip over the same things? |
@ajhool It's been a while since I was trying to get multigpu working, but my main issues were testing with multiple GPUs and dealing with the feval function (as PyTorch doesn't really have an exact equivalent of |
Okay, thanks. I'll check it out and see how things go |
@ajhool Alight, let me know how things go! |
@ajhool I made progress on multi device support using this guide I found: https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html Here's the current multi-gpu/multi-device branch: https://github.com/ProGamerGov/neural-style-pt/tree/multi-gpu Unfortunately, I am stuck at this error:
This is the setup function for multiple devices: https://github.com/ProGamerGov/neural-style-pt/blob/multi-gpu/neural_style.py#L303-L335 This is the class I created for the multi-device model: https://github.com/ProGamerGov/neural-style-pt/blob/multi-gpu/CaffeLoader.py#L110-L124 I got the model spread out across all of the selected devices successfully, but for some reason I can't run the input images through it. Any ideas? |
Using multiple GPUs is now possible! Though I am not sure how I am going to deal with the feval function in a more elegant way. I am also not sure what if any change the new code will have on neural-style-pt's speed, because it will have redundant It should be possible to use both multiple GPUs and the CPU at the same time, or just a single GPU and the CPU at the same time. But I believe that will require some sort of conversion between tensor types in order to work. Due to this ability to use multiple CPUs and GPUs, I renamed the |
@ProGamerGov Nice work adding this support. It's unclear if you're still seeing the input image issue, but I was never able to invest much time into figuring out multigpu pytorch so I'm of no help here. I like the multidevice strategy idea, that's clever. Following this chain [1] and [2], is it possible that the gpu 0 usage is the cuda driver allocation when loading pytorch? I think a really useful feature would be to add a background nvidia-smi watcher with a high sample rate that could produce a plot of the memory usage, the native pytorch utils apparently don't do well [2]. There is a spike in the beginning of the program that isn't captured by the steady-state memory usage and it can be large enough to crash the program. It might also be useful in the dev/debug phase to determine what program is running on each gpu. I was never able to capture that entire memory usage profile (with the spike) in lua but maybe pytorch makes it easier using a library like [3] It's been a long time since I've use python but something like
You could also try disabling gpu 0 and seeing what breaks. [2] https://discuss.pytorch.org/t/memory-cached-and-memory-allocated-does-not-nvidia-smi-result/28420/2 |
nvgpu just uses nvidia-smi, and I think that I can replicate the behavior in an easier with this nvidia-smi command:
I got CPU device support working, and I'm not sure if I can reproduce the error while using a GPU and CPU for devices. I think the way I used the dtype variable in my code, puts the input images on GPU:0, because it's the default GPU. But I move the inputs to their device afterwards, so I don't think that should matter? In my experiments that I shared here: #20, GPU:0 went from 5549MiB to 4973MiB when I only had layer 1 on it. Sticking the model on I was thinking that maybe the |
@ajhool I found a memory tracking library called pytorch_memlab and used it to track the memory usage line by line in my code: Here's the the short version: https://gist.github.com/ProGamerGov/0ab55d9b23bb409ca116188883f4a1fd And here's the full line by line memory tracking output: https://gist.github.com/ProGamerGov/de7a8734e05011018d535385de31b034 And here's what tensors exist on I used |
Looks like the anomalous GPU:0 memory usage comes from here, as the line by line memory usage by default only tells you what GPU:0 is using:
Here are the line by line memory usage results and what tensors exist when I'm using multiple GPUs: https://gist.github.com/ProGamerGov/e383bc19023d72022e5e426f4a0260af |
@ajhool I figured it out! It was actually something that I have suspected initially, but I didn't make the connection until I saw the note here: I was converting my input tensors to cuda before running them though my model with: I'm not sure if this is a bug that I should report to PyTorch? Here's the fix: d08b594 |
Now, all I need to figure out, is if I can turn this section of code:
Back into just:
Edit: Maybe it's okay to leave the code like this? In some previous testing, I didn't notice any increase in terms of speed when I was only using a single GPU. |
- You can now use multiple GPUs in the same way that you could in the original neural-style. - The `-multigpu_strategy` parameter was renamed to `-multidevice_strategy`. - #2 - You can use any combination of GPUs and your CPU as devices.
So, apparently the Though I changed nothing with the seed code, other than add Edit: I removed |
- You can now use multiple GPUs in the same way that you could in the original neural-style with the `-multidevice_strategy` parameter. #2 - You can use any combination of GPUs and your CPU as devices with the `-multidevice_strategy` parameter. - New `-disable_check` parameter for advanced users. #5 - AMD GPU support. - Changed -lbfgs_num_correction default. #7
I've merged the multi-gpu branch into both the master branch and the pip-master branch! If you experience any issues with the new update, let me know! |
I'm going to close this issue now, as the update seems to be working well without any issues. If you experience any issues with the multi-device feature, please make a new issue. |
So, I was able to achieve of an
This was the strategy I used:
I wonder how much higher I could go with the Adam optimizer and a less memory demanding model like VGG-16, Channel Pruning, or NIN? |
Fantastic work! Manage to get it to run on win10.
Just curious is it capable of the -multigpu argument like the one that runs on torch7?
The text was updated successfully, but these errors were encountered: