Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Tutorial #28

Closed
iancze opened this issue Apr 15, 2021 · 9 comments
Closed

GPU Tutorial #28

iancze opened this issue Apr 15, 2021 · 9 comments
Labels
documentation Improvements or additions to documentation roadmap Planned development
Milestone

Comments

@iancze
Copy link
Collaborator

iancze commented Apr 15, 2021

Is your feature request related to a problem or opportunity? Please describe.
One great benefit of PyTorch is the ability to easily run on GPU-accelerated hardware for substantial speedups. Currently, none of the tutorials show off this functionality, though it exists.

Describe the solution you'd like
At minimum, a section of a tutorial incorporating transfer code like in the PyTorch tutorial. For monolithic batches (common to most simple RML imaging problems), GPU accelerated training loops are the quickest way to a significant training speedup.

Alternative solutions
For batched training workflows (e.g., where each batch is an execution block of a measurement set), a distributed training loop (across multiple CPUs or even GPUs) has the potential to be faster.

@iancze iancze added documentation Improvements or additions to documentation roadmap Planned development labels Apr 15, 2021
@iancze iancze mentioned this issue Apr 15, 2021
10 tasks
@iancze iancze added this to the v0.1.1 milestone Apr 15, 2021
@iancze iancze modified the milestones: v0.1.1, v0.1.2 May 5, 2021
@trq5014
Copy link
Contributor

trq5014 commented Jun 1, 2021

I was able to download and install Cuda for gpu use, however, I am not able to get pytorch to be able to see it (torch.cuda.is_available() returns false). I have restarted my computer and checked the $CUDA_PATH variables and everything looks good. I can use numba with the gpu as far as I can tell, but pytorch wont recognize it. Does pytorch require the TCC driver?

image
image

@iancze iancze added this to To do in DSHARP HD 143006 Tutorial via automation Jun 1, 2021
@iancze
Copy link
Collaborator Author

iancze commented Jun 1, 2021

I'm not whether you need to specifically configure CUDA as long as you install the right version of PyTorch depending on your hardware.

There is a little selector install widget on the Pytorch homepage: https://pytorch.org/ that might give a hint of what to try.

It looks like for Windows you might need to do something like
pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

If this resolves it, we should make a note in the installation instructions. I've just used default pip install torch torchvision and that's always seemed to work picking up the GPU on the Roar compute environment.

@trq5014
Copy link
Contributor

trq5014 commented Jun 1, 2021

Sounds good, I will check on this and get back with any further issues. This specific issue I am having is not well documented online so it will be helpful to include this in the tutorial

@trq5014
Copy link
Contributor

trq5014 commented Jun 1, 2021

Using that worked, the pip install torch torchvision was not working on my system for whatever reason. I will include this fix in the tutorial

@iancze
Copy link
Collaborator Author

iancze commented Jun 1, 2021

Great! A brief note in the tutorial would be good, as well as a short note about the possibility of GPU-specific installs in the "Installation.rst" file would be great, thanks!

@trq5014
Copy link
Contributor

trq5014 commented Jun 1, 2021

Would it be possible to include a screenshot (.png) of the Install PyTorch section on their homepage for greater clarification on this in the .rst file?

@iancze
Copy link
Collaborator Author

iancze commented Jun 1, 2021

It's possible, but probably not necessary. I would just add a sentence or two that torch may need to be (re)installed separately and that more information is available on the pytorch homepage.

@trq5014
Copy link
Contributor

trq5014 commented Jun 2, 2021

For the GPU tutorial itself, would it be sufficient to follow the Optimization Loop tutorial on the documentations site and show this running on the GPU and then compare the time of the loop ran on the CPU to that with the GPU and also to the time on the cluster (with multiple GPUs)? Or would you prefer something else? Also, using the GPU may be worth adding to the Issue #25 chapters

@iancze
Copy link
Collaborator Author

iancze commented Jun 2, 2021

I think this first GPU tutorial can be on a much smaller scale, and doesn't need to require a fully running optimization loop. The main challenge here is that Github Actions (where we run the tests, and build the documentation and tutorials for the docs) does't yet provide a GPU environment. So, we won't be able to test any particular tutorial as part of the continuous integration environment, and will just need to rely on our local tests where we have GPUs. If we did have this functionality available, then I agree a loop comparing the runtime would be really nice.

I was thinking this could be a very short example of how to initialize a model and then transfer it to the GPU to run. Given the restrictions on GPU CI, it might be easiest to write this as a static *.rst file with Python code snippets (e.g., https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#showing-code-examples) highlighting the main idea. For example, a line

# query to see if we have a GPU
if torch.cuda.is_available():
    device = "cuda:0"
else:
    device = "cpu"

and then a few examples on how to initialize/transfer tensors and nn.modules to and from the GPU, like in here: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

We'll most likely return to this idea in the production-ready scripts of #63 , that we ourselves will want to run on Roar w/ multiple GPUs. So we will cite/link to this tutorial for more explanation.

trq5014 pushed a commit to trq5014/MPoL that referenced this issue Jun 3, 2021
Updated installation.rst file to include potential issues when installing torch and torchvision for CUDA work
Added gpu_setup_tutorial.rst file to the docs/tutorias folder
@iancze iancze moved this from To do to In progress in DSHARP HD 143006 Tutorial Jun 10, 2021
@iancze iancze closed this as completed in 10e77a4 Jun 10, 2021
DSHARP HD 143006 Tutorial automation moved this from In progress to Done Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation roadmap Planned development
Projects
No open projects
Development

No branches or pull requests

2 participants