Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_device error for non-integer CUDA_VISIBLE_DEVICES environment #2420

Closed
jgbos opened this issue Jun 29, 2020 · 1 comment
Closed

set_device error for non-integer CUDA_VISIBLE_DEVICES environment #2420

jgbos opened this issue Jun 29, 2020 · 1 comment
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@jgbos
Copy link
Contributor

jgbos commented Jun 29, 2020

馃悰 Bug

My cluster does not set CUDA_VISIBLE_DEVICES is a list of integers, instead it is a list of hashes. Therefore my code crashes on this line

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/distrib_data_parallel.py#L509

To Reproduce

Can be reproduced by setting CUDA_VISIBLE_DEVICES to a list non-integers.

Expected behavior

Shouldn't local_rank already be a number between [0, num_gpus-1]? Is there a reason to actually choose from this list of devices (use case I'm not aware of)? Maybe just set local_rank = global_rank % num_gpus?

Environment

  • CUDA:
    - GPU:
    - Tesla V100-PCIE-32GB
    - Tesla V100-PCIE-32GB
    - available: True
    - version: 10.2
  • Packages:
    - numpy: 1.18.1
    - pyTorch_debug: False
    - pyTorch_version: 1.5.0
    - pytorch-lightning: 0.8.2
    - tensorboard: 2.2.1
    - tqdm: 4.46.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    -
    - processor: x86_64
    - python: 3.7.7
    - version: Proposal for help聽#1 SMP Fri May 29 11:57:47 EDT 2020
@jgbos jgbos added bug Something isn't working help wanted Open to be worked on labels Jun 29, 2020
@edenlightning
Copy link
Contributor

Closing this as it should be fixed. @jgbos please try master, and let us know if you run into any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

2 participants