Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run #1189

Open
vijaypshah opened this issue Aug 14, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@vijaypshah
Copy link

Related to Model/Framework(s)
PyTorch/Segmentation/nnUNet

Describe the bug
I am trying to run the example provided on the nnUNet. The code works fine when I use single GPU. However, if I request for 2 GPU it will not work.
Following command works:
python scripts/benchmark.py --mode train --gpus 1 --dim 3 --batch_size 2 --amp

Following command gets stuck
python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp

387 training, 97 validation, 484 test examples
Filters: [32, 64, 128, 256, 320, 320],
Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]
Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called full_state_update that has
not been set for this class (Dice). The property determines if update by
default needs access to the full metric state. If this is not the case, significant speedups can be
achieved and we recommend setting this to False.
We provide an checking function
from torchmetrics.utilities import check_forward_full_state_property
that can be used to check if the full_state_update=True (old and potential slower behaviour,
default for now) or if full_state_update=False can be used safely.

warnings.warn(*args, **kwargs)
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default ModelSummary callback.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined a validation_step but have no val_dataloader. Skipping val loop.
rank_zero_warn("You defined a validation_step but have no val_dataloader. Skipping val loop.")
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2

To Reproduce
Steps to reproduce the behavior:

  1. Install '...' : git clone https://github.com/NVIDIA/DeepLearningExamples
    cd DeepLearningExamples/PyTorch/Segmentation/nnUNet
    docker build -t nnunet .
    mkdir data results
    sudo singularity build nnunetMultiGPU.sif docker-daemon://nnunet:latest

  2. Launch : singularity shell --nv -B ${PWD}/data:/data -B ${PWD}/results:/results -B ${PWD}:/workspace nnunetMultiGPU.sif

Expected behavior
Training to start as provided in the example

Environment
Please provide at least:

  • Container version (e.g. pytorch:19.05-py3): PyTorch 21.11 NGC container
  • GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB): 2x Tesla V100-SXM3- 32 GB
  • CUDA driver version (e.g. 418.67): 440.118.02
@vijaypshah vijaypshah added the bug Something isn't working label Aug 14, 2022
@vijaypshah vijaypshah changed the title [Model/Framework] What is the problem? [PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run Aug 14, 2022
@michal2409
Copy link
Contributor

michal2409 commented Aug 14, 2022

Hi,

I've run the command for 2 GPUs and it works fine for me:

oot@6e38dc6f86a4:/workspace/nnunet_pyt# python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp
387 training, 97 validation, 484 test examples
Filters: [32, 64, 128, 256, 320, 320],
Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]
Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called `full_state_update` that has
                not been set for this class (Dice). The property determines if `update` by
                default needs access to the full metric state. If this is not the case, significant speedups can be
                achieved and we recommend setting this to `False`.
                We provide an checking function
                `from torchmetrics.utilities import check_forward_full_state_property`
                that can be used to check if the `full_state_update=True` (old and potential slower behaviour,
                default for now) or if `full_state_update=False` can be used safely.
                
  warnings.warn(*args, **kwargs)
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
  rank_zero_warn("You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.")
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name               | Type             | Params
--------------------------------------------------------
0 | model              | DynUNet          | 31.2 M
1 | model.input_block  | UnetBasicBlock   | 31.2 K
2 | model.downsamples  | ModuleList       | 8.5 M 
3 | model.bottleneck   | UnetBasicBlock   | 5.5 M 
4 | model.upsamples    | ModuleList       | 17.2 M
5 | model.output_block | UnetOutBlock     | 132   
6 | model.skip_layers  | DynUNetSkipLayer | 31.2 M
7 | loss               | Loss             | 0     
8 | loss.loss_fn       | DiceCELoss       | 0     
9 | dice               | Dice             | 0     
--------------------------------------------------------
31.2 M    Trainable params
0         Non-trainable params
31.2 M    Total params
62.386    Total estimated model params size (MB)
Epoch 0    ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/150 0:00:23 • 0:01:14 2.00it/s loss: 2.81

I've found that might be PLT issue with some systems, please check Lightning-AI/pytorch-lightning#4612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants