[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run #1189

vijaypshah · 2022-08-14T02:37:05Z

Related to Model/Framework(s)
PyTorch/Segmentation/nnUNet

Describe the bug
I am trying to run the example provided on the nnUNet. The code works fine when I use single GPU. However, if I request for 2 GPU it will not work.
Following command works:
python scripts/benchmark.py --mode train --gpus 1 --dim 3 --batch_size 2 --amp

Following command gets stuck
python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp

387 training, 97 validation, 484 test examples
Filters: [32, 64, 128, 256, 320, 320],
Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]
Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called full_state_update that has
not been set for this class (Dice). The property determines if update by
default needs access to the full metric state. If this is not the case, significant speedups can be
achieved and we recommend setting this to False.
We provide an checking function
from torchmetrics.utilities import check_forward_full_state_property
that can be used to check if the full_state_update=True (old and potential slower behaviour,
default for now) or if full_state_update=False can be used safely.

warnings.warn(*args, **kwargs)
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default ModelSummary callback.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined a validation_step but have no val_dataloader. Skipping val loop.
rank_zero_warn("You defined a validation_step but have no val_dataloader. Skipping val loop.")
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2

To Reproduce
Steps to reproduce the behavior:

Install '...' : git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/Segmentation/nnUNet
docker build -t nnunet .
mkdir data results
sudo singularity build nnunetMultiGPU.sif docker-daemon://nnunet:latest
Launch : singularity shell --nv -B ${PWD}/data:/data -B ${PWD}/results:/results -B ${PWD}:/workspace nnunetMultiGPU.sif

Expected behavior
Training to start as provided in the example

Environment
Please provide at least:

Container version (e.g. pytorch:19.05-py3): PyTorch 21.11 NGC container
GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB): 2x Tesla V100-SXM3- 32 GB
CUDA driver version (e.g. 418.67): 440.118.02

The text was updated successfully, but these errors were encountered:

michal2409 · 2022-08-14T16:03:56Z

Hi,

I've run the command for 2 GPUs and it works fine for me:

oot@6e38dc6f86a4:/workspace/nnunet_pyt# python scripts/benchmark.py --mode train --gpus 2 --dim 3 --batch_size 2 --amp
387 training, 97 validation, 484 test examples
Filters: [32, 64, 128, 256, 320, 320],
Kernels: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]
Strides: [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Torchmetrics v0.9 introduced a new argument class property called `full_state_update` that has
                not been set for this class (Dice). The property determines if `update` by
                default needs access to the full metric state. If this is not the case, significant speedups can be
                achieved and we recommend setting this to `False`.
                We provide an checking function
                `from torchmetrics.utilities import check_forward_full_state_property`
                that can be used to check if the `full_state_update=True` (old and potential slower behaviour,
                default for now) or if `full_state_update=False` can be used safely.
                
  warnings.warn(*args, **kwargs)
Using 16bit native Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:133: UserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
  rank_zero_warn("You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.")
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name               | Type             | Params
--------------------------------------------------------
0 | model              | DynUNet          | 31.2 M
1 | model.input_block  | UnetBasicBlock   | 31.2 K
2 | model.downsamples  | ModuleList       | 8.5 M 
3 | model.bottleneck   | UnetBasicBlock   | 5.5 M 
4 | model.upsamples    | ModuleList       | 17.2 M
5 | model.output_block | UnetOutBlock     | 132   
6 | model.skip_layers  | DynUNetSkipLayer | 31.2 M
7 | loss               | Loss             | 0     
8 | loss.loss_fn       | DiceCELoss       | 0     
9 | dice               | Dice             | 0     
--------------------------------------------------------
31.2 M    Trainable params
0         Non-trainable params
31.2 M    Total params
62.386    Total estimated model params size (MB)
Epoch 0    ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/150 0:00:23 • 0:01:14 2.00it/s loss: 2.81

I've found that might be PLT issue with some systems, please check Lightning-AI/pytorch-lightning#4612

vijaypshah added the bug Something isn't working label Aug 14, 2022

vijaypshah changed the title ~~[Model/Framework] What is the problem?~~ [PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run Aug 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run #1189

[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run #1189

vijaypshah commented Aug 14, 2022

michal2409 commented Aug 14, 2022 •

edited

[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run #1189

[PyTorch/Segmentation/nnUNet] If multiple GPUs requested code will not run #1189

Comments

vijaypshah commented Aug 14, 2022

michal2409 commented Aug 14, 2022 • edited

michal2409 commented Aug 14, 2022 •

edited