how to train on my designated GPU? #68

ChibisukeDragon · 2021-08-20T09:37:47Z

I want to train this model on GPU 4...
I used this command:

for FOLD in 0 1 2 3 4 do CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD done

But it always tried to allocate space on GPU 0.

heyupeng_2020@irip-114:~$ for FOLD in 0 1 2 3 4

do
CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD
done

Please cite the following paper when using nnUNet:
Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020).
If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

###############################################
I am running the following nnUNet: 3d_fullres
My trainer class is: <class 'nnunet.training.network_training.nnUNetPlusPlusTrainerV2.nnUNetPlusPlusTrainerV2'>
For that I will be using the following configuration:
num_classes: 2
modalities: {0: 'CT'}
use_mask_for_norm OrderedDict([(0, False)])
keep_only_largest_region None
min_region_size_per_class None
min_size_per_class None
normalization_schemes OrderedDict([(0, 'CT')])
stages...

stage: 0
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([195, 207, 207]), 'current_spacing': array([2.473119 , 1.89831205, 1.89831205]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([482, 512, 512]), 'current_spacing': array([1. , 0.76757812, 0.76757812]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans
I am using batch dice + CE loss

I am using data from this folder: /mnt2/heyupeng_2020/environment_variables/nnUNet_preprocessed/Task003_Liver/nnUNetData_plans_v2.1
###############################################
loading dataset
loading all case properties
unpacking dataset
done
weight_decay: 3e-05
2021-08-20 17:30:57.956403: lr: 0.01
using pin_memory on device 0
using pin_memory on device 0
2021-08-20 17:32:23.117584: Unable to plot network architecture:
2021-08-20 17:32:31.011488: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 10.92 GiB total capacity; 8.80 GiB already allocated; 401.00 MiB free; 9.75 GiB reserved in total by PyTorch)
2021-08-20 17:32:31.011869:
printing the network instead:

2021-08-20 17:32:31.012049: Generic_UNetPlusPlus(
(loc0): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(640, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(768, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(2): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(512, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(3): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(4): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(192, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc1): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(512, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(384, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(2): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(3): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(160, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc2): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(192, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(2): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc3): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(96, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc4): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(conv_blocks_context): ModuleList(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(2): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(3): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 256, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(4): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(5): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(td): ModuleList()
(up0): ModuleList(
(0): ConvTranspose3d(320, 320, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(2): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(3): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(4): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up1): ModuleList(
(0): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(2): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(3): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up2): ModuleList(
(0): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(2): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up3): ModuleList(
(0): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up4): ModuleList(
(0): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(seg_outputs): ModuleList(
(0): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(1): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(2): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(3): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(4): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
)
)
2021-08-20 17:32:31.022252:

2021-08-20 17:32:31.371178:
epoch: 0
Traceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in
sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')())
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main
trainer.run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training
ret = super().run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training
super(nnUNetTrainer, self).run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 491, in run_training
l = self.run_iteration(self.tr_gen, True)
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 240, in run_iteration
output = self.network(data)
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/network_architecture/generic_UNetPlusPlus.py", line 417, in forward
x0_4 = self.loc1[3](torch.cat([x0_0, x0_1, x0_2, x0_3, self.up13], 1))
RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 10.92 GiB total capacity; 8.69 GiB already allocated; 487.00 MiB free; 9.66 GiB reserved in total by PyTorch)
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop
raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look "
RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full
Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop
raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look "
RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full

Please cite the following paper when using nnUNet:
Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020).
If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

###############################################
I am running the following nnUNet: 3d_fullres
My trainer class is: <class 'nnunet.training.network_training.nnUNetPlusPlusTrainerV2.nnUNetPlusPlusTrainerV2'>
For that I will be using the following configuration:
num_classes: 2
modalities: {0: 'CT'}
use_mask_for_norm OrderedDict([(0, False)])
keep_only_largest_region None
min_region_size_per_class None
min_size_per_class None
normalization_schemes OrderedDict([(0, 'CT')])
stages...

stage: 0
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([195, 207, 207]), 'current_spacing': array([2.473119 , 1.89831205, 1.89831205]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([482, 512, 512]), 'current_spacing': array([1. , 0.76757812, 0.76757812]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans
I am using batch dice + CE loss

I am using data from this folder: /mnt2/heyupeng_2020/environment_variables/nnUNet_preprocessed/Task003_Liver/nnUNetData_plans_v2.1
###############################################
loading dataset
loading all case properties
unpacking dataset
done
weight_decay: 3e-05
2021-08-20 17:33:23.698772: lr: 0.01
^CTraceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in
sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')())
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main
trainer.run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training
ret = super().run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training
super(nnUNetTrainer, self).run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 453, in run_training
_ = self.tr_gen.next()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 190, in next
return self.next()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 211, in next
self._start()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 246, in _start
with threadpool_limits(limits=1, user_api="blas"):
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 171, in init
self._original_info = self._set_threadpool_limits()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits
modules = _ThreadpoolInfo(prefixes=self._prefixes,
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 340, in init
self._load_modules()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 375, in _load_modules
self._find_modules_with_dl_iterate_phdr()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 387, in _find_modules_with_dl_iterate_phdr
libc = self._get_libc()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 553, in _get_libc
libc_name = find_library("c")
File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 350, in find_library
_findSoname_ldconfig(name) or
File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 290, in _findSoname_ldconfig
with subprocess.Popen(['/sbin/ldconfig', '-p'],
File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 1662, in _execute_child
part = os.read(errpipe_read, 50000)
KeyboardInterrupt
^C
heyupeng_2020@irip-114:~$

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 8106 C python 11075MiB |
| 1 8106 C python 6681MiB |
| 2 8106 C python 6681MiB |
| 3 8106 C python 6681MiB |
| 6 30886 C python 6115MiB |
| 7 20708 C python 9571MiB |
+-----------------------------------------------------------------------------+
heyupeng_2020@irip-114:~$

The text was updated successfully, but these errors were encountered:

ChibisukeDragon · 2021-08-20T14:18:54Z

I think the “GPU0” in the log is my GPU 4...
So I should not use 1080ti to train this model...

ChibisukeDragon closed this as completed Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to train on my designated GPU? #68

how to train on my designated GPU? #68

ChibisukeDragon commented Aug 20, 2021 •

edited

ChibisukeDragon commented Aug 20, 2021

how to train on my designated GPU? #68

how to train on my designated GPU? #68

Comments

ChibisukeDragon commented Aug 20, 2021 • edited

ChibisukeDragon commented Aug 20, 2021

ChibisukeDragon commented Aug 20, 2021 •

edited