Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to train on my designated GPU? #68

Closed
ChibisukeDragon opened this issue Aug 20, 2021 · 1 comment
Closed

how to train on my designated GPU? #68

ChibisukeDragon opened this issue Aug 20, 2021 · 1 comment

Comments

@ChibisukeDragon
Copy link

ChibisukeDragon commented Aug 20, 2021

I want to train this model on GPU 4...
I used this command:

for FOLD in 0 1 2 3 4 do CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD done

But it always tried to allocate space on GPU 0.

heyupeng_2020@irip-114:~$ for FOLD in 0 1 2 3 4

do
CUDA_VISIBLE_DEVICES=4 nnUNet_train 3d_fullres nnUNetPlusPlusTrainerV2 Task003_Liver $FOLD
done

Please cite the following paper when using nnUNet:
Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020).
If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

###############################################
I am running the following nnUNet: 3d_fullres
My trainer class is: <class 'nnunet.training.network_training.nnUNetPlusPlusTrainerV2.nnUNetPlusPlusTrainerV2'>
For that I will be using the following configuration:
num_classes: 2
modalities: {0: 'CT'}
use_mask_for_norm OrderedDict([(0, False)])
keep_only_largest_region None
min_region_size_per_class None
min_size_per_class None
normalization_schemes OrderedDict([(0, 'CT')])
stages...

stage: 0
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([195, 207, 207]), 'current_spacing': array([2.473119 , 1.89831205, 1.89831205]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([482, 512, 512]), 'current_spacing': array([1. , 0.76757812, 0.76757812]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans
I am using batch dice + CE loss

I am using data from this folder: /mnt2/heyupeng_2020/environment_variables/nnUNet_preprocessed/Task003_Liver/nnUNetData_plans_v2.1
###############################################
loading dataset
loading all case properties
unpacking dataset
done
weight_decay: 3e-05
2021-08-20 17:30:57.956403: lr: 0.01
using pin_memory on device 0
using pin_memory on device 0
2021-08-20 17:32:23.117584: Unable to plot network architecture:
2021-08-20 17:32:31.011488: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 10.92 GiB total capacity; 8.80 GiB already allocated; 401.00 MiB free; 9.75 GiB reserved in total by PyTorch)
2021-08-20 17:32:31.011869:
printing the network instead:

2021-08-20 17:32:31.012049: Generic_UNetPlusPlus(
(loc0): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(640, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(768, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(2): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(512, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(3): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(4): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(192, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc1): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(512, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(384, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(2): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(3): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(160, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc2): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(192, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(2): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc3): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(96, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(loc4): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(conv_blocks_context): ModuleList(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(2): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(3): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(128, 256, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(4): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(256, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(5): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv3d(320, 320, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(instnorm): InstanceNorm3d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(td): ModuleList()
(up0): ModuleList(
(0): ConvTranspose3d(320, 320, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(2): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(3): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(4): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up1): ModuleList(
(0): ConvTranspose3d(320, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(2): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(3): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up2): ModuleList(
(0): ConvTranspose3d(256, 128, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(2): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up3): ModuleList(
(0): ConvTranspose3d(128, 64, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(up4): ModuleList(
(0): ConvTranspose3d(64, 32, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
)
(seg_outputs): ModuleList(
(0): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(1): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(2): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(3): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(4): Conv3d(32, 3, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
)
)
2021-08-20 17:32:31.022252:

2021-08-20 17:32:31.371178:
epoch: 0
Traceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in
sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')())
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main
trainer.run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training
ret = super().run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training
super(nnUNetTrainer, self).run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 491, in run_training
l = self.run_iteration(self.tr_gen, True)
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 240, in run_iteration
output = self.network(data)
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/network_architecture/generic_UNetPlusPlus.py", line 417, in forward
x0_4 = self.loc1[3](torch.cat([x0_0, x0_1, x0_2, x0_3, self.up13], 1))
RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 10.92 GiB total capacity; 8.69 GiB already allocated; 487.00 MiB free; 9.66 GiB reserved in total by PyTorch)
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop
raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look "
RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full
Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 99, in results_loop
raise RuntimeError("Someone died. Better end this madness. This is not the actual error message! Look "
RuntimeError: Someone died. Better end this madness. This is not the actual error message! Look further up your stdout to see what caused the error. Please also check whether your RAM was full

Please cite the following paper when using nnUNet:
Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein "Automated Design of Deep Learning Methods for Biomedical Image Segmentation" arXiv preprint arXiv:1904.08128 (2020).
If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

###############################################
I am running the following nnUNet: 3d_fullres
My trainer class is: <class 'nnunet.training.network_training.nnUNetPlusPlusTrainerV2.nnUNetPlusPlusTrainerV2'>
For that I will be using the following configuration:
num_classes: 2
modalities: {0: 'CT'}
use_mask_for_norm OrderedDict([(0, False)])
keep_only_largest_region None
min_region_size_per_class None
min_size_per_class None
normalization_schemes OrderedDict([(0, 'CT')])
stages...

stage: 0
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([195, 207, 207]), 'current_spacing': array([2.473119 , 1.89831205, 1.89831205]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

stage: 1
{'batch_size': 2, 'num_pool_per_axis': [5, 5, 5], 'patch_size': array([128, 128, 128]), 'median_patient_size_in_voxels': array([482, 512, 512]), 'current_spacing': array([1. , 0.76757812, 0.76757812]), 'original_spacing': array([1. , 0.76757812, 0.76757812]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}

I am using stage 1 from these plans
I am using batch dice + CE loss

I am using data from this folder: /mnt2/heyupeng_2020/environment_variables/nnUNet_preprocessed/Task003_Liver/nnUNetData_plans_v2.1
###############################################
loading dataset
loading all case properties
unpacking dataset
done
weight_decay: 3e-05
2021-08-20 17:33:23.698772: lr: 0.01
^CTraceback (most recent call last):
File "/home/heyupeng_2020/anaconda3/bin/nnUNet_train", line 33, in
sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')())
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/run/run_training.py", line 148, in main
trainer.run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetPlusPlusTrainerV2.py", line 422, in run_training
ret = super().run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/nnUNetTrainer.py", line 316, in run_training
super(nnUNetTrainer, self).run_training()
File "/home/heyupeng_2020/HUAWEI/UNetPlusPlus/pytorch/nnunet/training/network_training/network_trainer.py", line 453, in run_training
_ = self.tr_gen.next()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 190, in next
return self.next()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 211, in next
self._start()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 246, in _start
with threadpool_limits(limits=1, user_api="blas"):
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 171, in init
self._original_info = self._set_threadpool_limits()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits
modules = _ThreadpoolInfo(prefixes=self._prefixes,
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 340, in init
self._load_modules()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 375, in _load_modules
self._find_modules_with_dl_iterate_phdr()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 387, in _find_modules_with_dl_iterate_phdr
libc = self._get_libc()
File "/home/heyupeng_2020/anaconda3/lib/python3.8/site-packages/threadpoolctl.py", line 553, in _get_libc
libc_name = find_library("c")
File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 350, in find_library
_findSoname_ldconfig(name) or
File "/home/heyupeng_2020/anaconda3/lib/python3.8/ctypes/util.py", line 290, in _findSoname_ldconfig
with subprocess.Popen(['/sbin/ldconfig', '-p'],
File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/heyupeng_2020/anaconda3/lib/python3.8/subprocess.py", line 1662, in _execute_child
part = os.read(errpipe_read, 50000)
KeyboardInterrupt
^C
heyupeng_2020@irip-114:~$


heyupeng_2020@irip-114:~$ nvidia-smi
Fri Aug 20 17:37:16 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 38% 59C P2 76W / 250W | 11085MiB / 11178MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A |
| 30% 65C P2 84W / 250W | 6691MiB / 11178MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:07:00.0 Off | N/A |
| 33% 67C P2 86W / 250W | 6691MiB / 11178MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A |
| 34% 55C P2 76W / 250W | 6691MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... Off | 00000000:0C:00.0 Off | N/A |
| 18% 37C P0 60W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... Off | 00000000:0D:00.0 Off | N/A |
| 14% 42C P0 63W / 250W | 0MiB / 11178MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... Off | 00000000:0E:00.0 Off | N/A |
| 51% 79C P2 95W / 250W | 6809MiB / 11178MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... Off | 00000000:0F:00.0 Off | N/A |
| 54% 84C P2 267W / 250W | 9581MiB / 11178MiB | 100% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 8106 C python 11075MiB |
| 1 8106 C python 6681MiB |
| 2 8106 C python 6681MiB |
| 3 8106 C python 6681MiB |
| 6 30886 C python 6115MiB |
| 7 20708 C python 9571MiB |
+-----------------------------------------------------------------------------+
heyupeng_2020@irip-114:~$

@ChibisukeDragon
Copy link
Author

I think the “GPU0” in the log is my GPU 4...
So I should not use 1080ti to train this model...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant