"Expected more than 1 value per channel when training.." #14

Esaada · 2019-08-24T10:09:09Z

I'm tyring to run using this command line:
CUDA_VISIBLE_DEVICES=0 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 2
This is after your post on how to solve memory issues, I'm still getting memory error, apparently, it's not enough for me.

After adding another GPU means running this command line:
CUDA_VISIBLE_DEVICES=0,1 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 2
I'm getting this error:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/workdisk/AutoML/auto_deeplab.py", line 348, in forward
aspp_result_4 = self.aspp_4 (self.level_4[-1])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/workdisk/AutoML/operations.py", line 164, in forward
conv_image_pool = self.conv_p(image_pool)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 20, 1, 1])

Any Idea?
Thanks!

iariav · 2019-08-25T04:20:51Z

it is a known issue, not specific to this repository.
batchnorm requires a batch size > 1 in train mode or it will produce an error like the one you got.
see this for instance

Esaada · 2019-08-27T15:05:26Z

I had a feeling it's somehting like this, but even when I'm using --filter_multiplier 4 --resize 358 --crop_size 224 (which is a decrease from the initial parameters) I don't have space for 2 batchs on 1 gpu.
And I have a 12GB gpu.
Any Idea?

NoamRosenberg · 2019-08-28T04:03:15Z

@BarakBat a fix has been merged into the master branch. Let us know if you still have any trouble.

zhizhangxian mentioned this issue Aug 26, 2019

Test1 #15

Merged

NoamRosenberg closed this as completed Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Expected more than 1 value per channel when training.." #14

"Expected more than 1 value per channel when training.." #14

Esaada commented Aug 24, 2019

iariav commented Aug 25, 2019

Esaada commented Aug 27, 2019

NoamRosenberg commented Aug 28, 2019

"Expected more than 1 value per channel when training.." #14

"Expected more than 1 value per channel when training.." #14

Comments

Esaada commented Aug 24, 2019

iariav commented Aug 25, 2019

Esaada commented Aug 27, 2019

NoamRosenberg commented Aug 28, 2019