Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Expected more than 1 value per channel when training.." #14

Closed
Esaada opened this issue Aug 24, 2019 · 3 comments
Closed

"Expected more than 1 value per channel when training.." #14

Esaada opened this issue Aug 24, 2019 · 3 comments

Comments

@Esaada
Copy link

Esaada commented Aug 24, 2019

I'm tyring to run using this command line:
CUDA_VISIBLE_DEVICES=0 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 2
This is after your post on how to solve memory issues, I'm still getting memory error, apparently, it's not enough for me.

After adding another GPU means running this command line:
CUDA_VISIBLE_DEVICES=0,1 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 2
I'm getting this error:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/workdisk/AutoML/auto_deeplab.py", line 348, in forward
aspp_result_4 = self.aspp_4 (self.level_4[-1])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/workdisk/AutoML/operations.py", line 164, in forward
conv_image_pool = self.conv_p(image_pool)

File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 20, 1, 1])

Any Idea?
Thanks!

@iariav
Copy link
Collaborator

iariav commented Aug 25, 2019

it is a known issue, not specific to this repository.
batchnorm requires a batch size > 1 in train mode or it will produce an error like the one you got.
see this for instance

@zhizhangxian zhizhangxian mentioned this issue Aug 26, 2019
@Esaada
Copy link
Author

Esaada commented Aug 27, 2019

I had a feeling it's somehting like this, but even when I'm using --filter_multiplier 4 --resize 358 --crop_size 224 (which is a decrease from the initial parameters) I don't have space for 2 batchs on 1 gpu.
And I have a 12GB gpu.
Any Idea?

@NoamRosenberg
Copy link
Owner

@BarakBat a fix has been merged into the master branch. Let us know if you still have any trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants