Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensor size not match #31

Closed
JingweiZhang12 opened this issue Sep 3, 2019 · 4 comments
Closed

tensor size not match #31

JingweiZhang12 opened this issue Sep 3, 2019 · 4 comments
Assignees

Comments

@JingweiZhang12
Copy link

Before saying the problems, I show my running environment:
pytorch version 1.0.1.post2
cuda version 8.0.61
cudnn version: 7102
python version: 3.7

I run the latest code, two problems bother me:
1.Traceback (most recent call last):
File "train_autodeeplab.py", line 20, in
import apex
File "/home/zhangjw/anaconda3/lib/python3.7/site-packages/apex/init.py", line 18, in
from apex.interfaces import (ApexImplementation,
File "/home/zhangjw/anaconda3/lib/python3.7/site-packages/apex/interfaces.py", line 10, in
class ApexImplementation(object):
File "/home/zhangjw/anaconda3/lib/python3.7/site-packages/apex/interfaces.py", line 14, in ApexImplementation
implements(IApex)
File "/home/zhangjw/anaconda3/lib/python3.7/site-packages/zope/interface/declarations.py", line 483, in implements
raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3. Use the @Implementer class decorator instead.

Then I comment related code and set the APEX_AVAILABLE=False. The code continues to run, but the problems 2 occurs

  1. Traceback (most recent call last):
    File "train_autodeeplab.py", line 421, in
    main()
    File "train_autodeeplab.py", line 414, in main
    trainer.training(epoch)
    File "train_autodeeplab.py", line 176, in training
    output = self.model(image)
    File "/home/zhangjw/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
    result = self.forward(*input, **kwargs)
    File "/home/zhangjw/AutoML/auto_deeplab.py", line 282, in forward
    normalized_alphas)
    File "/home/zhangjw/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
    result = self.forward(*input, **kwargs)
    File "/home/zhangjw/AutoML/cell_level_search.py", line 138, in forward
    s = sum(new_states)
    RuntimeError: The size of tensor a (13) must match the size of tensor b (14) at non-singleton dimension 3
    It's so weird, so I print the new_states[0].shape, as following:
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 16, 14, 14])
    torch.Size([2, 32, 7, 7])
    torch.Size([2, 32, 7, 7])
    torch.Size([2, 32, 7, 7])
    torch.Size([2, 32, 7, 7])
    torch.Size([2, 32, 7, 7])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 4, 56, 56])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 8, 28, 28])
    torch.Size([2, 16, 13, 13])

when the last line torch.Size([2, 16, 13, 13]) comes, RuntimeError occurs.It's so weird.
Have you encountered this situation? I would appreciate it if you tell me how to solve it.

@NoamRosenberg
Copy link
Owner

@JingweiZhang12 hi, we’re running this code daily and haven’t seen this error yet. Can you give us the details of your run? Hyper parameter setting ect. Sorry about this, we’ll try to get it working again quick. And do let us know if you find anything.

@albert-ba
Copy link

I encountered something similar when I use: --crop_size 112.
I don't know if it helps..
One I got it back to 224 everything was ok

@iariav
Copy link
Collaborator

iariav commented Sep 5, 2019

@JingweiZhang12
Hi, this first issue you described with apex seems like an apex installation issue and it's not related specifically to this repository. please see NVIDIA/apex#116. could you try to uninstall apex and then reinstall and see it this solves the issue?

regarding the tensor size mis-match, i think @albert-ba might be right. the network can't accept any arbitrary input size, since it might cause size mis-match during one of the down-sampling \ up-sampling operations. could you please share the hyper-parameters you used in your run?

@JingweiZhang12
Copy link
Author

@iariav
Thank for your helpful information. I refer to NVIDIA/apex#116 and reinstall apex. The module apex can be used now.
As for the tensor size mis-match, I change the --crop-size from 224 to 256, and it works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants