Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: The size of tensor a (96774) must match the size of tensor b (290322) at non-singleton dimension 0 #6

Closed
lamhoangtung opened this issue Jul 5, 2019 · 25 comments

Comments

@lamhoangtung
Copy link

I'm trying to rerun your code but I encounter this:

linus@srv-aws:~/2DOCR/ultra_high_resolution_segmentation$ ./train_deep_globe.sh
fpn_global.508_4.28.2019_lr2e5
mode: 1 evaluation: False test: False
preparing datasets and dataloaders......
creating models......
Using poly LR Scheduler!
start training......
  0%|                                                                                                                                                                                                                 | 0/215 [00:00<?, ?it/s]
=>Epoches 0, learning rate = 0.0000500,                 previous best = 0.0000
Traceback (most recent call last):
  File "train_deep_globe.py", line 105, in <module>
    loss = trainer.train(sample_batched, model, global_fixed)
  File "/mnt/data/linus/2DOCR/ultra_high_resolution_segmentation/helper.py", line 346, in train
    loss = self.criterion(outputs_global, labels_glb)
  File "train_deep_globe.py", line 85, in <lambda>
    criterion = lambda x,y: criterion1(x, y)
  File "/mnt/data/linus/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/data/linus/2DOCR/ultra_high_resolution_segmentation/utils/loss.py", line 57, in forward
    probs = (probs * target).sum(1)
RuntimeError: The size of tensor a (96774) must match the size of tensor b (290322) at non-singleton dimension 0

Any ideal how to fix this @chenwydj ? Thanks a lot.

lamhoangtung pushed a commit to lamhoangtung/GLNet that referenced this issue Jul 8, 2019
lamhoangtung added a commit to lamhoangtung/GLNet that referenced this issue Jul 8, 2019
lamhoangtung pushed a commit to lamhoangtung/GLNet that referenced this issue Jul 11, 2019
@lamhoangtung
Copy link
Author

lamhoangtung commented Jul 11, 2019

Hi @chenwydj, thanks for your amazing works. I'm trying to rerun your experiments on DeepGlobe.
Since I can't find any mask label for the validation set and test set, I splited some sample from training set for validation and testing.
Currently I'm facing the issue above: RuntimeError: The size of tensor a (96774) must match the size of tensor b (290322) at non-singleton dimension 0.
After debugging for a while I found that the input and the target for focal loss calculation don't matching shape, the predicted tensor have 7 channel but the label mask only have 3.
For now, I'm trying to fix it by writing my own code to create the labels mask for loss calculation. You can see my code in commit 0cad452. I really surprised that you have the code to do that but haven't use it anywhere ??? So I did leverage it a lot ;)
But up to this point I'm facing a new issue like this:

Traceback (most recent call last):
  File "train_deep_globe.py", line 129, in <module>
    loss = trainer.train(sample_batched, model, global_fixed)
  File "/root/ultra_high_resolution_segmentation/helper.py", line 429, in train
    self.metrics_global.update(labels_npy, predictions_global)
  File "/root/ultra_high_resolution_segmentation/utils/metrics.py", line 23, in update
    tmp = self._fast_hist(lt.flatten(), lp.flatten(), self.n_classes)
  File "/root/ultra_high_resolution_segmentation/utils/metrics.py", line 18, in _fast_hist
    hist = np.bincount(n_class * label_true[mask].astype(int) + label_pred[mask], minlength=n_class**2).reshape(n_class, n_class)
IndexError: boolean index did not match indexed array along dimension 0; dimension is 5992704 but corresponding boolean dimension is 17978112

So I've been thinking my fix might not be how thing should be ...
Do you have any ideal to resolve this ... Can you take a look at the code (it's all available here: https://github.com/lamhoangtung/ultra_high_resolution_segmentation/tree/deepglobe, on deepglobe branch)

Thanks ;)

@zxshi
Copy link

zxshi commented Jul 16, 2019

I have encountered the same problem, have you solved it?

@lamhoangtung
Copy link
Author

@zxshi I haven't :3 ...
And also no respond from the author :(( ...

@bigKoki
Copy link

bigKoki commented Jul 30, 2019

have you solved it?

@lamhoangtung
Copy link
Author

have you solved it?

Solved for binary segmantation

@bigKoki
Copy link

bigKoki commented Jul 31, 2019

can you tell me how to solve it?do you still use this code?

@lamhoangtung
Copy link
Author

can you tell me how to solve it?do you still use this code?

I solve it by making sure that the predicted tensor tensor and the target mask tensor have the same number of channel.

You can see the code at https://github.com/lamhoangtung/ultra_high_resolution_segmentation/tree/master

But keep in mind this only work for binary semantic segmantation yet

@bigKoki
Copy link

bigKoki commented Jul 31, 2019

can you tell me how to solve it?do you still use this code?

I solve it by making sure that the predicted tensor tensor and the target mask tensor have the same number of channel.

You can see the code at https://github.com/lamhoangtung/ultra_high_resolution_segmentation/tree/master

But keep in mind this only work for binary semantic segmantation yet

thank you very much!!!it will help

@zzx0836
Copy link

zzx0836 commented Jul 31, 2019

@lamhoangtung when I train mode=3 I have a problem:AttributeError: 'Trainer' object has no attribute 'template'?
Can you give me some help?
thanks

@bigKoki
Copy link

bigKoki commented Aug 1, 2019

hi,I want to study this code,and i didn't get a profound comprehension to the Semantic segmentation ,could you please explain the function and distinguish in mode(1,2,3),thank you very much!

@bigKoki
Copy link

bigKoki commented Aug 1, 2019

Hi GLnet is a segmented network model that combine global information with local information. It is divided into global and local networks. In the process of training, interaction between global information and local information fusion. mode=1:train global mode=2:train local form global mode=3 train global from local.

你好,我想你应该是中国人吧,我最近才看这个,知道这个网络结构主要是由全局信息融合局部信息,但是在代码里看到了有三种mode,这三种mode是网络结构不同吗,还是训练的结果不一样?
麻烦您能讲解一下吗,多谢了!

@bigKoki
Copy link

bigKoki commented Aug 1, 2019

@lamhoangtung when I train mode=3 I have a problem:AttributeError: 'Trainer' object has no attribute 'template'?
Can you give me some help?
thanks

多谢讲解!!!让我有了进一步的理解

@lamhoangtung
Copy link
Author

@lamhoangtung when I train mode=3 I have a problem:AttributeError: 'Trainer' object has no attribute 'template'?
Can you give me some help?
thanks

I'm trying to train with mode 3 now but I'm getting NAN loss because the forward pass are giving me NAN tensor.

Still looking into it for now ;). Code at the master branch of my fork

@lamhoangtung
Copy link
Author

OK,Thanks.If you solve all the problems, please provide the details of the training, thank you very much ------------------ 原始邮件 ------------------ 发件人: "Hoàng Tùng Lâm (Linus)"notifications@github.com; 发送时间: 2019年8月2日(星期五) 晚上6:23 收件人: "chenwydj/ultra_high_resolution_segmentation"ultra_high_resolution_segmentation@noreply.github.com; 抄送: "1416055750"1416055750@qq.com;"Comment"comment@noreply.github.com; 主题: Re: [chenwydj/ultra_high_resolution_segmentation] RuntimeError: Thesize of tensor a (96774) must match the size of tensor b (290322) atnon-singleton dimension 0 (#6) @lamhoangtung when I train mode=3 I have a problem:AttributeError: 'Trainer' object has no attribute 'template'? Can you give me some help? thanks I'm trying to train with mode 3 now but I'm getting NAN loss because the forward pass are giving me NAN tensor. Still looking into it for now ;). Code at the master branch of my fork — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Use my fork code, branch master ;)

@lamhoangtung
Copy link
Author

@zzx0836 I fixed that in e3c5128 ;). Mode 3 are going well now :P

@zzx0836
Copy link

zzx0836 commented Aug 3, 2019 via email

@wangbyz
Copy link

wangbyz commented Aug 13, 2019

Hi, have you fix this problem ?
I want to run this code in mode 1 but meet the same problem.

@bigKoki
Copy link

bigKoki commented Aug 23, 2019

hi,could you please tell me which dataset you used in this code,i see that the author didnt provide the training code for AerialImageDataset。

@zzx0836
Copy link

zzx0836 commented Aug 30, 2019 via email

@bigKoki
Copy link

bigKoki commented Aug 30, 2019

@zzx0836,thank you,i use the code which forked by lumhoangtung,and it woks

@zzx0836
Copy link

zzx0836 commented Aug 30, 2019 via email

@bigKoki
Copy link

bigKoki commented Sep 2, 2019

hi,can you run it on deepglobe dataset? I have used your code on AerialImageDataset,but failed on deepglobe

@lamhoangtung
Copy link
Author

hi,can you run it on deepglobe dataset? I have used your code on AerialImageDataset,but failed on deepglobe

No I can't :3

@chenwydj
Copy link
Collaborator

chenwydj commented Oct 1, 2019

Hi! We updated the instructions for training and evaluation in the Readme, and also uploaded the pretrained model for the Deep Globe. The updated training and evaluation bash scripts work on my side. Please take a look and give a try! :)

@lamhoangtung
Copy link
Author

Thanks a lot @chenwydj. It worked ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants