Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object __array__ method not producing an array #72

Open
hepinghu opened this issue Sep 25, 2018 · 20 comments
Open

object __array__ method not producing an array #72

hepinghu opened this issue Sep 25, 2018 · 20 comments

Comments

@hepinghu
Copy link

train epoch..
test epoch..
train epoch
.
.
.
1671it [11:17, 2.47it/s]'>' not supported between instances of 'float' and 'NoneType'
1675it [11:18, 2.47it/s]Traceback (most recent call last):
File "train.py", line 132, in
fire.Fire()
File "/home/hhp/.local/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/hhp/.local/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/hhp/.local/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "train.py", line 81, in train
trainer.train_step(img, bbox, label, scale)
File "/home/hhp/project/faster_rcnn_cervix/trainer.py", line 168, in train_step
losses = self.forward(imgs, bboxes, labels, scale)
File "/home/hhp/project/faster_rcnn_cervix/trainer.py", line 99, in forward
self.faster_rcnn.rpn(features, img_size, scale)
File "/home/hhp/.conda/envs/chainer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/hhp/project/faster_rcnn_cervix/model/region_proposal_network.py", line 127, in forward
scale=scale)
File "/home/hhp/project/faster_rcnn_cervix/model/utils/creator_tool.py", line 429, in call
roi = roi[keep]
ValueError: object array method not producing an array
This problem occurred at the Kth epoch of the training. Thanks for you help.

@SystemErrorWang
Copy link

Met the same problem in creator_tool.py and non_maximum_suppression.py. Would be very grateful if there is any help or useful information

@ChuckGithub
Copy link

ChuckGithub commented Oct 10, 2018

Is this problem solved?
@hepinghu @SystemErrorWang

@foreverzzx
Copy link

I also met this problem. And when I check the code, I found that the loc produced from rpn is -inf. I really don't konw where the errors exist

@hepinghu
Copy link
Author

hepinghu commented Oct 29, 2018 via email

@ChuckGithub
Copy link

I changed the pytorch version 0.4 it’s ok

@hepinghu
Copy link
Author

hepinghu commented Oct 29, 2018 via email

@abhishekcvedia
Copy link

@hepinghu @ChuckGithub @foreverzzx Any solution to this? My PyTorch version is 0.4 but I am still having this issue.

@ChuckGithub
Copy link

sry

@PedroCastro
Copy link

Hey guys, I was having the same issue. The Proposal Creator was removing all roi's due to them being smaller than the min_size. Therefore, no roi's were being fed to the RPN which led to this error.

I later found out that my pre-processing was being done incorrectly which was leading into unreasonable values at that point and everything was getting pruned (due to my pretrained network). So make sure you are performing the image correct pre-processing.

Hope this helps!

@AngelaDevHao
Copy link

Hey guys, I was having the same issue. The Proposal Creator was removing all roi's due to them being smaller than the min_size. Therefore, no roi's were being fed to the RPN which led to this error.

I later found out that my pre-processing was being done incorrectly which was leading into unreasonable values at that point and everything was getting pruned (due to my pretrained network). So make sure you are performing the image correct pre-processing.

Hope this helps!

thank you! it works!

@blateyang
Copy link

I also met this problem and my PyTorch version is 0.4.1. @F0lha @AngelaDevHao I didn't change the source code. So how could the pre-processing been done incorrectly? Can you show me some more details? Many Thanks!

@howardyclo
Copy link

howardyclo commented Dec 11, 2018

TL;DR: Make sure the width and height of your bboxes are greater than zero!

So, I also ran into this issues (I am experimenting with my own dataset and using pytorch 1.0.0)

As @foreverzzx mentioned, this error is caused by the loc produced from rpn is filled with -inf and nan.

This is the my tracing process:

  1. I further traced the code and found that it's actually produced by x (rpn's input), which is all nan, and this is caused by the VGG16 feature extractor. (Caused by the 10th layer conv layer and later conv layers in my case).
  2. I checked the problematic conv layer's weight and it all becomes nan, and this may caused by the optimization bug.
  3. I logged the losses in trainer.py to check whether there's nan or -inf, and found that rpn_loc_loss has nans.
  4. Further, I found that gt_rpn_loc produced by gt_rpn_loc, gt_rpn_label = anchor_target_creator contains some -inf.
  5. I found that this is caused by bbox2loc() in bbox_tool.py which computes the offset and scales given the source and target bboxes.
  6. In my case, I found that some of my bbox's heights (base_height in bbox2loc()) are zeros, which is the problem.
  7. Finally I made sure that all the bboxes have no zero heights or weights, which solves the bug.

Hope my trace can help you guys. @blateyang

@HogFeet
Copy link

HogFeet commented Dec 15, 2018

In my case, it caused by bbox generating nan sometimes. Changing lr from 1e-3 to 1e-4 will solve the problem.

@blateyang
Copy link

Thanks for advice from @howardyclo and @HogFeet . I made sure that all the bboxes have valid heights or weights and I also changed lr from 1e-3 to 1e-4. Finally the problem have been solved.

@zjuPeco
Copy link

zjuPeco commented Jan 15, 2019

Thank you, @howardyclo !
I changed

height = src_bbox[:, 2] - src_bbox[:, 0]
width = src_bbox[:, 3] - src_bbox[:, 1]

base_height = dst_bbox[:, 2] - dst_bbox[:, 0]
base_width = dst_bbox[:, 3] - dst_bbox[:, 1]

to

height = src_bbox[:, 2] - src_bbox[:, 0] + 1
width = src_bbox[:, 3] - src_bbox[:, 1] + 1

base_height = dst_bbox[:, 2] - dst_bbox[:, 0] + 1
base_width = dst_bbox[:, 3] - dst_bbox[:, 1] + 1

in function bbox2loc() in bbox_tool.py and the problem is solved.

@pazlvbanke
Copy link

pazlvbanke commented Nov 16, 2019

try to replace function _smooth_l1_loss with this code.

def _smooth_l1_loss(x, t, in_weight, sigma):
    sigma2 = sigma ** 2
    diff = in_weight * (x - t)
    abs_diff = diff.abs()

    flag = (abs_diff.data < (1. / sigma2)).float()
    y = (flag * (sigma2 / 2.) * (diff ** 2) +
         (1 - flag) * (abs_diff - 0.5 / sigma2))
    modif = tonumpy(y)
    modif[np.isnan(modif)] = 0
    return modif.sum()

The problem caused by nan value of loss function. Some values of tensor y becomes nan. So I just replaced nan values with zeros. Then I just sum the rest of tensor values and get positive number of Loss function which is also positive but isn't nan. Cant's say if it's sciencely correct, but training don't crush anymore.

@Xxxxxxxi
Copy link

try to replace function _smooth_l1_loss with this code.

def _smooth_l1_loss(x, t, in_weight, sigma):
    sigma2 = sigma ** 2
    diff = in_weight * (x - t)
    abs_diff = diff.abs()

    flag = (abs_diff.data < (1. / sigma2)).float()
    y = (flag * (sigma2 / 2.) * (diff ** 2) +
         (1 - flag) * (abs_diff - 0.5 / sigma2))
    modif = tonumpy(y)
    modif[np.isnan(modif)] = 0
    return modif.sum()

The problem caused by nan value of loss function. Some values of tensor y becomes nan. So I just replaced nan values with zeros. Then I just sum the rest of tensor values and get positive number of Loss function which is also positive but isn't nan. Cant's say if it's sciencely correct, but training don't crush anymore.

thank you!!!!!!

@evanBear
Copy link

尝试用此代码替换函数**_smooth_l1_loss**。

def _smooth_l1_loss(x, t, in_weight, sigma):
    sigma2 = sigma ** 2
    diff = in_weight * (x - t)
    abs_diff = diff.abs()

    flag = (abs_diff.data < (1. / sigma2)).float()
    y = (flag * (sigma2 / 2.) * (diff ** 2) +
         (1 - flag) * (abs_diff - 0.5 / sigma2))
    modif = tonumpy(y)
    modif[np.isnan(modif)] = 0
    return modif.sum()

由损失函数的nan值引起的问题。张量y的某些值变为nan。所以我只是用零替换了nan值。然后我将其余的张量值求和,得到正数的Loss函数,它也是正数,但不是nan。不能说这在科学上是正确的,但是培训不再是迷恋。

谢谢!!!!!!
modif = tonumpy(y) tonumpy what's this

@bufferXia
Copy link

I think the problem is in the ProposalCreator().This function will filter out the inappropriate anchor.So when the rpn's out is unnoraml,this will cause the roi unnormal after the loc2bbox(anchor, loc).The code after loc2bbox() will filter the anchor which area is smaller then min_size.So when the lr or loss function is not fit ,after several iterations,lot of anchor area will become unnormal and the num of anchor after filter will become 0. Finally the error will appear on the roi = roi[keep],because the len(roi) is 0.
So I think we can change the LR, adjust the loss function or use the clip() to the loc.This is just my idea,.

   roi = loc2bbox(anchor, loc)

    # Clip predicted boxes to image. 
    roi[:, slice(0, 4, 2)] = np.clip(              
        roi[:, slice(0, 4, 2)], 0, img_size[0])
    roi[:, slice(1, 4, 2)] = np.clip(              
        roi[:, slice(1, 4, 2)], 0, img_size[1])

    # Remove predicted boxes with either height or width < threshold.
    min_size = self.min_size * scale       
    hs = roi[:, 2] - roi[:, 0]
    ws = roi[:, 3] - roi[:, 1]
    keep = np.where((hs >= min_size) & (ws >= min_size))[0]
    before_size = len(roi)
    roi = roi[keep, :]
    after_size = len(roi)
    score = score[keep]
    print("before_size:",before_size," after_size:",after_size)

@willy2cqu
Copy link

Hi guys, I also met the error when I use Tensorflow2.2-cpu. I solve the question by upgrading numpy version from 1.15.4 to 1.19.2(the latest version).
Moreover, make sure that only one numpy is installed. I find that install via conda and pip will add two numpy in the system. Maybe it's also a reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests