Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run train.py Input to reshape is a tensor with 1 values, but the requested shape has 0 #113

Open
ycui123 opened this issue Jul 19, 2017 · 22 comments

Comments

@ycui123
Copy link

ycui123 commented Jul 19, 2017

Hi everyone,

I got the problem:Input to reshape is a tensor with 1 values, but the requested shape has 0
while I was trainning the model. I run python train/train.py and the mistake happened in the middle of trainning. Sometimes it happens in iter 30+, sometimes it happens in iter 300+. I don't know how to fix it.

Does anyone have the same problem with me?

@AlexGfocus
Copy link

I think this issue is same as issue #88.

@AihahaFox
Copy link

I got this problem too, and i use version 1.1.0, the suggestions in issue #88 may not help

@ycui123
Copy link
Author

ycui123 commented Jul 21, 2017

But did you fix the problem ?

@LovPe
Copy link

LovPe commented Aug 3, 2017

this my caused by writing a tfrecord example when the instance number is 0, try to escape that kind of image when writing tfRecord

@Sharathnasa
Copy link

Sharathnasa commented Aug 3, 2017 via email

@LovPe
Copy link

LovPe commented Aug 4, 2017

@Sharathnasa
image


the code in red mark is the instance number, you can go into it to see the detail.
i think the new version tensorflow(1.2+) will do some check when doing resizeing so when you
have a example with instance number is zero, it will block the reading thread.

i add an if condition when writing tf record to make sure the instance number is >0:
I rewrite the writing process so the details may be different and after this, the training project work
well on tf1.3rc
image

@Sharathnasa
Copy link

Sharathnasa commented Aug 4, 2017 via email

@LovPe
Copy link

LovPe commented Aug 4, 2017

@Sharathnasa
i did not use tensorflow1.1 but you can try it

@WKChung1028
Copy link

@LovPe
May i know which line your code start from ? I met the same problem too. Your assistance is highly appreciated.

@LovPe
Copy link

LovPe commented Sep 21, 2017

@WKChung1028
in my implementation: tf_rcnn->lib->datasets->convert_coco.py ,line 226
but i rewrite the code so i'm not sure the are match!

@WKChung1028
Copy link

@LovPe
Do you mean you add the code right after this two line like this?
mask = mask.astype(np.uint8)
assert masks.shape[0] == gt_boxes.shape[0], 'Shape Error'
show _rsesult =False
break
if get_boxes.shape[0]>0
img_raw = img.tostring()
mask_raw = mask.tostring()

        example = _to_tfexample_coco_raw(
          	img_id,
          	img_raw,
          	mask_raw,
          	height, width, gt_boxes.shape[0],
          	gt_boxes.tostring(), masks.tostring())
        
        tfrecord_writer.write(example.SerializeToString())

@LovPe
Copy link

LovPe commented Sep 21, 2017

something like that
addif get_boxes.shape[0]>0
before implement _to_tfexample_coco_raw() function

@WKChung1028
Copy link

        img = img.astype(np.uint8)
        assert img.size == width * height * 3, '%s' % str(img_id)

        img_raw = img.tostring()
        mask_raw = mask.tostring()
        if gt_boxes.shape[0] > 0:
            example = _to_tfexample_coco_raw(
              img_id,
              img_raw,
              mask_raw,
              height, width, gt_boxes.shape[0],
              gt_boxes.tostring(), masks.tostring())
        
            tfrecord_writer.write(example.SerializeToString())

sys.stdout.write('\n')
sys.stdout.flush()

i add the command like this and run good for few literation.

but it stopped again and showed error like below:

['background']
iter 126: image-id:0516249, time:14.586(sec), regular_loss: 0.247718, total-loss 0.6493(0.0921, 0.5463, 0.000000, 0.0109, 0.0000), instances: 11, batch:(26|114, 0|33, 0|0)
labels
[]
classes
['background']
Traceback (most recent call last):
File "train/train.py", line 339, in
train()
File "train/train.py", line 335, in train
coord.join(threads)
File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1235, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape has 0
[[Node: Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](DecodeRaw_1, Reshape/shape)]]

@WKChung1028
Copy link

@LovPe
I am a beginner, hope to know more about the solution from you ,. Thank you.

@LovPe
Copy link

LovPe commented Sep 21, 2017

@WKChung1028
1/make sure your tfrecord file was generated from new code
2/ i use python 3.5 with tf1.3

@WKChung1028
Copy link

  1. Which means i need to delete all tfrecord file generated by previous code in the record before i run the new code ?

  2. i am using tf 1.4 and python 2.7 in cpu .

@LovPe
Copy link

LovPe commented Sep 21, 2017

1/yes
2/i think is all right

@anatolix
Copy link

Hi, i've looked into this bug. Actually bad string is
coco.py: gt_boxes = tf.decode_raw(features['label/gt_boxes'], tf.float32)
The problem is if we call tf.decode_raw from empty string('') it returns tensor [0]
i.e. tf.decode_raw('', tf.float32).eval() == array([ 0.], dtype=float32)
I think we should handle empty in special way string here.
I'll try to make a fix

I am actually not sure about 'delete all tfrecord file generated by previous code' because actually by this you will delete all images without markup. They could still be useful for training.

@anatolix
Copy link

fix: #160

@zzdgit
Copy link

zzdgit commented Mar 6, 2018

tfrecord file in use tf1.2 generate,but run use tf1.5, Will have this problem?

@LiuPearl1
Copy link

LiuPearl1 commented Jun 12, 2018

@LovPe I just follow your advice and this problem has been partially solved. But the program still ended at iter124 and still had this problem. I want to know how to deal with it totally. I use tf1.4 and python2.7.

@LovPe
Copy link

LovPe commented Jun 13, 2018

@LiuPearl1 i use python3.6 and tf1.2 when solving this problem, and currently i have already give up to use this project and work on the original implementation on caffe2. i found there are some details different between 2 projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants