INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value #18

zhangshuaitao · 2017-10-20T06:56:57Z

INFO:tensorflow:global step 109662: loss = 5.3843 (0.160 sec/step)
INFO:tensorflow:global step 109663: loss = 4.5832 (0.256 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value
INFO:tensorflow:global step 109664: loss = 8.8361 (0.098 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "./train_seglink.py", line 275, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./train_seglink.py", line 271, in main
train(train_op)
File "./train_seglink.py", line 260, in train
session_config = sess_config
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 759, in train
sv.saver.save(sess, sv.save_path, global_step=sv.global_step)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 296, in stop_on_exception
yield
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 494, in run
self.run_loop()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 994, in run_loop
self._sv.global_step])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[0] does not have value
zst@zst-robot1:~/zst/seglink-master$
我的tf 版本是1.2.1,我也尝试在1.1.0上运行也会出现这样的错误.

zhangshuaitao · 2017-10-20T06:57:17Z

@dengdan

dengdan · 2017-10-20T07:26:40Z

Please provide the the command you are running, including the parameters

zhangshuaitao · 2017-10-20T07:28:49Z

sudo python ./train_seglink.py --dataset_name=icdar2015 --dataset_dir=/home/zst/result --batch_size=1 --train_dir=/home/zst/zst/seglink-master/train_dir --checkpoint_path=/home/zst/zst/seglink

zhangshuaitao · 2017-10-20T07:31:33Z

@dengdan 学习率改为了0.00001

dengdan · 2017-10-21T03:16:26Z

Why sudo?

zhangshuaitao · 2017-10-21T03:23:04Z

因为我用sudo去使用python2.7，如果不用sudo它使用的是anconda python

zhangshuaitao · 2017-10-21T03:24:47Z

我还有一个问题就是大概训练多少次的时候开始收敛啊，我已经迭代了50万次了，还没有收敛

dengdan · 2017-10-21T03:54:15Z

Try not to use sudo to run Python2.7, something might go wrong because of bin path differences.
The loss will decrease at the very beginning of training if everything works well. The final loss will be around 1.0.

zhangshuaitao · 2017-10-21T06:50:22Z

@dengdan 我采取了您的建议在另一台机器装了tensorflow1.1.0 还是这样的问题,但是每次报错前,它都会自动保存相应的checkpoint,再次训练时它会在原有的迭代基础上继续训练,这样对结果影响大吗?

dengdan · 2017-10-22T04:49:28Z

Well, I am not sure about it, because the reason for the bug has not been figured out yet, and how the bug can be reproduced is also unknown.
But, do you mean that the training can be restarted and work well after the error?

zhangshuaitao · 2017-10-22T04:51:10Z

yes

zhangshuaitao · 2017-10-22T13:35:44Z

When i train step at 500000 ,i find the fmean fall,it should be overfit,Can i know the number of you training set ,I guess the reason for the decline in fmean is that the training set is not enough.

dengdan · 2017-10-23T11:24:26Z

SynthText 0.8M, IC15 1000.

Donaghys · 2020-06-14T03:51:13Z

当我训练步数为500000时，我发现fmean下降，应该是过拟合，我能知道您的训练集数量吗，我想fmean下降的原因是训练集不够。

您好，请问您通过调参使损失顺利下降成功了吗？

dengdan closed this as completed Oct 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value #18

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value #18

zhangshuaitao commented Oct 20, 2017

zhangshuaitao commented Oct 20, 2017

dengdan commented Oct 20, 2017

zhangshuaitao commented Oct 20, 2017

zhangshuaitao commented Oct 20, 2017

dengdan commented Oct 21, 2017

zhangshuaitao commented Oct 21, 2017

zhangshuaitao commented Oct 21, 2017

dengdan commented Oct 21, 2017

zhangshuaitao commented Oct 21, 2017

dengdan commented Oct 22, 2017

zhangshuaitao commented Oct 22, 2017

zhangshuaitao commented Oct 22, 2017

dengdan commented Oct 23, 2017

Donaghys commented Jun 14, 2020

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value #18

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value #18

Comments

zhangshuaitao commented Oct 20, 2017

zhangshuaitao commented Oct 20, 2017

dengdan commented Oct 20, 2017

zhangshuaitao commented Oct 20, 2017

zhangshuaitao commented Oct 20, 2017

dengdan commented Oct 21, 2017

zhangshuaitao commented Oct 21, 2017

zhangshuaitao commented Oct 21, 2017

dengdan commented Oct 21, 2017

zhangshuaitao commented Oct 21, 2017

dengdan commented Oct 22, 2017

zhangshuaitao commented Oct 22, 2017

zhangshuaitao commented Oct 22, 2017

dengdan commented Oct 23, 2017

Donaghys commented Jun 14, 2020