Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: "Either detect or segment should be True" #5

Closed
DrSleep opened this issue Sep 17, 2017 · 6 comments
Closed

AssertionError: "Either detect or segment should be True" #5

DrSleep opened this issue Sep 17, 2017 · 6 comments

Comments

@DrSleep
Copy link

DrSleep commented Sep 17, 2017

Hi there,

I have tried to execute the example training command:

python training.py --run_name=BlitzNet300_x4_VOC0712_detseg --dataset=voc07+12-segmentation --trunk=resnet50 --x4 --batch_size=32 --optimizer=adam --max_iterations=65000 --lr_decay 40000 50000

but have encountered an assertion error:

Traceback (most recent call last):
  File "training.py", line 327, in <module>
    tf.app.run()
  File "/home/drsleep/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "training.py", line 280, in main
    assert args.detect or args.segment, "Either detect or segment should be True"
AssertionError: Either detect or segment should be True

Does it mean I need to provide detect and segment explicitly?

@dvornikita
Copy link
Owner

You are right, there is a mistake in the example run. Both flags are missing. You need to run with --detect and --segment if you want to train for both tasks, or leave one of them if you choose either task. Pushed that modification.
Thank you.

@fastlater
Copy link

fastlater commented Sep 19, 2017

@DrSleep Correction has been done. I guess you can close this issue.

@dvornikita
Copy link
Owner

Fixed

@fastlater
Copy link

fastlater commented Sep 21, 2017

@DrSleep Did you run the training script till end?
I am trying
python training.py --run_name=BlitzNet300_VOC12_Det_Seg --dataset=voc12-train --trunk=resnet50 --x4 --batch_size=1 --optimizer=adam --detect --segment --max_iterations=1001 --lr_decay 1000 1500

@dvornikita
I am using a NVIDIA QUADRO K2000 which only has 2GB or memory so I reduced the batch size from 32 to 1 and in the train.txt, I reduced the number of inputs from 1464 images to 500.
After a few steps (less than 100). I get the error:
Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero.
Could it be an out of memory error? I wanna be sure that the problem is my GPU and not the code.
Is there a possible way to test the training process under minimum requirements adjusting the configuration and other arguments?

2017-09-21 09:56:14.669454: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\common_runtime\bfc_allocator.cc:217] Allocator (GPU_0_bfc
) ran out of memory trying to allocate 586.13MiB. The caller indicates that this
is not a failure, but may mean that there could be performance gains if more me
mory is available.
[INFO]: step 0, loss = 29.05, acc = 0.01, iou=0.000000, lr=5.000 (0.0 examples/s
ec; 20.544 sec/batch)
[INFO]: step 1, loss = 26.51, acc = 0.01, iou=0.000000, lr=4.000 (0.7 examples/s
ec; 1.366 sec/batch)
[INFO]: step 2, loss = 25.99, acc = 0.00, iou=0.003891, lr=4.000 (0.7 examples/s
ec; 1.379 sec/batch)
[INFO]: step 3, loss = 19.77, acc = 0.18, iou=0.008367, lr=4.000 (0.7 examples/s
ec; 1.354 sec/batch)
[INFO]: step 4, loss = 20.65, acc = 0.00, iou=0.016453, lr=4.000 (0.7 examples/s
ec; 1.400 sec/batch)
[INFO]: step 5, loss = 20.03, acc = 0.00, iou=0.022892, lr=4.000 (0.7 examples/s
ec; 1.360 sec/batch)
[INFO]: step 6, loss = 21.74, acc = 0.00, iou=0.027287, lr=4.000 (0.7 examples/s
ec; 1.388 sec/batch)
[INFO]: step 7, loss = 19.69, acc = 0.00, iou=0.030866, lr=4.000 (0.7 examples/s
ec; 1.364 sec/batch)
[INFO]: step 8, loss = 21.50, acc = 0.28, iou=0.032887, lr=4.000 (0.7 examples/s
ec; 1.371 sec/batch)
[INFO]: step 9, loss = 21.19, acc = 0.35, iou=0.033197, lr=4.000 (0.7 examples/s
ec; 1.390 sec/batch)
[INFO]: step 10, loss = 16.02, acc = 0.52, iou=0.033137, lr=4.000 (0.7 examples/
sec; 1.420 sec/batch)
[INFO]: step 11, loss = 18.08, acc = 0.43, iou=0.034473, lr=4.000 (0.7 examples/
sec; 1.341 sec/batch)
[INFO]: step 12, loss = 16.75, acc = 0.57, iou=0.035309, lr=4.000 (0.7 examples/
sec; 1.340 sec/batch)
[INFO]: step 13, loss = 17.05, acc = 0.61, iou=0.035514, lr=4.000 (0.7 examples/
sec; 1.404 sec/batch)
[INFO]: step 14, loss = 21.49, acc = 0.47, iou=0.036379, lr=4.000 (0.7 examples/
sec; 1.428 sec/batch)
[INFO]: step 15, loss = 19.92, acc = 0.57, iou=0.035743, lr=4.000 (0.7 examples/
sec; 1.376 sec/batch)
[INFO]: step 16, loss = 16.62, acc = 0.46, iou=0.036653, lr=4.000 (0.7 examples/
sec; 1.344 sec/batch)
[INFO]: step 17, loss = 12.99, acc = 0.51, iou=0.037424, lr=4.000 (0.7 examples/
sec; 1.421 sec/batch)
[INFO]: step 18, loss = 18.08, acc = 0.69, iou=0.037933, lr=4.000 (0.7 examples/
sec; 1.353 sec/batch)
[INFO]: step 19, loss = 30.51, acc = 0.75, iou=0.037144, lr=4.000 (0.7 examples/
sec; 1.361 sec/batch)
[INFO]: step 20, loss = 16.09, acc = 0.72, iou=0.037762, lr=4.000 (0.7 examples/
sec; 1.348 sec/batch)
[INFO]: step 21, loss = 18.75, acc = 0.64, iou=0.038262, lr=4.000 (0.7 examples/
sec; 1.369 sec/batch)
[INFO]: step 22, loss = 23.29, acc = 0.75, iou=0.038670, lr=4.000 (0.7 examples/
sec; 1.356 sec/batch)
[INFO]: step 23, loss = 16.26, acc = 0.71, iou=0.039054, lr=4.000 (0.7 examples/
sec; 1.371 sec/batch)
[INFO]: step 24, loss = 15.54, acc = 0.73, iou=0.038320, lr=4.000 (0.7 examples/
sec; 1.355 sec/batch)
[INFO]: step 25, loss = 11.92, acc = 0.74, iou=0.039059, lr=4.000 (0.7 examples/
sec; 1.375 sec/batch)
2017-09-21 09:57:03.093948: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.093948: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.093948: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.095198: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.100198: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.101448: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.103948: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.107698: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.108948: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Reshape ca
nnot infer the missing input size for an empty tensor unless all specified input
sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
2017-09-21 09:57:03.122698: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\kernels\check_numerics_op.cc:157] abnormal_detected_host
@0000000200EF0B00 = {1, 0} LossTensor is inf or nan
2017-09-21 09:57:03.140199: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\kernels\queue_base.cc:295] _0_parallel_read/filenames: Sk
ipping cancelled enqueue attempt with queue not closed
2017-09-21 09:57:03.141449: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\kernels\queue_base.cc:295] _2_parallel_read/common_queue:
Skipping cancelled enqueue attempt with queue not closed
2017-09-21 09:57:03.142699: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu
\PY\35\tensorflow\core\kernels\queue_base.cc:295] _2_parallel_read/common_queue:
Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framewo
rk.errors_impl.CancelledError'>, Enqueue operation was cancelled
[[Node: parallel_read/common_queue_enqueue_1 = QueueEnqueueV2[Tcomponen
ts=[DT_STRING, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task
:0/cpu:0"](parallel_read/common_queue, parallel_read/ReaderReadV2_1, parallel_re
ad/ReaderReadV2_1:1)]] File "C:\Program Files (x86)\Python 3.5.2\lib\site-packa
ges\tensorflow\python\client\session.py", line 1327, in _do_call

[INFO]: Error reported to Coordinator: <class 'tensorflow.python.framework.error
s_impl.CancelledError'>, Enqueue operation was cancelled
[[Node: parallel_read/common_queue_enqueue_1 = QueueEnqueueV2[Tcomponen
ts=[DT_STRING, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task
:0/cpu:0"](parallel_read/common_queue, parallel_read/ReaderReadV2_1, parallel_re
ad/ReaderReadV2_1:1)]]
return fn(*args)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
client\session.py", line 1306, in _run_fn
status, run_metadata)
File "C:\Program Files (x86)\Python 3.5.2\lib\contextlib.py", line 66, in ex
it

next(self.gen)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot inf
er the missing input size for an empty tensor unless all specified input sizes a
re non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
[[Node: PiecewiseConstant/case/Assert/AssertGuard/pred_id/_587 = _HostR
ecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0"
, send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1
, tensor_name="edge_685_PiecewiseConstant/case/Assert/AssertGuard/pred_id", tens
or_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"
]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "training.py", line 333, in
tf.app.run()
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "training.py", line 312, in main
train(dataset, net, net_config)
File "training.py", line 249, in train
update_mean_iou, learning_rate])
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
client\session.py", line 895, in run
run_metadata_ptr)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
client\session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
client\session.py", line 1321, in _do_run
options, run_metadata)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
client\session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot inf
er the missing input size for an empty tensor unless all specified input sizes a
re non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
[[Node: PiecewiseConstant/case/Assert/AssertGuard/pred_id/_587 = _HostR
ecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0"
, send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1
, tensor_name="edge_685_PiecewiseConstant/case/Assert/AssertGuard/pred_id", tens
or_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"
]]

Caused by op 'gradients/TopKV2_grad/Reshape', defined at:
File "training.py", line 333, in
tf.app.run()
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "training.py", line 312, in main
train(dataset, net, net_config)
File "training.py", line 209, in train
summarize_gradients=True)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\contrib
\slim\python\slim\learning.py", line 440, in create_train_op
check_numerics=check_numerics)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\contrib
\training\python\training\training.py", line 439, in create_train_op
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
training\optimizer.py", line 386, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
ops\gradients_impl.py", line 542, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
ops\gradients_impl.py", line 348, in _MaybeCompile
return grad_fn() # Exit early
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
ops\gradients_impl.py", line 542, in
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
ops\nn_grad.py", line 707, in _TopKGrad
ind_2d = array_ops.reshape(op.outputs[1], array_ops.stack([-1, ind_lastdim])
)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
ops\gen_array_ops.py", line 2619, in reshape
name=name)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
framework\ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
framework\ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-
access

...which was originally created as op 'TopKV2', defined at:
File "training.py", line 333, in
tf.app.run()
[elided 1 identical lines from previous traceback]
File "training.py", line 312, in main
train(dataset, net, net_config)
File "training.py", line 175, in train
seg_gt, dataset, config)
File "training.py", line 117, in objective
detection_loss(location, confidence, refine_ph, classes_ph, pos_mask)
File "training.py", line 74, in detection_loss
number_of_negatives)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
ops\nn_ops.py", line 1949, in top_k
return gen_nn_ops._top_kv2(input, k=k, sorted=sorted, name=name)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
ops\gen_nn_ops.py", line 2577, in _top_kv2
name=name)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
framework\ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Program Files (x86)\Python 3.5.2\lib\site-packages\tensorflow\python
framework\ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-
access

InvalidArgumentError (see above for traceback): Reshape cannot infer the missing
input size for an empty tensor unless all specified input sizes are non-zero
[[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_I
NT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_851, gradients/To
pKV2_grad/stack)]]
[[Node: PiecewiseConstant/case/Assert/AssertGuard/pred_id/_587 = _HostR
ecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0"
, send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1
, tensor_name="edge_685_PiecewiseConstant/case/Assert/AssertGuard/pred_id", tens
or_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"
]]

@dvornikita
Copy link
Owner

@fastlater Regarding the checkpoints, we save them every 1000 iterations no matter what. You can see this in the main training loop in training.py and you can change this value.
Regarding your error, I guess it's caused by the absence of positive proposals for your single image that you feed. As you know, there is some data augmentation involved in the pipeline. It could happen that your random crop doesn't contain an object. The probability p of this event is pretty low and when you have 32 images in your batch it becomes p^32, so almost zero. You shouldn't forget about batch normalization either that won't produce anything meaningful with the batch size of one.
DL;DR Increase your batch size.

@fastlater
Copy link

fastlater commented Sep 22, 2017

@dvornikita Thank you for your reply. About the checkpoint saving method, I found it in the training.py. I added a few lines to save the checkpoint when max_iteration is reached:
if step % args.max_iterations == 0 and step > 0: #here summaries and save checkpoints

I was training with batch_size =1 because if I set any number higher than 1, the execution stops due to low memory. I will update my gpu and run it next time with batch size =32.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants