Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to solve the problem of "summary" errors during training? #99

Open
sunruina2 opened this issue Oct 17, 2019 · 0 comments
Open

How to solve the problem of "summary" errors during training? #99

sunruina2 opened this issue Oct 17, 2019 · 0 comments

Comments

@sunruina2
Copy link

sunruina2 commented Oct 17, 2019

epoch 0, total_step 90180, total loss is 15.06 , inference loss is 8.29, weight deacy loss is 6.77, training accuracy is 0.312500, time 124.800 samples/sec
epoch 0, total_step 90200, total loss is 15.34 , inference loss is 8.57, weight deacy loss is 6.77, training accuracy is 0.343750, time 132.006 samples/sec
epoch 0, total_step 90220, total loss is 14.04 , inference loss is 7.27, weight deacy loss is 6.77, training accuracy is 0.328125, time 123.523 samples/sec
epoch 0, total_step 90240, total loss is 17.67 , inference loss is 10.90, weight deacy loss is 6.77, training accuracy is 0.281250, time 130.974 samples/sec
epoch 0, total_step 90260, total loss is nan , inference loss is nan, weight deacy loss is nan, training accuracy is 0.000000, time 128.621 samples/sec
epoch 0, total_step 90280, total loss is nan , inference loss is nan, weight deacy loss is nan, training accuracy is 0.000000, time 133.669 samples/sec
Traceback (most recent call last):
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1
[[{{node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1}} = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_nets.py", line 210, in
summary_op_val = sess.run(summary_op, feed_dict=feed_dict)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1
[[node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 (defined at train_nets.py:161) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]

Caused by op 'resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1', defined at:
File "train_nets.py", line 161, in
summaries.append(tf.summary.histogram(var.op.name, var))
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/summary/summary.py", line 187, in histogram
tag=tag, values=values, name=scope)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 284, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1
[[node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 (defined at train_nets.py:161) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant