Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias not found in checkpoint #7

Closed
abhisuri97 opened this issue Jun 20, 2017 · 18 comments
Closed

Bias not found in checkpoint #7

abhisuri97 opened this issue Jun 20, 2017 · 18 comments

Comments

@abhisuri97
Copy link

I’m having some problems running the Pretrained-Show-and-Tell-model
I am using tensorflow v1.2 and python 2.7 (in addition to checking out the latest versions of your repo and the im2txt model repo).
I am getting the following error:

NotFoundError (see above for traceback): Key lstm/basic_lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2_380 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_380/tensor_names, save/RestoreV2_380/shape_and_slices)]]

My Full Trace is Below:

(venv)  abhi@Abhinavs-MacBook-Pro  ~/Desktop/Various_dev_projects/testing/models/im2txt   master  bazel-bin/im2txt/run_inference \
  --checkpoint_path=${CHECKPOINT_PATH} \
  --vocab_file=${VOCAB_FILE} \
  --input_files=${IMAGE_FILE}


INFO:tensorflow:Building model.
INFO:tensorflow:Initializing vocabulary from file: ptrain/word_counts.txt
INFO:tensorflow:Created vocabulary with 11520 words
INFO:tensorflow:Running caption generation on 1 files matching images/1.jpg
2017-06-19 22:24:00.000062: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 22:24:00.000085: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 22:24:00.000090: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-19 22:24:00.000094: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Loading model from checkpoint: ptrain/model.ckpt-2000000
INFO:tensorflow:Restoring parameters from ptrain/model.ckpt-2000000
2017-06-19 22:24:02.932954: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key lstm/basic_lstm_cell/bias not found in checkpoint
2017-06-19 22:24:02.933737: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key lstm/basic_lstm_cell/kernel not found in checkpoint
Traceback (most recent call last):
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 85, in <module>
    tf.app.run()
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 65, in main
    restore_fn(sess)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 96, in _restore_fn
    saver.restore(sess, checkpoint_path)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key lstm/basic_lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2_380 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_380/tensor_names, save/RestoreV2_380/shape_and_slices)]]

Caused by op u'save/RestoreV2_380', defined at:
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 85, in <module>
    tf.app.run()
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 51, in main
    FLAGS.checkpoint_path)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 116, in build_graph_from_config
    saver = tf.train.Saver()
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 640, in restore_v2
    dtypes=dtypes, name=name)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/abhi/Desktop/Various_dev_projects/testing/models/im2txt/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()
@KranthiGV
Copy link
Owner

@abhisuri97
It's very unusual.
Are you sure you put the checkpoint files?
Run ls -l --block-size=K in the ptrain folder and post it here.

@abhisuri97
Copy link
Author

Here is the output I get on that command:

ls: illegal option -- -
usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]

The regular ls -l command outputs the following

-rw-r--r--@ 1 abhi  staff  149002244 Jun 19 22:21 model.ckpt-2000000.data-00000-of-00001
-rw-r--r--  1 abhi  staff      16876 Jun 19 22:21 model.ckpt-2000000.index
-rw-r--r--  1 abhi  staff     121881 Jun 19 22:21 word_counts.txt

Am I missing a file or something here?

@DominikFilipiak
Copy link

DominikFilipiak commented Jun 20, 2017

@abhisuri97, I had tried to run this model on the latest version of TF (1.2.0), but encountered the same error:
Key lstm/basic_lstm_cell/kernel not found in checkpoint
I've found @cshallue answer. The original version didn't work out for me. After some digging, I've changed values in vars_to_rename - did the trick.

OLD_CHECKPOINT_FILE = "model.ckpt-1000000"
NEW_CHECKPOINT_FILE = "model2.ckpt-1000000"

import tensorflow as tf
vars_to_rename = {
    "lstm/basic_lstm_cell/weights": "lstm/basic_lstm_cell/kernel",
    "lstm/basic_lstm_cell/biases": "lstm/basic_lstm_cell/bias",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
  if old_name in vars_to_rename:
    new_name = vars_to_rename[old_name]
  else:
    new_name = old_name
  new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))

init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)

with tf.Session() as sess:
  sess.run(init)
  saver.save(sess, NEW_CHECKPOINT_FILE)

@abhisuri97
Copy link
Author

@mimol that worked. thank you so much!!!!

@KranthiGV
Copy link
Owner

@mimol
Thank you so much!
Will update the readme with this fix! :)

@PapaMadeleine2022
Copy link

@mimol your code finally solve my problem.thank you very much~~

@eyaler
Copy link

eyaler commented Apr 7, 2018

thanks. please add to readme...

@monajalal
Copy link

monajalal commented Apr 19, 2018

@0xDFDFDF What if we can't go back to Tensorflow1.1? Or how can I go to that version having the below error:
[jalal@goku CharLSTM]$ conda install tensorflow-gpu==1.1.0
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:

  • tensorflow-gpu-base -> cudatoolkit=8.0
  • tensorflow-gpu-base -> cudnn=7
  • tensorflow-gpu==1.1.0 -> cudnn==5.1
    Use "conda info " to see the dependencies for each package.

I can't downgrade my cudnn and cudatoolkit due to systemwide settings.

@monajalal
Copy link

monajalal commented Apr 19, 2018

@0xDFDFDF where did you apply these changes? Can you please give more starter points? I am using CharLSTM and got this error charlesashby/CharLSTM#9

@danlou
Copy link

danlou commented Aug 22, 2018

To anyone wondering, I've just executed @0xDFDFDF 's script (#7 (comment)) using the tensorflow v1.10.0 and it worked fine.

@ndwuhuangwei
Copy link

@abhisuri97 could you tell me what @mimol said? I just can't see any answer from him

@DominikFilipiak
Copy link

@ndwuhuangwei that was my old login, so you are perhaps looking for this answer: #7 (comment)

@maoao686868
Copy link

So finally how did you solve your problems? I am toubled with this problem either very much. I need your help and I can't see what mimol said.@abhisuri97

@abhisuri97
Copy link
Author

@maoao686868 As @DominikFilipiak mentioned earlier, the code is here: #7 (comment) Of course, I haven't looked at this codebase for a while so I am not sure if that is still a valid solution or not.

@maoao686868
Copy link

@abhisuri97 ,thanks for your reply,but my question is "biases not found in checkpoint" not "kernel not found in checkpoint". I am expecting your answer

@abhisuri97
Copy link
Author

abhisuri97 commented Nov 13, 2020

@maoao686868 It seems like this line "lstm/basic_lstm_cell/biases": "lstm/basic_lstm_cell/bias", in the answer I linked should take care of that; however, I don't recall coming across a "biases not found in checkpoint" error.

@BassantTolba1234
Copy link

Please @DominikFilipiak , I'm using a pretrained model to my new model , but the same error( about bias does not exist in checkpoints) appears to me.
I tried to use your above solution #7 (comment) but it says that there is no file called model.ckpt-100000
please can you kindly help me ??
here is the figure of my three pretrained saved model..
I'm waiting for your reply
Thanks in advance
image

@BassantTolba1234
Copy link

Please all, I'm using a pretrained model to my new model , but the same error( about bias does not exist in checkpoints) appears to me.
I tried to use your above solution #7 (comment) but it says that there is no file called model.ckpt-100000
please can you kindly help me ??
here is the figure of my three pretrained saved model..
I'm waiting for your reply
Thanks in advance
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants