Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the test code #6

Closed
StevenPuttemans opened this issue Sep 3, 2018 · 11 comments
Closed

Running the test code #6

StevenPuttemans opened this issue Sep 3, 2018 · 11 comments

Comments

@StevenPuttemans
Copy link

So I configured a docker with your code and exact code copies.
The configuration file lists

# Parameter section begins here. Edit to change number of test scenes, which model to use, output path.
MAX_NUM_TEST_SCENES=1
NUM_HIERARCHY_LEVELS=3
BASE_OUTPUT_DIR=../results

# Fill in path to test scenes
TEST_SCENES_PATH_3='../data/vox19'
TEST_SCENES_PATH_2='../data/vox9'
TEST_SCENES_PATH_1='../data/vox5'

# Fill in model to use here
PREDICT_SEMANTICS=0
HIERARCHY_LEVEL_3_MODEL='../models/completion/hierarchy1of3'
HIERARCHY_LEVEL_2_MODEL='../models/completion/hierarchy2of3'
HIERARCHY_LEVEL_1_MODEL='../models/completion/hierarchy3of3'

# Specify output folders for each hierarchy level.
OUTPUT_FOLDER_3=${BASE_OUTPUT_DIR}/vis_level3
OUTPUT_FOLDER_2=${BASE_OUTPUT_DIR}/vis_level2
OUTPUT_FOLDER_1=${BASE_OUTPUT_DIR}/vis_level1

When running the shell script from src folder, the first hierarchy is loaded, but when it tries to save the results I get an error

Processing hierarchy level 3, scene 1 of 1: ../data/vox19/e582d82458819d07c184a49feac3ca85__0__.tfrecords.
Traceback (most recent call last):
  File "complete_scan.py", line 394, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "complete_scan.py", line 325, in main
    assign_fn(session)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 697, in callback
    saver.restore(session, model_path)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1713, in restore
    raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.

Which in my opinion means that the paths are somehow configured incorrectly. Any clues on where to look?

@speed8928
Copy link

I ran it and got the same error. Have you solved it?

@StevenPuttemans
Copy link
Author

Since I am posting this nope

  • Changed the tensorflow version back from 1.10 to 1.3 same issue
  • This makes me believe it is in the paths still somewhere and not in the actual tensor code behind it

Will keep trying stuff out, we really need this implementation to work :D

@StevenPuttemans
Copy link
Author

StevenPuttemans commented Sep 4, 2018

I retraced the issue to the following line. This is for the CPU version of tensorflow. Going to check whether GPU has same issues.

To me it seems that in the lines below

if FLAGS.model_checkpoint:
      checkpoint_path = os.path.join(model_path, FLAGS.model_checkpoint
else:
      checkpoint_path = tf.train.latest_checkpoint(model_path)
assign_fn = tf.contrib.framework.assign_from_checkpoint_fn(
      checkpoint_path, tf.contrib.framework.get_variables_to_restore())

are the actual issue, where I am guessing the pretrained checkpoints of the model are read from the folder. However, it seems that at the end of this loop, in the first iteration, checkpoint_path stays empty, and thus running the assign_fn code, fails because paths are incorrect.

Will keep you posted on the evolution.

@StevenPuttemans
Copy link
Author

Same issue for GPU version.

All the subsequent errors are due to the fact that in the first run the result cannot be saved, and thus in hierarchy level 2, there is nothing to grab from level 1.

@StevenPuttemans
Copy link
Author

Next update, it seems that the issue is indeed the checkpoint files that cannot be read. So what I tried was providing

  • The path: /home/github/ScanComplete/models/completion/hierarchy3of3/
  • The path + modelname: /home/github/ScanComplete/models/completion/hierarchy3of3/model.ckpt.data-00000-of-00001

But both return a None element in the checkpoint path using the latest checkpoint function.
@angeladai is there a way you could pass on some pointers on this?

@StevenPuttemans
Copy link
Author

I found this page where the following is stated

So, to summarize, Tensorflow models for versions greater than 0.10 look like this:
checkpoint file
*.data file
*.index file
*.meta file

Which means that basically the interface of TF 1.3 is expecting a checkpoint file pointing to the last saved state. I will now look into what to generate for this.

@StevenPuttemans
Copy link
Author

Ok solution found, you need for each model folder a file called checkpoint with the following content

model_checkpoint_path: "model.ckpt"
all_model_checkpoint_paths: "model.ckpt"

Once you have that, it works glamoursly :) @angeladai I suggest you add those to help people in the future =/

@livc
Copy link

livc commented Apr 11, 2019

good job, dude!

by the way,

HIERARCHY_LEVEL_3_MODEL='../models/completion/hierarchy1of3'
HIERARCHY_LEVEL_2_MODEL='../models/completion/hierarchy2of3'
HIERARCHY_LEVEL_1_MODEL='../models/completion/hierarchy3of3'

seems should be

HIERARCHY_LEVEL_3_MODEL='../models/completion/hierarchy3of3'
HIERARCHY_LEVEL_2_MODEL='../models/completion/hierarchy2of3'
HIERARCHY_LEVEL_1_MODEL='../models/completion/hierarchy1of3'

@StevenPuttemans
Copy link
Author

@livc thats only an impression ;) we eventually changed the inner concepts, but somehow, and for some weird reason, they switched it up ... be aware of that when running the default model. In the long run we replaced everything with vox19cm vox9cm and vox5cm labels instead of 1of3 2of3 and 3of3

@euzer
Copy link

euzer commented Jul 16, 2019

HI everyone !
Sorry to bother you but i just need a little help.
I am having some error when running sh run_complete_scans_hierarchical.sh
I have specify the path to model and test scene in run_complete_scans_hierarchical.sh file.
PLEASE can you take a look to this ERROR:
I think the problem is due to the checkpoint file (.ckpt), isn't it ? Where i look for this file i found .index, .meta after ckpt like @StevenPuttermans said it.

Traceback (most recent call last):
File "complete_scan.py", line 396, in
tf.app.run(main)
File "/home/shanshan/anaconda3/envs/tf2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
sys.exit(main(argv))
File "complete_scan.py", line 327, in main
assign_fn(session)
File "/home/shanshan/anaconda3/envs/tf2.7/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 750, in callback
saver.restore(session, model_path)
File "/home/shanshan/anaconda3/envs/tf2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1268, in restore
+ compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: /home/shanshan/ScanComplete/src/train/train_v003/
run_complete_scans_hierarchical.sh: 53: run_complete_scans_hierarchical.sh: count++: not found
run_complete_scans_hierarchical.sh: 54: run_complete_scans_hierarchical.sh: count: not found
Processing hierarchy level 3, scene 1 of 1: /home/shanshan/Documents/vox19/e66305ae98faab6b809ed2b2cbe82bed__0
_.tfrecords.

Use eager execution and:
tf.data.TFRecordDataset(path)
Traceback (most recent call last):
File "complete_scan.py", line 396, in
tf.app.run(main)
File "/home/shanshan/anaconda3/envs/tf2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
sys.exit(main(argv))
File "complete_scan.py", line 302, in main
FLAGS.num_quant_levels, FLAGS.p_norm, FLAGS.predict_semantics)
File "complete_scan.py", line 130, in read_inputs
assert os.path.isfile(previous_file)
AssertionError
run_complete_scans_hierarchical.sh: 107: run_complete_scans_hierarchical.sh: count++: not found
run_complete_scans_hierarchical.sh: 108: run_complete_scans_hierarchical.sh: count: not found
Processing hierarchy level 1, scene 1 of 1: /home/shanshan/Documents/vox5/e78488fccfc6871503d32753d43aa6ce__0
_.tfrecords.

@euzer
Copy link

euzer commented Jul 16, 2019

Ok solution found, you need for each model folder a file called checkpoint with the following content

model_checkpoint_path: "model.ckpt"
all_model_checkpoint_paths: "model.ckpt"

Once you have that, it works glamoursly :) @angeladai I suggest you add those to help people in the future =/

hi @StevenPuttemans
that means to copy that checkpoint file inside of the models folder ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants