New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Context Path In TensorflowV2ModelStep Saving #12
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexbrillant please consider fixing this for the issue to be resolved.
step.setup() | ||
step.checkpoint.restore(step.checkpoint_manager.latest_checkpoint) | ||
step.checkpoint.restore(checkpoint_manager.latest_checkpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to take into account the different summary ids to solve the issue because if the hyperparameters are different we should not restore. This doesn't seems fine yet for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions before merging. Depending on the answers, more fixes may be required.
self, | ||
savers=[step_saver], | ||
hyperparams=self.__class__.HYPERPARAMS, | ||
hyperparams_space=self.__class__.HYPERPARAMS_SPACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain me this change please? and why isn't the tf saver here?
step.checkpoint_manager.save() | ||
checkpoint_manager = tf.train.CheckpointManager( | ||
step.checkpoint, | ||
context.get_path(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe os.path.join(context.get_path(), summary_id)
?
Otherwise, how do we differentiate between different runs with different hyperparams when saving and reloading ? It should be simple to be compatible with all the rest of the mechanics of Neuraxle here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I recognize that from one minibatch to the other the path should remain the same, and the path should only change when hyperparameters change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, if the context's path is unique according to the hyperparameters already, then we don't need to add more hashing. Maybe it is me that don't fully understand here and it's probably already okay.
A solution would therefore be to be sure that the path in context.get_path()
is unique and according to the hyperparameters of this step and of all the previous. If not, we may need to add context.get_unique_hp_path()
or something like that.
@@ -199,8 +192,13 @@ def load_step(self, step: 'Tensorflow2ModelStep', context: 'ExecutionContext') - | |||
:return: loaded step | |||
""" | |||
step.is_initialized = False | |||
checkpoint_manager = tf.train.CheckpointManager( | |||
step.checkpoint, | |||
context.get_path(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in the next 180 days. Thank you for your contributions. |
@guillaume-chevalier Note: this doesn't take into account the summary id, but this is fine for now.