Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Context Path In TensorflowV2ModelStep Saving #12

Closed
wants to merge 5 commits into from

Conversation

alexbrillant
Copy link
Contributor

@alexbrillant alexbrillant commented Mar 16, 2020

@guillaume-chevalier Note: this doesn't take into account the summary id, but this is fine for now.

Copy link
Member

@guillaume-chevalier guillaume-chevalier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexbrillant please consider fixing this for the issue to be resolved.

step.setup()
step.checkpoint.restore(step.checkpoint_manager.latest_checkpoint)
step.checkpoint.restore(checkpoint_manager.latest_checkpoint)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to take into account the different summary ids to solve the issue because if the hyperparameters are different we should not restore. This doesn't seems fine yet for now.

Copy link
Member

@guillaume-chevalier guillaume-chevalier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions before merging. Depending on the answers, more fixes may be required.

self,
savers=[step_saver],
hyperparams=self.__class__.HYPERPARAMS,
hyperparams_space=self.__class__.HYPERPARAMS_SPACE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain me this change please? and why isn't the tf saver here?

step.checkpoint_manager.save()
checkpoint_manager = tf.train.CheckpointManager(
step.checkpoint,
context.get_path(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe os.path.join(context.get_path(), summary_id) ?

Otherwise, how do we differentiate between different runs with different hyperparams when saving and reloading ? It should be simple to be compatible with all the rest of the mechanics of Neuraxle here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I recognize that from one minibatch to the other the path should remain the same, and the path should only change when hyperparameters change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, if the context's path is unique according to the hyperparameters already, then we don't need to add more hashing. Maybe it is me that don't fully understand here and it's probably already okay.

A solution would therefore be to be sure that the path in context.get_path() is unique and according to the hyperparameters of this step and of all the previous. If not, we may need to add context.get_unique_hp_path() or something like that.

@@ -199,8 +192,13 @@ def load_step(self, step: 'Tensorflow2ModelStep', context: 'ExecutionContext') -
:return: loaded step
"""
step.is_initialized = False
checkpoint_manager = tf.train.CheckpointManager(
step.checkpoint,
context.get_path(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@stale
Copy link

stale bot commented Nov 23, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in the next 180 days. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Nov 23, 2021
@stale stale bot closed this May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants