-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError when max_checkpoints_to_keep=0
in TorchModel's
#3810
Comments
Hello @arunppsg thank you so much for raising this. I'm looking into this issue |
Since What do you think? |
Something like this: def save_checkpoint(self,
max_checkpoints_to_keep: int = 5,
model_dir: Optional[str] = None) -> None:
"""Save a checkpoint to disk.
Usually you do not need to call this method, since fit() saves checkpoints
automatically. If you have disabled automatic checkpointing during fitting,
this can be called to manually write checkpoints.
Parameters
----------
max_checkpoints_to_keep: int
the maximum number of checkpoints to keep. Older checkpoints are discarded.
model_dir: str, default None
Model directory to save checkpoint to. If None, revert to self.model_dir
"""
if max_checkpoints_to_keep == 0:
return
self._ensure_built()
if model_dir is None:
model_dir = self.model_dir
if not os.path.exists(model_dir):
os.makedirs(model_dir) |
Or do you think we should disable calling In class if checkpoint_interval > 0 and current_step % checkpoint_interval == checkpoint_interval - 1 and max_checkpoints_to_keep > 0:
self.save_checkpoint(max_checkpoints_to_keep) |
This sounds like a good idea. |
Cool! Let me create a pr! |
Steps to reproduce:
The last line raises the following error:
Setting
max_checkpoints_to_keep=0
helps to avoid time spend in disk IO in development of large models.The text was updated successfully, but these errors were encountered: