Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version suffix to saved model checkpoint with a version #5000

Closed
rohitgr7 opened this issue Dec 7, 2020 · 12 comments 路 Fixed by #5008
Closed

Add version suffix to saved model checkpoint with a version #5000

rohitgr7 opened this issue Dec 7, 2020 · 12 comments 路 Fixed by #5008
Labels
checkpointing Related to checkpointing feature Is an improvement or enhancement

Comments

@rohitgr7
Copy link
Contributor

rohitgr7 commented Dec 7, 2020

馃殌 Feature

Currently, if a duplicate filename is found in the checkpoint directory saved by the ModelCheckpoint, then a version is added to it and the old file is kept as it is. Here is an enhancement request to update the old file too with a version.

Before

file.ckpt
file-v0.ckpt
file-v1.ckpt

After

file-v0.ckpt
file-v1.ckpt
file-v2.ckpt

cc @carmocca

@rohitgr7 rohitgr7 added feature Is an improvement or enhancement help wanted Open to be worked on checkpointing Related to checkpointing and removed help wanted Open to be worked on labels Dec 7, 2020
@Borda Borda added this to the 1.1.x milestone Dec 7, 2020
@carmocca
Copy link
Contributor

carmocca commented Dec 7, 2020

On it 馃挭

@carmocca
Copy link
Contributor

carmocca commented Dec 7, 2020

Should I include adding version suffixes to last.ckpt files?

@rohitgr7
Copy link
Contributor Author

rohitgr7 commented Dec 8, 2020

Should I include adding version suffixes to last.ckpt files?

Isn't it saved only once?

@carmocca
Copy link
Contributor

carmocca commented Dec 8, 2020

It is for one run, but if you do a re-run, it gets overwritten

@rohitgr7
Copy link
Contributor Author

rohitgr7 commented Dec 8, 2020

oh... it's a bug then. Yep version suffixes should be added. If I have a file already present in the directory, not saved by pl.ModelCheckpoint, then it should remain untouched.

@carmocca
Copy link
Contributor

carmocca commented Dec 8, 2020

oh... it's a bug then. Yep version suffixes should be added. If I have a file already present in the directory, not saved by pl.ModelCheckpoint, then it should remain untouched.

Okay, let's do it after #5008 is merged then. Created #5030 to track it.

@awaelchli
Copy link
Member

awaelchli commented Dec 9, 2020

@carmocca the filename last.ckpt needs to be kept for easy access, which is what it is for (save_last=True) :)

@ananthsub
Copy link
Contributor

agreed w/ @awaelchli that we shouldn't change the last.ckpt name by adding versions to it. it needs to be deterministic for restarting

@Borda Borda changed the title [Enhancement] Add version suffix to saved model checkpoint with a version. Add version suffix to saved model checkpoint with a version Dec 11, 2020
@Borda
Copy link
Member

Borda commented Dec 11, 2020

@carmocca the filename last.ckpt needs to be kept for easy access, which is what it is for (save_last=True) :)

but if you have in the folder already one last.ckpt so the new one (which you want to use) is last-v0.ckpt
cc: @ananthsub

oh... it's a bug then. Yep version suffixes should be added. If I have a file already present in the directory, not saved by pl.ModelCheckpoint, then it should remain untouched.

Okay, let's do it after #5008 is merged then. Created #5030 to track it.

@rohitgr7 @carmocca seems we have two issues on the same topic, right? let's close one of them...

@carmocca
Copy link
Contributor

@rohitgr7 @carmocca seems we have two issues on the same topic, right? let's close one of them...

This is about version suffixes for normal checkpoints.
#5030 is the same for last checkpoints.

@Borda Borda modified the milestones: 1.1.x, 1.2 Dec 30, 2020
@edenlightning edenlightning modified the milestones: 1.2, 1.3 Feb 8, 2021
@edenlightning edenlightning removed this from the v1.3 milestone Apr 27, 2021
@stale
Copy link

stale bot commented May 30, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label May 30, 2021
@stale stale bot closed this as completed Jun 6, 2021
@carmocca carmocca reopened this Jun 6, 2021
@stale stale bot removed the won't fix This will not be worked on label Jun 6, 2021
@carmocca carmocca added this to the v1.5 milestone Jun 6, 2021
@awaelchli awaelchli modified the milestones: v1.5, v1.6 Nov 4, 2021
@carmocca carmocca modified the milestones: 1.6, future Feb 28, 2022
@carmocca
Copy link
Contributor

Closing as we don't plan to do this since it would be inconsistent with the requirements of #5030 where the original last.ckpt couldn't be renamed for backwards compatibility

@carmocca carmocca removed this from the future milestone Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checkpointing Related to checkpointing feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants