-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix ModelCheckpoint default paths #413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ModelCheckpoint default paths #413
Conversation
checkpoints and version file names would just have a number. it's easy to tell what you're looking at with version_ prepended
|
Any changes needed for this to be merged? |
| ) | ||
| else: | ||
| ckpt_path = self.default_save_path | ||
| ckpt_path = "checkpoints" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be:
self.default_save_path
because the user sets this in the trainer as the place where everything saves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about os.path.join(self.default_save_path, "checkpoints")? The point of this PR was to make sure that we're not storing checkpoints in the current directory, because the whole directory gets wiped out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
|
@neggert just two things:
|
| # assert logger.hparams_logged == hparams | ||
| # assert logger.metrics_logged != {} | ||
| # assert logger.finalized_status == "success" | ||
| def test_mlflow_logger(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this uncommented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this PR fixes the bug that led to it being commented out.
| super(ModelCheckpoint, self).__init__() | ||
| if ( | ||
| save_best_only and | ||
| os.path.exists(filepath) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use just os.path.isdir(filepath) instead of os.path.exists(filepath) and os.path.isdir(filepath)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
|
|
||
| @property | ||
| def name(self): | ||
| if self._experiment is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather:
name = self._name if self._experiment is None else self.experiment.name
return name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the current version is easier to read.
|
@williamFalcon Comments are addressed |
| trainer2 = pickle.loads(pkl_bytes) | ||
| trainer2.logger.log_metrics({"acc": 1.0}) | ||
|
|
||
| testing_utils.clear_save_dir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an assert to check that the output is as you expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is designed to detect a very particular failure mode when an exception is thrown on the pickle dump/load. No need to assert. All an assert here would test is whether the standard pickle module works correctly.
The test isn't really the focus of this PR. For context, this test was in master for a while, then a tricky interaction of some new (and mostly unrelated) features caused problems when using the default checkpoint saver, default save path, and anything other than the test tube logger. These tests got commented out as a short term fix. I'm just uncommenting them now, since this PR resolves the underlying problem.
I definitely think there are some improvements that could be made to this and other tests, but lets deal with them in a separate issue / PR.
|
@williamFalcon I think this is ready to merge. |
|
GPU tests passed. Waiting on Circle CI |
Fixes #394.
os.path.join(default_save_path, logger.name, logger.version, "checkpoints")if a logger is defined, otherwise"checkpoints".