Skip to content

Conversation

@neggert
Copy link
Contributor

@neggert neggert commented Oct 22, 2019

Fixes #394.

  • Add a new (required) name property to LightningLoggerBase and make the existing version property required
  • Have the default ModelCheckpoint save to os.path.join(default_save_path, logger.name, logger.version, "checkpoints") if a logger is defined, otherwise"checkpoints".
  • Add a warning in ModelCheckpoint if filepath already exists and has files in it.
  • Re-instate logger tests

@neggert neggert changed the title Fixe ModelCheckpoint default paths Fix ModelCheckpoint default paths Oct 22, 2019
checkpoints and version file names would just have a number. it's easy to tell what you're looking at with version_ prepended
@neggert
Copy link
Contributor Author

neggert commented Oct 23, 2019

Any changes needed for this to be merged?

)
else:
ckpt_path = self.default_save_path
ckpt_path = "checkpoints"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be:

self.default_save_path

because the user sets this in the trainer as the place where everything saves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about os.path.join(self.default_save_path, "checkpoints")? The point of this PR was to make sure that we're not storing checkpoints in the current directory, because the whole directory gets wiped out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@williamFalcon
Copy link
Contributor

@neggert just two things:

  1. rebase master onto this
  2. can you look at the comment I left (in the review). Sorry, I forgot to press submit review

# assert logger.hparams_logged == hparams
# assert logger.metrics_logged != {}
# assert logger.finalized_status == "success"
def test_mlflow_logger():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this PR fixes the bug that led to it being commented out.

super(ModelCheckpoint, self).__init__()
if (
save_best_only and
os.path.exists(filepath) and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use just os.path.isdir(filepath) instead of os.path.exists(filepath) and os.path.isdir(filepath)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!


@property
def name(self):
if self._experiment is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather:

name = self._name if self._experiment is None else self.experiment.name
return name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the current version is easier to read.

@neggert
Copy link
Contributor Author

neggert commented Oct 29, 2019

@williamFalcon Comments are addressed

trainer2 = pickle.loads(pkl_bytes)
trainer2.logger.log_metrics({"acc": 1.0})

testing_utils.clear_save_dir()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an assert to check that the output is as you expect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is designed to detect a very particular failure mode when an exception is thrown on the pickle dump/load. No need to assert. All an assert here would test is whether the standard pickle module works correctly.

The test isn't really the focus of this PR. For context, this test was in master for a while, then a tricky interaction of some new (and mostly unrelated) features caused problems when using the default checkpoint saver, default save path, and anything other than the test tube logger. These tests got commented out as a short term fix. I'm just uncommenting them now, since this PR resolves the underlying problem.

I definitely think there are some improvements that could be made to this and other tests, but lets deal with them in a separate issue / PR.

@neggert
Copy link
Contributor Author

neggert commented Nov 1, 2019

@williamFalcon I think this is ready to merge.

@williamFalcon
Copy link
Contributor

GPU tests passed. Waiting on Circle CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ModelCheckpoint wipes out current directory

3 participants