-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
@williamFalcon I think this is what you were seeing in #389. If we let the Trainer create the default ModelCheckpoint callback and don't use TestTubeLogger, the prefix ends up being set to the current directory. Then, when ModelCheckpoint tries to clean up previous checkpoints, it wipes out everything in the current directory.
Relevant bits of code:
default_save_pathset toos.getcwd(): https://github.com/williamFalcon/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L151ModelCheckpointfalls back todefault_save_path: https://github.com/williamFalcon/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L280
(ModelCheckpointblows away pre-existing files in checkpoint directory: https://github.com/williamFalcon/pytorch-lightning/blob/master/pytorch_lightning/callbacks/pt_callbacks.py#L216
The most obvious fix is to provide a better default checkpoint prefix, but there would still be a lurking footgun for a user who sets default_save_path incorrectly. Should we maybe insist that the checkpoint directory not exist before training starts, or that it be empty?
Islanna and philippslang
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working