Names of subdirectories for checkpoint and logging #1953

fjhheras · 2020-05-26T09:23:16Z

❓ Questions and Help

First question

The directories where checkpoints are saved are always something like

$DIR/$SUBDIR/checkpoint/

I have $DIR controlled: it can be set by default_save_path argument in Trainer, and it is also a member of the class with the same name.

$SUBDIR seems randomly generated. Where is it generated? Is there any way where I can access it after training?

My use case is: I am training a LightningModule, and by default it is saving the best model (checkpoint) during training (the one with lowest val_loss). I want to load the best model just after training and, for example run a test run.

Second question:

A similar question about wandb logger. Is there any way that I can see which logger directory and subdirectory it is pointing to?

My goal would be to select the best, for example, 5 models out of 20 runs and load them by combining the metrics stored in the logger and the weights stored in the checkpoints. Alternatively, what would be the best practice here?

What have you tried?

I went through the code looking for where $SUBDIR is generated and I did not find it. I could not find anything relevant as members of the classes Trainer or LightningModule either.

The text was updated successfully, but these errors were encountered:

github-actions · 2020-05-26T09:23:55Z

Hi! thanks for your contribution!, great first issue!

fjhheras · 2020-05-27T17:03:39Z

Hi, it seems that I found part of the answer: trainer.checkpoint_callback.best_k_models is a dictionary with the path and the metric (in my case validation loss). By default it only contains an element, which is the best model, so this works for me:

path_to_best_model =next(iter(trainer.checkpoint_callback.best_k_models.keys()))
model.load_from_checkpoint(path_to_best_model)

lgvaz · 2020-05-29T15:35:40Z

If you have more than one model saved, you can use the following snippet for getting the best model:

op = min if checkpoint_callback.mode == 'min' else max
best_model_path = op(
    checkpoint_callback.best_k_models,
    key=checkpoint_callback.best_k_models.get
)

BTW, this snippet comes from #1799 , this PR will make it even easier to get back the best model =)

Borda · 2020-06-11T09:02:09Z

btw, #1799 hs landed, so I assume this can be also closed... feel free to reopen if needed 🦝

fjhheras added the question Further information is requested label May 26, 2020

Borda closed this as completed Jun 11, 2020

Borda added the logger Related to the Loggers label Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Names of subdirectories for checkpoint and logging #1953

Names of subdirectories for checkpoint and logging #1953

fjhheras commented May 26, 2020

github-actions bot commented May 26, 2020

fjhheras commented May 27, 2020

lgvaz commented May 29, 2020

Borda commented Jun 11, 2020

Names of subdirectories for checkpoint and logging #1953

Names of subdirectories for checkpoint and logging #1953

Comments

fjhheras commented May 26, 2020

❓ Questions and Help

First question

Second question:

What have you tried?

github-actions bot commented May 26, 2020

fjhheras commented May 27, 2020

lgvaz commented May 29, 2020

Borda commented Jun 11, 2020