Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Names of subdirectories for checkpoint and logging #1953

Closed
fjhheras opened this issue May 26, 2020 · 4 comments
Closed

Names of subdirectories for checkpoint and logging #1953

fjhheras opened this issue May 26, 2020 · 4 comments
Labels
logger Related to the Loggers question Further information is requested

Comments

@fjhheras
Copy link
Contributor

❓ Questions and Help

First question

The directories where checkpoints are saved are always something like

$DIR/$SUBDIR/checkpoint/

I have $DIR controlled: it can be set by default_save_path argument in Trainer, and it is also a member of the class with the same name.

$SUBDIR seems randomly generated. Where is it generated? Is there any way where I can access it after training?

My use case is: I am training a LightningModule, and by default it is saving the best model (checkpoint) during training (the one with lowest val_loss). I want to load the best model just after training and, for example run a test run.

Second question:

A similar question about wandb logger. Is there any way that I can see which logger directory and subdirectory it is pointing to?

My goal would be to select the best, for example, 5 models out of 20 runs and load them by combining the metrics stored in the logger and the weights stored in the checkpoints. Alternatively, what would be the best practice here?

What have you tried?

I went through the code looking for where $SUBDIR is generated and I did not find it. I could not find anything relevant as members of the classes Trainer or LightningModule either.

@fjhheras fjhheras added the question Further information is requested label May 26, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@fjhheras
Copy link
Contributor Author

Hi, it seems that I found part of the answer: trainer.checkpoint_callback.best_k_models is a dictionary with the path and the metric (in my case validation loss). By default it only contains an element, which is the best model, so this works for me:

path_to_best_model =next(iter(trainer.checkpoint_callback.best_k_models.keys()))
model.load_from_checkpoint(path_to_best_model)

@lgvaz
Copy link
Contributor

lgvaz commented May 29, 2020

If you have more than one model saved, you can use the following snippet for getting the best model:

op = min if checkpoint_callback.mode == 'min' else max
best_model_path = op(
    checkpoint_callback.best_k_models,
    key=checkpoint_callback.best_k_models.get
)

BTW, this snippet comes from #1799 , this PR will make it even easier to get back the best model =)

@Borda
Copy link
Member

Borda commented Jun 11, 2020

btw, #1799 hs landed, so I assume this can be also closed... feel free to reopen if needed 🦝

@Borda Borda closed this as completed Jun 11, 2020
@Borda Borda added the logger Related to the Loggers label Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logger Related to the Loggers question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants