-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
bugSomething isn't workingSomething isn't workingcheckpointingRelated to checkpointingRelated to checkpointinglightningclipl.cli.LightningCLIpl.cli.LightningCLItrainer: predictver: 2.5.x
Description
Bug description
When running trainer.predict via LightningCLI with --ckpt_path best, Lightning raises:
ValueError: `.predict(ckpt_path="best")` is set but `ModelCheckpoint` is not configured to save the best model.
However, ModelCheckpoint is configured under trainer.callbacks in the YAML, and training is executed via LightningCLI. This looks like either:
1. LightningCLI isn’t detecting the ModelCheckpoint from the CLI YAML when resolving best during predict, or
2. The checkpoint_connector’s detection logic for best doesn’t handle the CLI/YAML path or log directory resolution, or
3. A mismatch between monitor metric registration and best-checkpoint discovery is not surfaced clearly (i.e., the callback exists, but the condition for “best” was silently unmet).
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
trainer:
callbacks:
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
monitor: val_loss
mode: min
save_top_k: 1
filename: "{val_loss:.4f}"
save_last: false
uv run src/cli.py predict -c config.yaml --ckpt_path=best
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcheckpointingRelated to checkpointingRelated to checkpointinglightningclipl.cli.LightningCLIpl.cli.LightningCLItrainer: predictver: 2.5.x