Skip to content

fix: add best_metric_key field to CheckpointingConfig dataclass#1641

Merged
akoumpa merged 1 commit intoNVIDIA-NeMo:mainfrom
stanley1208:fix/add-best-metric-key-to-checkpoint-config
Apr 5, 2026
Merged

fix: add best_metric_key field to CheckpointingConfig dataclass#1641
akoumpa merged 1 commit intoNVIDIA-NeMo:mainfrom
stanley1208:fix/add-best-metric-key-to-checkpoint-config

Conversation

@stanley1208
Copy link
Copy Markdown
Contributor

What does this PR do?

Add the missing best_metric_key field to CheckpointingConfig dataclass, fixing a crash when users set this field in their checkpoint YAML config.

Fixes #971.

Bug

Setting best_metric_key in the checkpoint config crashes with CheckpointingConfig.__init__() got an unexpected keyword argument 'best_metric_key' because the dataclass doesn't include this field.

The recipes (train_ft.py, train_seq_cls.py, finetune.py, kd.py) already read checkpoint.best_metric_key from config and pass it to _maybe_save_checkpoint(), but the CheckpointingConfig dataclass rejects it during construction at build_checkpoint_config().

Fix

Add best_metric_key: str = "default" to CheckpointingConfig. The default value "default" matches the fallback used in all 4 recipe files.

Related

CC @akoumpa @adil-a

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@stanley1208
Copy link
Copy Markdown
Contributor Author

@akoumpa @adil-a Ready for review — one-line fix adding the missing best_metric_key field to CheckpointingConfig. Fixes #971. Thanks!

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 1, 2026

/claude review

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 1, 2026

/ok to test 7ca34bb

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 2, 2026

/claude review

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 2, 2026

@adil-a can you review? IDK if this option is wired with the rest of the checkpointing code.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Apr 4, 2026
@akoumpa akoumpa merged commit 4ecba76 into NVIDIA-NeMo:main Apr 5, 2026
53 checks passed
@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Apr 6, 2026
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
Made-with: Cursor

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

best_metric_key cannot be set in config

4 participants