PPOTrainer ignores limit_val_batches when running validation #99

gleibovich-nvidia · 2024-02-05T08:33:58Z

There's an inconsistency between PPOTrainer and SFT and DPO trainers causing PPOTrainer to always test just a single (same) validation batch on every run, while ignoring the rest of the validation set. The goal originally was to have validation track the same set of held out samples throughout training, thus resetting the validation iterator on every validation run.

Expected behavior is to allow the user to control the size of the validation set to use for testing. One possible solution is to have the validation dataloader return mbs*dp size and have the user control how many he'd like to use for validation with the limit_val_batches as done with other trainers.

The text was updated successfully, but these errors were encountered:

odelalleau · 2024-02-05T14:28:12Z

Duplicate of #27?

gleibovich-nvidia · 2024-02-05T14:38:11Z

Yep. Missed it.
Sorry, closing.

gleibovich-nvidia added the bug Something isn't working label Feb 5, 2024

gleibovich-nvidia closed this as completed Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPOTrainer ignores limit_val_batches when running validation #99

PPOTrainer ignores limit_val_batches when running validation #99

gleibovich-nvidia commented Feb 5, 2024

odelalleau commented Feb 5, 2024

gleibovich-nvidia commented Feb 5, 2024

PPOTrainer ignores limit_val_batches when running validation #99

PPOTrainer ignores limit_val_batches when running validation #99

Comments

gleibovich-nvidia commented Feb 5, 2024

odelalleau commented Feb 5, 2024

gleibovich-nvidia commented Feb 5, 2024