You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's an inconsistency between PPOTrainer and SFT and DPO trainers causing PPOTrainer to always test just a single (same) validation batch on every run, while ignoring the rest of the validation set. The goal originally was to have validation track the same set of held out samples throughout training, thus resetting the validation iterator on every validation run.
Expected behavior is to allow the user to control the size of the validation set to use for testing. One possible solution is to have the validation dataloader return mbs*dp size and have the user control how many he'd like to use for validation with the limit_val_batches as done with other trainers.
The text was updated successfully, but these errors were encountered:
There's an inconsistency between
PPOTrainer
and SFT and DPO trainers causingPPOTrainer
to always test just a single (same) validation batch on every run, while ignoring the rest of the validation set. The goal originally was to have validation track the same set of held out samples throughout training, thus resetting the validation iterator on every validation run.Expected behavior is to allow the user to control the size of the validation set to use for testing. One possible solution is to have the validation dataloader return mbs*dp size and have the user control how many he'd like to use for validation with the
limit_val_batches
as done with other trainers.The text was updated successfully, but these errors were encountered: