-
Notifications
You must be signed in to change notification settings - Fork 34
filter out zero-advantage samples #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank for the PR! My understanding is that with entropy_bonus=0 and kl_coef=0 this PR should not change anything, cause it will masking it what is already multiplied by 0. Is that correct? We can also drop zero-advantage groups earlier in the pipeline, in Happy to hop on a call tomorrow to better understand what you are trying to achieve. |
|
@rizar you are right! When the Good point about dropping them earlier let me check! |
|
@rizar I've moved to drop zero-advantage groups earlier in the pipeline |
|
Great, that will be a nice contribution, @kashif ! But there are good reasons to make it optional. The main being that I'd be happy to work with you to finish up this PR a bit later. Right now and until May 15 we are crunching for NeurIPS. Not sure if I find time before May 15, but right after we should definitely add this feature. |
|
Ran this code once, "(filtered out 0)" is what I see all the time. Will take a closer look on Monday. |
|
@rizar so by default I made filtering to be |
|
indeed my bad, I had this set to 0 in my launch command turned it on, now it crashes: [2025-05-31 17:18:15,668][pipelinerl.run_preprocess][ERROR] - Error in preprocessor worker: 'group_id' |
|
the reason is that the preprocessing removes the group_id... we could keep it though, if it doesn't screw up the trainer component (finetune) |
|
ah yes sure! |
rizar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| rl_config: RLConfig, | ||
| ) -> Dataset: | ||
| preprocess = partial(preprocess_fn, seq_length=seq_length, tokenizer=tokenizer, is_rl=True) | ||
| columns = ["input_ids", "labels", "attention_mask"] + RL_DATA_COLUMNS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can always add "group_id" to the columns, it won't hurt
No description provided.