Skip to content

Conversation

@LovelyBuggies
Copy link
Member

No description provided.

… (mode, sandbox_slice, original/previous, expert_model).\n- Default external.mode=level_feedback; sandbox_slice=1 (supports 0/None/'all').\n- Handoff handled in CoMLRL trainer with strict modes; expose magrpo/grpo.handoff.\n- Update HumanEval/CHE splits (HE train 33:163, eval :32; CHE train 16:, eval :16).\n- Set output.save_final_model=false by default.\n- Set wandb.dir and output.base_dir to storage paths by trainer/mode:\n * ST GRPO: output_st_grpo, ST MAGRPO: output_st_magrpo\n * MT GRPO: output_mt_grpo, MT MAGRPO: output_mt_magrpo\n- Rename expert model key to external.expert_model (used only for expert_edits).\n- Simplify YAML comments to section headers only.\n- Read external.* in train_magrpo.py and train_grpo.py; defaults adjusted.\n- README: clarify external keys and sandbox_slice semantics.
@LovelyBuggies LovelyBuggies merged commit 9331ca2 into main Sep 24, 2025
@LovelyBuggies LovelyBuggies deleted the new branch September 24, 2025 16:57
ryankamiri pushed a commit that referenced this pull request Sep 24, 2025
* make random default

* reset train num and levelfeedback as default

* delete files no use

* fix che train too less

* Config and external overhaul:\n\n- Introduce unified external section (mode, sandbox_slice, original/previous, expert_model).\n- Default external.mode=level_feedback; sandbox_slice=1 (supports 0/None/'all').\n- Handoff handled in CoMLRL trainer with strict modes; expose magrpo/grpo.handoff.\n- Update HumanEval/CHE splits (HE train 33:163, eval :32; CHE train 16:, eval :16).\n- Set output.save_final_model=false by default.\n- Set wandb.dir and output.base_dir to storage paths by trainer/mode:\n  * ST GRPO: output_st_grpo, ST MAGRPO: output_st_magrpo\n  * MT GRPO: output_mt_grpo,  MT MAGRPO: output_mt_magrpo\n- Rename expert model key to external.expert_model (used only for expert_edits).\n- Simplify YAML comments to section headers only.\n- Read external.* in train_magrpo.py and train_grpo.py; defaults adjusted.\n- README: clarify external keys and sandbox_slice semantics.

* Remove unnecessary try/except; robust sandbox_slice parsing without exceptions; minimal default tags

* Fix: define external_cfg before use; remove duplicate assignments in train_magrpo.py and train_grpo.py

* Fix: handle dataset load failure in train_magrpo.py (return early) to avoid UnboundLocalError

* Configs: reduce num_train_epochs by 20% (rounded) across all YAMLs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants