v0.5.0
What's new
- Online RL Algorithms: We now support PPO and GRPO for online RL training
- RL with Verifiable Rewards: We've added support for verifiable rewards with online RL algorithms, along with evaluations during training.
- Registries for extensible and composable design
- Robust vLLM support for efficient inference during online RL training
What's Changed
- Update version to match latest release by @dakinggg in #25
- attach vllm engines to state by @vchiley in #20
- Adding warning for truncating preferences by @bcui-db in #27
- Add load planner for PPO by @bcui-db in #18
- Auto set TP size by @vchiley in #29
- Enable Masking of EOS tokens list by @bcui-db in #31
- Accomodate typing changes for transformers 4.51 by @dakinggg in #33
- Dataloader changes for RLVR by @gupta-abhay in #21
- Moved the long seq fix on top of main by @abaheti95 in #34
- Changes for better reward validation by @gupta-abhay in #35
- Inheritance fix by @gupta-abhay in #37
- Simple change by @gupta-abhay in #40
- K generation per prompt by @abaheti95 in #36
- Merge ReadMEs for easier parsing by @gupta-abhay in #41
- Enable hf token for restricted data access by @gupta-abhay in #42
- Enable different KL estimators for training by @gupta-abhay in #44
- update readme by @bcui-db in #45
- Upgrade yapf version by @gupta-abhay in #46
- Fast inference w/ single vllm generate call per PPO iter by @abaheti95 in #43
- Addressing cleanup comments on fast vLLM PR by @abaheti95 in #49
- Improving online RL logging by @abaheti95 in #50
- Update vLLM, enables single node Tensor parallel sizes (1, 2, 4, 8) by @bcui-db in #48
- Unified kl estimators by @gupta-abhay in #53
- Add codeowners by @gupta-abhay in #54
- Add
chatfunctionality to vLLM actor by @bcui-db in #55 - Exposing average log prob flag by @abaheti95 in #56
- Modifying codeowners by @gupta-abhay in #57
- GRPO implementation by @abaheti95 in #51
- Registries for extending compose-rl by @gupta-abhay in #47
- Simple tests for new registries by @gupta-abhay in #58
- Timeout change by @gupta-abhay in #59
- Fix label generation for MATH to match verification by @gupta-abhay in #60
- Changes for optional tokens list by @gupta-abhay in #61
- Minor changes for dtype and docstrings by @gupta-abhay in #62
New Contributors
- @vchiley made their first contribution in #20
- @gupta-abhay made their first contribution in #21
Full Changelog: v0.4.0...v0.5.0