Release v0.5.0 · databricks/compose-rl

What's new

Online RL Algorithms: We now support PPO and GRPO for online RL training
RL with Verifiable Rewards: We've added support for verifiable rewards with online RL algorithms, along with evaluations during training.
Registries for extensible and composable design
Robust vLLM support for efficient inference during online RL training

What's Changed

Update version to match latest release by @dakinggg in #25
attach vllm engines to state by @vchiley in #20
Adding warning for truncating preferences by @bcui-db in #27
Add load planner for PPO by @bcui-db in #18
Auto set TP size by @vchiley in #29
Enable Masking of EOS tokens list by @bcui-db in #31
Accomodate typing changes for transformers 4.51 by @dakinggg in #33
Dataloader changes for RLVR by @gupta-abhay in #21
Moved the long seq fix on top of main by @abaheti95 in #34
Changes for better reward validation by @gupta-abhay in #35
Inheritance fix by @gupta-abhay in #37
Simple change by @gupta-abhay in #40
K generation per prompt by @abaheti95 in #36
Merge ReadMEs for easier parsing by @gupta-abhay in #41
Enable hf token for restricted data access by @gupta-abhay in #42
Enable different KL estimators for training by @gupta-abhay in #44
update readme by @bcui-db in #45
Upgrade yapf version by @gupta-abhay in #46
Fast inference w/ single vllm generate call per PPO iter by @abaheti95 in #43
Addressing cleanup comments on fast vLLM PR by @abaheti95 in #49
Improving online RL logging by @abaheti95 in #50
Update vLLM, enables single node Tensor parallel sizes (1, 2, 4, 8) by @bcui-db in #48
Unified kl estimators by @gupta-abhay in #53
Add codeowners by @gupta-abhay in #54
Add chat functionality to vLLM actor by @bcui-db in #55
Exposing average log prob flag by @abaheti95 in #56
Modifying codeowners by @gupta-abhay in #57
GRPO implementation by @abaheti95 in #51
Registries for extending compose-rl by @gupta-abhay in #47
Simple tests for new registries by @gupta-abhay in #58
Timeout change by @gupta-abhay in #59
Fix label generation for MATH to match verification by @gupta-abhay in #60
Changes for optional tokens list by @gupta-abhay in #61
Minor changes for dtype and docstrings by @gupta-abhay in #62

New Contributors

@vchiley made their first contribution in #20
@gupta-abhay made their first contribution in #21

Full Changelog: v0.4.0...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's new

What's Changed

New Contributors

Contributors

Uh oh!