Skip to content

v0.5.0

Choose a tag to compare

@gupta-abhay gupta-abhay released this 15 May 04:43
· 44 commits to main since this release
d65335f

What's new

  • Online RL Algorithms: We now support PPO and GRPO for online RL training
  • RL with Verifiable Rewards: We've added support for verifiable rewards with online RL algorithms, along with evaluations during training.
  • Registries for extensible and composable design
  • Robust vLLM support for efficient inference during online RL training

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0