What's Changed
- Removed Compatibility directories by @jdchang1 in #98
- Prompts per iteration by @bcui-db in #94
- Allow reference checkpoint to be both callback and load_path by @dakinggg in #99
- Added MessagesDataloader so we can just use
messagesin our datasets rather than tokenized inputs by @SeanKski in #92 - Add logs around reward process pool recreation by @dakinggg in #101
- Revert timeout by @dakinggg in #104
- Added proper temperature scaling of logits by @jdchang1 in #105
- Wensun/apo by @wensun in #96
- vLLM Chat Conversion by @jdchang1 in #102
- hotfix by @jdchang1 in #108
- STEM Benchmarks and verifiers by @gupta-abhay in #95
- Add prefix caching by @gupta-abhay in #107
- Update codeowners by @dakinggg in #113
- Changes for accumulate flag by @gupta-abhay in #111
- Adding Token Counter for Online RL by @rithwik-db in #110
- Use Single Controller design with unit test by [Experimental] @bowenyang008 in #114
- Update Code Owners by @gupta-abhay in #116
- refactor ppo callback to move its logic to single controller [Experimental] by @bowenyang008 in #115
New Contributors
- @SeanKski made their first contribution in #92
- @wensun made their first contribution in #96
- @rithwik-db made their first contribution in #110
Full Changelog: v0.7.0...v0.8.0