Release v0.2.6
Changes
- Upgraded vLLM to v0.4.1 @mgerstgrasser @wuxibin89 @hijkzzz
- Upgraded Transformers to v4.40.1 and DeepSpeed to v0.14.0 @hijkzzz
- Fixed typo in train_ppo_ray.py @mickelliu
- Fixed mismatch size output_state_dict(148) and state_dict(149) in model saving @hijkzzz
- Added support for --colocate_actor_ref and --colocate_critic_reward in train_ppo_ray.py @hijkzzz
- Added support for Ray PPO reward ref models offloading @hijkzzz