Releases: OpenLLMAI/OpenRLHF
Releases · OpenLLMAI/OpenRLHF
Release v0.2.0
Changes
- Supported vLLM 0.3.1 @wuxibin89
Release v0.1.10
Changes
- Fixed save_models for named_buffer @wuxibin89
- Fixed vLLM generation hang bug (requires vLLM<0.2.7) @hijkzzz
Release v0.1.9
Changes
- Supported input_template #203 @rbao2018
- Supported KTO #201 @Dylancer1998
- Upgrade HuggingFace Transformers to 4.37.1
Release v0.1.8
Changes
- Upgraded transformers to version 4.37
- Fixed gradient checkpoint configuration in Ray RLHF @wuxibin89
- Fixed loss coefficient for PPO-ptx @hijkzzz
Release v0.1.7
Changes
- Fixed LLaMA RoPE initialization bug for ZeRO3 @wuxibin89
- Fixed a DPO training script bug @hijkzzz
Release v0.1.6
Changes
- Fixed DeepSpeed configs to improve PPO training stability @hijkzzz
Release v0.1.5
Release v0.1.4
Changes
- Fixed reward model training when using the Huggingface ZeRO3 initialization API (for models with 70 billion+ parameters) @wuxibin89
- Added support for Mixtral 8x7b balancing loss (--balancing_loss_coef) @hijkzzz
- Fixed issue with vllm_engine when tp=1 @wuxibin89
- Fixed ZeRO2 model saving bugs @hijkzzz
- Added --grad_accum_dtype args to save memory of the CPUAdam @hijkzzz
Release v0.1.3
Changes
- Fixed Huggingface Reward model saving @wuxibin89
- Improved
mask_mean
for loss function @hijkzzz - Fixed
num_actions
andaction_mask
@ZiyiLiubird - Optimized PPO performance of example scripts (set micro_batch_size=4) @hijkzzz
Release v0.1.2
Changes
- Fix Reward model hidden size and value_head initialization @wuxibin89
- Fix save bugs @hijkzzz