Releases · OpenLLMAI/OpenRLHF · GitHub

04 Jan 13:03

hijkzzz

Release v0.1.1

Changes

Using Huggingface format Actor and Reward/Critic models @wuxibin89
https://huggingface.co/OpenLLMAI/Llama-2-7b-sft-model-ocra-500k
https://huggingface.co/OpenLLMAI/Llama-2-7b-rm-anthropic_hh-lmsys-oasst-webgpt
Upgrade PyTorch NGC container to 23.12
Upgrade FlashAttention2 to 2.4.2
Added continue pre-train script @hijkzzz

Contributors

wuxibin89 and hijkzzz

Assets 2

30 Dec 06:08

hijkzzz

Release v0.1.0

Changes

Added support for vLLM generation in RLHF @wuxibin89
Added 70B RLHF training scripts @wuxibin89
Optimized padding removal using torch.argmax @li-plus
Upgraded the container to NVIDIA PyTorch 23.10
Upgraded Transformers and DeepSpeed
Fixed FlashAttention 2 @hijkzzz

Contributors

wuxibin89, hijkzzz, and li-plus

Assets 2

17 Dec 15:17

hijkzzz

Release v0.0.2

Changes

Remove pad_token for llama2 @hijkzzz
Support cDPO/IPO @hijkzzz
Fix Ray RLHF sync bugs @wuxibin89
Optimized eos_indicies with torch.argmax @li-plus
Fix local datasets @catqaq
Fix DPO DataLoader bugs

Contributors

wuxibin89, hijkzzz, and 2 other contributors

Assets 2

15 Nov 04:15

hijkzzz

Release v0.0.1

Features

A fast LLaMA2 SFT/PPO Training Framework based on DeepSpeed. @hijkzzz
Multi-nodes training scripts for Slurm. @hijkzzz
Support DPO (direct-preference-optimization). @hijkzzz
Distributed PPO based on Ray for 34B+ models and 7B models on RTX4090. @wuxibin89
Support Conditional SFT (https://arxiv.org/abs/2308.12050). @hijkzzz
Support Wandb log (--wandb). @dabney777
Support conda env/nvidia docker. @catqaq
Support FlashAttention2 (--flash_attn). @pikaqqqqqq
Support Hot Chinese models. @catqaq
Support GPT4 evaluation. @hijkzzz
Support Multiple Reward models. @wuxibin89

Contributors

wuxibin89, dabney777, and 3 other contributors

Assets 2