Releases: OpenLLMAI/OpenRLHF
Releases · OpenLLMAI/OpenRLHF
Release v0.1.1
Changes
- Using Huggingface format Actor and Reward/Critic models @wuxibin89
https://huggingface.co/OpenLLMAI/Llama-2-7b-sft-model-ocra-500k
https://huggingface.co/OpenLLMAI/Llama-2-7b-rm-anthropic_hh-lmsys-oasst-webgpt - Upgrade PyTorch NGC container to 23.12
- Upgrade FlashAttention2 to 2.4.2
- Added continue pre-train script @hijkzzz
Release v0.1.0
Changes
- Added support for vLLM generation in RLHF @wuxibin89
- Added 70B RLHF training scripts @wuxibin89
- Optimized padding removal using
torch.argmax
@li-plus - Upgraded the container to NVIDIA PyTorch 23.10
- Upgraded Transformers and DeepSpeed
- Fixed FlashAttention 2 @hijkzzz
Release v0.0.2
Changes
- Remove pad_token for llama2 @hijkzzz
- Support cDPO/IPO @hijkzzz
- Fix Ray RLHF sync bugs @wuxibin89
- Optimized eos_indicies with
torch.argmax
@li-plus - Fix local datasets @catqaq
- Fix DPO DataLoader bugs
Release v0.0.1
Features
- A fast LLaMA2 SFT/PPO Training Framework based on DeepSpeed. @hijkzzz
- Multi-nodes training scripts for Slurm. @hijkzzz
- Support DPO (direct-preference-optimization). @hijkzzz
- Distributed PPO based on Ray for 34B+ models and 7B models on RTX4090. @wuxibin89
- Support Conditional SFT (https://arxiv.org/abs/2308.12050). @hijkzzz
- Support Wandb log (--wandb). @dabney777
- Support conda env/nvidia docker. @catqaq
- Support FlashAttention2 (--flash_attn). @pikaqqqqqq
- Support Hot Chinese models. @catqaq
- Support GPT4 evaluation. @hijkzzz
- Support Multiple Reward models. @wuxibin89