Skip to content

Releases: NVIDIA/NeMo-Aligner

NVIDIA NeMo-Aligner v0.3.0.trtllm

03 May 18:34
e61121b
Compare
Choose a tag to compare

New features and optimizations

  • TRT-LLM support in the RLHF pipeline. This adds significant speedup(6.96x) over our previous implementation.

Container

Please refer to the Dockerfile which contains all the dependencies needed to run TRT-LLM and RLHF.

Learn More

For more information see our Paper and Usage Guide.

NVIDIA NeMo-Aligner v0.2.0

13 Mar 23:17
3b8b70e
Compare
Choose a tag to compare

New features and optimizations

  • Added public-facing official Dockerfile for NeMo-Aligner.
  • PPO: memory optimization to help avoid OOM in the actor when sending training data to the critic.
  • PPO: it is now possible to use a custom end string in sampling_params.end_strings that is different from <extra_id_1>.
  • SFT: added support for custom validation metrics based on model generations.
  • Added the ability to do multi-epoch (cfg.max_epochs > 1) training for reward models, DPO, PPO, and SFT
  • SFT/SteerLM: added LoRA tuning as an option besides full fine-tuning, only attention_qkv layer is supported

Breaking changes

  • We have changed the shuffle logic in the data sampler to support multi-epoch training, so training runs using identical parameters
    will not give the same results anymore because the shuffle logic has changed (specifically the seed value is modified slightly per epoch).
    If you run CI/regression type tests, then be warned that the test may break due to this shuffle change.

Bug Fixes

  • Fixed a potential issue when the base model's model.data.data_prefix config is a list and is about to be overridden with
    a dictionary from the training configuration.
  • exp_manager.max_time_per_run is now respected, the trainers will save and run validation before exiting if we've reached the time limit.
  • Fixed crash in PPO when using a separate reward model server (i.e., with combine_rm_and_critic_server=False).
  • Fixed crash when LR scheduler is not specified

Container

docker pull nvcr.io/nvidia/nemo:24.01.framework

To get access:

  1. Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
  2. If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team: ea-bignlp/ga-participants and click Generate API key. Save this key for the next step. Else, skip this step.
  3. On your machine, docker login to nvcr.io using
docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>

PyPi

https://pypi.org/project/nemo-aligner/0.2.0/

NVIDIA NeMo-Aligner v0.1.0

06 Dec 17:59
Compare
Choose a tag to compare

Highlights

First open source release of NeMo-Aligner. Featuring:

  • Support for the full Reinforcement Learning from Human Feedback(RLHF) pipeline including SFT, Reward Model Training and Reinforcement Learning
  • Support for the SteerLM technique
  • Support for Direct Preference Optimization
  • Support for all Megatron Core GPT models such as LLAMA2 70B

Container

docker pull nvcr.io/ea-bignlp/ga-participants/nemofw-training:23.11

To get access:

  1. Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
  2. If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team: ea-bignlp/ga-participants and click Generate API key. Save this key for the next step. Else, skip this step.
  3. On your machine, docker login to nvcr.io using
docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>

PyPi

https://pypi.org/project/nemo-aligner/0.1.0/