Skip to content

feat: adding support for Bradley-Terry reward model training#609

Merged
terrykong merged 28 commits intomainfrom
jveronvialard/bt-rm-training
Jul 29, 2025
Merged

feat: adding support for Bradley-Terry reward model training#609
terrykong merged 28 commits intomainfrom
jveronvialard/bt-rm-training

Conversation

@jveronvialard
Copy link
Contributor

@jveronvialard jveronvialard commented Jul 3, 2025

What does this PR do ?

Adding support for Bradley-Terry reward model training in NVIDIA-NeMo/RL.

Usage

The command to launch a Bradley-Terry reward model training job is as follows:

uv run examples/run_rm.py --config <PATH TO YAML CONFIG> <OVERRIDES>

An example config can be found at examples/configs/rm.yaml. Please refer to docs/guides/rm.md for more information.

Example convergence plots:
image
image
image

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 3, 2025
@jveronvialard jveronvialard added enhancement New feature or request training Training related algorithm labels Jul 3, 2025
@jveronvialard jveronvialard changed the title adding support for Bradley-Terry reward model training feat: adding support for Bradley-Terry reward model training Jul 3, 2025
odelalleau
odelalleau previously approved these changes Jul 4, 2025
Copy link
Contributor

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! lgtm (I had already reviewed those changes privately) -- I just realized though that we probably need an update of README.md which contains the list of algorithms (and should link to the new rm.md). Would still be good to get this merged asap to unblock people who need this feature.

@SahilJain314
Copy link
Contributor

Thanks for the PR! Would you mind adding convergence plots to your PR description?

Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @jveronvialard !

  1. can you also update index.md to include rm.md so it appears in our docs? @jgerh could you review the docs?
  2. could you add unit tests? Cursor can help with most of the heavy lifting

@terrykong terrykong requested a review from jgerh July 11, 2025 00:46
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Copy link
Contributor

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one small suggestion

odelalleau
odelalleau previously approved these changes Jul 29, 2025
Copy link
Contributor

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but I'm probably biased since I contributed a few commits ;)

terrykong
terrykong previously approved these changes Jul 29, 2025
@terrykong
Copy link
Collaborator

@jveronvialard can you run the linter?

@odelalleau odelalleau dismissed stale reviews from terrykong and themself via 43f7eae July 29, 2025 19:15
Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
@odelalleau odelalleau force-pushed the jveronvialard/bt-rm-training branch from 43f7eae to 7530c84 Compare July 29, 2025 19:15
@odelalleau
Copy link
Contributor

@jveronvialard can you run the linter?

Ah my bad my last commit introduced a minor linting issue, I just fixed it.

@terrykong terrykong added this pull request to the merge queue Jul 29, 2025
Merged via the queue into main with commit d467852 Jul 29, 2025
15 checks passed
@terrykong terrykong deleted the jveronvialard/bt-rm-training branch July 29, 2025 23:09
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jul 30, 2025
…NeMo#609)

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Julien Veron Vialard <50602890+jveronvialard@users.noreply.github.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
…NeMo#609)

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Julien Veron Vialard <50602890+jveronvialard@users.noreply.github.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
…NeMo#609)

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Julien Veron Vialard <50602890+jveronvialard@users.noreply.github.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
soodoshll pushed a commit to soodoshll/RL that referenced this pull request Aug 13, 2025
…NeMo#609)

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Julien Veron Vialard <50602890+jveronvialard@users.noreply.github.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Qidong Su <qidongs@nvidia.com>
@terrykong
Copy link
Collaborator

@phtran8 can you QA?

@github-actions
Copy link

github-actions bot commented Oct 2, 2025

❌ Submodule Fast-Forward Check Failed

Check based on commit: 7530c84 (PR #609 from jveronvialard/bt-rm-training)

❌ Submodules that need attention:

Megatron-LM: ❌ PR branch is BEHIND main branch
TARGET (main branch): https://github.com/terrykong/Megatron-LM/commits/af73aa2cebf94a0bee5ea6dda2614ad989faffae/
CURRENT (PR #609 from jveronvialard/bt-rm-training): https://github.com/terrykong/Megatron-LM/commits/2ff0f099ffc30ffd152e3e29e921a1609d00855c/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@github-actions
Copy link

github-actions bot commented Oct 3, 2025

❌ Submodule Fast-Forward Check Failed

Check based on commit: 7530c84 (PR #609 from jveronvialard/bt-rm-training)

❌ Submodules that need attention:

Megatron-LM: ❌ PR branch is BEHIND main branch
TARGET (main branch): https://github.com/terrykong/Megatron-LM/commits/af73aa2cebf94a0bee5ea6dda2614ad989faffae/
CURRENT (PR #609 from jveronvialard/bt-rm-training): https://github.com/terrykong/Megatron-LM/commits/2ff0f099ffc30ffd152e3e29e921a1609d00855c/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

algorithm CI Relating to CI documentation Improvements or additions to documentation enhancement New feature or request QA:Verified training Training related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants