Remove reward model codepath by SumanthRH · Pull Request #374 · NovaSky-AI/SkyRL

SumanthRH · 2025-10-01T20:19:17Z

What does this PR do?

Should close #371

We've had an unused codepath for using an outcome reward model in the training loop for a while. This was primarly for RLHF use-cases that we don't target and can be removed.

TODO:

Cleanup custom_rewards / orm_rewards keys in the trainer
Cleanup RewardModel logic
Cleanup normalize_reward config
E2E test with gsm8k (GRPO and PPO)

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

erictang000

lgtm, if you haven't already can you try an e2e gsm8k run with ppo just to make sure, since the critic code path was more tightly coupled with reward?

skyrl-train/skyrl_train/trainer.py

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

# What does this PR do? Should close NovaSky-AI#371 We've had an unused codepath for using an outcome reward model in the training loop for a while. This was primarly for RLHF use-cases that we don't target and can be removed. TODO: - [x] Cleanup `custom_rewards` / `orm_rewards` keys in the trainer - [x] Cleanup `RewardModel` logic - [x] Cleanup `normalize_reward` config - [x] E2E test with gsm8k (GRPO and PPO) --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH added 4 commits October 1, 2025 20:16

remove reward model codepath

bae6e3a

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

custom_rewards -> rewards

f018b23

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

2046fc7

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

remove normalize reward; cleanup

002e021

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH marked this pull request as ready for review October 1, 2025 20:58

SumanthRH requested a review from erictang000 October 1, 2025 20:58

SumanthRH assigned erictang000 Oct 1, 2025

erictang000 approved these changes Oct 1, 2025

View reviewed changes

skyrl-train/skyrl_train/trainer.py Outdated Show resolved Hide resolved

x

5674cee

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH merged commit 8805e75 into NovaSky-AI:main Oct 2, 2025
3 checks passed

erictang000 pushed a commit that referenced this pull request Oct 2, 2025

[Fix] Fix ci after #374 (#378)

6078885

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025

[Fix] Fix ci after NovaSky-AI#374 (NovaSky-AI#378)

9d03662

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove reward model codepath#374

Remove reward model codepath#374
SumanthRH merged 5 commits intoNovaSky-AI:mainfrom
SumanthRH:remove-rm

SumanthRH commented Oct 1, 2025 •

edited

Loading

Uh oh!

erictang000 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SumanthRH commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

erictang000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SumanthRH commented Oct 1, 2025 •

edited

Loading