Skip to content

Conversation

@vivekkalyan
Copy link
Collaborator

Due to the way we efficiently deal with common substring, there is an edge case where RULER would send empty trajectories to the judge when all trajectories in a group were identical. This caused scores of 0 for that group, which while it doesn't affect the advantage, might be alarming.

This PR detects this edge case and sends only 1 trajectory to the judge and duplicates the result to all trajectories. We still end up having the same score across all trajectories, just now it's a number chosen between 0-1 by the judge.

To test this case:

    message_lists = [
        [
            {"role": "system", "content": "You are helpful."},
            {"role": "user", "content": "What is 2+2?"},
            {"role": "assistant", "content": "4"},
        ],
        [
            {"role": "system", "content": "You are helpful."},
            {"role": "user", "content": "What is 2+2?"},
            {"role": "assistant", "content": "4"},
        ],
    ]
    scores = await ruler(message_lists, debug=True)

# If all trajectories were identical, we only sent one to the judge
# Duplicate the score for all trajectories
if all_identical:
assert len(parsed.scores) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assert, could you use an if -> raise block, similar to line 184? A user might otherwise have to do a bit of debugging to figure out what exactly is going wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, added raise statements to both conditionals

@vivekkalyan vivekkalyan merged commit 0ac6720 into main Nov 14, 2025
2 checks passed
@vivekkalyan vivekkalyan deleted the fix/ruler-same-trajectories-reward branch November 14, 2025 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants