Skip to content

Introduce JudgerConfig with judger_type and unify judger configs#1552

Merged
YanhuiDua merged 6 commits intoInternLM:rl_designfrom
YanhuiDua:fix_judger
Mar 10, 2026
Merged

Introduce JudgerConfig with judger_type and unify judger configs#1552
YanhuiDua merged 6 commits intoInternLM:rl_designfrom
YanhuiDua:fix_judger

Conversation

@YanhuiDua
Copy link
Copy Markdown
Collaborator

No description provided.

@YanhuiDua YanhuiDua changed the title Unify judger configs and refactor judger build API (introduce JudgerConfig with judger_type) Introduce JudgerConfig with judger_type and unify judger configs Mar 10, 2026
@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread xtuner/v1/rl/judger/native.py Outdated
Comment thread xtuner/v1/rl/judger/native.py Outdated
Comment on lines +215 to +225
if self.judger_type == "native":
return NativeJudger(
judger_name=self.judger_name,
reward_handler=self.reward_handler,
request_timeout=self.request_timeout,
extra_info=self.extra_info,
)
workers_list.append(worker)
return workers_list

def build(self, pg: PlacementGroup | None = None, start_bundle_idx: int = 0) -> RouterJudger:
if self.judger_type == "ray.actor":
if self.num_ray_actors > 1:
raise ValueError("num_ray_actors must be 1 when judger_type is 'ray.actor'.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning: The num_ray_actors > 1 validation for ray.actor mode happens late — only inside build(). If a user constructs a config with judger_type="ray.actor", num_ray_actors=4, they won't see the error until build() is called. Consider using a Pydantic model_validator to fail fast at config construction time:

@model_validator(mode="after")
def _validate_ray_actor_config(self) -> "JudgerConfig":
    if self.judger_type == "ray.actor" and self.num_ray_actors > 1:
        raise ValueError("num_ray_actors must be 1 when judger_type is 'ray.actor'.")
    return self

Also, build() silently falls through to building a RouterJudger if judger_type doesn't match "native" or "ray.actor". An explicit check (or an else raising ValueError for unknown types) would be safer, even though the Literal type constrains the values.

Comment thread xtuner/v1/rl/judger/dapo_math.py Outdated
Comment on lines 345 to 348
class DapoMathJudgerConfig(_DapoMathJudgerDefaults, JudgerConfig):
"""Configuration for the DapoMath judger."""

num_ray_actors: int = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: num_ray_actors and num_cpus_per_actor are already defined in the parent JudgerConfig with the same defaults (1 and 1). These redeclarations are redundant and can be removed.

Suggested change
class DapoMathJudgerConfig(_DapoMathJudgerDefaults, JudgerConfig):
"""Configuration for the DapoMath judger."""
num_ray_actors: int = 1
class DapoMathJudgerConfig(_DapoMathJudgerDefaults, JudgerConfig):
"""Configuration for the DapoMath judger."""

Comment thread tests/rl/test_judger.py Outdated
Comment on lines 118 to 119
config = DapoMathJudgerConfig(judger_type="router",
judger_name="dapo_math",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: Awkward formatting — judger_type="router" is on the opening line while remaining kwargs are on subsequent lines. For consistency, either put all args on separate lines or keep it on one line:

Suggested change
config = DapoMathJudgerConfig(judger_type="router",
judger_name="dapo_math",
config = DapoMathJudgerConfig(
judger_type="router",

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 10, 2026

Claude: ## Summary

Good refactor — unifying NativeJudgerConfig and RouterJudgerConfig into a single JudgerConfig with a judger_type discriminator reduces duplication across the judger config hierarchy. The new ray.actor type is a clean addition.

Issues

Warning

  • native.py:136-138: Field descriptions as trailing inline comments with unnecessary line breaks. Use Field(description=...) or a class docstring instead.
  • native.py:215-225: num_ray_actors validation for ray.actor mode is deferred to build() — should be a model_validator for fail-fast behavior. Also, build() silently falls through to RouterJudger for the remaining case; an explicit guard or else branch would be safer.

Nit

  • dapo_math.py:345-348: num_ray_actors and num_cpus_per_actor are redundantly redeclared (already in parent JudgerConfig with same defaults).
  • test_judger.py:118-119: Minor formatting issue — judger_type="router" on the opening paren line while other kwargs are on separate lines.

Verdict

COMMENT — no critical issues, but the model_validator suggestion would improve usability.

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude I have added the model_validator in native.py, review again. If there is no issues, give this PR an approve.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Unifies NativeJudgerConfig and RouterJudgerConfig into a single JudgerConfig with a judger_type discriminator. The model_validator addition correctly validates constraints at construction time. Clean refactoring.

Issues

Nit

  • [xtuner/v1/rl/judger/native.py:150] Warning message is redundant — "num_ray_actors will be set to 1 when judger_type is 'ray.actor'. num_ray_actors will be set to 1" repeats itself.
  • [xtuner/v1/rl/judger/gsm8k.py:85-87, geo3k.py:41-43] num_ray_actors, num_cpus_per_actor, cpu_memory_per_actor re-declared with identical defaults as JudgerConfig. Unnecessary and inconsistent with DapoMathJudgerConfig which correctly relies on inheritance. Consider removing for consistency.

Verdict

APPROVE

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude fix the nit you mentioned above

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

  • [xtuner/v1/rl/judger/native.py:150] Warning message is redundant — "num_ray_actors will be set to 1 when judger_type is 'ray.actor'. num_ray_actors will be set to 1" repeats itself.
  • [xtuner/v1/rl/judger/gsm8k.py:85-87, geo3k.py:41-43] num_ray_actors, num_cpus_per_actor, cpu_memory_per_actor re-declared with identical defaults as JudgerConfig. Unnecessary and inconsistent with DapoMathJudgerConfig which correctly relies on inheritance. Consider removing for consistency.

@claude fix these two nits

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude give this pr an approve

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Unifies NativeJudgerConfig and RouterJudgerConfig into a single JudgerConfig class with a judger_type discriminator field (Literal["native", "ray.actor", "router"]). Removes redundant per-judger Native/Router config class pairs (e.g. GSM8KNativeJudgerConfig + GSM8KRouterJudgerConfigGSM8KJudgerConfig). Adds validation for ray.actor/native type constraints on Ray-specific fields.

Issues

Nit

  • [xtuner/v1/rl/judger/native.py:245] The build() method falls through to router behavior without an explicit if self.judger_type == "router" guard. If a new judger_type variant is added to the Literal in the future, it would silently build a RouterJudger. Consider adding an explicit check or a final else: raise ValueError(...).

Verdict

APPROVE

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude Your commit make the lint check failed. Can you see the lint / lint (pull_request) error message. If you can see the error message, fix the code according to the error. If you can't, you should tell me and I will provide the error message for you

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude approve

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean unification of judger configs with proper validation.

@YanhuiDua YanhuiDua merged commit 23adbb2 into InternLM:rl_design Mar 10, 2026
3 of 6 checks passed
@YanhuiDua YanhuiDua deleted the fix_judger branch March 17, 2026 09:41
YanhuiDua added a commit that referenced this pull request Apr 27, 2026
…1552)

* Introduce JudgerConfig with judger_type and Unify judger configs

* fix tool config and add comments for JudgerConfig

* add check num_ray_actors, num_cpus_per_actor and cpu_memory_per_actor

* fix trailing whitespace in test_judger.py

* [Fix] Remove redundant warning message and duplicated field defaults in judger configs

* [Fix] Fix ruff-format lint: collapse multi-line logger.warning to single line

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants