Introduce JudgerConfig with judger_type and unify judger configs#1552
Introduce JudgerConfig with judger_type and unify judger configs#1552YanhuiDua merged 6 commits intoInternLM:rl_designfrom
JudgerConfig with judger_type and unify judger configs#1552Conversation
JudgerConfig with judger_type)JudgerConfig with judger_type and unify judger configs
|
@claude review |
| if self.judger_type == "native": | ||
| return NativeJudger( | ||
| judger_name=self.judger_name, | ||
| reward_handler=self.reward_handler, | ||
| request_timeout=self.request_timeout, | ||
| extra_info=self.extra_info, | ||
| ) | ||
| workers_list.append(worker) | ||
| return workers_list | ||
|
|
||
| def build(self, pg: PlacementGroup | None = None, start_bundle_idx: int = 0) -> RouterJudger: | ||
| if self.judger_type == "ray.actor": | ||
| if self.num_ray_actors > 1: | ||
| raise ValueError("num_ray_actors must be 1 when judger_type is 'ray.actor'.") |
There was a problem hiding this comment.
Claude: Warning: The num_ray_actors > 1 validation for ray.actor mode happens late — only inside build(). If a user constructs a config with judger_type="ray.actor", num_ray_actors=4, they won't see the error until build() is called. Consider using a Pydantic model_validator to fail fast at config construction time:
@model_validator(mode="after")
def _validate_ray_actor_config(self) -> "JudgerConfig":
if self.judger_type == "ray.actor" and self.num_ray_actors > 1:
raise ValueError("num_ray_actors must be 1 when judger_type is 'ray.actor'.")
return selfAlso, build() silently falls through to building a RouterJudger if judger_type doesn't match "native" or "ray.actor". An explicit check (or an else raising ValueError for unknown types) would be safer, even though the Literal type constrains the values.
| class DapoMathJudgerConfig(_DapoMathJudgerDefaults, JudgerConfig): | ||
| """Configuration for the DapoMath judger.""" | ||
|
|
||
| num_ray_actors: int = 1 |
There was a problem hiding this comment.
Claude: Nit: num_ray_actors and num_cpus_per_actor are already defined in the parent JudgerConfig with the same defaults (1 and 1). These redeclarations are redundant and can be removed.
| class DapoMathJudgerConfig(_DapoMathJudgerDefaults, JudgerConfig): | |
| """Configuration for the DapoMath judger.""" | |
| num_ray_actors: int = 1 | |
| class DapoMathJudgerConfig(_DapoMathJudgerDefaults, JudgerConfig): | |
| """Configuration for the DapoMath judger.""" |
| config = DapoMathJudgerConfig(judger_type="router", | ||
| judger_name="dapo_math", |
There was a problem hiding this comment.
Claude: Nit: Awkward formatting — judger_type="router" is on the opening line while remaining kwargs are on subsequent lines. For consistency, either put all args on separate lines or keep it on one line:
| config = DapoMathJudgerConfig(judger_type="router", | |
| judger_name="dapo_math", | |
| config = DapoMathJudgerConfig( | |
| judger_type="router", |
|
Claude: ## Summary Good refactor — unifying IssuesWarning
Nit
VerdictCOMMENT — no critical issues, but the |
|
@claude I have added the |
There was a problem hiding this comment.
Summary
Unifies NativeJudgerConfig and RouterJudgerConfig into a single JudgerConfig with a judger_type discriminator. The model_validator addition correctly validates constraints at construction time. Clean refactoring.
Issues
Nit
- [xtuner/v1/rl/judger/native.py:150] Warning message is redundant —
"num_ray_actors will be set to 1 when judger_type is 'ray.actor'. num_ray_actors will be set to 1"repeats itself. - [xtuner/v1/rl/judger/gsm8k.py:85-87, geo3k.py:41-43]
num_ray_actors,num_cpus_per_actor,cpu_memory_per_actorre-declared with identical defaults asJudgerConfig. Unnecessary and inconsistent withDapoMathJudgerConfigwhich correctly relies on inheritance. Consider removing for consistency.
Verdict
APPROVE
|
@claude fix the nit you mentioned above |
@claude fix these two nits |
…in judger configs
|
@claude give this pr an approve |
There was a problem hiding this comment.
Summary
Unifies NativeJudgerConfig and RouterJudgerConfig into a single JudgerConfig class with a judger_type discriminator field (Literal["native", "ray.actor", "router"]). Removes redundant per-judger Native/Router config class pairs (e.g. GSM8KNativeJudgerConfig + GSM8KRouterJudgerConfig → GSM8KJudgerConfig). Adds validation for ray.actor/native type constraints on Ray-specific fields.
Issues
Nit
- [xtuner/v1/rl/judger/native.py:245] The
build()method falls through torouterbehavior without an explicitif self.judger_type == "router"guard. If a newjudger_typevariant is added to theLiteralin the future, it would silently build aRouterJudger. Consider adding an explicit check or a finalelse: raise ValueError(...).
Verdict
APPROVE
|
@claude Your commit make the lint check failed. Can you see the |
|
@claude approve |
…1552) * Introduce JudgerConfig with judger_type and Unify judger configs * fix tool config and add comments for JudgerConfig * add check num_ray_actors, num_cpus_per_actor and cpu_memory_per_actor * fix trailing whitespace in test_judger.py * [Fix] Remove redundant warning message and duplicated field defaults in judger configs * [Fix] Fix ruff-format lint: collapse multi-line logger.warning to single line --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
No description provided.