support log rollout info by YanhuiDua · Pull Request #1579 · InternLM/xtuner

YanhuiDua · 2026-03-13T11:04:55Z

No description provided.

YanhuiDua · 2026-03-13T11:06:11Z

claude · 2026-03-13T11:08:08Z

Claude: ## Summary
Clean addition of timing and replay-buffer statistics to the rollout pipeline. The ProduceBatchStats / ProduceBatchResult dataclasses are well-structured and the plumbing through producer → manager → trainer is straightforward.

Issues

Warning

agent_loop_manager.py:116-121 Three sequential await replay_buffer.count() calls are independent — use asyncio.gather() to run them concurrently. (See inline comment with suggestion.)
rl_colocate_trainer.py # TODO: save train trajectory is placed right after the code that already saves the trajectory — this is confusing. Clarify the intent or remove.

Nit

agent_loop_manager.py:1 If adopting asyncio.gather(), add import asyncio at the top.

Verdict

COMMENT — no blocking issues, but the asyncio.gather() optimization is worth picking up.

…_completed

YanhuiDua · 2026-03-16T09:28:04Z

@claude review and check the definition of ProduceBatchResult and ProducerTimings is clear enough for users.

claude · 2026-03-16T09:30:01Z

Claude: ## Summary
Good iteration — the previous review's suggestions (asyncio.gather, stale TODO removal) are addressed in commits 2-3, and the rename from timing_* → group_gen_* / completed_samples → leftover_* improves clarity.

Regarding the specific question about whether ProduceBatchResult and ProducerTimings are clear enough for users:

ProducerTimings — clean and minimal. Two fields, self-explanatory names. Suggest adding a one-line docstring.

ProduceBatchResult — the field names are reasonable after the rename, but the class is doing double duty (generation timing stats + replay-buffer leftovers) and is part of the public API. It needs a proper docstring to explain: (1) what "group gen" refers to, (2) the None-vs-populated semantics of the timing fields, and (3) that "leftover" counts are post-retrieval replay-buffer state. See inline comment with a concrete suggestion.

Issues

Warning

[agent_loop_manager.py:106] group_gen_pause_time_s actually measures wall-clock time for the entire cleanup phase (pause + drain), not just pause overhead. Consider documenting or renaming.
[producer.py:318] The useful final stats log was removed in commit 2 — consider keeping a summary log at the producer level for operational debugging.

Nit

[agent_loop_manager.py:17-29] Public dataclass ProduceBatchResult needs a docstring per project standards. (Inline suggestion provided.)

Verdict

COMMENT — no blocking issues. Adding docstrings to the two new dataclasses would make the API self-documenting.

YanhuiDua · 2026-03-16T11:25:37Z

@claude review, if you think this pr is ok, give this pr an approve

claude · 2026-03-16T11:27:30Z

+@dataclass
+class ProduceBatchResult:
+    """Result of a single ``produce_batch`` call.
+
+    Attributes:
+        rollout_states (list[list[RolloutState]]): Completed rollout groups retrieved from the replay buffer for training.
+        group_gen_count (int | None): Number of generate-group calls finished in this batch (None if no generations ran).
+        group_gen_mean_s (float | None): Mean wall-clock time per generate-group call, in seconds.
+        group_gen_p50_s (float | None): Median (p50) generate-group time, in seconds.
+        group_gen_p99_s (float | None): 99th percentile generate-group time, in seconds.
+        group_gen_p99_p50_ratio (float | None): Ratio of p99 to p50, indicating tail-latency skew.
+        group_gen_pause_time_s (float | None): Time spent in pause/cleanup phase (async strategy only), in seconds.
+        leftover_completed (int): Number of completed groups remaining in the replay buffer after this batch.
+        leftover_aborted (int): Number of aborted groups remaining in the replay buffer.


Claude: Per CLAUDE.md: "Do not declare Attributes in the class docstring." For dataclasses, the fields are __init__ parameters, so Google style uses Args: instead. The previous review already suggested this format — please switch Attributes: → Args:.

Suggested change

@dataclass

class ProduceBatchResult:

"""Result of a single ``produce_batch`` call.

Attributes:

rollout_states (list[list[RolloutState]]): Completed rollout groups retrieved from the replay buffer for training.

group_gen_count (int | None): Number of generate-group calls finished in this batch (None if no generations ran).

group_gen_mean_s (float | None): Mean wall-clock time per generate-group call, in seconds.

group_gen_p50_s (float | None): Median (p50) generate-group time, in seconds.

group_gen_p99_s (float | None): 99th percentile generate-group time, in seconds.

group_gen_p99_p50_ratio (float | None): Ratio of p99 to p50, indicating tail-latency skew.

group_gen_pause_time_s (float | None): Time spent in pause/cleanup phase (async strategy only), in seconds.

leftover_completed (int): Number of completed groups remaining in the replay buffer after this batch.

leftover_aborted (int): Number of aborted groups remaining in the replay buffer.

"""Result of a single ``produce_batch`` call.

Args:

rollout_states (list[list[RolloutState]]): Completed rollout groups retrieved from the replay buffer for training.

group_gen_count (int | None): Number of generate-group calls finished in this batch (None if no generations ran).

group_gen_mean_s (float | None): Mean wall-clock time per generate-group call, in seconds.

group_gen_p50_s (float | None): Median (p50) generate-group time, in seconds.

group_gen_p99_s (float | None): 99th percentile generate-group time, in seconds.

group_gen_p99_p50_ratio (float | None): Ratio of p99 to p50, indicating tail-latency skew.

group_gen_pause_time_s (float | None): Time spent in pause/cleanup phase (async strategy only), in seconds.

leftover_completed (int): Number of completed groups remaining in the replay buffer after this batch.

leftover_aborted (int): Number of aborted groups remaining in the replay buffer.

leftover_expired (int): Number of expired groups remaining in the replay buffer.

"""

claude · 2026-03-16T11:27:33Z

+    """Handle preprocessing and postprocessing for partial rollout
+    continuation."""


Claude: Nit: This docstring rewrap appears unrelated to the rollout logging feature. The original single-line docstring was fine. Consider reverting to keep the diff minimal per PR standards ("Keep the diff minimal — fix only the bug. Do not refactor surrounding code").

Suggested change

"""Handle preprocessing and postprocessing for partial rollout

continuation."""

"""Handle preprocessing and postprocessing for partial rollout continuation."""

claude · 2026-03-16T11:27:54Z

Claude: ## Summary
Good progress across the 4 commits — previous review feedback (asyncio.gather, stale TODO) is addressed. The rollout timing and replay-buffer stats plumbing is clean and well-structured.

Issues

Nit

agent_loop_manager.py:19-32 ProduceBatchResult docstring uses Attributes: — CLAUDE.md says not to; use Args: for dataclass fields (Google style).
producer.py:19-24 ProducerTimings docstring is in Chinese while ProduceBatchResult is in English — inconsistent. Also uses Attributes: instead of Args:.
utils.py:9-10 Unrelated docstring rewrap — revert to keep the diff minimal.

Verdict

APPROVE — the remaining items are all nits. The core logic is correct and the API is well-designed. Nice work iterating on the review feedback.

claude

Claude: LGTM — only docstring nits remaining (see inline comments). Core logic is correct.

* support log rollout info * fix claude comments * replace timing_n to group_gen_count and completed_samples to leftover_completed * add docstring for ProducerTimings and ProduceBatchResult * replace to Attributes to Args

support log rollout info

07c1809

claude Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py Outdated

claude Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py

YanhuiDua added 2 commits March 16, 2026 11:25

fix claude comments

120ba13

replace timing_n to group_gen_count and completed_samples to leftover…

d2f474d

…_completed