Skip to content

support log rollout info#1579

Merged
YanhuiDua merged 5 commits intoInternLM:rl_designfrom
YanhuiDua:add_log_info
Mar 16, 2026
Merged

support log rollout info#1579
YanhuiDua merged 5 commits intoInternLM:rl_designfrom
YanhuiDua:add_log_info

Conversation

@YanhuiDua
Copy link
Copy Markdown
Collaborator

No description provided.

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py Outdated
Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 13, 2026

Claude: ## Summary
Clean addition of timing and replay-buffer statistics to the rollout pipeline. The ProduceBatchStats / ProduceBatchResult dataclasses are well-structured and the plumbing through producer → manager → trainer is straightforward.

Issues

Warning

  • agent_loop_manager.py:116-121 Three sequential await replay_buffer.count() calls are independent — use asyncio.gather() to run them concurrently. (See inline comment with suggestion.)
  • rl_colocate_trainer.py # TODO: save train trajectory is placed right after the code that already saves the trajectory — this is confusing. Clarify the intent or remove.

Nit

Verdict

COMMENT — no blocking issues, but the asyncio.gather() optimization is worth picking up.

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude review and check the definition of ProduceBatchResult and ProducerTimings is clear enough for users.

Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py
Comment thread xtuner/v1/rl/agent_loop/producer.py
Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 16, 2026

Claude: ## Summary
Good iteration — the previous review's suggestions (asyncio.gather, stale TODO removal) are addressed in commits 2-3, and the rename from timing_*group_gen_* / completed_samplesleftover_* improves clarity.

Regarding the specific question about whether ProduceBatchResult and ProducerTimings are clear enough for users:

ProducerTimings — clean and minimal. Two fields, self-explanatory names. Suggest adding a one-line docstring.

ProduceBatchResult — the field names are reasonable after the rename, but the class is doing double duty (generation timing stats + replay-buffer leftovers) and is part of the public API. It needs a proper docstring to explain: (1) what "group gen" refers to, (2) the None-vs-populated semantics of the timing fields, and (3) that "leftover" counts are post-retrieval replay-buffer state. See inline comment with a concrete suggestion.

Issues

Warning

  • [agent_loop_manager.py:106] group_gen_pause_time_s actually measures wall-clock time for the entire cleanup phase (pause + drain), not just pause overhead. Consider documenting or renaming.
  • [producer.py:318] The useful final stats log was removed in commit 2 — consider keeping a summary log at the producer level for operational debugging.

Nit

  • [agent_loop_manager.py:17-29] Public dataclass ProduceBatchResult needs a docstring per project standards. (Inline suggestion provided.)

Verdict

COMMENT — no blocking issues. Adding docstrings to the two new dataclasses would make the API self-documenting.

@YanhuiDua
Copy link
Copy Markdown
Collaborator Author

@claude review, if you think this pr is ok, give this pr an approve

Comment on lines +19 to +32
@dataclass
class ProduceBatchResult:
"""Result of a single ``produce_batch`` call.

Attributes:
rollout_states (list[list[RolloutState]]): Completed rollout groups retrieved from the replay buffer for training.
group_gen_count (int | None): Number of generate-group calls finished in this batch (None if no generations ran).
group_gen_mean_s (float | None): Mean wall-clock time per generate-group call, in seconds.
group_gen_p50_s (float | None): Median (p50) generate-group time, in seconds.
group_gen_p99_s (float | None): 99th percentile generate-group time, in seconds.
group_gen_p99_p50_ratio (float | None): Ratio of p99 to p50, indicating tail-latency skew.
group_gen_pause_time_s (float | None): Time spent in pause/cleanup phase (async strategy only), in seconds.
leftover_completed (int): Number of completed groups remaining in the replay buffer after this batch.
leftover_aborted (int): Number of aborted groups remaining in the replay buffer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Per CLAUDE.md: "Do not declare Attributes in the class docstring." For dataclasses, the fields are __init__ parameters, so Google style uses Args: instead. The previous review already suggested this format — please switch Attributes:Args:.

Suggested change
@dataclass
class ProduceBatchResult:
"""Result of a single ``produce_batch`` call.
Attributes:
rollout_states (list[list[RolloutState]]): Completed rollout groups retrieved from the replay buffer for training.
group_gen_count (int | None): Number of generate-group calls finished in this batch (None if no generations ran).
group_gen_mean_s (float | None): Mean wall-clock time per generate-group call, in seconds.
group_gen_p50_s (float | None): Median (p50) generate-group time, in seconds.
group_gen_p99_s (float | None): 99th percentile generate-group time, in seconds.
group_gen_p99_p50_ratio (float | None): Ratio of p99 to p50, indicating tail-latency skew.
group_gen_pause_time_s (float | None): Time spent in pause/cleanup phase (async strategy only), in seconds.
leftover_completed (int): Number of completed groups remaining in the replay buffer after this batch.
leftover_aborted (int): Number of aborted groups remaining in the replay buffer.
"""Result of a single ``produce_batch`` call.
Args:
rollout_states (list[list[RolloutState]]): Completed rollout groups retrieved from the replay buffer for training.
group_gen_count (int | None): Number of generate-group calls finished in this batch (None if no generations ran).
group_gen_mean_s (float | None): Mean wall-clock time per generate-group call, in seconds.
group_gen_p50_s (float | None): Median (p50) generate-group time, in seconds.
group_gen_p99_s (float | None): 99th percentile generate-group time, in seconds.
group_gen_p99_p50_ratio (float | None): Ratio of p99 to p50, indicating tail-latency skew.
group_gen_pause_time_s (float | None): Time spent in pause/cleanup phase (async strategy only), in seconds.
leftover_completed (int): Number of completed groups remaining in the replay buffer after this batch.
leftover_aborted (int): Number of aborted groups remaining in the replay buffer.
leftover_expired (int): Number of expired groups remaining in the replay buffer.
"""

Comment thread xtuner/v1/rl/agent_loop/producer.py
Comment on lines +9 to +10
"""Handle preprocessing and postprocessing for partial rollout
continuation."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: This docstring rewrap appears unrelated to the rollout logging feature. The original single-line docstring was fine. Consider reverting to keep the diff minimal per PR standards ("Keep the diff minimal — fix only the bug. Do not refactor surrounding code").

Suggested change
"""Handle preprocessing and postprocessing for partial rollout
continuation."""
"""Handle preprocessing and postprocessing for partial rollout continuation."""

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 16, 2026

Claude: ## Summary
Good progress across the 4 commits — previous review feedback (asyncio.gather, stale TODO) is addressed. The rollout timing and replay-buffer stats plumbing is clean and well-structured.

Issues

Nit

  • agent_loop_manager.py:19-32 ProduceBatchResult docstring uses Attributes: — CLAUDE.md says not to; use Args: for dataclass fields (Google style).
  • producer.py:19-24 ProducerTimings docstring is in Chinese while ProduceBatchResult is in English — inconsistent. Also uses Attributes: instead of Args:.
  • utils.py:9-10 Unrelated docstring rewrap — revert to keep the diff minimal.

Verdict

APPROVE — the remaining items are all nits. The core logic is correct and the API is well-designed. Nice work iterating on the review feedback.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: LGTM — only docstring nits remaining (see inline comments). Core logic is correct.

@YanhuiDua YanhuiDua merged commit 7ba27ba into InternLM:rl_design Mar 16, 2026
2 of 5 checks passed
@YanhuiDua YanhuiDua deleted the add_log_info branch March 17, 2026 09:41
YanhuiDua added a commit that referenced this pull request Apr 27, 2026
* support log rollout info

* fix claude comments

* replace timing_n to group_gen_count and completed_samples to leftover_completed

* add docstring for ProducerTimings and ProduceBatchResult

* replace to Attributes to Args
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant