Skip to content

Rebase to latest slime#10

Merged
SamitHuang merged 158 commits into
SamitHuang:mainfrom
knlnguyen1802:rebase-vllm
May 17, 2026
Merged

Rebase to latest slime#10
SamitHuang merged 158 commits into
SamitHuang:mainfrom
knlnguyen1802:rebase-vllm

Conversation

@knlnguyen1802
Copy link
Copy Markdown
Collaborator

No description provided.

SamitHuang and others added 30 commits March 2, 2026 16:32
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: SamitHuang <285365963@qq.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Signed-off-by: samithuang <285365963@qq.com>
Co-authored-by: Copilot <copilot@github.com>
Add rollout backend client and test qwen2.5-0.5b non-colocate training
Signed-off-by: samithuang <285365963@qq.com>
Reorder weight synchronization support for colocate and non-colocate scenarios in the goal plan.
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
Co-authored-by: coding-famer <chenhegu0109@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
)

Co-authored-by: benyi <huangliangmeng.hlm@alibaba-inc.com>
zhuzilin and others added 26 commits April 25, 2026 01:29
…#1862)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
THUDM#1883)

Co-authored-by: yeqinghe <yeqinghe@MacBook-Pro-6.local>
Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>
…site (THUDM#1902)

Co-authored-by: jingshenghang <shenghang.jing@aminer.cn>
)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for NVIDIA GB10 and NPU backends, refactors the rollout system to integrate vLLM, and optimizes loss computation for context-parallel training. It also includes extensive documentation for role-based training overrides and advanced engine deployments. Feedback points to a potential TypeError when using non-reentrant checkpointing, a possible crash in advantage estimation if all log-probability sources are null, and an opportunity to reduce memory pressure by avoiding global tensor allocations during token redistribution.


if args.recompute_loss_function:
loss, log = checkpoint(func, args, batch, logits, sum_of_sample_mean)
loss, log = checkpoint(func, args, batch, logits, sum_of_sample_mean, use_reentrant=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The checkpoint function from megatron.core.tensor_parallel (which is typically what is used here) does not support use_reentrant as a keyword argument in many versions of Megatron-LM. Passing it here may cause a TypeError. If you are using a custom wrapper or a specific version that supports it, please ensure compatibility. Otherwise, it is safer to pass arguments positionally or verify the signature of the imported checkpoint function.

Comment on lines +614 to 615
xs = log_probs or rollout_log_probs or values
kl = [torch.zeros_like(x, dtype=torch.float32, device=x.device) for x in xs]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If log_probs, rollout_log_probs, and values are all None, xs will be None, and the list comprehension on the next line will raise a TypeError. While there is an early return check at the beginning of compute_advantages_and_returns, it only checks log_probs and values. If args.use_rollout_logprobs is true and rollout_log_probs is available but log_probs and values are None, this path might still be reached. Consider adding an explicit check for xs or ensuring the early return covers all cases.

References
  1. Enforce defensive programming: ensure appropriate null/nil/None checks exist before object property accesses or iterations.

Comment on lines +231 to +240
def _build_shifted_tokens(
T: int,
device: torch.device,
unconcat_tokens: list[torch.Tensor],
total_lengths: list[int],
response_lengths: list[int],
qkv_format: str,
max_seq_lens: list[int] | None,
allgather_cp: bool,
) -> torch.Tensor:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When allgather_cp is enabled, T_global represents the total sequence length across all CP ranks. Allocating full_tokens with T_global on every rank may lead to high memory pressure or OOM if the global sequence length is extremely large (e.g., millions of tokens), even though it only stores int64. Since this is immediately sliced to the local chunk size T, consider if there is a way to construct only the local chunk of shifted tokens directly to save memory.

@SamitHuang SamitHuang merged commit 2cb7c80 into SamitHuang:main May 17, 2026
SamitHuang added a commit that referenced this pull request May 17, 2026
Revert "Rebase to latest slime (#10)"

This reverts commit 2cb7c80.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.