Rebase to latest slime by knlnguyen1802 · Pull Request #10 · SamitHuang/slime

knlnguyen1802 · 2026-05-17T13:42:34Z

No description provided.

Signed-off-by: SamitHuang <285365963@qq.com>

Co-authored-by: Copilot <copilot@github.com>

…THUDM#1662)

Co-authored-by: Copilot <copilot@github.com>

Signed-off-by: samithuang <285365963@qq.com>

Co-authored-by: Copilot <copilot@github.com>

Add rollout backend client and test qwen2.5-0.5b non-colocate training

Signed-off-by: samithuang <285365963@qq.com>

Eliminate intermediate CPU tensors

Reorder weight synchronization support for colocate and non-colocate scenarios in the goal plan.

…1674)

Co-authored-by: Copilot <copilot@github.com>

Co-authored-by: Copilot <copilot@github.com> Co-authored-by: yueming-yuan <yym022502@gmail.com> Co-authored-by: coding-famer <chenhegu0109@gmail.com>

…tibility (THUDM#1648)

Co-authored-by: Copilot <copilot@github.com>

) Co-authored-by: benyi <huangliangmeng.hlm@alibaba-inc.com>

…UDM#1667)

…uted POST (THUDM#1873)

…#1862) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ixes THUDM#1886) (THUDM#1888)

THUDM#1883) Co-authored-by: yeqinghe <yeqinghe@MacBook-Pro-6.local>

Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>

…site (THUDM#1902) Co-authored-by: jingshenghang <shenghang.jing@aminer.cn>

) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

gemini-code-assist

Code Review

This pull request introduces support for NVIDIA GB10 and NPU backends, refactors the rollout system to integrate vLLM, and optimizes loss computation for context-parallel training. It also includes extensive documentation for role-based training overrides and advanced engine deployments. Feedback points to a potential TypeError when using non-reentrant checkpointing, a possible crash in advantage estimation if all log-probability sources are null, and an opportunity to reduce memory pressure by avoiding global tensor allocations during token redistribution.

gemini-code-assist · 2026-05-17T13:48:55Z


    if args.recompute_loss_function:
-        loss, log = checkpoint(func, args, batch, logits, sum_of_sample_mean)
+        loss, log = checkpoint(func, args, batch, logits, sum_of_sample_mean, use_reentrant=False)


The checkpoint function from megatron.core.tensor_parallel (which is typically what is used here) does not support use_reentrant as a keyword argument in many versions of Megatron-LM. Passing it here may cause a TypeError. If you are using a custom wrapper or a specific version that supports it, please ensure compatibility. Otherwise, it is safer to pass arguments positionally or verify the signature of the imported checkpoint function.

gemini-code-assist · 2026-05-17T13:48:55Z

+        xs = log_probs or rollout_log_probs or values
        kl = [torch.zeros_like(x, dtype=torch.float32, device=x.device) for x in xs]


If log_probs, rollout_log_probs, and values are all None, xs will be None, and the list comprehension on the next line will raise a TypeError. While there is an early return check at the beginning of compute_advantages_and_returns, it only checks log_probs and values. If args.use_rollout_logprobs is true and rollout_log_probs is available but log_probs and values are None, this path might still be reached. Consider adding an explicit check for xs or ensuring the early return covers all cases.

References

Enforce defensive programming: ensure appropriate null/nil/None checks exist before object property accesses or iterations.

gemini-code-assist · 2026-05-17T13:48:55Z

+def _build_shifted_tokens(
+    T: int,
+    device: torch.device,
+    unconcat_tokens: list[torch.Tensor],
+    total_lengths: list[int],
+    response_lengths: list[int],
+    qkv_format: str,
+    max_seq_lens: list[int] | None,
+    allgather_cp: bool,
+) -> torch.Tensor:


When allgather_cp is enabled, T_global represents the total sequence length across all CP ranks. Allocating full_tokens with T_global on every rank may lead to high memory pressure or OOM if the global sequence length is extremely large (e.g., millions of tokens), even though it only stores int64. Since this is immediately sliced to the local chunk size T, consider if there is a way to construct only the local chunk of shifted tokens directly to save memory.

Revert "Rebase to latest slime (#10)" This reverts commit 2cb7c80.

SamitHuang and others added 30 commits March 2, 2026 16:32

temp save rfc

08ce80b

Signed-off-by: SamitHuang <285365963@qq.com>

add plan

3af806e

Signed-off-by: SamitHuang <285365963@qq.com>

update

48fbde3

Signed-off-by: SamitHuang <285365963@qq.com>

[docker] remove true on policy patches (THUDM#1661)

d78bb43

Co-authored-by: Copilot <copilot@github.com>

[fix]: Qwen3.5-35B-A3B 8-GPU: set TP size to 2 for num_query_groups=2 (…

0104a9e

…THUDM#1662)

Remove FSDP support (THUDM#1664)

e4faf63

Co-authored-by: Copilot <copilot@github.com>

docs: add OpenClaw-RL to projects built upon slime (THUDM#1635)

988c45a

qwen2.5 0.5b non-colocate (first attempt ok, but nccl error later)

f8ceed6

Signed-off-by: samithuang <285365963@qq.com>

add convert script

2caa4a0

add setup doc

8caa8ba

Support setting update weights in sglang_config (THUDM#1665)

de84e10

Co-authored-by: Copilot <copilot@github.com>

fix nccl error by NcclBridge subprocess

25ee005

Add rollout backend client and test qwen2.5-0.5b non-colocate training

09f534a

Add rollout backend client and test qwen2.5-0.5b non-colocate training

eliminate gpu to cpu weight transfer

ab7eb0b

Signed-off-by: samithuang <285365963@qq.com>

Eliminate intermediate CPU tensors for faster weight transfer

411e2d2

Eliminate intermediate CPU tensors

Revise weight synchronization strategy in goal plan

546d2ad

Reorder weight synchronization support for colocate and non-colocate scenarios in the goal plan.

[fix] Fix numerical accuracy issue in dynamic sampling filter (THUDM#…

dd6888d

…1674)

sync from internal (THUDM#1677)

2cd28d6

Co-authored-by: Copilot <copilot@github.com>

bugfixes from community (THUDM#1678)

9268231

Co-authored-by: Copilot <copilot@github.com> Co-authored-by: yueming-yuan <yym022502@gmail.com> Co-authored-by: coding-famer <chenhegu0109@gmail.com>

Fix: pass return_tensors in text_kwargs for transformers>=5.0.0 compa…

450e9f3

…tibility (THUDM#1648)

Fix missing packed_seq_params in bshd qkv_format (THUDM#1649)

ce204d1

[Multimodal][Model] Qwen3.5 VL training example/support (THUDM#1676)

241c75b

update docs (THUDM#1680)

ceea3b0

Co-authored-by: Copilot <copilot@github.com>

update docs (THUDM#1681)

b2c5fc7

Co-authored-by: Copilot <copilot@github.com>

support offloading non-updatable server (THUDM#1668)

1018387

Co-authored-by: Copilot <copilot@github.com>

bugfix (THUDM#1685)

29e1487

Co-authored-by: Copilot <copilot@github.com>

fix: handle Qwen3.5 in quantize_params_fp8 (THUDM#1683)

22915a3

bugfix (THUDM#1687)

80d4bc3

Co-authored-by: Copilot <copilot@github.com>

Fix Qwen3.5 & Qwen3-Next linear attention cu_seqlens missing (THUDM#1686

1d31a49

) Co-authored-by: benyi <huangliangmeng.hlm@alibaba-inc.com>

fix: use semantic version comparison for PyTorch >= 2.6 detection (TH…

a4492dd

…UDM#1667)

zhuzilin and others added 26 commits April 25, 2026 01:29

[docker] update v0.5.9 patch

3cdec0c

Rename critic config to megatron config (THUDM#1866)

f65c6e8

[Fix] Use Ray ObjectRef await instead of asyncio.to_thread in distrib…

5d41cf7

…uted POST (THUDM#1873)

chore: include length context in slice_log_prob_with_cp assert (THUDM…

27b899f

…#1862) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[docker] upgrade megatron to 1dcf0dafa (THUDM#1867)

f3e7bd7

fix ppo value head load bugs (THUDM#1878)

9a022d8

[docker] upgrade sglang to v0.5.10.post1 (THUDM#1874)

f588add

[docs] update docs

07beb18

[docker] update megatron-bridge and add qwen3.6 tests (THUDM#1884)

fe1152b

fix lint

9b50665

Fix(checkpoint): add resume/pause in save_model() for offload_train (f…

16924b6

…ixes THUDM#1886) (THUDM#1888)

fix ppo value offload bugs (THUDM#1882)

1027409

fix qwen3.6 hf config validation bug (THUDM#1889)

3e0a3ca

Add missing metrics to log (THUDM#1890)

4cacab3

fix(qwen3_next): use torch.get_default_dtype() — get_current_dtype do… (

04059e5

THUDM#1883) Co-authored-by: yeqinghe <yeqinghe@MacBook-Pro-6.local>

Fix location error in install script (THUDM#1877)

477541d

Only allow --allgather-cp for DSA model (THUDM#1891)

82007fa

Migrate internal feature (THUDM#1897)

bf9b1a3

[Fix] Fix distributed POST actor concurrency split (THUDM#1880)

8ef1fb4

Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>

Fix CI: update rollout_data_postprocess plugin contract for new call …

c8aaf01

…site (THUDM#1902) Co-authored-by: jingshenghang <shenghang.jing@aminer.cn>

Patch Megatron TP grad coalesce to chunked all-reduce (THUDM#1899)

5b326e6

fix: harden retool rollout against multi-turn / retry desync (THUDM#1861

41dc3b6

) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix log file

a7a3ee1

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

Rebase

e7dbc8d

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

Fix import engine group

b015d72

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

Fix rebase code

26f979d

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

SamitHuang merged commit 2cb7c80 into SamitHuang:main May 17, 2026

SamitHuang mentioned this pull request May 17, 2026

Revert "Rebase to latest slime" #11

Merged

SamitHuang added a commit that referenced this pull request May 17, 2026

Revert "Rebase to latest slime" (#11)

05f6cf9

Revert "Rebase to latest slime (#10)" This reverts commit 2cb7c80.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase to latest slime#10

Rebase to latest slime#10
SamitHuang merged 158 commits into
SamitHuang:mainfrom
knlnguyen1802:rebase-vllm

knlnguyen1802 commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

		xs = log_probs or rollout_log_probs or values
		kl = [torch.zeros_like(x, dtype=torch.float32, device=x.device) for x in xs]

Conversation

knlnguyen1802 commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants