Skip to content

Rebase to latest slime#12

Merged
SamitHuang merged 42 commits into
SamitHuang:rebasefrom
knlnguyen1802:rebase-vllm
May 18, 2026
Merged

Rebase to latest slime#12
SamitHuang merged 42 commits into
SamitHuang:rebasefrom
knlnguyen1802:rebase-vllm

Conversation

@knlnguyen1802
Copy link
Copy Markdown
Collaborator

No description provided.

SamitHuang and others added 30 commits March 2, 2026 16:32
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Add rollout backend client and test qwen2.5-0.5b non-colocate training
Signed-off-by: samithuang <285365963@qq.com>
Reorder weight synchronization support for colocate and non-colocate scenarios in the goal plan.
* Draft router design

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Add vllm router

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Add router to script

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix gpu memory utilization

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix output token ids

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Add more nccl flag

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix bug

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

---------

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
knlnguyen1802 and others added 12 commits April 22, 2026 14:46
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
…site (THUDM#1902)

Co-authored-by: jingshenghang <shenghang.jing@aminer.cn>
)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces vLLM as a first-class rollout backend for Slime, refactoring the codebase to support backend-agnostic requests and responses through new adapter interfaces. Significant additions include a managed vLLM engine lifecycle, a translation sidecar for protocol compatibility, specialized weight synchronization mechanisms for both colocated and distributed deployments, and a Megatron patch for chunked gradient coalescence. Review feedback highlights the need to restore configuration validation logic and remove an unconditional override of the rollout offloading setting. Furthermore, the reviewer suggests replacing hardcoded file paths in the new shell scripts with variables to enhance environment portability.

Comment on lines +135 to +140
# if hasattr(hf_config, hf_config_name) and hasattr(args, megatron_config_name):
# if not compare_fn(getattr(hf_config, hf_config_name), getattr(args, megatron_config_name)):
# errors.append(
# f"{hf_config_name} in hf config {getattr(hf_config, hf_config_name)} is not equal to "
# f"{megatron_config_name} {getattr(args, megatron_config_name)}, please check the config."
# )
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The validation logic comparing Hugging Face config values with Megatron arguments has been commented out. This is risky as it can lead to silent misconfigurations that are hard to debug at runtime. If the validation was causing issues, it should be fixed or adapted rather than completely disabled. Please consider restoring this validation to ensure configuration correctness.

Comment thread slime/utils/arguments.py
args.offload_train = False
if args.offload_rollout is None:
args.offload_rollout = False
args.offload_rollout = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The offload_rollout argument is unconditionally set to False. This seems to override any user-provided value and disables the feature. If this is intentional, it should be documented. Otherwise, this line should be removed to respect the user's configuration.

Comment thread convert_qwen2.5_ckpt.sh
Comment on lines +2 to +5
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
${MODEL_ARGS[@]} \
--hf-checkpoint /root/Qwen2.5-0.5B-Instruct \
--save /root/Qwen2.5-0.5B-Instruct_torch_dist/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script contains hardcoded paths (e.g., /root/Megatron-LM, /root/Qwen2.5-0.5B-Instruct). This reduces portability and makes it difficult for other users to run the script in different environments. Consider using environment variables or command-line arguments to specify these paths to improve reusability.

Comment on lines +23 to +24
--hf-checkpoint /root/Qwen2.5-0.5B-Instruct/
--ref-load /root/Qwen2.5-0.5B-Instruct_torch_dist/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script contains hardcoded paths (e.g., /root/Qwen2.5-0.5B-Instruct/, /root/gsm8k/...). This reduces portability and makes it difficult for other users to run the script in different environments. It's recommended to use environment variables or command-line arguments for these paths.

Comment thread run-qwen2.5-0.5B-vllm.sh
Comment on lines +25 to +26
--hf-checkpoint /root/Qwen2.5-0.5B-Instruct/
--ref-load /root/Qwen2.5-0.5B-Instruct_torch_dist/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script contains hardcoded paths (e.g., /root/Qwen2.5-0.5B-Instruct/, /root/gsm8k/...). This reduces portability and makes it difficult for other users to run the script in different environments. It's recommended to use environment variables or command-line arguments for these paths.

@knlnguyen1802
Copy link
Copy Markdown
Collaborator Author

knlnguyen1802 commented May 18, 2026

Latest result

Colocated training

colocated

Reward
colocated_reward

Non-colocated training

non_colocated

Reward
non_colocated_reward

@SamitHuang SamitHuang merged commit 9146f9f into SamitHuang:rebase May 18, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants