feat: Add merged vLLM rollout weights#631
Conversation
|
|
||
| QWEN3_5_MOE_MODELS = { | ||
| "Qwen/Qwen3.5-35B-A3B", | ||
| "Qwen/Qwen3.5-397B-A17B", |
There was a problem hiding this comment.
Are we able to support training for 397B-A17B?
There was a problem hiding this comment.
not tested for now, but it should work with megatron. we need to add the merging logic to our megatron service as well
| ) | ||
| response.raise_for_status() | ||
|
|
||
| peft_model.merge_adapter() |
There was a problem hiding this comment.
Would this be more intuitive if we perform the merging on _merged_checkpoint_weights?
There was a problem hiding this comment.
i prefer to leave _merged_checkpoint_weights to only collect the checkpoint-format weights and normalize the names into the surface vLLM expects.
this keeps the pausing, merging, send weights, unmerge, resume part of the same flow in _sync_merged_weights and would be clearer to understand the flow
i renamed the function to _merged_checkpoint_weights_for_vllm to make it a little clearer that its just a transformation function
src/art/dev/get_model_config.py
Outdated
There was a problem hiding this comment.
We'll probably need to configure separate target_modules for Qwen3.5 Moe model.
This is the config I’ve been using when training Qwen3.5, but it’s worth double checking on the latest Unsloth version.
target_modules=[
# Full attention layers (25% of layers)
"q_proj", "k_proj", "v_proj", "o_proj",
# DeltaNet linear attention layers (75% of layers)
"in_proj_qkv", "in_proj_z", "in_proj_b", "in_proj_a", "out_proj",
# MLP (all layers)
"gate_proj", "up_proj", "down_proj",
# MoE shared expert gate (only present in MoE models, ignored for dense)
"shared_expert_gate",
],
Enable ART to serve merged LoRA weights through dedicated vLLM so Qwen3.5-MoE training works on the current vLLM build.
Changes
rollout_weights_mode: "lora" | "merged"with dedicated-only validation and a merged-mode requirement forQwen/Qwen3.5-35B-A3BandQwen/Qwen3.5-397B-A17B