Skip to content

feat: Add merged vLLM rollout weights#631

Merged
vivekkalyan merged 7 commits intomainfrom
feat/merged-inference
Mar 25, 2026
Merged

feat: Add merged vLLM rollout weights#631
vivekkalyan merged 7 commits intomainfrom
feat/merged-inference

Conversation

@vivekkalyan
Copy link
Collaborator

@vivekkalyan vivekkalyan commented Mar 25, 2026

Enable ART to serve merged LoRA weights through dedicated vLLM so Qwen3.5-MoE training works on the current vLLM build.

Changes

  • Add rollout_weights_mode: "lora" | "merged" with dedicated-only validation and a merged-mode requirement for Qwen/Qwen3.5-35B-A3B and Qwen/Qwen3.5-397B-A17B
  • Push merged weights into dedicated vLLM with native weight transfer while keeping LoRA checkpoints for training and persistence
  • Update dedicated server wiring, validation, and Qwen3.5 smoke scripts/tests for the merged-inference path

@vivekkalyan vivekkalyan requested a review from bradhilton March 25, 2026 17:50

QWEN3_5_MOE_MODELS = {
"Qwen/Qwen3.5-35B-A3B",
"Qwen/Qwen3.5-397B-A17B",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to support training for 397B-A17B?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not tested for now, but it should work with megatron. we need to add the merging logic to our megatron service as well

)
response.raise_for_status()

peft_model.merge_adapter()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be more intuitive if we perform the merging on _merged_checkpoint_weights?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer to leave _merged_checkpoint_weights to only collect the checkpoint-format weights and normalize the names into the surface vLLM expects.

this keeps the pausing, merging, send weights, unmerge, resume part of the same flow in _sync_merged_weights and would be clearer to understand the flow

i renamed the function to _merged_checkpoint_weights_for_vllm to make it a little clearer that its just a transformation function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need to configure separate target_modules for Qwen3.5 Moe model.
This is the config I’ve been using when training Qwen3.5, but it’s worth double checking on the latest Unsloth version.

target_modules=[
            # Full attention layers (25% of layers)
            "q_proj", "k_proj", "v_proj", "o_proj",
            # DeltaNet linear attention layers (75% of layers)
            "in_proj_qkv", "in_proj_z", "in_proj_b", "in_proj_a", "out_proj",
            # MLP (all layers)
            "gate_proj", "up_proj", "down_proj",
            # MoE shared expert gate (only present in MoE models, ignored for dense)
            "shared_expert_gate",
        ],

@vivekkalyan vivekkalyan merged commit fb26124 into main Mar 25, 2026
5 checks passed
@vivekkalyan vivekkalyan deleted the feat/merged-inference branch March 25, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants