[Feature] Support share MTP weights. by RangiLyu · Pull Request #1672 · InternLM/xtuner

RangiLyu · 2026-04-13T09:19:32Z

Refactor MTP configuration to support weight sharing across layers. Updated MoE and MTPBlock classes to handle shared weights and adjusted layer initialization accordingly. Added share_weights parameter to MTPConfig for better control over layer behavior.

…pdated MoE and MTPBlock classes to handle shared weights and adjusted layer initialization accordingly. Added share_weights parameter to MTPConfig for better control over layer behavior.

…computed as necessary.

RangiLyu · 2026-04-13T11:27:33Z

@claude review

claude · 2026-04-13T11:31:50Z

        super().__init__()
        if not mtp_layers:
            raise ValueError("mtp_layers cannot be empty")



Claude: Nit: self.num_layers is now semantically ambiguous. It stores the physical layer count (len(mtp_layers)), but the forward loop iterates self.mtp_config.num_layers times (the logical step count). When share_weights=True, these differ (1 vs. N).

If self.num_layers is not used externally, consider removing it or renaming it to self.num_physical_layers to avoid confusion. Alternatively, you could update it to always reflect the logical count:

self.num_layers = mtp_config.num_layers

claude · 2026-04-13T11:32:30Z

Claude: ## Summary

Adds share_weights parameter to MTPConfig so a single physical MTP layer can be reused across all prediction depths, reducing memory footprint.

Issues

Warning

mtp_block.py:64-65 — Missing validation for share_weights=False: the forward loop now indexes self.layers[step] using mtp_config.num_layers, but nothing enforces len(mtp_layers) == mtp_config.num_layers in the constructor. Could cause IndexError if they diverge.
moe.py:1237-1238 — _should_recompute docstring example doesn't cover share_weights=True behavior.

Nit

mtp_block.py:67 — self.num_layers is now ambiguous (physical count vs. logical step count).
moe.py:1018-1020 — Prefer assert or explicit is not None check over # type: ignore for strict mypy compliance.

Verdict

COMMENT — the logic is sound overall. The main thing to address is the missing len(mtp_layers) == mtp_config.num_layers validation in MTPBlock.__init__ for the non-sharing case.

RangiLyu added 2 commits April 13, 2026 17:16

Refactor MTP configuration to support weight sharing across layers. U…

7d8a080

…pdated MoE and MTPBlock classes to handle shared weights and adjusted layer initialization accordingly. Added share_weights parameter to MTPConfig for better control over layer behavior.

Updated the checkpointing mechanism to ensure shared MTP heads are re…

eb40138

…computed as necessary.

claude Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread xtuner/v1/module/mtp/mtp_block.py

claude Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread xtuner/v1/model/moe/moe.py Outdated

claude Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread xtuner/v1/model/moe/moe.py

resolve review comments

42a8807

HAOCHENYE approved these changes Apr 14, 2026

View reviewed changes

HAOCHENYE merged commit e35e80c into InternLM:main Apr 14, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support share MTP weights.#1672

[Feature] Support share MTP weights.#1672
HAOCHENYE merged 3 commits intoInternLM:mainfrom
RangiLyu:lcq/share-mtp-head

RangiLyu commented Apr 13, 2026

Uh oh!

RangiLyu commented Apr 13, 2026

Uh oh!

Uh oh!

claude Bot Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RangiLyu commented Apr 13, 2026

Uh oh!

RangiLyu commented Apr 13, 2026

Uh oh!

Uh oh!

claude Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 13, 2026

Issues

Warning

Nit

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants