refactor(mtp): extract BaseMTPModel mixin shared by existing MTP draft models#1337
Conversation
…t models Consolidate the duplicated draft-model wiring across the deepseek / mistral / glm4_moe_lite / qwen3_moe MTP models into a single BaseMTPModel mixin: draft models reuse the main model's req/mem managers and rope caches, pop the main_model / previous-draft-models kwargs before the base __init__, and carry the is_mtp_draft_model marker. Behaviour-preserving (+8/-99 across the four models); no new dependencies. Includes a unit test for the mixin wiring.
There was a problem hiding this comment.
Code Review
This pull request introduces a new BaseMTPModel mixin class to consolidate duplicated initialization logic, manager sharing, and ROPE cache reuse across multiple MTP draft models (DeepSeek, GLM4, Mistral, and Qwen3). The review feedback correctly identifies a critical issue with the cooperative multiple inheritance (MRO) chain in BaseMTPModel._init_custom, which overrides the method without calling super()._init_custom(). Suggestions are provided to call super()._init_custom() and to update the unit tests using a dummy subclass to properly test this cooperative inheritance behavior.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
What
Extracts a small
BaseMTPModelmixin that consolidates the draft-model wiring duplicated across the existing MTP models —deepseek_mtp,mistral_mtp,glm4_moe_lite_mtp, andqwen3_moe_mtp.Each of those draft models repeated the same setup: reuse the main model's req/mem managers and rope caches, pop the
main_model/mtp_previous_draft_modelskwargs before the base__init__, and mark themselves as draft models. This moves that shared logic into one place.Why
A behaviour-preserving cleanup (+8/−99 across the four models) that removes duplication and gives a single home for the
is_mtp_draft_modelmarker. It is a preparatory refactor split out of a larger Qwen3.5 / Qwen3.5-MoE MTP change so the follow-up feature PR stays focused on the new capability.Changes
lightllm/models/base_mtp_model.py— theBaseMTPModelmixin, plus a unit test for its wiring.deepseek_mtp/mistral_mtp/glm4_moe_lite_mtp/qwen3_moe_mtpmodel classes now mix inBaseMTPModel(mixed in before the concrete base model so the shared overrides win via MRO).Testing
pre-commit(black 21.12b0 + flake8 6.1.0) clean.unit_tests/models/test_base_mtp_model_mixin.pycovers the mixin's kwargs-popping, manager/rope sharing, and theis_mtp_draft_modelmarker.