Skip to content

fix: fall back to HF for Mistral3 VLMs with non-Mistral4 text backbone#1557

Merged
akoumpa merged 1 commit intomainfrom
fix/mistral3-devstral-small-fallback
Mar 17, 2026
Merged

fix: fall back to HF for Mistral3 VLMs with non-Mistral4 text backbone#1557
akoumpa merged 1 commit intomainfrom
fix/mistral3-devstral-small-fallback

Conversation

@HuiyingLi
Copy link
Copy Markdown
Contributor

Summary

  • The custom Mistral3ForConditionalGeneration model (for Mistral4 MoE+MLA) was intercepting all models with that HF architecture, including Devstral-Small which uses a dense Ministral3 text backbone, causing AttributeError: 'Ministral3Config' object has no attribute 'moe_intermediate_size'
  • Add supports_config classmethod to Mistral3ForConditionalGeneration that returns False for non-Mistral4 text configs
  • Add resolve_custom_model_cls to _ModelRegistry that checks supports_config before returning a custom model class, falling back to HF's native implementation when unsupported
  • 9 new unit tests covering resolve_custom_model_cls (5) and supports_config (4)

Test plan

  • All 4129 unit tests pass locally (0 failures)
  • New unit tests cover resolve_custom_model_cls with: found/not-found/supports-true/supports-false/config-passthrough
  • New unit tests cover supports_config with: mistral4-text/non-mistral4-text/no-text-config/no-model-type
  • E2E: Mistral4 VLM 2-layer proxy trains successfully (custom model path)
  • E2E: Devstral-Small correctly falls back to HF (log: "Custom model Mistral3ForConditionalGeneration does not support config Mistral3Config, falling back to HF")
  • ruff check and ruff format pass on all changed files

🤖 Generated with Claude Code

The custom Mistral3ForConditionalGeneration model (added for Mistral4
MoE+MLA text backbones) was intercepting all models with that HF
architecture, including Devstral-Small which uses a dense Ministral3
text backbone. This caused an AttributeError on `moe_intermediate_size`.

Add a `supports_config` classmethod that custom model classes can define
to opt out for incompatible configs. The registry's new
`resolve_custom_model_cls` method checks this before returning a custom
class, falling back to HF's native implementation when unsupported.

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@HuiyingLi
Copy link
Copy Markdown
Contributor Author

/ok to test a678fa5

@akoumpa akoumpa merged commit 6423542 into main Mar 17, 2026
52 checks passed
@akoumpa akoumpa deleted the fix/mistral3-devstral-small-fallback branch March 17, 2026 18:51
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
#1557)

The custom Mistral3ForConditionalGeneration model (added for Mistral4
MoE+MLA text backbones) was intercepting all models with that HF
architecture, including Devstral-Small which uses a dense Ministral3
text backbone. This caused an AttributeError on `moe_intermediate_size`.

Add a `supports_config` classmethod that custom model classes can define
to opt out for incompatible configs. The registry's new
`resolve_custom_model_cls` method checks this before returning a custom
class, falling back to HF's native implementation when unsupported.

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants