Qwen3 omni moe support #1939

LRL2-ModelCloud · 2025-09-27T07:28:21Z

Completes #1848

…to meta first.

…Qwen2_5OmniDecoderLayer or Qwen3OmniMoeThinkerTextDecoderLayer

…type is Qwen2_5OmniDecoderLayer or Qwen3OmniMoeThinkerTextDecoderLayer" This reverts commit e42a21d.

Qubitium · 2025-09-27T07:37:52Z

Refractor: Add new BaseQModel properties

ATTENTION_MASKS_DTYPE = torch.bool # default to bool
ATTENTION_MASKS_REQUIRED_FOR_INPUT : bool = False  # default to false

Qubitium · 2025-09-27T07:41:22Z

@LRL2-ModelCloud

 def pre_quantize_generate_hook_end(self):
        self.model.thinker.model.embed_tokens = self.model.thinker.model.embed_tokens.to(CPU)
        self.model.thinker.visual = self.model.thinker.visual.to(CPU)
        self.model.thinker.audio_tower = self.model.thinker.audio_tower.to(CPU)

        self.model.thinker.visual.rotary_pos_emb = self.model.thinker.visual.rotary_pos_emb.to(CPU)
        self.model.thinker.model.rotary_emb = self.model.thinker.model.rotary_emb.to(CPU)

Actually, call offload_to_disk() on these 3 modules so they directly go to disk and not waste cpu mem.

Qubitium · 2025-09-27T07:58:03Z

@LRL2-ModelCloud

add

INPUT_EMBEDDING_EXTRA_ARGS = None

This is for models like qwen vl whre we need to pass return_audio=False or return_video=False like args. We pass it as **INPUT_EMBEDDING_EXTRA_ARGS if not None.

LRL2-ModelCloud added 10 commits September 27, 2025 10:31

add Qwen3OmniMoe

9aef890

Fixing now: if offload_to_disk=False, we should not load the model in…

dd1d400

…to meta first.

cleanup

964a92f

fix nonetype

f60ae0c

fix none

5864c49

add qwen3 omine moe support

7d54a92

require attention_mask to be int type if model decoder layer type is …

e42a21d

…Qwen2_5OmniDecoderLayer or Qwen3OmniMoeThinkerTextDecoderLayer

Revert "require attention_mask to be int type if model decoder layer …

37165ba

…type is Qwen2_5OmniDecoderLayer or Qwen3OmniMoeThinkerTextDecoderLayer" This reverts commit e42a21d.

for compatibility, attention_mask should be of type long.

f20f37d

add support_offload_to_disk

f7cd5ae

Qubitium self-assigned this Sep 27, 2025

Qubitium mentioned this pull request Sep 27, 2025

QWEN3-OMNI SUPPORT #1848

Closed

cleanup

f3ff5c7

LRL2-ModelCloud added 14 commits September 27, 2025 16:02

cleanup

0765e28

typo

7fda54a

check none

0560d81

cleanup

1c9c91a

offload to disk

9ef8679

cleanup

2f5b277

update

78f6758

mod filter_not_quantize_module

4ade4d0

if offload_to_disk=False, we should not load the model into meta first.

59cf8c2

mod base.py

41fb9f4

mod pre_quantize_generate_hook_end

e3fc3ca

fix has no attr quantize_config

58e6d79

fix filter

acb5f80

check none

5be44b7

LRL2-ModelCloud added 2 commits September 28, 2025 11:02

need config

c20d9d2

fix filter_not_quantize_module

65eb981

Qubitium marked this pull request as ready for review September 29, 2025 00:45

Qubitium merged commit 000d41b into main Sep 29, 2025
5 checks passed

CSY-ModelCloud deleted the qwen3-omni-moe-support branch October 20, 2025 03:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 omni moe support #1939

Qwen3 omni moe support #1939

Uh oh!

LRL2-ModelCloud commented Sep 27, 2025 •

edited by Qubitium

Loading

Uh oh!

Qubitium commented Sep 27, 2025 •

edited

Loading

Uh oh!

Qubitium commented Sep 27, 2025

Uh oh!

Qubitium commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Qwen3 omni moe support #1939

Qwen3 omni moe support #1939

Uh oh!

Conversation

LRL2-ModelCloud commented Sep 27, 2025 • edited by Qubitium Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium commented Sep 27, 2025

Uh oh!

Qubitium commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LRL2-ModelCloud commented Sep 27, 2025 •

edited by Qubitium

Loading

Qubitium commented Sep 27, 2025 •

edited

Loading