Skip to content

Conversation

@LRL2-ModelCloud
Copy link
Collaborator

@LRL2-ModelCloud LRL2-ModelCloud commented Sep 27, 2025

Completes #1848

@Qubitium
Copy link
Collaborator

Qubitium commented Sep 27, 2025

Refractor: Add new BaseQModel properties

ATTENTION_MASKS_DTYPE = torch.bool # default to bool
ATTENTION_MASKS_REQUIRED_FOR_INPUT : bool = False  # default to false

@Qubitium Qubitium self-assigned this Sep 27, 2025
@Qubitium Qubitium mentioned this pull request Sep 27, 2025
@Qubitium
Copy link
Collaborator

@LRL2-ModelCloud

 def pre_quantize_generate_hook_end(self):
        self.model.thinker.model.embed_tokens = self.model.thinker.model.embed_tokens.to(CPU)
        self.model.thinker.visual = self.model.thinker.visual.to(CPU)
        self.model.thinker.audio_tower = self.model.thinker.audio_tower.to(CPU)

        self.model.thinker.visual.rotary_pos_emb = self.model.thinker.visual.rotary_pos_emb.to(CPU)
        self.model.thinker.model.rotary_emb = self.model.thinker.model.rotary_emb.to(CPU)

Actually, call offload_to_disk() on these 3 modules so they directly go to disk and not waste cpu mem.

@Qubitium
Copy link
Collaborator

@LRL2-ModelCloud

add

INPUT_EMBEDDING_EXTRA_ARGS = None

This is for models like qwen vl whre we need to pass return_audio=False or return_video=False like args. We pass it as **INPUT_EMBEDDING_EXTRA_ARGS if not None.

@Qubitium Qubitium marked this pull request as ready for review September 29, 2025 00:45
@Qubitium Qubitium merged commit 000d41b into main Sep 29, 2025
5 checks passed
@CSY-ModelCloud CSY-ModelCloud deleted the qwen3-omni-moe-support branch October 20, 2025 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants