Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Oct 4, 2025

Fix #1971

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium changed the title Flash2 dtype Add OVIS 2.5 Support Oct 4, 2025
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium merged commit c145f31 into main Oct 4, 2025
5 checks passed
@Qubitium Qubitium deleted the flash2-dtype branch October 4, 2025 20:07
@Qubitium
Copy link
Collaborator Author

Qubitium commented Oct 4, 2025

@avtc I have added barriers so the main loop will wait for all threads to complete work before any forward and forward (replay). This may reduce your chance of OOM. Another toggle you can try is changing the cuda:per value in module_looper.py from 4 to 1 to only allow one work task per gpu:index device.

@avtc
Copy link
Contributor

avtc commented Oct 4, 2025

@Qubitium

Traceback (most recent call last):
  File "/home/ubuntu/Documents/Quantize/quantize-glm4.5-Air-gptqmodel-moe-prune-smart-4.py", line 462, in <module>
    from gptqmodel import GPTQModel, QuantizeConfig
  File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/__init__.py", line 11, in <module>
    from .models import GPTQModel, get_best_device
  File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/models/__init__.py", line 7, in <module>
    from .auto import MODEL_MAP, GPTQModel
  File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/models/auto.py", line 60, in <module>
    from .definitions.apertus import ApertusQModel  # noqa: E402
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/models/definitions/__init__.py", line 47, in <module>
    from .ovis2_5 import Ovis2_5QModel
ModuleNotFoundError: No module named 'gptqmodel.models.definitions.ovis2_5'

@Qubitium
Copy link
Collaborator Author

Qubitium commented Oct 4, 2025

@avtc oof. comment that line out. Forgot to git add the new file.

@avtc
Copy link
Contributor

avtc commented Oct 4, 2025

Will check tomorrow

Qubitium added a commit that referenced this pull request Oct 6, 2025
This reverts commit c145f31

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Qubitium added a commit that referenced this pull request Oct 6, 2025
* Revert "Fix missing file (#1983)"

This reverts commit 673a1cb.

* Revert "Add OVIS 2.5 Support (#1981)"

This reverts commit c145f31

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>

* format

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>

* reduce usage of tctl.threadpool_limit

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>

---------

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ovis2.5 9B Quantization Support

3 participants