-
Notifications
You must be signed in to change notification settings - Fork 130
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Trying to quantize with gptqmodel commit hash d8f3c78
with mock_quantization=False, and got an error on first layer with experts (layer 1) (GLM-4.5-Air):
Quantizing mlp.experts.32.gate_proj in layer [1 of 45] ████-------------------------------------------------------------------------------------------------| 0:13:41 / 5:14:43 [2/46] 4.3%Traceback (most recent call last):
File "/home/ubuntu/Documents/Quantize/quantize-glm4.5-Air-gptqmodel-moe-prune-smart-4.py", line 489, in <module>
model.quantize(
~~~~~~~~~~~~~~^
calibration_dataset,
^^^^^^^^^^^^^^^^^^^^
batch_size=BATCH_SIZE,
^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/models/base.py", line 717, in quantize
return module_looper.loop(
~~~~~~~~~~~~~~~~~~^
backend=backend,
^^^^^^^^^^^^^^^^
fail_safe=self.quantize_config.fail_safe,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/looper/module_looper.py", line 850, in loop
name, m = fut.result()
~~~~~~~~~~^^
File "/home/ubuntu/.pyenv/versions/3.13.7t/lib/python3.13t/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/home/ubuntu/.pyenv/versions/3.13.7t/lib/python3.13t/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/utils/threadx.py", line 360, in _run
result = fn(*args, **kwargs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/looper/module_looper.py", line 842, in _process_on_worker
proc.process(module=nm)
~~~~~~~~~~~~^^^^^^^^^^^
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/looper/gptq_processor.py", line 123, in process
wq, q_scales, q_zeros, q_g_idx, duration, avg_loss, damp_percent, nsamples = g.quantize()
~~~~~~~~~~^^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/quantization/gptq.py", line 354, in quantize
Hinv, damp = self.hessian_inverse(self.H)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/quantization/gptq.py", line 257, in hessian_inverse
H2 = torch.linalg.cholesky(H2)
RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling `cusolverDnCreate(handle)`. If you keep seeing this error, you may use `torch.backends.cuda.preferred_linalg_library()` to try linear algebra operators with other supported backends. See https://pytorch.org/docs/stable/backends.html#torch.backends.cuda.preferred_linalg_library
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working