-
Notifications
You must be signed in to change notification settings - Fork 135
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Trying to quantize GLM-4.5-Air with 8 x 3090, on kubuntu 24.04, torch 2.8.0, cuda 12.8, python 3.13.7t
and got an error:
INFO Hooked Modules: Using legacy based config for targeting of modules
INFO ModuleLooper: forward start (processor=`gptq`, layer=`model.layers.0`, subset=1/7, batches=1057) %
Quantizing layer 0 of 45 [0 of 45] ██------------------------------------------------------------------------------------------------------------------------| 0:00:02 / 0:01:32 [1/46] 2.2%Traceback (most recent call last):
File "/home/ubuntu/Documents/Quantize/quantize-glm4.5-Air-gptqmodel-moe-prune-smart-4.py", line 489, in <module>
model.quantize(
~~~~~~~~~~~~~~^
calibration_dataset,
^^^^^^^^^^^^^^^^^^^^
batch_size=BATCH_SIZE,
^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/models/base.py", line 875, in quantize
return module_looper.loop(
~~~~~~~~~~~~~~~~~~^
backend=backend,
^^^^^^^^^^^^^^^^
fail_safe=self.quantize_config.fail_safe,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/looper/module_looper.py", line 786, in loop
forward_outputs = self._run_forward_batches(
module=module,
...<10 lines>...
reuse_kv=reuse_kv,
)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/looper/module_looper.py", line 248, in _run_forward_batches
return self._run_forward_batches_parallel(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
module=module,
^^^^^^^^^^^^^^
...<11 lines>...
devices=devices,
^^^^^^^^^^^^^^^^
)
^
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/looper/module_looper.py", line 394, in _run_forward_batches_parallel
batch_idx, module_output, kv_next = fut.result()
~~~~~~~~~~^^
File "/home/ubuntu/.pyenv/versions/3.13.7t/lib/python3.13t/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/home/ubuntu/.pyenv/versions/3.13.7t/lib/python3.13t/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/utils/threadx.py", line 367, in _run
result = fn(*args, **kwargs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/utils/looper_helpers.py", line 291, in forward_batch_worker
module_output = module(*inputs, **additional_inputs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/transformers/modeling_layers.py", line 94, in __call__
return super().__call__(*args, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 380, in forward
hidden_states, _ = self.self_attn(
~~~~~~~~~~~~~~^
hidden_states=hidden_states,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<6 lines>...
**kwargs,
^^^^^^^^^
)
^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 170, in forward
query_states = self.q_proj(hidden_states).view(hidden_shape)
~~~~~~~~~~~^^^^^^^^^^^^^^^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/nn_modules/hooked_linear.py", line 218, in forward
with tf32_enable_guard():
~~~~~~~~~~~~~~~~~^^
File "/home/ubuntu/.pyenv/versions/3.13.7t/lib/python3.13t/contextlib.py", line 141, in __enter__
return next(self.gen)
File "/home/ubuntu/git/avtc/GPTQModel/gptqmodel/utils/torch.py", line 251, in tf32_enable_guard
if torch.backends.fp32_precision == "tf32":
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch.backends' has no attribute 'fp32_precision'
terminate called without an active exception
terminate called recursively
Aborted (core dumped)
Show output of:
gptqmodel, commit hash: 5d80bdcc28e88ea642cdd79a2e9dd6fd78c8b7e9
last known working hash: d8f3c78988bb8f11982a5e52361537ffba05d145
(did not checked middle hashes)
transformers Version: 4.56.1
accelerate Version: 1.10.1
triton Version: 3.4.0
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working