-
Notifications
You must be signed in to change notification settings - Fork 133
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Float64 dtype is not supported in Apple MPS backend. However, float64 dtype is used to warm up GPTQModel. I have commented this datatype, and the quantization seems to be successful.
Error Logs
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/threadx.py", line 391, in _run
self._run_warmup()
File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/threadx.py", line 375, in _run_warmup
warmup_fn(self.device)
File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/linalg_warmup.py", line 49, in run_torch_linalg_warmup
_run_cholesky_and_eigh(device, dtype)
File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/linalg_warmup.py", line 24, in _run_cholesky_and_eigh
spd = _make_spd(4, device, dtype)
File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/linalg_warmup.py", line 18, in _make_spd
base = torch.randn((size, size), device=device, dtype=dtype)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.warmup code:
GPTQModel/gptqmodel/utils/linalg_warmup.py
Lines 39 to 51 in f8dd297
| def run_torch_linalg_warmup(device: torch.device) -> None: | |
| """ | |
| Execute the torch.linalg operators used across the project once on the worker thread. | |
| Serialized under a global lock to avoid races inside PyTorch's lazy wrappers. The warmup | |
| still runs once per physical device so backend-specific handles are initialized where needed. | |
| """ | |
| with _GLOBAL_WARMUP_LOCK: | |
| dtypes = (torch.float32, torch.float64) | |
| for dtype in dtypes: | |
| _run_cholesky_and_eigh(device, dtype) | |
| _run_svd(device, dtype) | |
| _run_qr(device, dtype) |
GPU Info
Apple M4 GPU
Software Info
MacOS 26 + Python 3.10
Show output of:
# pip show gptqmodel torch transformers accelerate triton
WARNING: Package(s) not found: triton
Name: GPTQModel
Version: 5.4.4
Summary: Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Home-page: https://github.com/ModelCloud/GPTQModel
Author:
Author-email: ModelCloud <qubitium@modelcloud.ai>
License-Expression: Apache-2.0
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: accelerate, datasets, device-smi, dill, hf_transfer, huggingface_hub, logbar, maturin, numpy, packaging, pillow, protobuf, pyarrow, pypcre, random_word, safetensors, threadpoolctl, tokenicer, torch, torchao, transformers
Required-by:
---
Name: torch
Version: 2.9.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org
Author:
Author-email: PyTorch Team <packages@pytorch.org>
License: BSD-3-Clause
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: accelerate, GPTQModel, optimum
---
Name: transformers
Version: 4.57.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: GPTQModel, optimum, tokenicer
---
Name: accelerate
Version: 1.12.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: zach.mueller@huggingface.co
License: Apache
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: huggingface_hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: GPTQModelTo Reproduce
# 1. install gptqmodel in Apple M4
# 2. run example
python examples/quantization/transformers_usage.pyExpected behavior
The quantization successes without error. And the model is saved.
Screenshots
Additional context
The example script successes if the float64-dtype warming-up step is removed. But I’m not sure whether skipping it has any side effects.

Qubitium
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working