Skip to content

[BUG] float64 is not supported in MPS backend. #2215

@hebangwen

Description

@hebangwen

Describe the bug

Float64 dtype is not supported in Apple MPS backend. However, float64 dtype is used to warm up GPTQModel. I have commented this datatype, and the quantization seems to be successful.

Error Logs

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/threadx.py", line 391, in _run
    self._run_warmup()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/threadx.py", line 375, in _run_warmup
    warmup_fn(self.device)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/linalg_warmup.py", line 49, in run_torch_linalg_warmup
    _run_cholesky_and_eigh(device, dtype)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/linalg_warmup.py", line 24, in _run_cholesky_and_eigh
    spd = _make_spd(4, device, dtype)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages/gptqmodel/utils/linalg_warmup.py", line 18, in _make_spd
    base = torch.randn((size, size), device=device, dtype=dtype)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

warmup code:

def run_torch_linalg_warmup(device: torch.device) -> None:
"""
Execute the torch.linalg operators used across the project once on the worker thread.
Serialized under a global lock to avoid races inside PyTorch's lazy wrappers. The warmup
still runs once per physical device so backend-specific handles are initialized where needed.
"""
with _GLOBAL_WARMUP_LOCK:
dtypes = (torch.float32, torch.float64)
for dtype in dtypes:
_run_cholesky_and_eigh(device, dtype)
_run_svd(device, dtype)
_run_qr(device, dtype)

GPU Info

Apple M4 GPU

Software Info

MacOS 26 + Python 3.10

Show output of:

# pip show gptqmodel torch transformers accelerate triton

WARNING: Package(s) not found: triton
Name: GPTQModel
Version: 5.4.4
Summary: Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Home-page: https://github.com/ModelCloud/GPTQModel
Author: 
Author-email: ModelCloud <qubitium@modelcloud.ai>
License-Expression: Apache-2.0
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: accelerate, datasets, device-smi, dill, hf_transfer, huggingface_hub, logbar, maturin, numpy, packaging, pillow, protobuf, pyarrow, pypcre, random_word, safetensors, threadpoolctl, tokenicer, torch, torchao, transformers
Required-by: 
---
Name: torch
Version: 2.9.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org
Author: 
Author-email: PyTorch Team <packages@pytorch.org>
License: BSD-3-Clause
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: accelerate, GPTQModel, optimum
---
Name: transformers
Version: 4.57.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: GPTQModel, optimum, tokenicer
---
Name: accelerate
Version: 1.12.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: zach.mueller@huggingface.co
License: Apache
Location: /opt/homebrew/Caskroom/miniconda/base/envs/gptq/lib/python3.10/site-packages
Requires: huggingface_hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: GPTQModel

To Reproduce

# 1. install gptqmodel in Apple M4

# 2. run example
python examples/quantization/transformers_usage.py

Expected behavior

The quantization successes without error. And the model is saved.

Screenshots

Image

Additional context

The example script successes if the float64-dtype warming-up step is removed. But I’m not sure whether skipping it has any side effects.

Image

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions