Hi, thanks for maintaining DeepQuantum.
I tested deepquantum on an Apple Silicon Mac and found two MPS-related problems that look reproducible with the current release combination:
deepquantum==4.4.0
torch==2.10.0
- macOS on Apple M2 Ultra
- Python 3.11.11
PyTorch MPS is available in this environment:
import torch
print(torch.__version__)
print(torch.backends.mps.is_built())
print(torch.backends.mps.is_available())
Output:
Summary
There seem to be two separate issues on mps:
- Parametric single-qubit gates such as
Rx fail even for very small circuits.
- Dense statevector simulation on
mps fails around 16 qubits for a simple GHZ-style circuit.
In contrast, the same circuits run on CPU.
Issue 1: Rx fails on MPS because of mixed float / complex stacking
Minimal reproduction
import torch
import deepquantum as dq
print("torch", torch.__version__)
print("mps available", torch.backends.mps.is_available())
cir = dq.QubitCircuit(2)
cir.rx(0, 0.1)
cir.to("mps")
cir()
Observed error
RuntimeError: Failed to create function state object for: cat_int32_t_float_float2
Likely cause
From local inspection, Rx.get_matrix() currently does:
theta = self.inputs_to_tensor(theta)
cos = torch.cos(theta / 2)
isin = torch.sin(theta / 2) * 1j
return torch.stack([cos, -isin, -isin, cos]).reshape(2, 2)
On MPS, cos is float32 while isin is complex64, and torch.stack([...]) with that mixture fails.
I verified that a temporary local patch which casts both branches to complex64 before stacking makes the parametric circuit work on MPS up to 14 qubits in my test:
def patched_get_matrix(self, theta):
theta = self.inputs_to_tensor(theta)
cos = torch.cos(theta / 2).to(torch.complex64)
isin = (torch.sin(theta / 2).to(torch.complex64)) * 1j
return torch.stack([cos, -isin, -isin, cos]).reshape(2, 2)
This suggests the first issue is likely fixable inside DeepQuantum.
Issue 2: dense MPS execution fails around 16 qubits on Apple MPS
Minimal reproduction
import deepquantum as dq
cir = dq.QubitCircuit(16)
cir.h(0)
for i in range(15):
cir.cnot(i, i + 1)
cir.to("mps")
cir()
Observed error
RuntimeError: MPS supports tensors with dimensions <= 16, but got 17.
Additional observation
- 14 qubits works for a fixed GHZ circuit on MPS in my test.
- 16 qubits fails.
- CPU works for the same circuit up to at least 24 qubits in my local probe.
This second issue may be a PyTorch MPS backend limitation rather than a pure DeepQuantum bug, because DeepQuantum reshapes the state into a high-rank tensor during evolution. However, it would still be very helpful if DeepQuantum could either:
- detect this case and fall back to CPU automatically, or
- raise a clearer compatibility warning for Apple MPS.
Larger probe results
I ran a small probe comparing CPU vs MPS for two circuit types:
fixed_ghz: H + CNOT chain + expectation
param_ghz: fixed_ghz plus one Rx on each qubit
Observed behavior:
fixed_ghz on CPU: successful up to 24 qubits
fixed_ghz on MPS: successful up to 14 qubits, failed at 16 qubits
param_ghz on CPU: successful up to 24 qubits
param_ghz on MPS: failed already at 2 qubits with the cat_int32_t_float_float2 error
Selected console output:
[fixed_ghz]
Device: mps
2 qubits -> ok
4 qubits -> ok
6 qubits -> ok
8 qubits -> ok
10 qubits -> ok
12 qubits -> ok
14 qubits -> ok
16 qubits -> FAIL (RuntimeError: MPS supports tensors with dimensions <= 16, but got 17.)
[param_ghz]
Device: mps
2 qubits -> FAIL (RuntimeError: Failed to create function state object for: cat_int32_t_float_float2)
Environment
macOS: Darwin 25.3.0
Machine: arm64
Hardware: Apple M2 Ultra
Python: 3.11.11
torch: 2.10.0
deepquantum: 4.4.0
cuda available: False
mps built: True
mps available: True
Request
Would you consider:
- fixing the parametric gate matrix construction for MPS by avoiding mixed float/complex
torch.stack paths, and
- documenting or guarding the dense high-qubit MPS limitation on Apple Silicon?
If you prefer, I can split this into two separate issues because the first one looks like a library bug while the second one may partly come from a PyTorch MPS backend limitation.
Hi, thanks for maintaining DeepQuantum.
I tested
deepquantumon an Apple Silicon Mac and found two MPS-related problems that look reproducible with the current release combination:deepquantum==4.4.0torch==2.10.0PyTorch MPS is available in this environment:
Output:
Summary
There seem to be two separate issues on
mps:Rxfail even for very small circuits.mpsfails around 16 qubits for a simple GHZ-style circuit.In contrast, the same circuits run on CPU.
Issue 1:
Rxfails on MPS because of mixed float / complex stackingMinimal reproduction
Observed error
Likely cause
From local inspection,
Rx.get_matrix()currently does:On MPS,
cosisfloat32whileisiniscomplex64, andtorch.stack([...])with that mixture fails.I verified that a temporary local patch which casts both branches to
complex64before stacking makes the parametric circuit work on MPS up to 14 qubits in my test:This suggests the first issue is likely fixable inside DeepQuantum.
Issue 2: dense MPS execution fails around 16 qubits on Apple MPS
Minimal reproduction
Observed error
Additional observation
This second issue may be a PyTorch MPS backend limitation rather than a pure DeepQuantum bug, because DeepQuantum reshapes the state into a high-rank tensor during evolution. However, it would still be very helpful if DeepQuantum could either:
Larger probe results
I ran a small probe comparing CPU vs MPS for two circuit types:
fixed_ghz:H+CNOTchain + expectationparam_ghz:fixed_ghzplus oneRxon each qubitObserved behavior:
fixed_ghzon CPU: successful up to 24 qubitsfixed_ghzon MPS: successful up to 14 qubits, failed at 16 qubitsparam_ghzon CPU: successful up to 24 qubitsparam_ghzon MPS: failed already at 2 qubits with thecat_int32_t_float_float2errorSelected console output:
Environment
Request
Would you consider:
torch.stackpaths, andIf you prefer, I can split this into two separate issues because the first one looks like a library bug while the second one may partly come from a PyTorch MPS backend limitation.