-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Summary
custatevec-backed quantum simulation (via PennyLane lightning.gpu) fails with out of memory on NVIDIA DGX Spark (GB10 Superchip, SM 12.1) even for trivially small state vectors (4 qubits = 256 bytes). The failure originates from cudaMalloc encountering the unified memory architecture where cudaMemGetInfo reports near-zero free memory despite 128GB of shared DRAM being available.
Environment
- Hardware: NVIDIA DGX Spark, GB10 Superchip (SM 12.1)
- Memory: 128GB LPDDR5X unified CPU+GPU (no discrete VRAM)
- CUDA: 13.0, Driver 580.126.09
- OS: Ubuntu (aarch64)
- custatevec: 1.12.0 (via
custatevec-cu12pip package)
Reproducer
import pennylane as qml
# Uses custatevec under the hood
dev = qml.device('lightning.gpu', wires=4)
@qml.qnode(dev)
def circuit():
qml.Hadamard(wires=0)
return qml.expval(qml.PauliZ(0))
result = circuit() # Fails with OOMpennylane_lightning.lightning_gpu_ops.LightningException:
[...DevTag.hpp][Line:65][Method:refresh]: Error in PennyLane Lightning: out of memory
Root Cause
DGX Spark implements Unified Memory Architecture (UMA) — the GPU and CPU share the same physical DRAM. Standard CUDA memory query APIs (cudaMemGetInfo) return misleading values on UMA systems, causing libraries that pre-check available memory to believe no GPU memory exists.
This is documented by NVIDIA:
- DGX Spark Known Issues: "
cudaMemGetInfodoes not account for memory that could potentially be reclaimed from SWAP" - DGX Spark CUDA Porting Guide
Similar issues in other CUDA libraries:
Questions
- Does custatevec internally use
cudaMemGetInfofor pre-allocation checks? If so, is there a plan to support UMA platforms? - Would using
cudaMallocManagedinstead ofcudaMallocon UMA systems (detected viacudaDeviceProperties::integrated) be a viable fix? - Is there a custatevec configuration option or environment variable to bypass the memory pre-check?
- What is the roadmap for cuQuantum support on DGX Spark / Grace-Blackwell UMA systems?
Context
The DGX Spark is shipping to quantum computing researchers and developers. GPU-accelerated quantum simulation is a natural use case for this hardware. Currently, custatevec is unusable on it, forcing fallback to CPU-only simulation.
We've also filed a related issue on PennyLaneAI/pennylane-lightning since the allocation code path goes through their DataBuffer.hpp, but the underlying question is whether custatevec itself has UMA-incompatible memory assumptions.