Skip to content

custatevec OOM on DGX Spark (GB10 UMA) — cudaMalloc fails on unified memory architecture #215

@reinamora137

Description

@reinamora137

Summary

custatevec-backed quantum simulation (via PennyLane lightning.gpu) fails with out of memory on NVIDIA DGX Spark (GB10 Superchip, SM 12.1) even for trivially small state vectors (4 qubits = 256 bytes). The failure originates from cudaMalloc encountering the unified memory architecture where cudaMemGetInfo reports near-zero free memory despite 128GB of shared DRAM being available.

Environment

  • Hardware: NVIDIA DGX Spark, GB10 Superchip (SM 12.1)
  • Memory: 128GB LPDDR5X unified CPU+GPU (no discrete VRAM)
  • CUDA: 13.0, Driver 580.126.09
  • OS: Ubuntu (aarch64)
  • custatevec: 1.12.0 (via custatevec-cu12 pip package)

Reproducer

import pennylane as qml

# Uses custatevec under the hood
dev = qml.device('lightning.gpu', wires=4)

@qml.qnode(dev)
def circuit():
    qml.Hadamard(wires=0)
    return qml.expval(qml.PauliZ(0))

result = circuit()  # Fails with OOM
pennylane_lightning.lightning_gpu_ops.LightningException: 
[...DevTag.hpp][Line:65][Method:refresh]: Error in PennyLane Lightning: out of memory

Root Cause

DGX Spark implements Unified Memory Architecture (UMA) — the GPU and CPU share the same physical DRAM. Standard CUDA memory query APIs (cudaMemGetInfo) return misleading values on UMA systems, causing libraries that pre-check available memory to believe no GPU memory exists.

This is documented by NVIDIA:

Similar issues in other CUDA libraries:

Questions

  1. Does custatevec internally use cudaMemGetInfo for pre-allocation checks? If so, is there a plan to support UMA platforms?
  2. Would using cudaMallocManaged instead of cudaMalloc on UMA systems (detected via cudaDeviceProperties::integrated) be a viable fix?
  3. Is there a custatevec configuration option or environment variable to bypass the memory pre-check?
  4. What is the roadmap for cuQuantum support on DGX Spark / Grace-Blackwell UMA systems?

Context

The DGX Spark is shipping to quantum computing researchers and developers. GPU-accelerated quantum simulation is a natural use case for this hardware. Currently, custatevec is unusable on it, forcing fallback to CPU-only simulation.

We've also filed a related issue on PennyLaneAI/pennylane-lightning since the allocation code path goes through their DataBuffer.hpp, but the underlying question is whether custatevec itself has UMA-incompatible memory assumptions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions