Open
Description
Hi, the following code causes GPU OOM on hopper with nvls enabled. I am using the latest main branch.
from mscclpp import Transport, TcpBootstrap, Communicator
from mscclpp._mscclpp import Context, RawGpuBuffer
import cupy as cp
cp.cuda.Device(0).use()
bootstrap = TcpBootstrap.create(0, 1)
bootstrap.initialize(bootstrap.create_unique_id(), 60)
comm = Communicator(bootstrap)
for i in range(100):
if i % 10 == 0:
print(f"{i=}", flush=True)
mem = RawGpuBuffer(2 ** 30)
reg = comm.register_memory(mem.data(), mem.bytes(), Transport.CudaIpc)
del reg, mem
Output:
i=0
i=10
i=20
i=30
i=40
i=50
i=60
i=70
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
mscclpp._mscclpp.CuError: (2, 'Call to result failed./.../mscclpp/src/gpu_utils.cc:128 (Cu failure: out of memory)')
The code is fine if memory is not registered. Could you please check if it can be reproduced on your side?
Metadata
Metadata
Assignees
Labels
No labels
Activity
Binyang2014 commentedon May 25, 2025
We can reproduce this issue. Let me figure out the reason