You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, here, when creating coo/csr matrix, it would actually call this cudaPointerGetAttributes CUDA runtime API inside IsPinned function. As a result, each worker would call this constructor and start init/access cuda instances.
// We don't want to fail in these particular cases since this function
// can be called when users only want to run on CPU even if CUDA API is
// enabled, or in a forked subprocess where CUDA context cannot be
// initialized. So we just mark the CUDA context to unavailable and
// return.
is_available_ = false;
cudaGetLastError(); // clear error
Nevertheless, I believe it would still be preferable to adhere to PyT's convention by removing the IsPinned function from the constructor of the coo/csr matrix.
This has caused a lot of trouble in the past, so really glad you've caught this and will fix it, @chang-l. It will also save a lot of developer time as the resulting bugs from this issue take a while to track down.
In PyTorch dataloader (cpu sampling), worker processes will never initialize CUDA context, as CUDA runtime does not support
fork
start method (https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing). I think DGL dataloader should also follow this convention, if possible.However, here, when creating coo/csr matrix, it would actually call this
cudaPointerGetAttributes
CUDA runtime API insideIsPinned
function. As a result, each worker would call this constructor and start init/access cuda instances.dgl/include/dgl/aten/coo.h
Lines 68 to 71 in fedaa36
It's not a bug and will not error out as such behavior is guarded here by clearing the cuda error msg (see below):
dgl/src/runtime/cuda/cuda_device_api.cc
Lines 295 to 301 in b35757a
Nevertheless, I believe it would still be preferable to adhere to PyT's convention by removing the
IsPinned
function from the constructor of the coo/csr matrix.I can come up with a PR for the fix later. cc. @nv-dlasalle @yaox12 @frozenbugs @TristonC
The text was updated successfully, but these errors were encountered: