raise an error `samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS` when running `/gnn_lab/example/samgraph/train_gcn.py`on papers100M dataset #15

weihai-98 · 2023-09-18T07:36:19Z

Hey, I got an error samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS when running /gnn_lab/example/samgraph/train_gcn.pyon papers100M dataset, however, this python script runs successfully on other datasets such as ognb_products and reddit. How can I fix it? Looking forward to your help. Thanks!

The text was updated successfully, but these errors were encountered:

weihai-98 · 2023-09-18T12:36:54Z

I fix this bug when decreasing the cache_percentage from 0.21 to 0.001. However I am curious on how to realize the cache ratio of 0.21 reported in your paper? Increasing the pin-memory limit or others?

molamooo · 2023-09-18T12:41:01Z

Can you provide more information about your setup? E.g., the number of GPUs, GPU memory, batch size, the script and parameters you use, how you generate the dataset.

weihai-98 · 2023-09-18T12:52:24Z

Yeah, I use 2 3090 GPUs with 24GB device memory, 1 for sample and 1 for train, the training batch size is 8000, hidden dim is 64 and I generate the dataset as the code(gnn_lab/utility/data-process/dataset/papers100M.ipynb) you provided. The script I run is /gnn_lab/example/samgraph/train_gcn.py.

weihai-98 · 2023-09-18T13:00:49Z

My papers100M dataset downloaded from https://snap.stanford.edu/ogb/data/nodeproppred/.

molamooo · 2023-09-18T13:03:47Z

Then 24GB memory should be enough for 21% cache rate, since the entire feature is around 55GB. We need more information to indentify the root cause, e.g., the call stack when the error raises. You may sleep several seconds right after the trainer process launches, print pid, and attach GDB to it.

weihai-98 · 2023-09-18T13:07:54Z

Thanks! I will try to print the more detailed information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raise an error `samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS` when running `/gnn_lab/example/samgraph/train_gcn.py`on papers100M dataset #15

raise an error `samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS` when running `/gnn_lab/example/samgraph/train_gcn.py`on papers100M dataset #15

weihai-98 commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

molamooo commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

molamooo commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

raise an error samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS when running /gnn_lab/example/samgraph/train_gcn.pyon papers100M dataset #15

raise an error samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS when running /gnn_lab/example/samgraph/train_gcn.pyon papers100M dataset #15

Comments

weihai-98 commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

molamooo commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

molamooo commented Sep 18, 2023

weihai-98 commented Sep 18, 2023

raise an error `samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS` when running `/gnn_lab/example/samgraph/train_gcn.py`on papers100M dataset #15

raise an error `samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS` when running `/gnn_lab/example/samgraph/train_gcn.py`on papers100M dataset #15