GCMC with 350M edges CUDA OOM #957

burness · 2019-10-26T09:47:44Z

❓ Questions and Help

Hi DGLer,

I use your gcmc code to train on my dataset and build an graph as shown in the figure
.
Maybe the graph is too large to fit in one 1080ti card.

So I change the ctx to use 8 1080ti cards, but I found that your code can't use the multi cards although you have the config with ctx. But you only use gpu0 in https://github.com/dmlc/dgl/blob/master/examples/mxnet/gcmc/train.py#L223.

Is there any multi cards solution or distributed solution?

jermainewang · 2019-10-26T16:04:51Z

Hi @burness , the memory consumption of GCMC is mainly from storing the user and mlog features. You have 3.6M users and 67K mlogs. The default hidden size is 500 so storing user features alone takes ~7GB. You also have 4 types of relations so that number needs to be multipled by 4. Two possible solutions:

Use a smaller hidden size but that might influence model performance.
Use the mini-batch training approach in author's implementation (https://github.com/riannevdberg/gc-mc). The idea is to load a subgraph from CPU to GPU. It will be great if you'd like to contribute this to DGL.

burness · 2019-10-26T17:30:28Z

Hi @jermainewang

I use cpu to run the GCMC but It's too slow. The last version of my code has a bug with user and mlog feature shape (user_num, user_num) (mlog_num, mlog_num). After fixing this bug, It remains an OOM.

Thanks for your solution mentioned above. I will change the gcn_agg_units and gcn_out_units to be 100 and 15. And if time is sufficient, I will try to implement the mini-batch GCMC.

BTW, the data reader in the gcmc example has very low performance and I change some code to speedup. But it still need 40mins or more to read data.

burness · 2019-10-27T03:00:43Z

After changing gcn_agg_units and gcn_out_units, it still OOM. But the time with cpu decrease.

jermainewang changed the title ~~DCMC with 350M edges CUDA OOM~~ GCMC with 350M edges CUDA OOM Oct 26, 2019

classicsong mentioned this issue Nov 25, 2019

[Example] Add Sampling support for GCMC example (MXNet impl) #1042

Closed

6 tasks

classicsong mentioned this issue Feb 29, 2020

[Example] GCMC with sampling #1296

Merged

6 tasks

classicsong closed this as completed Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCMC with 350M edges CUDA OOM #957

GCMC with 350M edges CUDA OOM #957

burness commented Oct 26, 2019 •

edited

Loading

jermainewang commented Oct 26, 2019

burness commented Oct 26, 2019 •

edited

Loading

burness commented Oct 27, 2019

GCMC with 350M edges CUDA OOM #957

GCMC with 350M edges CUDA OOM #957

Comments

burness commented Oct 26, 2019 • edited Loading

❓ Questions and Help

jermainewang commented Oct 26, 2019

burness commented Oct 26, 2019 • edited Loading

burness commented Oct 27, 2019

burness commented Oct 26, 2019 •

edited

Loading

burness commented Oct 26, 2019 •

edited

Loading