Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCMC with 350M edges CUDA OOM #957

Closed
burness opened this issue Oct 26, 2019 · 3 comments
Closed

GCMC with 350M edges CUDA OOM #957

burness opened this issue Oct 26, 2019 · 3 comments

Comments

@burness
Copy link

burness commented Oct 26, 2019

❓ Questions and Help

Hi DGLer,

I use your gcmc code to train on my dataset and build an graph as shown in the figure
image.
Maybe the graph is too large to fit in one 1080ti card.
image

So I change the ctx to use 8 1080ti cards, but I found that your code can't use the multi cards although you have the config with ctx. But you only use gpu0 in https://github.com/dmlc/dgl/blob/master/examples/mxnet/gcmc/train.py#L223.

Is there any multi cards solution or distributed solution?

@jermainewang jermainewang changed the title DCMC with 350M edges CUDA OOM GCMC with 350M edges CUDA OOM Oct 26, 2019
@jermainewang
Copy link
Member

Hi @burness , the memory consumption of GCMC is mainly from storing the user and mlog features. You have 3.6M users and 67K mlogs. The default hidden size is 500 so storing user features alone takes ~7GB. You also have 4 types of relations so that number needs to be multipled by 4. Two possible solutions:

  • Use a smaller hidden size but that might influence model performance.
  • Use the mini-batch training approach in author's implementation (https://github.com/riannevdberg/gc-mc). The idea is to load a subgraph from CPU to GPU. It will be great if you'd like to contribute this to DGL.

@burness
Copy link
Author

burness commented Oct 26, 2019

Hi @jermainewang
image
I use cpu to run the GCMC but It's too slow. The last version of my code has a bug with user and mlog feature shape (user_num, user_num) (mlog_num, mlog_num). After fixing this bug, It remains an OOM.

Thanks for your solution mentioned above. I will change the gcn_agg_units and gcn_out_units to be 100 and 15. And if time is sufficient, I will try to implement the mini-batch GCMC.

BTW, the data reader in the gcmc example has very low performance and I change some code to speedup. But it still need 40mins or more to read data.

@burness
Copy link
Author

burness commented Oct 27, 2019

image
After changing gcn_agg_units and gcn_out_units, it still OOM. But the time with cpu decrease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants