-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Improve metapath_reachable_graph for large graphs #2248
Comments
Sounds like a generalization of PinSage. PinSage basically samples some neighbors on a |
I use |
@BarclayII I was planning to implement something similar using @v2psv What you suggested is similar to the second alternative I proposed, but you don't make use of heterogeneity and metapath information when performing subgraph sampling. I'm not sure if the sampled subgraph will be able to maintain semantic information or will be biased toward certain edge types that have more edges. It would be nice to try this out on large dataset like MAG and OAG. |
Closed due to no further discussion |
Hi @lingfanyu, have you tried mini-batch training HAN on ogbn-mag? Is it work? |
I had the same problem when building the mag dataset 3 hop meta-path using this API. The build speed is too slow and directly stuck. If this API has better performance now? |
When I use this function, the memory out of the limit.. The error said: unable to allocate 538 GiB for an array with shape.... and data type int64 could you solve this? |
🚀 Feature
I suggest the following support for
dgl.metapath_reachable_graph
:Motivation
I am trying to run Heterogeneous Graph Attention Network (HAN) on large graphs. The example HAN implementation in DGL works perfectly on small graphs like ACM (only 4k paper nodes). But I find it not easy to scale to larger datasets like ogbn-mag.
The problem
HAN defines neighborhood based on metapath. However, this step makes the graph a lot denser. Take ACM dataset as an example, there are about 4k papers, 70 fields, and 4k paper-field edges in the graph. The reachable graph of metapath paper-field-paper has about 2.2M edges.
On large datasets, there are millions of nodes, and more edge types (resulting in more potential metapaths) and performing metapath_reachable_graph for full graph does not scale.
Pitch
I want two things to happen if this feature is added:
Alternatives
One alternative way I can think of to implement what I need using DGL's current API is to use
dgl.sampling.random_walk
. For a mini-batch of nodes, it might be equivalent to performingdgl.sampling.random_walk(g, nodes, metapath)
for K times. And then I can take only the last step of the returned paths to form a size K neighborhood for each node in the batch.Another way (less relevant to requested feature) to scale HAN to large graph is to design a subgraph sampling approach which needs to be aware of the heterogeneity of the graph and the user-specified metapaths. Then HAN can run on the sampled much smaller heterogeneous subgraph.
@jermainewang @BarclayII @mufeili
The text was updated successfully, but these errors were encountered: