-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make graph compaction handle isolated nodes #1266
Comments
Solution is to add an optional |
Even you preserve the isolated nodes (as seed nodes), they still can not get their embeddings in training through message passing. Will this cause some problems in some models that concatenate embeddings layer by layer? |
Isolated nodes indeed do not get anything from message passing. In the case of GraphSAGE, it naturally reduces to MLP for mean and LSTM aggregators according to the formulation (our implementation on max aggregator currently returns
This actually throws a wrench in heterogeneous graph training for instance, especially if some sampling algorithm samples on only some of the relations (e.g. MEIRec). Which edge type would you assign it to? |
What I mean in 'concatenate embeddings layer by layer' is that the embeddings from each frontier is concatenated together. You can refer to nn.pytorch.conv.tagconv as an example.
This is a problem for self_loop. It seems there is no silver bullet in this case. |
🚀 Feature
Make graph compaction keep a given set of isolated nodes.
Motivation
In the future, our recommended way of performing minibatch training on node classification is shown in the following pseudocode (see the document in #1199 for a more complete explanation):
If the seed nodes contain isolated nodes (i.e. those with no inbound edges), then the seed nodes would actually be removed from the sampled frontiers in
compact_graphs
. The consequence is that those isolated nodes would never be trained in node classification by the pipeline above.Note that link prediction does not suffer from this problem. We recommend to construct a pair graph with edges connecting the positive and negative pairs respectively, and compact the pair graphs and frontiers together. Therefore even the isolated nodes would have at least one edge in one of the pair graphs, and would not be removed during graph compaction.
Alternatives
We can technically ignore the isolated nodes during training. It is not clear how ignoring those examples would impact performance on current benchmarks, but if a GNN model did fail to beat a baseline model on a dataset, it would be hard to determine if the performance loss is due to discarding the isolated nodes.
We can also work around it by manually adding self loops for isolated nodes, but this would introduce other subtleties, such as assigning an edge type for such self loops, and changing the formulation of GraphSAGE and other GNNs for such corner cases (although they don't explicitly speak of how to handle isolated nodes anyway).
Pitch
Handle training of isolated nodes in the same minibatch training pipeline.
The text was updated successfully, but these errors were encountered: