New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Improve Documentation for Message Passing and Computing #357

Open
mufeili opened this Issue Jan 15, 2019 · 8 comments

Comments

Projects
None yet
5 participants
@mufeili
Copy link
Member

mufeili commented Jan 15, 2019

馃摎 Documentation

This issue is related to the API documentation for message passing and is motivated by a discussion thread. Briefly, when we are performing update_all or applying recv/pull to all nodes, if some nodes do not have incoming edges, a zero placeholder will be employed and likely to overwrite the original node features. This is a bit tricky and can be very harmful for models if not properly handled. Ideally we may also want to add a convenient way for users to check automatically whether a graph is completely connected. What will be a good way to notify users about this?

@wll199566

This comment has been minimized.

Copy link

wll199566 commented Jan 15, 2019

Hi,

I can suggest two ways to do this from my experience of learning and using DGL.

  1. Add the related examples and notification in the tutorial. Since I assume most users like me start to learn DGL by the tutorial you provided, we can get alarmed at the first time we see your notifications. If it is not so bothering, I think here is a good choice.

  2. Meanwhile, adding the warning like that for zero initializer is another choice if it is possible after you add some codes to check the complete connectivity of the graph. Careful coders can think about how to avoid the warning and try to figure out the difference between the command they used and they should use.

Thanks so much again for such a convenient and awesome graph computing framework!!

@jermainewang

This comment has been minimized.

Copy link
Member

jermainewang commented Jan 16, 2019

There is a paragraph about zero-degree behaviors in the docstring of recv. It's a little bit deep though.

@mufeili

This comment has been minimized.

Copy link
Member Author

mufeili commented Jan 16, 2019

@jermainewang Probably we should still highlight it somewhere else as new comers may directly use update_all without paying attention to recv?

@jermainewang

This comment has been minimized.

Copy link
Member

jermainewang commented Jan 22, 2019

@mufeili agree. How about add things like "For nodes with no-incoming message, the reduce function will be skipped. See recv(give link) for more details." in the docstring of update_all and pull?

@mufeili

This comment has been minimized.

Copy link
Member Author

mufeili commented Jan 22, 2019

@jermainewang Sure. But I think the emphasis should be placed on that if update_all and pull are applied to nodes with no incoming messages, their node features will be overwritten, yielding highly likely undesired behavior. In that case users should use send_and_recv instead.

@hbsun2113

This comment has been minimized.

Copy link
Contributor

hbsun2113 commented Feb 17, 2019

image
Could you explain the picture of the PageRank tutorial? I am confused about the type of data and line.
@mufeili

@mufeili

This comment has been minimized.

Copy link
Member Author

mufeili commented Feb 17, 2019

Given a directed edge (i, j), assume the node features of i and j are separately hi, hj and aij is the feature of the edge. hi, hj and aij are separately kept in edge.src.data, edge.dst.data and edge.data in the figure above.

We want to update hj based on hi and aij. The whole process proceeds as follows:

  1. The source node i sends a message computed by f(hi, aij) for some function f and the message is preserved in the mailbox of the destination node.
  2. The destination node j may have received multiple messages from different source nodes and it applies a reduce function to all the messages received (e.g. averaging over all messages received). The computed result is then used as new hj.
  3. Finally an apply function may be applied to the new hj (e.g. a ReLU activation function).

I'm also cc @VoVAllen , who draw the figure and see if he wants to add points.

@VoVAllen

This comment has been minimized.

Copy link
Collaborator

VoVAllen commented Feb 18, 2019

The orange line means the output(destination) of the user defined functions and the blue line means the input of the user defined functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment