New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users should not make assumption in the batch dimension. #346

Closed
rmurphy2718 opened this Issue Jan 8, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@rmurphy2718
Copy link

rmurphy2718 commented Jan 8, 2019

馃摎 Documentation

In the API reference for "user-defined function related data structures", you write

"Note: the size of the batch dimension is determined by the DGL framework for good efficiency and small memory footprint. Users should not make assumption in the batch dimension."

This makes me very nervous, because it sounds like we never know what to expect the input to look like when writing the Edge UDF or Node UDF. For example, in the Karate tutorial with GCN, you use torch.sum(nodes.mailbox['msg'], dim=1). Your quote above makes it sound like we need to choose the dimension of summation (dim=1) on a case-by-case basis.

Could you please elaborate on what is constant whenever a node/edge UDF is called?

Thanks!

@jermainewang

This comment has been minimized.

Copy link
Member

jermainewang commented Jan 8, 2019

This looks like a general question rather than a document improvement. Please use our discussion forum so your question could help the others.

There are two things here.

First, how many dimensions of the tensors and what are their semantics? This is fixed and will not change in the future. To elaborate, nodes.data['feat'] always returns a tensor of shape (B, D1, D2, ...), where the first dimension is the batch dimension (the number of nodes in this batch) and (D1, D2, ...) is the feature shape. Likewise, edges.data['feat'] and edges.src['feat'] return tensors of shape (B, D1, D2, ...), but here B is the number of edges in this batch. For nodes.mailbox['msg'], it always return tensor of shape (B, deg, D1, D2, ...), where the second dimension is the number of in-coming messages (thus, the degree) for the nodes in the current batch. All these semantics are stable, so if messages are to be aggregated, dim=1 is always the correct answer.

Second, what is the exact value of the B dimension? Here is what we mean by "Users should not make assumption in the batch dimension". In a word, it is flexible. We might choose batching 10 nodes or 20 nodes together due to efficiency concern. That's why it is not a good idea to make assumption on that. For example, normalizing/aggregating on the batch dimension will likely to fail and result in some unexpected behaviors.

@BarclayII

This comment has been minimized.

Copy link
Collaborator

BarclayII commented Jan 8, 2019

The features in edges.src, edges.dst and edges.data of Edge UDF, as well as nodes.data of Node UDF, are always stacked along the first dimension (i.e. dim=0). The statement you quoted actually means that one should not make any assumption on how the features are stacked (e.g. how many nodes/edges are stacked at a time, what is the order of stacking, etc.)

For message receiving, incoming messages to all the nodes are stored in nodes.mailbox of Node UDFs. The messages are always stacked along the second dimension for different incoming messages of a single node, and along the first dimension for different nodes. Again, one should not assume the number of nodes stacked at a time, the order of stacking, etc. It does make sure that all nodes at a time will have the same number of incoming messages (hence being able to stack along the second dimension into a dense tensor), though.

That being said, I personally found that the documentation (in particular the tutorials) about features/messages stacking is somewhat scattered and unclear. We need to refine that.

@rmurphy2718

This comment has been minimized.

Copy link
Author

rmurphy2718 commented Jan 8, 2019

Thank you for your quick and helpful response! I'm sorry for posting this in the wrong place, I will use the discussion forum in the future and also include a link to this post in the forums so others can find it.

@jermainewang

This comment has been minimized.

Copy link
Member

jermainewang commented Jan 8, 2019

Yeah, @BarclayII please go ahead and fix that. Also @rmurphy2718 thanks for the question so we could fix the vagueness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment