Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the sampling code #9

Open
chocolate9624 opened this issue Jan 4, 2021 · 2 comments
Open

Question about the sampling code #9

chocolate9624 opened this issue Jan 4, 2021 · 2 comments

Comments

@chocolate9624
Copy link

chocolate9624 commented Jan 4, 2021

In line #L517, nodes of the lowest layer are treated as input nodes for GCN. This suggests the lowest layer contains all the nodes in the sampled sub-graph. However, it is not always true.
For example,


layer1: 4->2 5 -> 2
layer2: 2->1 3->1
layer3: 1->0


the lowest layer contains node (4, 5, 2), the middle layer contain nodes (2,3,1), the top layer contains nodes (1, 0). In your code, the features for nodes 1 and 3 are lost.

@acbull
Copy link
Owner

acbull commented Jan 5, 2021

Sorry which line are you referring to?

We don't assume the lowest layer contains all the nodes, as we are conducting layer-wise sampling. For each GNN layer, the source nodes and target nodes could be different. (which is stored as a specific adjacency matrix in

LADIES/pytorch_ladies.py

Lines 117 to 120 in 303036e

adj = row_norm(U[: , after_nodes].multiply(1/p[after_nodes]))
# Turn the sampled adjacency matrix into a sparse matrix. If implemented by PyG
# This sparse matrix can also provide index and value.
adjs += [sparse_mx_to_torch_sparse_tensor(row_normalize(adj))]
)

What you refer to is more like graph-wise sampling, by sampling a subgraph and then apply GNN on the whole sampled graph, which is different from our setting.

@chocolate9624
Copy link
Author

Thanks for your quick response!

return adjs, previous_nodes, batch_nodes

This line returns the "previous_nodes". The 'previous_nodes' is the nodes from the lowest layer.

LADIES/pytorch_ladies.py

Lines 241 to 246 in c10b526

for adjs, input_nodes, output_nodes in train_data:
adjs = package_mxl(adjs, device)
optimizer.zero_grad()
t1 = time.time()
susage.train()
output = susage.forward(feat_data[input_nodes], adjs)

In these lines, the 'previous_nodes' is renamed as 'input_nodes'. And these nodes ('input_nodes') are used to slice node features for the sampled subgraph.

I think the 'previous_nodes' from the lowest layer may not contain all the nodes in the sampled subgraph. The node features for some of the nodes are not used.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants