Question about the sampling code #9

chocolate9624 · 2021-01-04T08:10:17Z

In line #L517, nodes of the lowest layer are treated as input nodes for GCN. This suggests the lowest layer contains all the nodes in the sampled sub-graph. However, it is not always true.
For example,

layer1: 4->2 5 -> 2
layer2: 2->1 3->1
layer3: 1->0

the lowest layer contains node (4, 5, 2), the middle layer contain nodes (2,3,1), the top layer contains nodes (1, 0). In your code, the features for nodes 1 and 3 are lost.

acbull · 2021-01-05T03:55:14Z

Sorry which line are you referring to?

We don't assume the lowest layer contains all the nodes, as we are conducting layer-wise sampling. For each GNN layer, the source nodes and target nodes could be different. (which is stored as a specific adjacency matrix in

LADIES/pytorch_ladies.py

Lines 117 to 120 in 303036e

    
           adj = row_norm(U[: , after_nodes].multiply(1/p[after_nodes])) 
        
           #     Turn the sampled adjacency matrix into a sparse matrix. If implemented by PyG 
        
           #     This sparse matrix can also provide index and value. 
        
           adjs += [sparse_mx_to_torch_sparse_tensor(row_normalize(adj))]

)

What you refer to is more like graph-wise sampling, by sampling a subgraph and then apply GNN on the whole sampled graph, which is different from our setting.

chocolate9624 · 2021-01-07T12:48:18Z

Thanks for your quick response!

LADIES/pytorch_ladies.py

Line 157 in c10b526

return adjs, previous_nodes, batch_nodes

This line returns the "previous_nodes". The 'previous_nodes' is the nodes from the lowest layer.

LADIES/pytorch_ladies.py

Lines 241 to 246 in c10b526

    
           for adjs, input_nodes, output_nodes in train_data:     
        
               adjs = package_mxl(adjs, device) 
        
               optimizer.zero_grad() 
        
               t1 = time.time() 
        
               susage.train() 
        
               output = susage.forward(feat_data[input_nodes], adjs)

In these lines, the 'previous_nodes' is renamed as 'input_nodes'. And these nodes ('input_nodes') are used to slice node features for the sampled subgraph.

I think the 'previous_nodes' from the lowest layer may not contain all the nodes in the sampled subgraph. The node features for some of the nodes are not used.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the sampling code #9

Question about the sampling code #9

chocolate9624 commented Jan 4, 2021 •

edited

Loading

acbull commented Jan 5, 2021 •

edited

Loading

chocolate9624 commented Jan 7, 2021

Question about the sampling code #9

Question about the sampling code #9

Comments

chocolate9624 commented Jan 4, 2021 • edited Loading

acbull commented Jan 5, 2021 • edited Loading

chocolate9624 commented Jan 7, 2021

chocolate9624 commented Jan 4, 2021 •

edited

Loading

acbull commented Jan 5, 2021 •

edited

Loading