Skip to content
kent edited this page Nov 28, 2019 · 2 revisions

Introduction

Common neighbor calculation algorithm, the goal is to find the number of common neighbors of two nodes in the network, and output the neighbor list.

Parameters

Parameter Name Description Comments
--thread Number of threads
--input_edges Path to input data. Supports HDFS. input path in CSV format , undirected graph, support gzip
--output Path to output data. Supports HDFS. output path in CSV format , use gzip compression.
--ouput_list Output the coomon neighbor list or not default value is false, only the common number is output, the format is 'src,dst,common_cnt'; otherwise, the common neighbor list is output, the format is 'src,dst,item1,item2,item3...'
--common calculate common neighbors default value is false, that is, the common neighbor calculation of homogeneous nodes (only input_edges is used to calculate the common number), parameters such as input_vertices / separator / vdata_bits are ignored; when true, the common number of heterogeneous nodes is calculated (using the list provided by input_vertices for calculation) So INPUT_VERTICES must be non-null.
--input_vertices Path to node's neighbors list Effective when COMMON is true, the format is 'user,item1:item2:item3:...'. User can appear repeatedly, append operation will be performed on items
--separator Separator of the node's neighbors list The default is ':', which takes effect when COMMON is true. If the value is '/', the input data format for the input_vertices path is 'user,item1/item2/item3/...'
--vdata_bits Input vertex's state data_bits Effective when COMMON is true, vertex state data_bits: 16/32/64. Try to choose a small number without overflowing

Input Format

Input files should be formatted as follows:

<src>,<dst>

where <src> and <dst> are integers of type uint32_t, representing the end nodes of an edge. Note that Plato treats every input graph as undirected by default. For a directed graph, please ensure both <A, B> and <B, A> appear in the input file if they exist. Edges that appear more than once will be considered as multiple edges between the same pair of nodes.

Input example (Following numbers are synthetic and are for demonstration purpose only.):

4564,823192
...
1996,973033

Output Format

Output files are formatted as follows:

<src>,<dst>,<common_cnt> where <src>,<dst> represents an edge. where <common_cnt> represents the common neigbors between <src> and <dst>

Output example (Following numbers are synthetic and are for demonstration purpose only.):

4564,823192,12
...
1996,973033,3

Code

https://github.com/Tencent/plato/blob/master/example/mutual.cc

Algorithms to open source:

  • Network Embedding
    • LINE
    • Word2Vec
    • GraphVite
  • GNN
    • GCN
    • GraphSage
Clone this wiki locally