Our method contains two parts, one for topic extraction and the other for community search, the figure below shows the pipeline of the algorithm.
The pipeline of SNCS.
For the input heterogeneous graph (HG), we extract topics and reconstruct the graph in parallel. We aggregate topic vectors from the neighbors of vertices that have the same type as the query vertex
The TCS folder is the code for graph reconstruction method and community search, implemented by Java(JDK 1.8). We are using publicly available datasets that can be downloaded from the internet. To make it easier for the reader to understand, we have given an example with the dblp dataset. Note that the algorithm for community search by meta-path (BatchEcore)[1] was obtained from the author, so it is not publicly available, I added thematic constraints to the BatchEcore algorithm with modifications.
Topic extraction is a very common method and we have implemented it with the help of OCTIS(https://github.com/MIND-Lab/OCTIS). You can see the repository for details of it and we have also made public the code and sample datasets we have processed.
The proof is detailed in the "proof.pdf".
In line with existing works about meta-structures, we focus on meta-structures with diameters at most four. We select meta-structures with more connected vertices as expert suggest, so as to ensure that our query is meaningful. Our dataset contains four vertex types that, coincidentally, constitute a meta-structure. To ensure the validity of our experiments, we randomly selected 20 vertices as the set of query vertices. The topic similarity threshold
For the parameters settings of Table 4 in paper, we utilized the same vertices set as query vertices and set
Implementation details are discussed in our document called "implementationDetails.pdf", which details the quality metrics, query vertices and meta-structure selection, and the settings for
Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz and 64G of memory with Ubuntu installed.
JDK 1.8
All environment required for the OCTIS
The standard data formats for TCS-HINMS input are mainly:
graph, vertex type, edge type, topic vector, Meta-path, Meta-structure.
SNCS/Sample.java is a sample for data format.
[1] Y. Fang, Y. Yang, W. Zhang, X. Lin, and X. Cao, “Effective and efficientcommunity search over large heterogeneous information networks,”Proceedings of the VLDB Endowment, vol. 13, no. 6, pp. 854–867, 2020