This is the code repository for the paper: Exploratory Analysis of Graph Data by Leveraging Domain Knowledge
By Di Jin, Danai Koutra.
"Summarize an unknown graph from known ones."
The data directory contains "real_train", the directory containing raw files of the domain knowledge (known graphs) and "real_test", the directory containing the input unknown graph file.
The directory with experiments conducted in the paper. To run the experiments, run "exp_effectiveness", "exp_scalability_1", "exp_scalability_2" and "exp_sensitivity". For example, the evaluation of the diversity and domain-specificity of the graph invariant distributions selected by EAGLE and the baselines is conducted with the command
$ exp_effectiveness
The supplementary results of Satisfaction of Desired Properties
(Section V, part D) from the paper can be obtained by running the same script with different correlation metrics. To be specific, the first figure shows the evaluation based on correlation using Pearson correlation which is the figure in the paper. The second figure shows the evaluation based on correlation using Kendall's tau. The third figure shows the evaluation based on correlation using Spearman's rank correlation. As stated in the paper, in all three cases we observe that EAGLE outperforms the baseline methods.
The directory contains the extra graph invariants computed through SNAP.
- matlab_bgl: This library is adopted from the Internet written by David Gleich (https://www.cs.purdue.edu/homes/dgleich/packages/matlab_bgl/).
- util: This library contains several toolkits used in this project.
This directory contains the processed raw graphs in the format to run EAGLE.
This directory contains the stored experimental results in the paper conducted with scripts in analysis.
This code is built in MATLAB 2016a. The preprocessing procedure is time-consuming, the command to run without preprocessing the raw data files is:
$ main