Graph topology inference benchmarks for machine learning

Code for benchmarking graph topology inference methods designed to improve performance of machine learning methods. We provide code for simple plug and play evaluation of new methods and also some baseline results.

Datasets

We provide 4 datasets (cora, toronto, ESC-50 and ) in numpy and Matlab format. The files are available in the refined_datasets folder. We also provide code for reproducing our feature extraction by running the extract_features.py script. If you want to re-extract the features you will have to download the raw data using the .sh scripts in the download_raw (./download_raw/{dataset}.sh DATA_FOLDER) folder

Graph inference baselines

We provide code for generating both naive inference (KNN with different values of k/similarity kernels) and NNK. We thank the authors of NNK for their easy to use and accesible code. We also test with the Kalofolias inference method, but for the moment we do not provide the code we used to do so (because it is done in matlab/octave we would need to include other toolboxes).

Our methods accept graphs in plain/compressed(gz) text form. A function save_adjacence_matrix is available in generate_graph.py to convert from numpy to this representation. Each line of the file contains:

node_i node_j weight

Testing

We provide three scripts, unsupervised_benchmark.py, semi_supervised_benchmark.py and graph_denoising.py that allow one to quickly benchmark their inferred graphs.

Spectral clustering example

To run the spectral clustering test for a normalized Cosine 10-nn graph:

python unsupervised_benchmark.py --graph_path graph/cora_Cosine_False_10_BothSides_False_False.gz --dataset cora

result:

AMI: 20.98, NMI: 21.31, ARI: 9.51

Label propagation example

To run the Label prop test for a normalized Cosine 10-nn graph:

python semi_supervised_benchmark.py --graph_path graph/cora_Cosine_False_10_BothSides_False_False.gz --dataset cora --model LabelProp

result:

Train Accuracy: 100.00, Test Accuracy: 67.79
Train STD: 0.00, Test STD: 1.08

SGC example

To run the SGC test for a normalized Cosine 10-nn graph:

python semi_supervised_benchmark.py --graph_path graph/cora_Cosine_False_10_BothSides_False_False.gz --dataset cora --model SGC

result:

Train Accuracy: 80.47, Test Accuracy: 70.34
Train STD: 1.31, Test STD: 0.97

Graph denoising example

To run the graph denoising using an NNK 1500 neighbors graph:

python graph_denoising.py --graph_path graph/toronto_RBF_False_1500_BothSides_True_False.gz

result:

Best SnR: 9.99, Best Threshold: 0.20

Reproducing the results from the paper

To reproduce the results of the paper, first you have to run the graph_denoising.sh and signals_over_graph_baselines.sh scripts. This will generate all the inferred graphs. You can then run benchmark_{task}.py to get the results for each task

Baseline results

Task 1

AMI scores:

Method	Inference/Dataset	ESC-50	cora	flowers102
C-means		0.59	0.10	0.36
Spectral clustering	Naive	0.65	0.34	0.46
Spectral clustering	NNK	0.66	0.34	0.44
Spectral clustering	Kalofolias	0.65	0.27	0.44

We also provide non aggregated results in results/unsupervised.csv

Task 2

Mean test accuracy +- Standard deviation

Method	Inference/Dataset	ESC-50	cora	flowers102
Logistic Regression	Inference/Dataset	52.92% +- 1.9	46.84% +- 1.6	33.51% +- 1.7
Label Propagation	Naive	59.05% +- 1.8	58.86% +- 2.9	36.73% +- 1.6
Label Propagation	NNK	57.44% +- 2.2	58.66% +- 2.9	33.57% +- 1.6
Label Propagation	Kalofolias	59.16% +- 1.8	58.60% +- 3.4	37.01% +- 1.7
SGC	Naive	60.48% +- 2.0	67.19% +- 1.5	37.73% +- 1.5
SGC	NNK	61.38% +- 2.0	66.58% +- 1.5	36.81% +- 1.5
SGC	Kalofolias	59.36% +- 2.0	66.28% +- 1.5	37.5% +- 1.5

We also provide non aggregated results in results/sgc.csv and results/labelprop.csv

Task 3

Graph	Best SNR
No denoising	7
Real graph	10.32
Kalofolias	10.41
RBF NNK	9.99
RBF KNN	9.80

We also provide non aggregated results in results/benchmark_denoising.csv

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
SGC @ c449c07		SGC @ c449c07
download_raw		download_raw
graph		graph
nnk		nnk
refined_datasets/features		refined_datasets/features
results		results
sound_feature_extractor @ ef3c769		sound_feature_extractor @ ef3c769
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
benchmark_denoising.py		benchmark_denoising.py
benchmark_labelprop.py		benchmark_labelprop.py
benchmark_sgc.py		benchmark_sgc.py
benchmark_unsupervised.py		benchmark_unsupervised.py
environment.yml		environment.yml
extract_features.py		extract_features.py
generate_graph.py		generate_graph.py
graph_denoising.py		graph_denoising.py
graph_denoising.sh		graph_denoising.sh
nn.csv		nn.csv
prepare_flowers102.py		prepare_flowers102.py
read_results.ipynb		read_results.ipynb
semi_supervised_benchmark.py		semi_supervised_benchmark.py
signals_over_graphs_baselines.sh		signals_over_graphs_baselines.sh
unsupervised_benchmark.py		unsupervised_benchmark.py

License

cadurosar/benchmark_graphinference

Folders and files

Latest commit

History

Repository files navigation

Graph topology inference benchmarks for machine learning

Datasets

Graph inference baselines

Testing

Spectral clustering example

Label propagation example

SGC example

Graph denoising example

Reproducing the results from the paper

Baseline results

Task 1

Task 2

Task 3

About

Resources

License

Stars

Watchers

Forks

Languages