A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs

This repository contains the implementation and experimental data for the paper A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs, accepted to PVLDB 2024.

📄 Paper: Link to paper

This repository contains the implementations of

the sampling-based hypothesis testing framework
the hypothesis-aware samplers PHASE and its optimized version Opt-PHASE.

Install and Run

Download the repository

git clone https://github.com/Carrieww/GraphHT.git

All graph data shall be stored in \datasets. Here we include a graph.pkl for MovieLens dataset for illustration.

Install required packages

pip install -r requirements.txt

Run the framework

The preprocessed dataset can be found here on OneDrive: link.

You can either run:

python main.py --sampling_method "PHASE"

or specify the sampling method in run.sh and run

bash run.sh

in the terminal.

If you want to specify the dataset, sampling budget, and hypothesis, you can specify them in config.py and run using the above two lines of code.

Output files

All output files are stored in the folder /result/one_sample_log_and_results_*. The two output files are

*.txt: a table seperated by tab storing the accuracy, time, p-value, and confidence interval results at the specified sampling budgets.
*.log: logger information

If you want to get plots for accuracy, time, p-value, and confidence interval versus the sampling budget, you can edit the parameters in makePlot.py and run it. The plot will be saved to the same folder as the txt and log files.

Citation

If you find this work useful in your research, please cite:

@article{wang2024sampling,
  title={A Sampling-Based Framework for Hypothesis Testing on Large Attributed Graphs},
  author={Wang, Yun and Kosyfaki, Chrysanthi and Amer-Yahia, Sihem and Cheng, Reynold},
  journal={Proceedings of the VLDB Endowment},
  volume={17},
  number={11},
  pages={3192--3200},
  year={2024},
  publisher={VLDB Endowment}
}

For questions or issues, please refer to the paper or contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
dataprep		dataprep
img		img
.gitignore		.gitignore
LICENSE		LICENSE
Opt_PHASE.py		Opt_PHASE.py
PHASE.py		PHASE.py
README.md		README.md
checkConvergence.py		checkConvergence.py
config.py		config.py
extraction.py		extraction.py
index.md		index.md
main.py		main.py
makePlots.py		makePlots.py
pattern.json		pattern.json
requirements.txt		requirements.txt
run.sh		run.sh
sampling.py		sampling.py
test_script.py		test_script.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs

Install and Run

Output files

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs

Install and Run

Output files

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages