Source code for (1) analysis of real-world hypergraphs and (2) our model HyperFF for generating realistic hypergraphs, described in the paper Evolution of Real-world Hypergraphs: Patterns and Models without Oracles, Yunbum Kook, Jihoon Ko and Kijung Shin, IEEE ICDM 2020.
In this work, we (A) establish structural and temporal patterns of real-world hypergraphs and (B) devise a stochastic model for generating realistic hypergraphs. That is,
(A) Establishment of structural and dynamical patterns in real-world hypergraphs
- Structural: Heavy-tailed distributions of degrees, hyperedge sizes, intersection sizes, and singular values of incidence matrices.
- Dynamical: Diminishing overlaps of hyperedges, densification, and shrinking diameter.
(B) Stochastic model HyperFF (Hyper Forest Fire) for hypergraph generation with the following merits
- Realistic: It exhibits all seven observed patterns and the five structural patterns reported in the previous study.
- Self-contained: It does not rely on oracles or external information, and it is parameterized by just two scalars.
- Emergent: Its simple and interpretable mechanisms on individual nodes non-trivially produce the examined patterns at the macroscopic level.
To get requirements ready, run the following command on your terminal:
pip install -r requirements.txt
** Please download the datasets from the links in the table, unzip them, and put the folders unzipped under the "./data" folder so that the hierarchy would be like, for example,
data
|__contact-high-school
|__email-Eu-full
src
The datasets used in the paper are collected by Austin R. Benson and listed as follows:
Name | #Nodes | #Edges | Description | Download |
---|---|---|---|---|
contact-high-school (contact) | 327 | 172,035 | Social Interaction | Link |
email-Eu-full (email) | 1,005 | 235,263 | Link | |
tags-ask-ubuntu (tags) | 3,029 | 271,233 | Q&A | Link |
NDC-substances-full (substances) | 5,556 | 112,919 | Drug | Link |
threads-math-sx (threads) | 176,445 | 719,792 | Q&A | Link |
coauth-DBLP-full (coauth) | 1,924,991 | 3,700,067 | Coauthorship | Link |
After downloading all the required datasets, you may run the first line of the following on your terminal:
main.py [-h] [-p BURNING] [-q EXPANDING] [-n NODES] dataset
positional arguments:
dataset Select dataset for analysis
optional arguments:
-h, --help show this help message and exit
-p BURNING, --burning BURNING
Select the burning probability p (if the target dataset is 'model')
-q EXPANDING, --expanding EXPANDING
Select the expanding probability q (if the target dataset is 'model')
-n NODES, --nodes NODES
Select the number of nodes n (if the target dataset is 'model')
For example, python main.py substances
reproduces the results from NDC-substances (substances) dataset.
To reproduce the results in the paper generated by HyperFF, run python main.py -p 0.51 -q 0.2 -n 10000 model
If you use this code as part of any published research, please consider acknowledging our IEEE ICDM 2020 paper.
@inproceedings{kook2020hyperff,
title={Evolution of Real-world Hypergraphs: Patterns and Models without Oracles},
author={Kook, Yunbum and Ko, Jihoon and and Shin, Kijung},
booktitle={IEEE International Conference on Data Mining (ICDM)},
year={2020},
}