Skip to content

The repository implements HyCoSBM probabilistic model for hypergraphs that can incorporate node attributes for improved inference.

License

Notifications You must be signed in to change notification settings

badalyananna/HyCoSBM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HyCoSBM
Hypergraph Covariate Stochastic Block Model

Probabilistic model on hypergraphs able to incorporate the information about node covariates.

License: MIT Made with Python Code style: black ARXIV: 2311.03857

This repository contains the implementation of the HyCoSBM model presented in:

   [1] Hypergraphs with node attributes: structure and inference.
        Anna Badalyan, Nicolò Ruggeri, and Caterina De Bacco
        [ ArXiv ]

HyCoSBM is a stochastic block model for higher-order interactions that can incorporate node covariates for improved inference.
This code is made available for the public, if you make use of it please cite our work in the form of the references above. The implementation is based on the Hy-MMSBM model.

Code installation

The code was developed utilizing Python 3.9, and can be downloaded and used locally as-is.
To install the necessary packages, run the following command

pip install -r requirements.txt

Inference of community structure

The inference of the affinity matrix w and community assignments u is performed by running the code in main_inference.py.

The most basic run only needs a hypergraph, the number of communities K, and a path to store the results.
For example, to perform inference on the High School dataset with K=2 communities, one can run the following command:

python main_inference.py 
--K 2 --out_dir ./out_inference --pickle_file data/examples/high_school_dataset/hypergraph.pkl

The basic run, however, doesn't use the attributes. To add the attributes we need to specify the link to a csv file containing attributes with --attribute_file parameter and the names of the columns to be used as attributes in --attribute_names. By default, gamma = 0.0, we can also change this parameter by using --gamma 0.8 command. The following command runs inference on High School dataset using attributes class and sex with K = 2 and gamma = 0.8.

python main_inference.py 
--K 2 
--gamma 0.8
--out_dir ./out_inference 
--pickle_file data/examples/high_school_dataset/hypergraph.pkl
--attribute_file data/examples/high_school_dataset/attributes.csv
--attribute_names class sex

Input dataset format

It is possible to provide the input dataset in two formats.

1. Text format
A hypergraph can be provided as input via two .txt files, containing the list of hyperedges, and the relative weights. This allows the user to provide arbitrary datasets as inputs. To perform inference on a dataset specified in text format, provide the path to the two files as

python main_inference.py 
--K 2 
--out_dir ./out_inference 
--hyperedge_file data/examples/high_school_dataset/hyperedges.txt 
--weight_file data/examples/high_school_dataset/weights.txt

2. Pickle format
Alternatively, one can provide a Hypergraph instance, which is the main representation utilized internally in the code (see src.data.representation), serialized via the pickle Python library.
An example equivalent to the above is

python main_inference.py 
--K 2 
--out_dir ./out_inference 
--pickle_file data/examples/high_school_dataset/hypergraph.pkl

Similarly to the text format, this allows to provide arbitrary hypergraphs as input.

Additional options

Additional options can be specified, the full documentation is shown by running

python main_inference.py --help

Among the important ones we list:

  • --assortative whether to run inference with a diagonal affinity matrix w.
  • --max_hye_size to keep only hyperedges up to a given size for inference. If None, all hyperedges are utilized.
  • --w_prior and --u_prior the rates for the exponential priors on the parameters. A value of zero is equivalent to no prior, any positive value is utilized for MAP inference.
    For non-uniform priors, the path to a file containing a NumPy array can be specified, which will be loaded via numpy.load.
  • --em_rounds number of EM steps during optimization. It is sometimes useful when the model doesn't converge rapidly.
  • --training_rounds the number of models to train with different random initializations. The one with the highest log-likelihood is returned and saved.
  • --seed integer random seed.

Data release

All synthetically generated attributes and hypergraphs used in the experiments are available in data/generated folder.

All real datasets used in the experiments are publically available.

About

The repository implements HyCoSBM probabilistic model for hypergraphs that can incorporate node attributes for improved inference.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published