Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.

A Versatile Hypergraph Model for Document Collections

Andreas Spitz, Dennis Aumiller, Bálint Soproni and Michael Gertz
"A Versatile Hypergraph Model for Document Collections"
SSDBM 2020
DOI: preprint: TODO

This repository contains the main code, including all relevant results to reproduce the figures found in the submission. We additionally provide some more detailed analysis of our underlying dataset.

Reproducing Figures from the Paper

Figures 4 and 5, including all relevant evaluation data, can be found in the folder SSDBM_figures; subfolders document_vs_storage and window_size_vs_storage are for Figure 4, and contain JSON files with the respective sizes.
To see which entities were evaluated specifically, see the entities.json in the runtime_eval subfolder.

Data Definition

For Postgres, we list the various schemata in, also see the definition in docker/base11/createDBSchema.sql, as well as and
For Neo4j, see the Neoj4 subfolder, which contains definitions for the respective models. Note that in Neo4j we need to create a temporary copy of the SQL tables due to the lack of composite primary keys in Neo4j.

Query implementations

A detailed comparison between the PostgreSQL and Neo4j queries can be found in

Dataset Analysis

To see a more descriptive analysis of the dataset used in the Evaluation section, see old_eval/dataset_plots/. In the old_eval folder, we also ship some more runtime analysis results on representative samples (entities "Donald Trump" (most frequently occurring entity), "Boris Johnson" (highly frequent), and "2016-07-09" (moderate occurrence frequency)) for various query tasks, performed only on our PostgreSQL implementation.

Installation Instructions

For now, a complete reproduction of the results is out of scope, as the original dataset is hosted internally, and will generate a blob of over 800 GB in size for the complete evaluation data. For specific installation instructions and reasonable requests about the dataset, please reach out to Dennis Aumiller (lastname (at) informatik.uni-heidelberg (dot) de).

The implementation was performed under PostgreSQL 11.4, and Neo4j 3.5.11.


If you are using our work, please cite our paper:

  author    = {Andreas Spitz and
               Dennis Aumiller and
               B\'alint Soproni and
               Michael Gertz},
  title     = {{A Versatile Hypergraph Model for Document Collections}},
  booktitle = {Proceedings of the 32nd International Conference on Scientific and
               Statistical Database Management, {SSDBM} 2020, Vienna, Austria,
               July 07-09, 2020},
  publisher = {{ACM}},
  year      = {2020},
  url       = {},
  doi       = {10.1145/1122445.1122456},


Code accompanying SSDBM 2020 submission "A Versatile Hypergraph Model for Document Collections" by Spitz et al.







No releases published


No packages published