Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Getting Started with EpiGEN


EpiGEN is an easy-to-use epistasis simulation pipeline written in Python. It supports epistasis models of arbitrary size, which can be specified either extensionally or via parametrized risk models. Moreover, the user can specify the minor allele frequencies (MAFs) of both noise and disease SNPs, and provide a biased target distribution for the generated phenotypes to simulate observation bias.


EpiGen is freely available on GitHub. Before installing it on your machine, make sure that you have git and git lfs installed. Then simply execute the following line in a terminal:

git clone


The user interface of EpiGEN consists of three scripts:


The script simulates epistasis data on top of a pre-computed genotype corpus. For each chromosome <CHROM> and each HAPMAP3 population code <POP>, EpiGEN contains a pre-computed corpus for 10000 individuals, which is identified by the prefix corpora/<CHROM>_<POP>_. For example, if you want to generate epistasis data with ID 0 for 7500 individuals and 10000 SNPs on top of the pre-computed corpus corpora/1_ASW_, where the parametrized epistasis model models/param_model.xml acts upon the SNPs with IDs 156, 3, and 1076 in the corpus, you can use as follows:

python3 --sim-ids 0 --corpus-id 1 --pop ASW --inds 7500 --snps 10000 --disease-snps 156 3 1076 --model models/param_model.xml  

As you will notice when executing this command, a large fraction of the runtime of is used for loading the corpora. If you want to simulate data for only a small number of individuals, it is therefore advisable to first compute your own, smaller corpora. You can also speed-up the script by unzipping the corpora before running it.

If you want to use custom corpora instead of the pre-computed ones, you can generate them via the script For example, the corpus corpora/1_ASW_ shipped with EpiGEN was generated as follows:

python3 --corpus-id 1 --pop ASW --inds 10000 --chroms 1 --compress 

The script allows you to merge pre-computed corpora into a larger corpus. For instance, the following command merges the pre-computed corpora corpora/1_ASW_ and corpora/2_ASW_ into a newly generated corpus corpora/23_ASW_:

python3 --corpus-ids 1 2 --pops ASW ASW --corpus-id 23 --append SNPS

Finally, you can validate the simulated data with the script For categorical phenotypes, this script carries out the chi-squared test, for quantitative phenotypes, it runs one-way ANOVA.

More detailed descriptions of how to use the scripts can be found in the user guide or by calling them with the option --help.

Implementing Custom Interaction Models

EpiGEN natively supports four parametrized interaction models: exponential, multiplicative, joint-dominant, and joint-recessive interaction. Further interaction models can easily be implemented by the user. Assume, for instance, that the user wants to implement xor-dominant interaction, i.e., a parametrized interaction model where there is an effect if and only if there is at least one minor allele at exactly one of the SNPs involved in the interaction. Then it suffices to insert the following five lines of code at line 242 of utils/

elif model_type == "xor-dominant":
	if np.sum(gen_at_snp_set[poss]) == 1:
		return alpha
		return 1

For consistency, it is also recommendable to add the string "xor-dominant" to the error message on line 249 of utils/, as well to the list of acceptable interaction types on line 42 of the document type definition models/ParametrizedModel.dtd.


EpiGEN has the following dependencies:

  • Python 3.3 or higher.
  • Numpy 1.17.3 or higher.
  • Scipy 1.3.1 or higher.
  • Matplotlib 3.1.1 or higher.

Moreover, due to its HAPGEN2 dependency, the script needs to be run on a Linux machine or on a machine running macOS 10.14 or lower. However, you can avoid running by using the pre-computed corpora and merging them, if necessary.

User Guide

EpiGEN comes with a detailed user guide. The main files of the HTML and LaTeX versions are, respectively, docs/build/html/index.html and docs/build/latex/user_guide.pdf. If you want to re-compile the user guide, you additionally need to install Sphinx, the extension recommonmark, and the package mock. If you have these packages installed, the HTML and PDF documentations can be re-compiled by executing make html and make latexpdf from the docs directory.


All of EpiGEN's Python sources are licensed under the GNU General Public License 3. However, this license does not cover the HAPGEN2 binaries, which are distributed with EpiGEN and are called by the script HAPGEN2 is property of the University of Oxford and may only be freely used for academic research and in accordance with the license found at Copies of the GNU General Public License 3 and of the license for HAPGEN2 are distributed with EpiGEN.

Citing EpiGEN

If you use EpiGEN, please cite the following paper:

  • D. B. Blumenthal, L. Viola, M. List, J. Baumbach, P. Tieri, T. Kacprowski (2020). “EpiGEN: an epistasis simulation pipeline”, Bioinformatics, DOI: 10.1093/bioinformatics/btaa245.

Structure of the Repository

├──                        // README
├── LICENSE                          // A copy of the GNU General Public License 3
├── requirements.txt                 // Lists dependencies
├──                 // Script to simulate epistasis data
├──      // Script to generate genotype corpus
├──        // Script to merge genotype corpora
├──       // Script to validate simulated data
├──                  // Script to test EpiGEN's runtime performance
├── docs                             // Contains Sphinx documentation
├── sim                              // Output directory for simulated data
├── corpora                          // Output directory for genotype corpora
├── temp                             // Contains auxiliary files 
├── ext                              // Contains external libraries and data
│   ├── HAPGEN2                      // Contains HAPGEN2 binaries and license
│   └── HAPMAP3                      // Contains HAPMAP3 data
├── models                           // Contains epistasis models
│   ├── ParametrizedModel.dtd        // Doctype definition for parametrized models
│   ├── ext_model.ini                // An example of an extensional model
│   ├── param_model.xml              // An example of a parametrized model
│   └── ...                          // Further models
└── utils                            // Contains the core of EpiGEN
    ├──                  // __init__ file
    ├──            // Implements simulation of epistasis data
    ├── // Implements generation of genotype corpora
    ├──     // Implements merging of genotype corpora
    ├──       // Implements parametrized models 
    ├──        // Implements extensional models
    └──          // Implements argparse checks


EpiGEN: an epistasis simulation pipeline







No releases published


No packages published