Linking GWAS studies to genes through cis-regulatory datasets
Branch: master
Clone or download

Post-GWAS Analysis Pipeline

Copyright holder: EMBL-European Bioinformatics Institute (Apache 2 License)

This script is designed to automatically finemap and highlight the causal variants behind GWAS results by cross-examining GWAS, population genetic, epigenetic and cis-regulatory datasets.

Its original design was based on STOPGAP. It takes as input a disease identifier, extracts associated SNPs via GWAS databases, expands them by LD, then searches an array of regulatory and cis-regulatory databases for gene associations.

Pipeline diagram


Add the lib/ directory to your $PYTHONPATH environment variable.

Installing dependencies

To install all dependencies run sh

Add the bin directory to your $PATH environment variable.

Flatfile preparation

  • Via the FTP site (recommended)

    The following script downloads a bunch of files into PWD. sh

    Ideally, save these files in a separate directory, which we will call ```databases_dir````

    Everytime you run POSTGAP, add --database_dir /path/to/databases_dir to the command line.

  • Manually (sloooow) This script will create a databases_dir directory for you:

    1. Type make download to download public databases.
    2. Type make process to preprocess the databases. Warning this may take days as it needs to split the entire 1000 Genomes files by population.


By default, run from the root directory the command:

python --disease autism  

Multiple disease names can be provided.

You can also provide a list of EFOs:

python --efos EFO_0000196

Or an rsID:

python --rsID rs10009124

Or a manually defined variant:

python --coords my_variant 1 1234567 

Direct data upload

To short cut the GWAS databases and enter you own data with a file:

python --summary_stats my_stats.txt

The summary statistics file should be tab delimited with the following columns:

  • Chromosome (GRCh37)
  • Position (GRCh37)
  • MarkerName
  • Effect_allele
  • Non_Effect_allele
  • Beta
  • SE
  • Pvalue

Bayesian mode (EXPERIMENTAL)

For an EFO, you can trigger the Bayesian calculations with:

python --efos EFO_0000196 --bayesian

In this case, POSTGAP produces an output file, 'postgap_output', which can be displayed as:



By default, the script writes out a tab delimited file to standard out.

If you wish, you can redirect this into a file:

python --disease autism --output results.txt

If you want a JSON dump of all the data retrieved by the pipeline:

python --disease autism --output results.json --json
python --disease autism --json


python --database_dir /path/to/postgap/databases --efos EFO_0008263 --output EFO_0008263.txt --GWAS GWAS_Catalog

More Info

Check out our Wiki