Skip to content
Tool for integrative gene-based association analysis using GWAS summary stats
C++ C Fortran Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
copyrights
src
LICENCES
Makefile
README.md
defs.txt
format_gwas_summary_stats.sh

README.md

GAMBIT

A C++ tool for Gene-based Analysis with oMniBus, Integrative Tests

  • Implements several gene-based test forms (quadratic: weighted sum of Zsq, linear: weighted sum of Z, and maximum Zsq) to aggregate GWAS single-variant summary statistics cross-referenced with variant- or region-based functional annotations
  • Calculates annotation-stratified gene-based tests (e.g., TWAS/PrediXcan tests using eSNPs, gene-based tests using only coding variants, and gene-based tests using enhancer-to-target-gene maps), and omnibus tests by combining p-values for each gene
  • Inputs: GWAS association summary statistics file (chromosome, position, ref/alt allele, and z-score or beta-hat + se), annotation files, and LD reference panel

GWAS Summary Statistics

  • GWAS summary statistics files can be specified via --gwas my_summary_stats.txt.gz. Input files must be ordered by chromosome and genomic position, with input fields as shown below:
#CHR  POS     REF  ALT  SNP_ID      N         ZSCORE   ANNO
1     721290  C    G    rs12565286  58663.62  0.86661  Intergenic
1     752566  G    A    rs3094315   57135     0.5521   Intergenic
1     775659  A    G    rs2905035   54570     1.12098  Intron:LOC643837
1     777122  A    T    rs2980319   54570     1.11906  Exon:LOC643837
  • The first four fields and ZSCORE are required, while SNP_ID, ANNO and N (effective sample size) are optional.
  • See format_gwas_summary_stats.sh for annotating GWAS summary statistics files using EPACTS/TabAnno.

Annotation-Stratified Gene-Based Tests

Gene-Based Analysis with Regulatory Elements

  • To compute gene-based tests using regulatory element annotations, specify an annotation bed file with regulatory-element-to-target-gene weights via --anno-bed my_reg_elems.txt.gz, formatted
#CHR  START   END     CLASS     ELEMENT_ID          TARGET_GENES                     ANNO
chr1  567400  567600  Enhancer  chr1:567400:567600  MIB2:4.12|CPTP:2.53|GLTPD1:2.53  .
chr1  568000  568200  Enhancer  chr1:568000:568200  ATAD3A:2.75                      .
chr1  758600  758800  Enhancer  chr1:758600:758800  C1orf170:2.57|PERM1:2.57         .
chr1  769200  769400  Enhancer  chr1:769200:769400  C1orf170:3.36|PERM1:3.36         .
  • Association tests for individual regulatory elements is reported in *.stratified_out.txt files, and gene-based p-values (aggregating across regulatory elements for each gene) in *.summary_out.txt files.

  • Aggregation Methods for Regulatory Elements. By default, GAMBIT aggregates test statistics across variants in regulatory elements using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT or HMP to combine single-variant p-values, specify --no-skat and a p-value combination method via --pcomb.

Gene-Based Analysis with Coding and Other Annotated Variants

  • To compute gene-based tests using coding and other variants, GAMBIT relies on the ANNO field in GWAS summary statistics and an annotation hierarchy definitions file specified via --anno-defs my_defs.txt, formatted as below:
#CLASS    SUBCLASS          ANNO_TERMS
Coding    Protein_Altering  Nonsynonymous,Start_Loss,Stop_Gain,Stop_Loss,CodonGain,CodonLoss,Frameshift
Coding    Splice_Site       Essential_Splice_Site,Normal_Splice_Site
Coding    Exon_Other        Exon,Synonymous
UTR       UTR3              Utr3
UTR       UTR5              Utr5
  • The ANNO_TERMS field specifies a comma-separated list of annotation terms (matching terms from the GWAS summary statistics file's ANNO field), and CLASS and SUBCLASS determine the annotation hierarchy and classes reported in output files.

  • Gene-Based Test Output. Test statistics stratified by gene and annotation subclass are provided in *.stratified_out.txt files, and gene-based p-values (aggregating across annotation classes for each gene) in *.summary_out.txt files.

  • Variant Aggregation Methods. By default, GAMBIT aggregates test statistics across variants using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT or HMP to combine single-variant p-values, specify --no-skat and a p-value combination method via --pcomb.

TWAS Analysis

  • To compute TWAS/PrediXcan gene-based tests using GAMBIT, specify an eWeight file via --eweights my_eWeights.txt.gz, formatted
##TISSUE_IDS=0:Adipose_Subcutaneous,1:Adipose_Visceral_Omentum,2:Adrenal_Gland,3:Artery_Aorta
#CHR  POS     RSID       REF  ALT  BETAS
1     752566  rs3094315  G    A    C1orf159=3.92e-02@0|UBE2J2=-1.49e-02@0|FAM87B=2.75e-01@1;1.25e-01@2;1.17e-01@3
1     752721  rs3131972  A    G    LINC00115=1.15e-01@0;1.75e-02@3;4.90e-02@4|RP11-206L10.8=3.21e-02@1
1     754182  rs3131969  A    G    LINC00115=-2.1e-02@1|RP5-857K21.2=-8.27e-02@2|RP11-206L10.9=-1.11e-01@2
1     760912  rs1048488  C    T    C1orf159=3.35e-04@0|TTLL10=-1.4e-02@3|FAM87B=1.75e-01@1;1.12e-01@2;9.51e-02@3|SAMD11=-1.27e-02@2
  • The BETAS field format is eGene_A=Weight_A1@Tissue_A1;Weight_A2@Tissue_A2|eGene_B=Weight_B1@Tissue_B1, and labels for tissue IDs can be specified in the header.
  • Subsetting tissues. To restrict analysis to a subset of tissues/cell-types, specify a comma-separated list of tissues following the --tissues flag. By default, GAMBIT includes all tissues/cell-types present in the eWeight file.
  • Tissue Aggregation for Omnibus tests. GAMBIT reports both single-tissue TWAS/PrediXcan analysis results, and omnibus tests results aggregating across all specified tissues/cell-types for each eGene. Omnibus p-values for multi-tissue TWAS/PrediXcan analysis can be calculated in GAMBIT using either 1) the maximum single-tissue test statistic based on the joint distribution of single-tissue statistics, 2) the sum of squared single-tissue z-scores (analogous to SKAT), or 3) PCOMB for ACAT or HMP [default]. Omnibus test method for multi-tissue analysis can be specified via --tissue-aggreg (PCOMB, MinP, SKAT, or ALL). P-value combination method can be specified via --pcomb (ACAT or HMP).
  • Single-tissue and omnibus test output. Gene-based tests and p-values for each eGene-tissue pair are reported in *.stratified_out.txt files, and omnibus p-values (aggregating across all tissues for each eGene) in *.summary_out.txt files.

dTSS-Weighted Gene-Based Tests

  • To incorporate un-annotated regulatory variants in gene-based analysis, GAMBIT implements a dTSS (distance to Transcription Start Site) weighted gene-based test, which aggregates all single-variant p-values within a specified window from each gene's TSS using weighted ACAT or HMP and assigns higher weight to variants nearer the TSS using an exponential decay function.
  • To compute dTSS-weighted gene-based tests, specify a TSS bed file via --tss-bed my_tss_bed.bed.gz, fomatted
#CHR  START   END     SYMBOL      GENE             GENE_ANNO
1     11868   11869   DDX11L1     ENSG00000223972  transcribed_unprocessed_pseudogene
1     62947   62948   OR4G11P     ENSG00000240361  transcribed_unprocessed_pseudogene
1     69090   69091   OR4F5       ENSG00000186092  protein_coding
1     131024  131025  CICP27      ENSG00000233750  processed_pseudogene
  • Window size. The window size for dTSS-weighted gene-based tests can be modified by specifying --tss-window BASEPAIRS (500 Kbp by default).
  • dTSS decay function. The relative weight assigned to variants nearer/farther from the TSS can be modified by specifying --tss-alpha ALPHA, where alpha=0 implies all variants receive equal weight, and larger values confer more weight to variants nearer the TSS. --tss-alpha also accepts comma-separated lists of alpha values, in which case GAMBIT computes global test p-values across all specified values (individual p-values are reported in INFO output field). By default, GAMBIT uses dTSS alpha values 1e-4,5e-5,1e-5,5e-6.

Methods References

Statistical methods implemented in GAMBIT:

Software References

Libraries and resources used or adapted in GAMBIT:

Feedback and bug reports

  • Feel free to contact Corbin Quick (corbinq@gmail.com) with questions, bug reports, or feedback
You can’t perform that action at this time.