-
Notifications
You must be signed in to change notification settings - Fork 0
Step 0a. Obtain gene level MAGMA association statistics
We use MAGMA to prioritize disease-associated genes, using GWAS summary statistics data. MAGMA prioritizes genes that are proximal to genetic variants with strong genetic association to the disease.
Running MAGMA typically takes 2 steps:
- Create an annotation file that maps SNPs to genes, using window based approach
- Calculate gene-level association statistics using the annotation file and SNP-level GWAS summary statistics data
More details on how to run MAGMA can be found on the MAGMA page here.
- GWAS summary statistics data, containing SNP IDs and association p-values
- A LD reference panel matching the GWAS population. This can be downloaded from the MAGMA page here.
- A gene annotation file providing Entrez IDs, gene symbols, and start and end base pair position of the gene. This can be downloaded from the MAGMA page here.
We provide an example script to obtain gene-level MAGMA association statistics in misc/run_magma.sh (also see below). You will need to modify the relevant variables to run MAGMA on your GWAS of interest.
# Path to the MAGMA tool
MAGMA=<path to the MAGMA package>/magma
# Mapping SNP to genes using MAGMA
WINDOW=<upstream window size>,<downstream window size> # By default, WINDOW=10,10
SNP_LOC=<prefix to a reference panel plink file across all SNPs>
GENE_LOC=<a gene annotation file> # Provide Entrez ID, chromosome number, start base pair, stop base pair, strand, gene symbol
ANNOT_OUT=<MAGMA annotation output file name>
$MAGMA \
--annotate window=$WINDOW \
--snp-loc $SNP_LOC.bim \
--gene-loc $GENE_LOC \
--out $ANNOT_OUT
# Get gene-level association statistics
PVAL=<GWAS summary statistics data file> # Path to the SNP-level GWAS summary statistics file
USE=<SNP ID column name>,<p-value column name> # Provide the SNP ID and p-value column in the GWAS summary statsitics file
NCOL=<sample size colname name> # Provide the sample size column in the GWAS summary statistics file
GS_OUT=<output file name>
$MAGMA \
--bfile $SNP_LOC \
--pval $PVAL use=$USE ncol=$NCOL \
--gene-annot $ANNOT_OUT.genes.annot \
--out $GS_OUTMAGMA outputs a table of gene-level association statistics with the following columns. scEPS will use the GENE and ZSTAT column for downstream analysis.
GENE CHR START STOP NSNPS NPARAM N ZSTAT P
By default, MAGMA output uses Entrez IDs to represent genes. You will need to replace the Entrez IDs with gene symbols or Ensemble IDs, based on what's used to represent genes in the single-cell data.