Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Summary Statistics File Format
This page describes all new file formats introduced for use with the --h2 and --rg flags.
NOTE chromosomes are assumed to be integers. We haven't yet implemented LD Score regression for sex chromosomes
For GWAS data. Whitespace-delimited text, one row per SNP with a header row. Column order does not matter.
We recommend that you convert your summary statistics to the
.sumstats format using the
munge_sumstats.py program included with
munge_sumstats.py checks all the gotchas that we've run into over the course of developing this software and applying it to a lot of data.
SNP-- SNP identifier (e.g., rs number)
N-- sample size (which may vary from SNP to SNP).
Z-- z-score. Sign with respect to
A1(warning, possible gotcha)
A1-- first allele (effect allele)
A2-- second allele (other allele)
ldsc filters out all variants that are not SNPs and strand-ambiguous SNPs.