Overview of the data QC, code, and GWAS summary output from the 2017 UK Biobank data release
Switch branches/tags
Nothing to show
Clone or download
liameabbott Merge pull request #12 from liameabbott/master
added script selecting European ancestry samples
Latest commit 91c29cb Aug 29, 2018
Permalink
Failed to load latest commit information.
imputed-v2-gwas updated README Apr 26, 2018
00.load_sample_qc_kt.py update Apr 21, 2018
01.load_genotype_snp_qc_kt.py update Apr 21, 2018
02.export_vcf.py update Apr 21, 2018
03.load_genotype_vds.py update Apr 21, 2018
04.subset_samples.py update Apr 21, 2018
05.create_pca_vds.py update Apr 21, 2018
06.run_pca.py update Apr 21, 2018
07.vep_imputed_v3_sites.py update Apr 21, 2018
08.categorize_vep_consequences.py update Apr 21, 2018
09.load_mfi_vds.py update Apr 21, 2018
10.run_variant_qc_autosomes.py update Apr 21, 2018
11.run_variant_qc_chrX.py update Apr 21, 2018
12.subset_variants_autosomes.py added ldsc sumstats export May 1, 2018
13.subset_variants_chrX.py update Apr 21, 2018
14.create_covariates_kt.py update Apr 21, 2018
15.create_phesant_pipelines.py update Apr 21, 2018
16.create_icd10_pipelines.py update Apr 21, 2018
17.create_finngen_pipelines.py update Apr 21, 2018
18.create_curated_pipeline.py update Apr 21, 2018
19.create_icd10_phenotype_summaries.py update Apr 21, 2018
20.create_finngen_phenotype_summaries.py update Apr 21, 2018
21.create_curated_phenotype_summaries.py update Apr 21, 2018
22.run_regressions.py update Apr 21, 2018
23.export_results.py update Apr 21, 2018
24.create_variant_annotation_file.py added ldsc sumstats export May 1, 2018
25.create_phenotype_annotation_files.py update Apr 21, 2018
26.export_ldsc_sumstats.py change HM3 file to preemptively remove MHC for ldsc May 3, 2018
README.md updated README Aug 1, 2018
ukb31063_eur_selection.R added script selecting European ancestry samples Aug 29, 2018

README.md

Table of Contents

Updates

With the re-release of UK Biobank genotype imputation (which we term imputed-v3), we have generated an updated set of GWAS summary statistics for the genetics community.

  • Increased the number of phenotypes with application UKB31063 and addtl. custom curated phenotypes (see imputed-v3 Phenotypes)
  • More liberal inclusion of samples (see imputed-v3 Sample QC)
  • Inclusion of more SNPs (see imputed-v3 Variant QC)
  • Updates to our association model (imputed-v3 Association model) Our largest change is that for all phenotypes, we have run a female-only and male-only GWAS along with the full set.

Information and scripts from the previous round of GWAS are available in the imputed-v2-gwas subdirectory

imputed-v3 Phenotypes

  • Auto-curated phenotypes using PHESANT:

  • ICD10 codes (all non-coded individuals treated as controls)

  • Curated phenotypes in collaboration with the FinnGen consortium

  • Phenotypes in both sexes

    • PHESANT: 2891 total (274 continuous / 271 ordinal / 2346 binary)
    • ICD10: 633 binary
    • FinnGen curated: 559
  • Phenotypes in females

    • PHESANT: 2393 total (259 continuous / 257 ordinal / 1877 binary)
    • ICD10: 482 binary
    • FinnGen curated: 412
  • Phenotypes in males

    • PHESANT: 2305 total (262 continuous / 259 ordinal / 1784 binary)
    • ICD10: 439 binary
    • FinnGen curated: 400
  • Unique PHESANT phenotypes: 3011, of which 274 are continuous

  • 4203 total unique phenotypes: 3011 PHESANT + 559 finngen + 633 ICD10

  • Summary files:

    • phenotypes.both_sexes.tsv.gz
    • phenotypes.female.tsv.gz
    • phenotypes.male.tsv.gz
    • phenotype - phenotype ID
    • description - short description of phenotype
    • source - PHESANT auto-curation, ICD10, or FinnGen
    • n_controls - number of QC positive samples responding negatively to phenotype designation (NA if quantitative)
    • n_cases - number of QC positive samples responding affirmatively to phenotype designation (NA if quantitative)
    • n_missing - number of missing QC positive samples
    • n_non_missing - number of non-missing QC positive samples

imputed-v3 Sample QC

  • imputed-v3 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • Use provided PCs for European sample selection to determine British ancestry
      • Use 7 standard deviations away from the 1st 6 PCs
      • Further Filter to self-reported 'white-British' / 'Irish' / 'White'
    • QCed sample count: 361194 samples
  • imputed-v2 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • White.british.ancestry filter
    • QCed sample count: 337199 samples

imputed-v3 Variant QC

  • imputed-v3 parameters
    • Autosomes and X chromosome (but not pseudo-autosomal region or XY)
    • SNPs from HRC, UK10K, and 1KG imputation (~90 million)
    • INFO score > 0.8
    • MAF > 0.0001
      • Exception: VEP annotated Missense and PTV MAF > 1e-6
    • HWE p-value > 1e-10
    • QCed SNP count: 13.7 million
  • imputed-v2 parameters
    • Autosomes only
    • SNPs from HRC imputation (~40 million)
    • INFO score > 0.8
    • MAF > 0.0001
    • QCed SNP count: 10.9 million

imputed-v3 Association model

  • imputed-v3 model
    • Linear regression model in Hail (linreg)
    • Three GWAS per phenotype
      • Both sexes
      • Female only
      • Male only
    • Covariates: 1st 20 PCs + sex + age + age^2 + sexage + sexage2
    • Sex-specific covariates: 1st 20 PCs + age + age^2
    • Extra column for variant confidence in case/control phenotypes
      • column name: expected_case_minor_AC
      • Used to filter out false-positive SNPs when case count is low
      • Blog details here
  • imputed-v2 model
    • Linear regression model in Hail (linreg)
    • Covariates: 1st 10 PCs + sex