Skip to content
This repository has been archived by the owner on May 30, 2024. It is now read-only.

Resources for the GA4GH 2020 Plenary.

Notifications You must be signed in to change notification settings

DNAstack/plenary-resources-2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Resources for the 2020 GA4GH Plenary

Workflows

A test workflow may be found here. This workflow runs a basic GWAS on an input 1000 genomes joint-called chromosome VCF using PLINK. A sample input file specifies the gs:// locations of some sample workflow inputs.

The metadata file CSV input is available both in a public GCS bucket as well as in the workflows directory. This file has metadata associated with all samples found in the 1KG VCF, as well as an additional column containing simulated data for a disease (case/control). This Simulated_disease column is used as the phenotype in the GWAS, and the Super_Population column is added as a covariate.

If this workflow is run locally (rather than using a Google Cloud backend), a 1000 genomes VCF for the chromosome of interest should be downloaded locally and the inputs file should be altered to point to the local file paths for the VCF and metadata file, rather than their gs:// locations.

The output for the workflow is the association file generated by PLINK as well as a Manhattan plot generated from that association file.