Daniel Cotter[1], Timothy Webster[2,3], Melissa Wilson[3,4]
2021
- Department of Genetics, Stanford University
- Department of Anthropology, University of Utah
- School of Life Sciences, Arizona State University
- Center for Evolution and Medicine, Biodesign Institute, Arizona State University
Clone this repository and make sure conda is installed. Then use the provided PAB_variation.yml environment file to set up a new conda environment.
conda env create -f environment.yml
Data should be downloaded from the provided links and copied into the data/ directory of the project folder.
Variant data is from The 1000 Genomes Project phase3 VCF files for chrX, chrY, & chr8:
-
chrX: ALL.chrX.phase3_shapeit2_mvncall_integrated_v1b.20130502.genotypes.vcf.gz
-
chrY: ALL.chrY.phase3_integrated_v2a.20130502.genotypes.vcf.gz
-
chr8: ALL.chr8.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
The strict mask as provided by The 1000 genomes Project is used for identifying monomorphic sites. It is provided as a whole genome bed file:
- Strict Mask: 20141020.strict_mask.whole_genome.bed
Population and subpopulation lists are calculated using:
- Population Panel: integrated_call_samples_v3.20130502.ALL.panel
We analyze all individuals across the X chromosome and chromosome 8 and we analyze all males across the Y chromosome. The option to analyze males or females for chrX and chr8 can be changed at the top by altering the global variable, SEX. Below, we have included the breakdown of the number of males and females in each population:
AFR Populations | AMR Populations | EAS Populations | EUR Populations | SAS Populations | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
Population codes can be found here.
Run the analysis using snakemake
once all of the raw data files are downloaded. Navigate to the top of the project directory and type the following commands:
conda activate chrX_variation
snakemake
An example of this process for the YRI
population is presented below:
This workflow contains 34 jobs while the workflow for all of the figures in the paper contains 1,737 jobs.