Skip to content

djcotter/chrX_regional_variation

Repository files navigation

Analyses from X Chromosome Variation Paper

Daniel Cotter[1], Timothy Webster[2,3], Melissa Wilson[3,4]

2021

  1. Department of Genetics, Stanford University
  2. Department of Anthropology, University of Utah
  3. School of Life Sciences, Arizona State University
  4. Center for Evolution and Medicine, Biodesign Institute, Arizona State University

Analysis Steps

Step 1: Install/Set-up

Clone this repository and make sure conda is installed. Then use the provided PAB_variation.yml environment file to set up a new conda environment.

conda env create -f environment.yml

Step 2: Get data

Data should be downloaded from the provided links and copied into the data/ directory of the project folder.

Variant Data

Variant data is from The 1000 Genomes Project phase3 VCF files for chrX, chrY, & chr8:

Genome masks

The strict mask as provided by The 1000 genomes Project is used for identifying monomorphic sites. It is provided as a whole genome bed file:

Population Lists

Population and subpopulation lists are calculated using:

We analyze all individuals across the X chromosome and chromosome 8 and we analyze all males across the Y chromosome. The option to analyze males or females for chrX and chr8 can be changed at the top by altering the global variable, SEX. Below, we have included the breakdown of the number of males and females in each population:

AFR Populations AMR Populations EAS Populations EUR Populations SAS Populations
POPFemalesMales
ACB4947
YRI5652
ASW3526
ESN4653
MSL4342
GWD5855
LWK5544
TOTAL342319
POPFemalesMales
MXL3232
PUR5054
CLM5143
PEL4441
TOTAL177170
POPFemalesMales
CHB5746
JPT4856
CHS5352
CDX4944
KHV5346
TOTAL260244
POPFemalesMales
CEU5049
TSI5453
FIN6138
GBR4546
IBS5354
TOTAL263240
POPFemalesMales
GIH4756
PJL4848
BEB4442
STU4755
ITU4359
TOTAL229260

Population codes can be found here.

Step 3: Run the analysis

Run the analysis using snakemake once all of the raw data files are downloaded. Navigate to the top of the project directory and type the following commands:

conda activate chrX_variation
snakemake

An example of this process for the YRI population is presented below:

Image of DAG for YRI sample

This workflow contains 34 jobs while the workflow for all of the figures in the paper contains 1,737 jobs.

About

X chromosome regional variation by population

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published