Skip to content
Tobias Tilly edited this page May 4, 2017 · 4 revisions

Welcome to the wgs-exercise wiki!

Here you will find basic information on the created tool.

First of all: What is wgs-exercise?

wgs-exercise is a python tool, which wraps around the bcf-tools and allows one to generate a good range of statistics automated with one command. It is split into multiple modules which all have there own field of functions. Therefore extending the tool is made as easy as possible. The entrypoint of the tool is the main.py. It holds a small command line interface and allows one to use some of the predefined workflows.

Prerequisites

For running and using the wgs-exercise tool, you need matplotlib and numpy as python libraries (both available via pip). The bcf-tools(version 1.4)[0], the vcf-tools[1] should be downloaded and installed from github(links provided). The tabix and python-tk packages are available in most package managers.

Mainly the bcf-tools are used for generating statistics which are later parsed and partly plotted. The vcf-tools provide the tools for creating the subset of the 1000genomes-vcf files and generating the Hardy-Weinberg analysis.

[0] https://github.com/samtools/bcftools

[1] https://github.com/vcftools/vcftools

The CLI

The CLI holds 3 possible workflows. Running python main.py --download 1 the tool will start download Chromosome 1 of the 1000genomes project (phase 3), the according tbi file, the panel (which holds information regarding the individuals) and Chromosome 1 of the GnomAD project.

With python main.py --summary FILE the tool creates a small statistic summary for the given FILE.

By running python main.py --files FILE1 FILE2 one makes a full comparison between the statistics of those files. This may take a while, due to creation of population subsets of FILE1 (which should be the 1000genomes file) and intersections between FILE1 and FILE2.

Created statistics are stored in bcftools-output/, subsets in the data/ and intersections in data/isec_$POPdirectories. Generated plots will be saved as png files.

Sidenotes

In the commands_used.txt you can find most of the commands which have been used/tested for creating this wrapper.