Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
brentp committed Jan 12, 2016
1 parent 40a8677 commit e4ad650
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 13 deletions.
17 changes: 5 additions & 12 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,15 @@ in `.ped` files and the relationships inferred from a
corresponding `.vcf` file such as can occur from sample-swaps
or pedigree misspecifications.

The code to do this is quite simple. Below, we check for pedigree
The code to do this is quite simple. But we can automate using the command-line.
Below, we check for pedigree
violations by looking at 5,000 sites (see :doc:`relatedness <relatedness>`
for more details on selection) and for sex discrepancies by looking at the
non-PA regions of the X chromosome where males should have very few HET calls.

.. code-block:: python
.. code-block::
from pedagree import Ped
p = Ped('ceph1463.ped')
ped_df = p.ped_check('ceph1463.vcf.gz', plot='ped-check.png')
sex_df = p.sex_check('ceph1463.vcf.gz', plot='sex-check.png', cutoff=0.15)
python -m pedagree --plot --prefix ceph-1463 ceph1463.vcf.gz ceph1463.ped
This will create the images:

Expand All @@ -38,12 +36,7 @@ From both of these cases, we can see that it doesn't look like there are any
sample mixups. See the docs here for an example of how a sample mixup appears.


Both of these commands also create pandas dataframes that can be saved to a file with:

.. code-block:: python
ped_df.to_csv('ped-check.tsv', sep="\t", index=False)
sex_df.to_csv('sex-check.tsv', sep="\t", index=False)
For each of those images, there is a corresponding `.csv` file.

The `sex-check` file will look like::

Expand Down
19 changes: 18 additions & 1 deletion docs/relatedness.rst → docs/qc.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
QC
==

relatedness calculations
========================
------------------------

Using cyvcf2, we can quickly calculate relatedness using the method
described in http://www.nature.com/ng/journal/v42/n7/full/ng.608.html in
Expand Down Expand Up @@ -37,3 +40,17 @@ have more heterozygotes. With that in mind, we can find sample swaps
that involve sex by observing the proportion of heterozygote calls.
If a sample is indicated to be male by the ped file, it should have
a low value for the proportion of het calls.

het QC
------

We also check that het-calls in general have an alternate count that is
about 50% of the total reads. This only makes sense for germline variant
calling but is useful for finding contamination. The actual metric is the
inter quartile range of the alternate ratio. For perfect calls, they should
all be exactly 0.5 so the range will be 0. With contamination, there will
be much more of a range around 0.5.

We can also check the proportion of heterozygote calls. In a contaminated
sample the number of het calls will be much higher.

0 comments on commit e4ad650

Please sign in to comment.