update docs

brentp · Jan 12, 2016 · e4ad650 · e4ad650
1 parent 40a8677
commit e4ad650
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 13 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -9,17 +9,15 @@ in `.ped` files and the relationships inferred from a
 corresponding `.vcf` file such as can occur from sample-swaps 
 or pedigree misspecifications.
 
-The code to do this is quite simple. Below, we check for pedigree
+The code to do this is quite simple. But we can automate using the command-line.
+Below, we check for pedigree
 violations by looking at 5,000 sites (see :doc:`relatedness <relatedness>`
 for more details on selection) and for sex discrepancies by looking at the
 non-PA regions of the X chromosome where males should have very few HET calls.
 
-.. code-block:: python
+.. code-block::
 
-    from pedagree import Ped
-    p = Ped('ceph1463.ped')
-    ped_df = p.ped_check('ceph1463.vcf.gz', plot='ped-check.png')
-    sex_df = p.sex_check('ceph1463.vcf.gz', plot='sex-check.png', cutoff=0.15)
+    python -m pedagree --plot --prefix ceph-1463 ceph1463.vcf.gz ceph1463.ped
 
 This will create the images:
 
@@ -38,12 +36,7 @@ From both of these cases, we can see that it doesn't look like there are any
 sample mixups. See the docs here for an example of how a sample mixup appears.
 
 
-Both of these commands also create pandas dataframes that can be saved to a file with:
-
-.. code-block:: python
-
-    ped_df.to_csv('ped-check.tsv', sep="\t", index=False)
-    sex_df.to_csv('sex-check.tsv', sep="\t", index=False)
+For each of those images, there is a corresponding `.csv` file.
 
 The `sex-check` file will look like::
 

diff --git a/docs/relatedness.rst → docs/qc.rst b/docs/relatedness.rst → docs/qc.rst
@@ -1,5 +1,8 @@
+QC
+==
+
 relatedness calculations
-========================
+------------------------
 
 Using cyvcf2, we can quickly calculate relatedness using the method
 described in http://www.nature.com/ng/journal/v42/n7/full/ng.608.html in
@@ -37,3 +40,17 @@ have more heterozygotes. With that in mind, we can find sample swaps
 that involve sex by observing the proportion of heterozygote calls.
 If a sample is indicated to be male by the ped file, it should have
 a low value for the proportion of het calls.
+
+het QC
+------
+
+We also check that het-calls in general have an alternate count that is 
+about 50% of the total reads. This only makes sense for germline variant
+calling but is useful for finding contamination. The actual metric is the
+inter quartile range of the alternate ratio. For perfect calls, they should
+all be exactly 0.5 so the range will be 0. With contamination, there will
+be much more of a range around 0.5.
+
+We can also check the proportion of heterozygote calls. In a contaminated
+sample the number of het calls will be much higher.
+