Skip to content

Commit

Permalink
add docs on best-practices for resolving errors detected by peddy
Browse files Browse the repository at this point in the history
  • Loading branch information
brentp committed Nov 1, 2016
1 parent b6c73d7 commit f78724f
Show file tree
Hide file tree
Showing 22 changed files with 137 additions and 44 deletions.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/html.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/output.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/qc.doctree
Binary file not shown.
Binary file added docs/_build/doctrees/resolve.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 377ab8e42b60ca097ab33ac59cda77ea
config: e0a5a6255b38100ac5e31b14cc706c83
tags: 645f666f9bcd5a90fca523b33c5a78b7
8 changes: 5 additions & 3 deletions docs/_build/html/_sources/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@ The command-line usage looks like:

.. code-block:: bash

python -m peddy -p 12 --plot ceph1463.vcf.gz ceph1463.ped
python -m peddy -p 4 --plot ceph1463.vcf.gz ceph1463.ped

This will use 12 cpus to run various checks and create `ceph1463.html <_static/ceph.html>`_ which
you can open in any browser to interactively explore your data.
This will use 4 cpus to run various checks and create `ceph1463.html <_static/ceph.html>`_ which
you can open in any browser to interactively explore your data. Unless you have triple digit numbers
of samples, using more than 4 cpus will give only marginal improvement.

It will also create create 4 csv files and 4 static QC plots that mirror those in the interactive html.
These will indicate:
Expand Down Expand Up @@ -109,3 +110,4 @@ See `output` for a description of the columns.

output
html
resolve
6 changes: 3 additions & 3 deletions docs/_build/html/_sources/qc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ have fast methods to do this on the entire genome.
The limitations of these methods is that they assume the average pair of samples
is unrelated.

In `peddy`, we use about 5,000 variants described in http://www.nature.com/nature/journal/v506/n7487/full/nature12975.html
In `peddy`, we use about 25,000 variants described in http://www.nature.com/nature/journal/v506/n7487/full/nature12975.html
that are known to be targeted by most exome platforms, in hardy weinberg equilibrium in 1000 genomes,
and mostly unlinked.

When a user requests to calculate relatedness, we use those 5K sites and
When a user requests to calculate relatedness, we use those 25K sites and
the genotypes from the 2504 1KG samples to provide a background of samples so
that most samples are indeed unrelated. Since we are sampling on 5K sites,
that most samples are indeed unrelated. Since we are sampling on 25K sites,
the calculations are quite fast (~5 minutes) and match very well what
we get from a whole-genome scan because of the properties of those sites.
Though we use the additional 2504 1KG samples internally, only the information
Expand Down
40 changes: 40 additions & 0 deletions docs/_build/html/_sources/resolve.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _output:

Error Resolution
================

Once `peddy` finds errors, the user must decide wether to discard bad samples other
to resolve the errors. Deciding how to resolve the errors can be difficult. heterozygote
we enumerate some observations or strategies for doing this.

In our experience a general strategy to follow is this:

1) Look for samples that are outliers in the detph vs heterozygosity plot. If the sample appears
as an outlier there, it is also likely to appear abberant in the sex plot and the relatedness plot.
If the heterozygosity is too high, the sample will need to be discarded as it likley has
contamination. If it's too low, the researcher should consider if the sample could be consanguineous.

2) Look for samples that are in obvious error in the sex-plot. If a sample is an outlier in the sex plot
it must be either a swap involving samples of different sex, and mis-representation in the PED file,
or sample with an additional sex chromosome such as in Turner's syndrome.

3) Look at the relatedness plot with abberant samples from the above 2 checks in mind. If we have seen
2 parents from a trio that both have a reported sex that doesn't match their genotypes, we can between
quite sure that either the samples have been swapped or the ped file has swapped them names.

4) Look for a single point of a different color in a cluster of other colors. E.g. a blue point (indicating
that the sample is unrelated according to the pedigree file) clustering with a group of green triangles
(indicating sib-sib pairs) is often a case where the parents of actual siblings have not been specified
in the ped file. The solution for this is to add matching parental ids to the ped file.

Other cases like this are also fairly common in our experience, where, for example, a parental id was
mis-specified and is therefore reported as unrelated to the kid by the ped file.

5) For large families, a single sample swap can affect many relations. Hovering over points in the relatedness
plot that are out-of-place will reveal a single (or few) samples that are consistently involved in them
outlier pairs. Once that sample is identified, it can be removed or the pedigree file can be adjusted idr_baf
possible.

6) For large cohorts with many problematic samples or relationships, the user can limit the view to a selected
family by typing a family id in the box below the family id column. Resolving one family at a time also
described above can be a way to iteratively pare down errors.
3 changes: 2 additions & 1 deletion docs/_build/html/_static/ceph.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
</span>

<div class="container" style="width:98%">
<table id="pedigree" class="stripe row-border" cellspacing="0"></table>
<table id="pedigree" class="cchild stripe row-border" cellspacing="0"></table>
</div>

<div id="pcaplot"> </div>
Expand Down Expand Up @@ -214,6 +214,7 @@
titlefont: {size: 19},
showline: false,
tickfont: {size: 12},
range: [0.34, 0.46],
},
legend: {x: 0.7, y: 0.7, bgcolor: '#EEE'},
margin: { t: 30 },
Expand Down
8 changes: 4 additions & 4 deletions docs/_build/html/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

<title>Index &mdash; peddy 0.1.9 documentation</title>
<title>Index &mdash; peddy 0.2.7 documentation</title>

<link rel="stylesheet" href="_static/haiku.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />

<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.1.9',
VERSION: '0.2.7',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
Expand All @@ -24,11 +24,11 @@
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="peddy 0.1.9 documentation" href="index.html" />
<link rel="top" title="peddy 0.2.7 documentation" href="index.html" />
</head>
<body role="document">
<div class="header" role="banner"><h1 class="heading"><a href="index.html">
<span>peddy 0.1.9 documentation</span></a></h1>
<span>peddy 0.2.7 documentation</span></a></h1>
<h2 class="heading"><span>Index</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
Expand Down
16 changes: 11 additions & 5 deletions docs/_build/html/html.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

<title>Example HTML &mdash; peddy 0.1.9 documentation</title>
<title>Example HTML &mdash; peddy 0.2.7 documentation</title>

<link rel="stylesheet" href="_static/haiku.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />

<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.1.9',
VERSION: '0.2.7',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
Expand All @@ -23,12 +23,13 @@
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="peddy 0.1.9 documentation" href="index.html" />
<link rel="top" title="peddy 0.2.7 documentation" href="index.html" />
<link rel="next" title="Error Resolution" href="resolve.html" />
<link rel="prev" title="CSV Output" href="output.html" />
</head>
<body role="document">
<div class="header" role="banner"><h1 class="heading"><a href="index.html">
<span>peddy 0.1.9 documentation</span></a></h1>
<span>peddy 0.2.7 documentation</span></a></h1>
<h2 class="heading"><span>Example HTML</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
Expand All @@ -37,6 +38,8 @@ <h2 class="heading"><span>Example HTML</span></h2>
«&#160;&#160;<a href="output.html">CSV Output</a>
&#160;&#160;::&#160;&#160;
<a class="uplink" href="index.html">Contents</a>
&#160;&#160;::&#160;&#160;
<a href="resolve.html">Error Resolution</a>&#160;&#160;»
</p>

</div>
Expand Down Expand Up @@ -114,7 +117,7 @@ <h1>Example HTML<a class="headerlink" href="#example-html" title="Permalink to t
</span>

<div class="container" style="width:98%">
<table id="pedigree" class="stripe row-border" cellspacing="0"></table>
<table id="pedigree" class="cchild stripe row-border" cellspacing="0"></table>
</div>

<div id="pcaplot"> </div>
Expand Down Expand Up @@ -262,6 +265,7 @@ <h1>Example HTML<a class="headerlink" href="#example-html" title="Permalink to t
titlefont: {size: 19},
showline: false,
tickfont: {size: 12},
range: [0.34, 0.46],
},
legend: {x: 0.7, y: 0.7, bgcolor: '#EEE'},
margin: { t: 30 },
Expand Down Expand Up @@ -689,6 +693,8 @@ <h1>Example HTML<a class="headerlink" href="#example-html" title="Permalink to t
«&#160;&#160;<a href="output.html">CSV Output</a>
&#160;&#160;::&#160;&#160;
<a class="uplink" href="index.html">Contents</a>
&#160;&#160;::&#160;&#160;
<a href="resolve.html">Error Resolution</a>&#160;&#160;»
</p>

</div>
Expand Down
18 changes: 10 additions & 8 deletions docs/_build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

<title>manipulation, validation and exploration of pedigrees &mdash; peddy 0.1.9 documentation</title>
<title>manipulation, validation and exploration of pedigrees &mdash; peddy 0.2.7 documentation</title>

<link rel="stylesheet" href="_static/haiku.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />

<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.1.9',
VERSION: '0.2.7',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
Expand All @@ -23,12 +23,12 @@
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="peddy 0.1.9 documentation" href="#" />
<link rel="top" title="peddy 0.2.7 documentation" href="#" />
<link rel="next" title="CSV Output" href="output.html" />
</head>
<body role="document">
<div class="header" role="banner"><h1 class="heading"><a href="#">
<span>peddy 0.1.9 documentation</span></a></h1>
<span>peddy 0.2.7 documentation</span></a></h1>
<h2 class="heading"><span>manipulation, validation and exploration of pedigrees</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
Expand All @@ -50,11 +50,12 @@ <h1>manipulation, validation and exploration of pedigrees<a class="headerlink" h
<p>It samples the VCF at about 25000 sites (plus chrX) to accurately estimate <strong>relatedness</strong>, <strong>IBS0</strong>, <strong>heterozygosity</strong>, <strong>sex</strong> and <strong>ancestry</strong>. It uses 2504 thousand genome samples as backgrounds to calibrate the relatedness calculation and to make ancestry predictions.</p>
<p>It does this very quickly by sampling, by using C for computationally intensive parts, and by parallelization.</p>
<p>The command-line usage looks like:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>python -m peddy -p <span class="m">12</span> --plot ceph1463.vcf.gz ceph1463.ped
<div class="highlight-bash"><div class="highlight"><pre><span></span>python -m peddy -p <span class="m">4</span> --plot ceph1463.vcf.gz ceph1463.ped
</pre></div>
</div>
<p>This will use 12 cpus to run various checks and create <a class="reference external" href="_static/ceph.html">ceph1463.html</a> which
you can open in any browser to interactively explore your data.</p>
<p>This will use 4 cpus to run various checks and create <a class="reference external" href="_static/ceph.html">ceph1463.html</a> which
you can open in any browser to interactively explore your data. Unless you have triple digit numbers
of samples, using more than 4 cpus will give only marginal improvement.</p>
<p>It will also create create 4 csv files and 4 static QC plots that mirror those in the interactive html.
These will indicate:</p>
<ul class="simple">
Expand All @@ -66,7 +67,7 @@ <h1>manipulation, validation and exploration of pedigrees<a class="headerlink" h
<p>Finally, it will create a new file ped file <cite>ceph1463.peddy.ped</cite> that also lists
the most useful columns from the <cite>het-check</cite> and <cite>sex-check</cite>. Users can <strong>first
look at this extended ped file for an overview of likely problems</strong>.</p>
<p>The columns in the CSV output are documented in <a class="reference internal" href="output.html#output"><span class="std std-ref">CSV Output</span></a></p>
<p>The columns in the CSV output are documented in <a class="reference internal" href="resolve.html#output"><span class="std std-ref">Error Resolution</span></a></p>
<div class="section" id="static-images">
<h2>Static Images<a class="headerlink" href="#static-images" title="Permalink to this headline"></a></h2>
<p>This will create a number of images:</p>
Expand Down Expand Up @@ -120,6 +121,7 @@ <h3>CSVs<a class="headerlink" href="#csvs" title="Permalink to this headline">¶
<ul>
<li class="toctree-l1"><a class="reference internal" href="output.html">CSV Output</a></li>
<li class="toctree-l1"><a class="reference internal" href="html.html">Example HTML</a></li>
<li class="toctree-l1"><a class="reference internal" href="resolve.html">Error Resolution</a></li>
</ul>
</div>
</div>
Expand Down
Binary file modified docs/_build/html/objects.inv
Binary file not shown.
14 changes: 7 additions & 7 deletions docs/_build/html/qc.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

<title>QC &mdash; peddy 0.1.9 documentation</title>
<title>QC &mdash; peddy 0.2.7 documentation</title>

<link rel="stylesheet" href="_static/haiku.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />

<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.1.9',
VERSION: '0.2.7',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
Expand All @@ -23,11 +23,11 @@
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="peddy 0.1.9 documentation" href="index.html" />
<link rel="top" title="peddy 0.2.7 documentation" href="index.html" />
</head>
<body role="document">
<div class="header" role="banner"><h1 class="heading"><a href="index.html">
<span>peddy 0.1.9 documentation</span></a></h1>
<span>peddy 0.2.7 documentation</span></a></h1>
<h2 class="heading"><span>QC</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
Expand All @@ -52,12 +52,12 @@ <h2>relatedness calculations<a class="headerlink" href="#relatedness-calculation
have fast methods to do this on the entire genome.</p>
<p>The limitations of these methods is that they assume the average pair of samples
is unrelated.</p>
<p>In <cite>peddy</cite>, we use about 5,000 variants described in <a class="reference external" href="http://www.nature.com/nature/journal/v506/n7487/full/nature12975.html">http://www.nature.com/nature/journal/v506/n7487/full/nature12975.html</a>
<p>In <cite>peddy</cite>, we use about 25,000 variants described in <a class="reference external" href="http://www.nature.com/nature/journal/v506/n7487/full/nature12975.html">http://www.nature.com/nature/journal/v506/n7487/full/nature12975.html</a>
that are known to be targeted by most exome platforms, in hardy weinberg equilibrium in 1000 genomes,
and mostly unlinked.</p>
<p>When a user requests to calculate relatedness, we use those 5K sites and
<p>When a user requests to calculate relatedness, we use those 25K sites and
the genotypes from the 2504 1KG samples to provide a background of samples so
that most samples are indeed unrelated. Since we are sampling on 5K sites,
that most samples are indeed unrelated. Since we are sampling on 25K sites,
the calculations are quite fast (~5 minutes) and match very well what
we get from a whole-genome scan because of the properties of those sites.
Though we use the additional 2504 1KG samples internally, only the information
Expand Down
8 changes: 4 additions & 4 deletions docs/_build/html/search.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

<title>Search &mdash; peddy 0.1.9 documentation</title>
<title>Search &mdash; peddy 0.2.7 documentation</title>

<link rel="stylesheet" href="_static/haiku.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />

<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.1.9',
VERSION: '0.2.7',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
Expand All @@ -24,7 +24,7 @@
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/searchtools.js"></script>
<link rel="top" title="peddy 0.1.9 documentation" href="index.html" />
<link rel="top" title="peddy 0.2.7 documentation" href="index.html" />
<script type="text/javascript">
jQuery(function() { Search.loadIndex("searchindex.js"); });
</script>
Expand All @@ -35,7 +35,7 @@
</head>
<body role="document">
<div class="header" role="banner"><h1 class="heading"><a href="index.html">
<span>peddy 0.1.9 documentation</span></a></h1>
<span>peddy 0.2.7 documentation</span></a></h1>
<h2 class="heading"><span>Search</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
Expand Down

0 comments on commit f78724f

Please sign in to comment.