Permalink
Browse files

Additional work on BOSC 2013 talk: full initial set of slides

  • Loading branch information...
1 parent 0adeef5 commit 17968c54c5d3c41d99a4f12f335a3dfa5c85adaf @chapmanb committed Jul 15, 2013
@@ -2,7 +2,7 @@
<!doctype html>
<html lang="en">
<head>
-<meta charset="utf-8"><title>(Variant calling pipelines: quantification, scaling and community development)</title>
+<meta charset="utf-8"><title>(Community developed variant calling pipelines)</title>
<meta name="author" content="(Brad Chapman)"/>
<link rel="stylesheet" href="../reveal.js/css/reveal.min.css">
<link rel="stylesheet" href="../reveal.js/css/theme/simple.css" id="theme">
@@ -13,7 +13,7 @@
<div class="reveal">
<div class="slides">
<section>
-<h3>Variant calling pipelines: quantification, scaling and community development</h3>
+<h3>Community developed variant calling pipelines</h3>
<h4>Brad Chapman</h4>
<h4>Bioinformatics Core, Harvard School of Public Health</h4>
<h4><a href='https://github.com/chapmanb'>@chapmanb</a></h4>
@@ -85,15 +85,17 @@
<h2>Development goals</h2>
<ul class="org-ul">
-<li>Quantifiable: assess variant quality
+<li>Quantifiable
</li>
-<li>Scalable: 1500 whole genome samples
+<li>Analyzable
</li>
-<li>Reproducible: text-configurable, provenance, version tracking
+<li>Scalable
</li>
-<li>Community developed: open source, documented and widely deployable
+<li>Reproducible
</li>
-<li>Accessible: Usable by researchers and non-scientists
+<li>Community developed
+</li>
+<li>Accessible
</li>
</ul>
</section>
@@ -110,11 +112,49 @@
<li>Quantification details: <a href="http://j.mp/bcbioeval">http://j.mp/bcbioeval</a>
</li>
</ul>
+
+</section>
+<section id="sec-5-1" >
+
+<h3>Known unknowns</h3>
+<ul class="org-ul">
+<li>Coverage: summarize what you can't assess
+</li>
+<li>Structural: large, complex rearrangements
+</li>
+</ul>
</section>
</section>
<section>
<section id="sec-6" >
+<h2>Analysis</h2>
+</section>
+<section id="sec-6-1" >
+
+<h3>Query</h3>
+
+<div class="figure">
+<p><img src="./images/gemini.png" alt="gemini.png"/></p>
+</div>
+
+<p>
+<a href="https://github.com/arq5x/gemini">https://github.com/arq5x/gemini</a>
+</p>
+</section>
+<section id="sec-6-2" >
+
+<h3>Visualize</h3>
+<img src="images/o8.png" width="1000">
+
+<p>
+<a href="https://github.com/chapmanb/o8">https://github.com/chapmanb/o8</a>
+</p>
+</section>
+</section>
+<section>
+<section id="sec-7" >
+
<h2>Parallel scaling</h2>
<div class="figure">
@@ -129,21 +169,32 @@
</ul>
</section>
-<section id="sec-6-1" >
+<section id="sec-7-1" >
-<h3>Increasing parallel blocks</h3>
+<h3>Better parallel blocks</h3>
<div class="figure">
<p><img src="./images/parallel-genome.png" alt="parallel-genome.png"/></p>
</div>
</section>
</section>
<section>
-<section id="sec-7" >
+<section id="sec-8" >
<h2>Reproducibility</h2>
+<ul class="org-ul">
+<li>Express intentions at a high level
+</li>
+<li>Revision controlled configuration
+</li>
+<li>Handle complex distributed logging
+</li>
+<li>Provenance tracking
+</li>
+</ul>
+
</section>
-<section id="sec-7-1" >
+<section id="sec-8-1" >
<h3>Configuration</h3>
<div class="org-src-container">
@@ -165,21 +216,37 @@
</pre>
</div>
</section>
-<section id="sec-7-2" >
+<section id="sec-8-2" >
<h3>Provenance</h3>
+<ul class="org-ul">
+<li>Excellent logging
+</li>
+<li>Third party version tracking
+</li>
+<li>Beyond logging:
+<ul class="org-ul">
+<li>BioLite: <a href="https://bitbucket.org/caseywdunn/biolite">https://bitbucket.org/caseywdunn/biolite</a>
+</li>
+<li>Arvados: <a href="https://arvados.org/">https://arvados.org/</a>
+</li>
+</ul>
+</li>
+</ul>
</section>
</section>
<section>
-<section id="sec-8" >
+<section id="sec-9" >
<h2>Community developed</h2>
<ul class="org-ul">
<li>Fully automated installation: CloudBioLinux
</li>
-<li>Deployable on multiple clusters (LSF, SGE, Torque)
+<li>Deployable on multiple clusters (LSF, SGE, Torque&#x2026;)
+</li>
+<li>API for new aligners and variant callers
</li>
-<li>Open source and documented
+<li>Open source, hackable and documented
</li>
</ul>
@@ -188,28 +255,67 @@
</p>
</section>
-<section id="sec-8-1" >
+<section id="sec-9-1" >
<h3>Automated installation</h3>
+<ul class="org-ul">
+<li>Single biggest software problem: running for the first time
+</li>
+<li>Bootstrap from bare machine to ready-to-go pipeline
+</li>
+<li>Builds off existing installation work: CloudBioLinux
+</li>
+<li>Provide example pipelines with real data
+</li>
+</ul>
+
+<p>
+<a href="http://cloudbiolinux.org">http://cloudbiolinux.org</a>
+</p>
+
+<p>
+<a href="https://bcbio-nextgen.readthedocs.org">https://bcbio-nextgen.readthedocs.org</a>
+</p>
</section>
</section>
<section>
-<section id="sec-9" >
+<section id="sec-10" >
<h2>Accessible</h2>
+<img src="images/dtc_genomics.jpg" width="400">
+
+<p>
+<a href="http://exploringpersonalgenomics.org/">http://exploringpersonalgenomics.org/</a>
+</p>
+
</section>
-<section id="sec-9-1" >
+<section id="sec-10-1" >
<h3>Galaxy</h3>
+<div class="figure">
+<p><img src="./images/galaxy_pipeline.png" alt="galaxy_pipeline.png"/></p>
+</div>
+
+<p>
+<a href="https://bitbucket.org/hbc/galaxy-central-hbc">https://bitbucket.org/hbc/galaxy-central-hbc</a>
+</p>
</section>
-<section id="sec-9-2" >
+<section id="sec-10-2" >
<h3>STORMSeq</h3>
+
+<div class="figure">
+<p><img src="./images/4.1_stormseq.png" alt="4.1_stormseq.png"/></p>
+</div>
+
+<p>
+<a href="http://www.stormseq.org/">http://www.stormseq.org/</a>
+</p>
</section>
</section>
<section>
-<section id="sec-10" >
+<section id="sec-11" >
<h2>Summary</h2>
<ul class="org-ul">
@@ -219,13 +325,15 @@
<ul class="org-ul">
<li>Assessing quality: good science
</li>
+<li>Analysis: enable exploration
+</li>
<li>Scalability: finish in time
</li>
<li>Reproducibility: show your work
</li>
</ul>
</li>
-<li>Make widely available
+<li>Widely accessible
</li>
</ul>
@@ -1,9 +1,9 @@
-#+title: Variant calling pipelines: quantification, scaling and community development
+#+title: Community developed variant calling pipelines
#+author: Brad Chapman
#+creator: Bioinformatics Core, Harvard School of Public Health
#+date: 20 July 2013
-#+OPTIONS: reveal_center:t reveal_progress:t reveal_history:tl reveal_control:t
+#+OPTIONS: reveal_center:t reveal_progress:t reveal_history:t reveal_control:t
#+OPTIONS: reveal_overview:t reveal_keyboard:t
#+OPTIONS: toc:nil num:nil
#+OPTIONS: reveal_width:1200 reveal_height:800
@@ -49,11 +49,12 @@
* Development goals
-- Quantifiable: assess variant quality
-- Scalable: 1500 whole genome samples
-- Reproducible: text-configurable, provenance, version tracking
-- Community developed: open source, documented and widely deployable
-- Accessible: Usable by researchers and non-scientists
+- Quantifiable
+- Analyzable
+- Scalable
+- Reproducible
+- Community developed
+- Accessible
* Quantify quality
@@ -62,19 +63,43 @@
- Reference materials: [[http://www.genomeinabottle.org/]]
- Quantification details: [[http://j.mp/bcbioeval]]
+** Known unknowns
+
+- Coverage: summarize what you can't assess
+- Structural: large, complex rearrangements
+
+* Analysis
+
+** Query
+
+[[./images/gemini.png]]
+
+[[https://github.com/arq5x/gemini]]
+
+** Visualize
+
+#+REVEAL_HTML: <img src="images/o8.png" width="1000">
+
+[[https://github.com/chapmanb/o8]]
+
* Parallel scaling
[[./images/parallel-clustertypes.png]]
- Infrastructure details: [[http://j.mp/bcbioscale]]
- IPython: [[http://ipython.org/ipython-doc/dev/parallel/index.html]]
-** Increasing parallel blocks
+** Better parallel blocks
[[./images/parallel-genome.png]]
* Reproducibility
+- Express intentions at a high level
+- Revision controlled configuration
+- Handle complex distributed logging
+- Provenance tracking
+
** Configuration
#+BEGIN_SRC yaml
@@ -96,30 +121,59 @@
** Provenance
+- Excellent logging
+- Third party version tracking
+- Beyond logging:
+ - BioLite: [[https://bitbucket.org/caseywdunn/biolite]]
+ - Arvados: [[https://arvados.org/]]
+
* Community developed
- Fully automated installation: CloudBioLinux
-- Deployable on multiple clusters (LSF, SGE, Torque)
-- Open source and documented
+- Deployable on multiple clusters (LSF, SGE, Torque...)
+- API for new aligners and variant callers
+- Open source, hackable and documented
[[https://github.com/chapmanb/bcbio-nextgen]]
** Automated installation
+- Single biggest software problem: running for the first time
+- Bootstrap from bare machine to ready-to-go pipeline
+- Builds off existing installation work: CloudBioLinux
+- Provide example pipelines with real data
+
+[[http://cloudbiolinux.org]]
+
+[[https://bcbio-nextgen.readthedocs.org]]
+
* Accessible
+#+REVEAL_HTML: <img src="images/dtc_genomics.jpg" width="400">
+
+[[http://exploringpersonalgenomics.org/]]
+
** Galaxy
+[[./images/galaxy_pipeline.png]]
+
+[[https://bitbucket.org/hbc/galaxy-central-hbc]]
+
** STORMSeq
+[[./images/4.1_stormseq.png]]
+
+[[http://www.stormseq.org/]]
+
* Summary
- Community developed pipelines > challenges
- Focus
- Assessing quality: good science
+ - Analysis: enable exploration
- Scalability: finish in time
- Reproducibility: show your work
-- Make widely available
+- Widely accessible
[[https://github.com/chapmanb/bcbio-nextgen]]
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 17968c5

Please sign in to comment.