Skip to content

Commit

Permalink
Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
davidealbanese committed Apr 19, 2018
1 parent 638cc6e commit c471e79
Show file tree
Hide file tree
Showing 24 changed files with 770 additions and 671 deletions.
71 changes: 71 additions & 0 deletions doc/source/denoising_illumina.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
Denoising (Illumina only)
=========================

Usually, amplicon sequences are clustered into **Operational Taxonomic Units**
(OTUs) using a similarity threshold of 97%, which represents the common working
definition of bacterial species.

Another approach consists to identify the **Sequence Variants** (SVs, see
:doc:`/otu` for details). This approach avoids clustering sequences at a
predefined similarity threshold and usually includes a denoising algorithm in
order to identify SVs.

In this tutorial we show how to perform the denoising of Illumina overlapping
paired-end sequences in order to detect the SVs. Athough this tutorial explains
how to apply the pipeline to 16S paired-end Illumina reads, it can be adapted to
Illumina single-end sequening or to others markers gene/spacers, e.g. **Internal
Transcribed Spacer (ITS)**, **18S** or **28S**.

.. contents:: Table of Contents
:local:

Data download and preprocessing
-------------------------------

In this tutorial we analyze the same dataset used in :doc:`/pairedend_97`. Reads
merging, primer trimming and quality filtering are the same as in
:doc:`/pairedend_97`:

.. code-block:: sh
wget ftp://ftp.fmach.it/metagenomics/micca/examples/garda.tar.gz
tar -zxvf garda.tar.gz
cd garda
micca mergepairs -i fastq/*_R1*.fastq -o merged.fastq -l 100 -d 30
micca trim -i merged.fastq -o trimmed.fastq -w CCTACGGGNGGCWGCAG -r GACTACNVGGGTWTCTAATCC -W -R -c
micca filter -i trimmed.fastq -o filtered.fasta -e 0.75 -m 400
Denoising - Sequence Variants identification
--------------------------------------------

The :doc:`/commands/otu` command implements the UNOISE3 protocol
(``denovo_unoise``) which includes dereplication, denoising and chimera
filtering:

.. code-block:: sh
micca otu -m denovo_unoise -i filtered.fasta -o denovo_unoise_otus -t 4 -c
The :doc:`/commands/otu` command returns several files in the output directory,
including the **SV table** (``otutable.txt``) and a FASTA file containing the
**representative sequences** (``otus.fasta``).

.. Note::

See :doc:`/otu` to see how to apply the **de novo swarm**,
**closed-reference** and the **open-reference** OTU picking strategies to
these data.

Further steps
-------------

* :ref:`pairedend_97-taxonomy`

* :ref:`pairedend_97-tree`

* :ref:`pairedend_97-biom`

* :doc:`/phyloseq`

* :doc:`/table`
4 changes: 0 additions & 4 deletions doc/source/filtering.rst

This file was deleted.

61 changes: 31 additions & 30 deletions doc/source/formats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,43 +5,38 @@ Sequence files
--------------

`FASTA <https://en.wikipedia.org/wiki/FASTA_format>`_ and `FASTQ
<https://en.wikipedia.org/wiki/FASTQ_format>`_ Sanger/Illumina 1.8+
format (phred+33) formats are supported. micca provides the
:doc:`/commands/convert` command to convert between sequence file
formats.

<https://en.wikipedia.org/wiki/FASTQ_format>`_ Sanger/Illumina 1.8+ format
(phred+33) formats are supported. micca provides the :doc:`/commands/convert`
command to convert between sequence file formats.

Taxonomy files
--------------

Taxonomy files map sequence IDs to taxonomy. Input taxonomy files must
be TAB-delimited files where rows are either in the form:

#. ``SEQID[TAB]k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__;g__;``
#. ``SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales;;;``
#. ``SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales``
#. ``SEQID[TAB]D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__;D_5__;``

#. ``SEQID[TAB]k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__;g__;``
#. ``SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales;;;``
#. ``SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales``
#. ``SEQID[TAB]D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__;D_5__;``


Compatible taxonomy files are in:
Compatible taxonomy files are:

* Greengenes (http://greengenes.secondgenome.com/downloads);
* QIIME-formatted SILVA (https://www.arb-silva.de/download/archive/qiime/);
* UNITE (https://unite.ut.ee/repository.php);
* Human Oral Microbiome Database (HOMD) (http://www.homd.org/).

The output taxonomy file returned by :doc:`/commands/classify` is a
TAB-delimited file where each row is always in the format::
TAB-delimited file where each row is in the format::

SEQID[TAB]Bacteria;Firmicutes;Clostridia;Clostridiales

OTU/SV tables and taxonomy tables
---------------------------------

OTU table and taxonomy tables
-----------------------------

The OTU table returned by :doc:`/commands/otu` is an OTU x sample,
TAB-delimited text file, containing the number of times an OTU is
found in each sample::
The OTU table returned by :doc:`/commands/otu` is an OTU x sample, TAB-delimited
text file, containing the number of times an OTU is found in each sample::

OTU Mw_01 Mw_02 Mw_03 ...
DENOVO1 151 178 177 ...
Expand All @@ -50,14 +45,14 @@ found in each sample::
DENOVO4 166 299 115 ...
... ... ... ... ...

The :doc:`/commands/tabletotax` command returns the "taxonomy tables"
for each taxonomic level, e.g.::
The :doc:`/commands/tabletotax` command returns the "taxonomy tables" for each
taxonomic level, e.g.::

OTU Mw_01 Mw_02 Mw_03 ...
Bacteria;Bacteroidetes 1363 1543 1168 ...
Bacteria;Bacteroidetes 1363 1543 1168 ...
Bacteria;Cyanobacteria/Chloroplast 0 0 0 ...
Bacteria;Firmicutes 6257 5780 6761 ...
Bacteria;Lentisphaerae 0 1 0 ...
Bacteria;Lentisphaerae 0 1 0 ...
... ... ... ... ...


Expand All @@ -66,13 +61,12 @@ for each taxonomic level, e.g.::
Sample data
-----------

The sample data file contains all of the information about the
samples. In QIIME this file is called `Mapping File
<http://qiime.org/tutorials/tutorial.html#mapping-file-tab-delimited-txt>`_.
In micca, the sample data file must be a TAB-delimited text file (a
row for each sample). The first column must be the sample identifier
(assigned in :doc:`/commands/merge`, :doc:`/commands/split` or
:doc:`/commands/mergepairs`)::
The sample data file contains all of the information about the samples. In QIIME
this file is called `Mapping File
<http://qiime.org/tutorials/tutorial.html#mapping-file-tab-delimited-txt>`_. In
micca, the sample data file must be a TAB-delimited text file (a row for each
sample). The first column must be the sample identifier (assigned in
:doc:`/commands/merge`, :doc:`/commands/split` or :doc:`/commands/mergepairs`)::

ID Group Altitude
Mw_01 Mw1 492
Expand All @@ -87,3 +81,10 @@ Phylogenetic tree

Only the `Newick format <https://en.wikipedia.org/wiki/Newick_format>`_ is
supported.

BIOM file
---------

The :doc:`/commands/tobiom` command generates OTU/SV tables in the biom version
1.0 JSON file format
(http://biom-format.org/documentation/format_versions/biom-1.0.html).
Binary file removed doc/source/images/alpha454.png
Binary file not shown.
Binary file removed doc/source/images/beta454.png
Binary file not shown.
Binary file removed doc/source/images/filterstatspaired.png
Binary file not shown.
Binary file added doc/source/images/garda_alpha.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/garda_beta.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/garda_stats_plot.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/garda_stats_qualsumm_plot.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/garda_taxtable2.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed doc/source/images/rarecurve.png
Binary file not shown.
Binary file removed doc/source/images/taxtable.png
Binary file not shown.
5 changes: 2 additions & 3 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@
:caption: Getting Started

install
run
databases

.. toctree::
:maxdepth: 1
:caption: Tutorials

pairedend_97
denoising_illumina
singleend
pairedend
phyloseq
table
picrust
Expand All @@ -27,7 +27,6 @@
:maxdepth: 1
:caption: In Depth

filtering
otu
formats
changes
Expand Down
10 changes: 5 additions & 5 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ which all the software has already been installed, configured and tested.
Using pip
---------

At the moment, only Python 2.7 is supported.

On Ubuntu >= 12.04 and Debian >=7
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -182,11 +184,9 @@ Testing the installation
Install RDP classifier (optional)
---------------------------------

The RDP Classifier is a naive bayesian classifier for
taxonomic assignments
(http://sourceforge.net/projects/rdp-classifier/). The RDP classifier
can be used in the :doc:`/commands/classify` command (option
``-m/--method rdp``).
The RDP Classifier is a naive bayesian classifier for taxonomic assignments
(http://sourceforge.net/projects/rdp-classifier/). The RDP classifier can be
used in the :doc:`/commands/classify` command (option ``-m/--method rdp``).

.. warning::

Expand Down

0 comments on commit c471e79

Please sign in to comment.