Skip to content

Latest commit

 

History

History
71 lines (49 loc) · 2.33 KB

denoising_illumina.rst

File metadata and controls

71 lines (49 loc) · 2.33 KB

Denoising (Illumina only)

Usually, amplicon sequences are clustered into Operational Taxonomic Units (OTUs) using a similarity threshold of 97%, which represents the common working definition of bacterial species.

Another approach consists to identify the Sequence Variants (SVs, see /otu for details). This approach avoids clustering sequences at a predefined similarity threshold and usually includes a denoising algorithm in order to identify SVs.

In this tutorial we show how to perform the denoising of Illumina overlapping paired-end sequences in order to detect the SVs. Athough this tutorial explains how to apply the pipeline to 16S paired-end Illumina reads, it can be adapted to Illumina single-end sequening or to others markers gene/spacers, e.g. Internal Transcribed Spacer (ITS), 18S or 28S.

Table of Contents

Data download and preprocessing

In this tutorial we analyze the same dataset used in /pairedend_97. Reads merging, primer trimming and quality filtering are the same as in /pairedend_97:

wget ftp://ftp.fmach.it/metagenomics/micca/examples/garda.tar.gz
tar -zxvf garda.tar.gz
cd garda

micca mergepairs -i fastq/*_R1*.fastq -o merged.fastq -l 100 -d 30
micca trim -i merged.fastq -o trimmed.fastq -w CCTACGGGNGGCWGCAG -r GACTACNVGGGTWTCTAATCC -W -R -c
micca filter -i trimmed.fastq -o filtered.fasta -e 0.75 -m 400

Denoising - Sequence Variants identification

The /commands/otu command implements the UNOISE3 protocol (denovo_unoise) which includes dereplication, denoising and chimera filtering:

micca otu -m denovo_unoise -i filtered.fasta -o denovo_unoise_otus -t 4 -c

The /commands/otu command returns several files in the output directory, including the SV table (otutable.txt) and a FASTA file containing the representative sequences (otus.fasta).

Note

See /otu to see how to apply the de novo swarm, closed-reference and the open-reference OTU picking strategies to these data.

Further steps

  • pairedend_97-taxonomy
  • pairedend_97-tree
  • pairedend_97-biom
  • /phyloseq
  • /table