# Module: DADA2 Denoising in QIIME2

This notebook is a guide on working with QIIME2 with **raw paired-end demultiplexed reads** as the starting dataset. This notebook includes quality checking of raw reads, primer trimming, and denoising.

This module was built with the following as the main references: [LangilleLab SOP](https://github.com/LangilleLab/microbiome_helper/wiki/Amplicon-SOP-v2-(qiime2-2020.8)), ["Moving pictures" Tutorial](https://docs.qiime2.org/2024.10/tutorials/moving-pictures/), and ["Atacama soil microbiome" tutorial](https://docs.qiime2.org/2024.10/tutorials/atacama-soils/).

Created by: _Microbial Oceanography Laboratory (MOLab)_

---
## How to Use This Notebook

1. Activate conda environment in terminal window. Make sure to change the environment name to what is applicable in your case.
```bash
conda activate qiime2-2023.2
```
2. Open jupyter notebook with the command below and select the notebook.
```bash
jupyter notebook
```
3. To run the cells in this notebook, press Shift+Enter.

---
## Tools Used
1. **QIIME 2 Amplicon Distribution**
    - Installation procedure can be found here: [QIIME2 native installation](https://docs.qiime2.org/2024.10/install/native/)

---
## Starting Files 

1. Paired-end demultiplexed FASTQ dataset imported as QIIME2 artifact (filename: `seqs.qza`, location: `0-raw-sequences`)
2. Directories to organize the files. Run the command below:

In [None]:
!mkdir \
0-raw-sequences \
1-cleanup

---
## Expected Outputs

1. `.qza` of type `FeatureTable[Frequency]`
2. `.qza` of type `FeatureData[Sequence]`

---
## Table of Contents
 * [**Data Processing**](#Data-Processing)  
     * [Inspecting raw data](#Inspecting-raw-data)
     * [Trimming primers](#Trim-primers)
     * [Denoising](#Denoising-with-DADA2)
---

# <font color = 'gray'>Data Processing</font>

This stage involves the following steps only:
1. Trim primers
2. Denoising with DADA2

The DADA2 workflow wraps read merging, QC, dereplication, and chimera filtering, which is why unlike the OTU clustering workflow, there is no need to specify commands for those steps.

### Inspecting raw data

Our sequences are already *demultiplexed*, meaning they are already separated into different samples. We can use the `demux` plugin instead to visualize our sequences. **QIIME visualizations** have the extension '.qzv'. The .qzv files can be viewed in  http://view.qiime2.org or we can import the `qiime2` module to view the visualizations inline.

In [None]:
# Make summary of the QIIME2 artifact (.qza file)
!qiime demux summarize \
    --i-data 0-raw-sequences/seqs.qza \
    --p-n 100000 \
    --o-visualization 0-raw-sequences/seqs.qzv

In [None]:
import qiime2 as q2
# Visualize
q2.Visualization.load('0-raw-sequences/seqs.qzv')

### Trim primers
To remove the primers in our sequences, we use the `cutadapt` plugin. The primers used were E572F/E1009R, which have <b>18bp</b> and <b>20bp</b> lengths, respectively. Removing the primers is important especially if there are ambiguous bases, which might get confused as chimeric or low quality positions. You can explore more about the primer sequences, length, and predicted amplicon size in this excellent app <a href="https://app.pr2-primers.org/">PR-2 Primers</a>.

<div class="alert alert-block alert-info">
<b>Note:</b> 
    
If you are not using the E572F/E1009R primer pairs, you must replace the sequences indicated in the <code>--p-front-f</code> and <code>--p-front-r</code> options.
</div>

<div class="alert alert-block alert-info">
<b>Tip:</b> 
    
Inspect the standard output of the <code>cutadapt trim-paired</code> plugin. Look for any unusual occurrences and adjust the options accordingly. For instance, if a large fraction of reads are discarded, you can either increase <code>--p-error-rate</code> or disable <code>--p-discard-untrimmed</code> (albeit you may end up with lower quality sequences).
</div>

In [None]:
!qiime cutadapt trim-paired \
    --i-demultiplexed-sequences 0-raw-sequences/seqs.qza \
    --p-front-f CYGCGGTAATTCCAGCTC  \
    --p-front-r AYGGTATCTRATCRTCTTYG  \
    --p-error-rate 0 \
    --p-discard-untrimmed \
    --o-trimmed-sequences 1-cleanup/1-primer-trimmed-seqs.qza \
    --verbose

In [None]:
#Check after trimming primers
!qiime demux summarize \
    --i-data 1-cleanup/1-primer-trimmed-seqs.qza \
    --p-n 100000 \
    --o-visualization 1-cleanup/1-primer-trimmed-seqs.qzv

In [None]:
#Visualize
import qiime2 as q2
q2.Visualization.load('1-cleanup/1-primer-trimmed-seqs.qzv')

### Denoising with DADA2

There are two denoising methods available in QIIME2 namely, Deblur and DADA2. For this workflow, the DADA2 denoiser will be used. This tool encompasses several steps, but the QIIME2 plugin wraps these steps into a single command making it easier to execute. If you are interested in running the DADA2 pipeline in R, you can check this tutorial: [DADA2 Pipeline Tutorial](https://benjjneb.github.io/dada2/tutorial_1_8.html)

<div class="alert alert-block alert-info">
<b>Note:</b> 
    
Change the truncation length indicated in the <code>--p-trunc-len-f</code> (truncation of forward reads starting at the 3' end) and <code>--p-trunc-len-r</code> (truncation of reverse reads starting at the 3' end) options. You may base this on the quality report of the primer-trimmed sequences.
</div>

In [None]:
#Denoising with dada2
!qiime dada2 denoise-paired \
    --i-demultiplexed-seqs 1-cleanup/1-primer-trimmed-seqs.qza \
    --p-trunc-len-f 258 \
    --p-trunc-len-r 237 \
    --o-table 1-cleanup/2-table-dada2.qza \
    --o-representative-sequences 1-cleanup/2-rep-seqs-dada2.qza \
    --o-denoising-stats 1-cleanup/2-stats-dada2.qza \
    --p-n-threads 0 \
    --verbose

Create a table the summarizes the sequence count after each step of the `dada2 denoise-paired` plugin.

<div class="alert alert-block alert-info">
<b>Note:</b> 
    
Inspect the number of reads retained after running the code above. Adjust the command parameters if too many reads are discarded.
</div>

In [None]:
#Creating a visualization file of the denoising stats output
!qiime metadata tabulate \
    --m-input-file 1-cleanup/2-stats-dada2.qza \
    --o-visualization 1-cleanup/2-stats-dada2.qzv

In [None]:
#Visualize
import qiime2 as q2
q2.Visualization.load('1-cleanup/2-stats-dada2.qzv')

Check the output feature table.

In [None]:
#Creating a visualization file of the feature table
!qiime feature-table summarize \
    --i-table 1-cleanup/2-table-dada2.qza \
    --o-visualization 1-cleanup/2-table-dada2.qzv \
    --m-sample-metadata-file metadata.txt

In [None]:
#Visualize
import qiime2 as q2
q2.Visualization.load('1-cleanup/2-table-dada2.qzv')

Check the output denoised sequences.

<div class="alert alert-block alert-info">
<b>Note:</b> 
    
Inspect the lengths of the denoise and joined sequences. Are these within the expected size range of your target amplicon?
</div>

In [None]:
#Creating a visualization file of the ASV sequences
!qiime feature-table tabulate-seqs \
    --i-data 1-cleanup/2-rep-seqs-dada2.qza \
    --o-visualization 1-cleanup/2-rep-seqs-dada2.qzv

In [None]:
#Visualize
import qiime2 as q2
q2.Visualization.load('1-cleanup/2-rep-seqs-dada2.qzv')