# Module: Taxonomy Assignment in QIIME2

Taxonomic annotation in metabarcoding is the process of assigning biological classifications to DNA sequences obtained from environmental samples. After sequencing, the raw reads are typically clustered or denoised into unique representative sequences, which are then matched to reference databases. This matching process relies on specific genetic markers, allowing for identification at various taxonomic levels, from broad classifications (e.g., phylum) to finer ones (e.g., species).

This notebook demonstrates how taxonomic assignment is performed in QIIME2 using pre-built classifier.

The following references were used for this tutorial: [LangilleLab SOP](https://github.com/LangilleLab/microbiome_helper/wiki/Amplicon-SOP-v2-(qiime2-2020.8)), and ["Moving pictures" Tutorial](https://docs.qiime2.org/2024.10/tutorials/moving-pictures/).

Created by: _Microbial Oceanography Laboratory (MOLab)_

---
## How to Use This Notebook

1. Activate conda environment in terminal window. Make sure to change the environment name to what is applicable in your case.
>`conda activate qiime2-2023.2`
2. Open jupyter notebook with the command below and select the notebook.
>`jupyter notebook`
3. To run the cells in this notebook, press Shift+Enter.

---
## Tools Used
1. **QIIME 2 Amplicon Distribution**
    - Installation procedure can be found here: [QIIME2 native installation](https://docs.qiime2.org/2024.10/install/native/)

---
## Starting Files 

1. `.qza` of type `FeatureData[Sequence]`. Could be generated from OTU clustering or denoising.
2. `.qza` of type `FeatureTable[Frequency]`. Could be generated from OTU clustering or denoising.
3. `.qza` of type `TaxonomicClassifier`
4. Optionally, a metadata file (must be formatted according to QIIME2's specifications: [Metadata in QIIME2](https://docs.qiime2.org/2024.10/tutorials/metadata/))

---
## Expected Outputs

1. `.qza` of type `FeatureData[Taxonomy]`
2. `.qzv` of taxonomic assignments per feature.
3. `.qzv` of relative abundance barplot.

---
## Table of Contents
 * [**Taxonomy Assignment**](#Taxonomy-Assignment)  
     * [Annotate features](#Annotate-features)
     * [Relative abundance barplot](#Relative-abundance-barplot)

# <font color = 'gray'>Taxonomy Assignment</font>

### Annotate features

There are several methods to annotate sequences in QIIME2 (check `qiime feature-classifier --help`). Here, the `classify-sklearn` method will be demonstrated. This requires classifier which you can build yourself ([Training feature classifiers](https://docs.qiime2.org/2024.10/tutorials/feature-classifier/)), or alternatively, you can use the classifiers pre-built by QIIME2's team ([QIIME2 data resources](https://docs.qiime2.org/2024.10/data-resources/#taxonomy-classifiers-for-use-with-q2-feature-classifier)). These pre-built classifiers are specifically catered for ribosomal SSU: Greengenes for 16S and SILVA for both 16S/18S.

Below, the classifier that is specified was trained by _MOLab_ using entries from both [SILVA](https://www.arb-silva.de/) and [Nordicana](http://www.cen.ulaval.ca/nordicanad/dpage.aspx?doi=45409XD-79A199B76BCC4110) databases. This resulting classifier is specifically trained for the eukaryotic 18S V4 region. You can find the `.qza` file of the classifier in the same folder as this notebook.

In [None]:
!qiime feature-classifier classify-sklearn \
    --i-classifier silva-138-nord-classifier.qza \
    --i-reads rep-seqs.qza \
    --o-classification rep-seqs-taxa.qza \
    --verbose

You can then tabulate the per-feature taxonomic assignments.

In [None]:
!qiime metadata tabulate \
    --m-input-file rep-seqs-taxa.qza \
    --o-visualization rep-seqs-taxa.qzv

### Relative abundance barplot

Using a feature table (`feature-table.qza`) and the taxonomic annotations (`rep-seqs-taxa.qza`) generated above, you can also summarize the relative abundances of different taxonomic groups using a taxa barplot. Optionally, you could supply a metadata file here using the `--m-metadata-file` option. By doing so, you can sort your samples according to the variables/columns of your metadata file.

In [None]:
!qiime taxa barplot \
    --i-table feature-table.qza \
    --i-taxonomy rep-seqs-taxa.qza \
    --o-visualization taxa-barplot.qzv