# QIIME 2 Tutorial: Read Processing

This notebook contains materials accompanying the workshop **Microbiome-Based Tools: From Research to Application**. The notebook and corresponding setup script were adapted from the [**Advanced Block Course: Computational Biology**](https://github.com/bokulich-lab/advanced-comp-bio-tutorial.git); all source code is licensed under the Apache License 2.0.

Save your own local copy of this notebook by using `File > Save a copy in Drive`. At some point you may be prompted to trust the notebook. We promise that it is safe 🤞

**Disclaimer:**

The Google Colab notebook environment will interpret any command as Python code by default. If we want to run bash commands we will have to prefix them by `!`. So any command you see with a leading `!` is a bash command and if you wanted to run it in your terminal you would omit the leading `!`. For example, if in the Colab notebook you ran `!wget` you would just run `wget` in your terminal. 

In this notebook we use the `!` prefix because we run all QIIME 2 commands using the [`q2cli`](https://github.com/qiime2/q2cli/) (QIIME 2 command-line interface). However, QIIME 2 also has a python API and a Galaxy interface. You can learn more about these and other QIIME 2 interfaces at https://qiime2.org/.

You can run the entire notebook by selecting `Runtime > Run all` from the menu in Google Colab. Some steps are time-comsuming and the entire notebook may take up to 30-60 minutes, so run the entire notebook now and we will inspect the commands and results as we work through as a class.

## Setup

QIIME 2 is usually installed by following the [official installation instructions](https://docs.qiime2.org/2023.9/install/). However, because we are using Google Colab and there are some caveats to using conda here, we will have to hack around the installation a little. But no worries, we provide a setup script below which does all this work for us. 😌

So let's start by pulling a local copy of the project repository down from GitHub.

In [None]:
! git clone https://github.com/bokulich-lab/uzh-microbiome-tutorial.git materials
! mkdir /content/prefetch_cache

We will switch to working within the `materials` directory for the rest of the notebook.

In [None]:
%cd materials

Now we are ready to set up our environment. This will take about 10 minutes.
**Note:** This setup is only relevant for Google Colaboratory and will not work on your local machine. Please follow the [official installation instructions](https://docs.qiime2.org/2023.9/install/) for that.

In [None]:
%run setup_qiime2

And we will use some Python packages below, so let's load these here:

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

## Import data into QIIME 2
Run the following cells first! Feel free to run these first few cells while Anton explains the basics of QIIME 2.

In [None]:
! qiime tools import \
    --type 'SampleData[SequencesWithQuality]' \
    --input-path data/moving_pictures/moving_pictures_manifest.tsv \
    --output-path sequences.qza \
    --input-format SingleEndFastqManifestPhred33V2

In [None]:
! qiime demux summarize \
    --i-data sequences.qza \
    --o-visualization qualities.qzv

## Denoise amplicon sequence variants

In [None]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs sequences.qza \
    --p-trunc-len 135 \
    --p-n-threads 2 \
    --output-dir dada2 \
    --verbose

In [None]:
# Optional
! qiime metadata tabulate \
    --m-input-file dada2/denoising_stats.qza \
    --o-visualization dada2/denoising_stats.qzv

In [None]:
! qiime feature-table summarize \
    --i-table dada2/table.qza \
    --m-sample-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --o-visualization dada2/table.qzv

In [None]:
# Optional
! qiime feature-table tabulate-seqs \
    --i-data dada2/representative_sequences.qza \
    --o-visualization dada2/representative_sequences.qzv

## Generate a phylogenetic tree

In [None]:
! qiime phylogeny align-to-tree-mafft-fasttree \
    --i-sequences dada2/representative_sequences.qza \
    --output-dir phylogeny

## Analyze phylogenetic diversity

In [None]:
! qiime diversity core-metrics-phylogenetic \
    --i-phylogeny phylogeny/rooted_tree.qza \
    --i-table dada2/table.qza \
    --p-sampling-depth 1100 \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --output-dir core-metrics-results

In [None]:
! qiime diversity alpha-group-significance \
    --i-alpha-diversity core-metrics-results/faith_pd_vector.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --o-visualization core-metrics-results/faith_pd_group_significance.qzv

In [None]:
# Optional
! qiime diversity alpha-group-significance \
    --i-alpha-diversity core-metrics-results/evenness_vector.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --o-visualization core-metrics-results/evenness_group_significance.qzv

In [None]:
! qiime emperor plot \
    --i-pcoa core-metrics-results/bray_curtis_pcoa_results.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --o-visualization core-metrics-results/bray_curtis_pcoa.qzv

## Classify by taxonomy

In [None]:
! wget https://data.qiime2.org/2023.9/common/gg-13-8-99-515-806-nb-weighted-classifier.qza

In [None]:
! qiime feature-classifier classify-sklearn \
    --i-reads dada2/representative_sequences.qza \
    --i-classifier gg-13-8-99-515-806-nb-weighted-classifier.qza \
    --p-n-jobs 2 \
    --output-dir taxonomy

In [None]:
! qiime metadata tabulate \
    --m-input-file taxonomy/classification.qza \
    --o-visualization taxonomy/classification.qzv

In [None]:
! qiime taxa barplot \
    --i-table dada2/table.qza \
    --i-taxonomy taxonomy/classification.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --o-visualization taxonomy/taxa_barplot.qzv

## Optional section: Understand differentially abundant features

This section may be omitted for time, but provides an interesting mechanistic view of microbiome interactions.

In [None]:
! mkdir diff_abund

! qiime taxa collapse \
    --i-table dada2/table.qza \
    --i-taxonomy taxonomy/classification.qza \
    --p-level 6 \
    --o-collapsed-table diff_abund/table_l6.qza

In [None]:
! qiime composition add-pseudocount \
    --i-table diff_abund/table_l6.qza \
    --o-composition-table diff_abund/comp_table_l6.qza

In [None]:
! qiime feature-table filter-samples \
    --i-table diff_abund/comp_table_l6.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --p-where "[body-site]='gut'" \
    --o-filtered-table diff_abund/comp_gut_table_l6.qza

In [None]:
! qiime composition ancom \
    --i-table diff_abund/comp_gut_table_l6.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --m-metadata-column subject \
    --o-visualization diff_abund/ancom_gut_subject_l6.qzv

# Additional Tools
* `q2-fondue`
* Beta diversity methods in `q2-diversity`:
  * `qiime diversity beta-group-significance`
  * `qiime diversity adonis`