<a href="https://colab.research.google.com/github/Gibbons-Lab/isb_course_2020/blob/master/16S.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🦠 Amplicon Sequencing Data Analysis with Qiime 2

This notebook will accompany the session of the ISB Microbiome course 2020. The presentation slides can be [found here](https://gibbons-lab.github.io/isb_course_2020/16S). 

You can save a local copy of this notebook by using `File > Save a copy in Drive`. You may be promted to cetify the notebook is safe. We'll promise that it is 🤞

**Disclaimer:**

The Google colab notebook environment will interpret any command as Python code by default. If we want to run bash commands we will have to prefix them by `!`. So any command you see with a leading `!` is a bash command and if you wanted to run it in your terminal you would omit the leading `!`. So if the notebook run `!wget` you would just run `wget` in your terminal. 

## Setup

Qiime 2 can usually installed by following the [official installation instructions](https://docs.qiime2.org/2020.6/install/). Since we are using Google Colab and there are some caveats using conda here, we will have hack around those a little bit. Nut no worries, we will use a setup script which does all the work for us 😌 So let's start by getting a local copy of the project repository.

In [None]:
!git clone https://github.com/gibbons-lab/isb_course_2020 materials

Cloning into 'materials'...
remote: Enumerating objects: 626, done.[K
remote: Counting objects: 100% (626/626), done.[K
remote: Compressing objects: 100% (435/435), done.[K
remote: Total 626 (delta 155), reused 608 (delta 147), pack-reused 0[K
Receiving objects: 100% (626/626), 29.69 MiB | 33.52 MiB/s, done.
Resolving deltas: 100% (155/155), done.


In [None]:
%run materials/setup_qiime2.py

[19:39:01] 🐍 Miniconda is already installed. Skipped.        setup_qiime2.py:70
           🔍 Qiime 2 is already installed. Skipped.          setup_qiime2.py:90
           🔍 Fixed import paths to include Qiime 2.          setup_qiime2.py:93
           📊 Checking that Qiime 2 command line works...     setup_qiime2.py:39
[19:39:02] 📊 Qiime 2 command line looks good 🎉              setup_qiime2.py:45
           📊 Checking if Qiime 2 import works...            setup_qiime2.py:103
           📊 Qiime 2 can be imported 🎉                     setup_qiime2.py:109
           Cleaned up unneeded files.                         setup_qiime2.py:34
           Everything is A-OK. You can start using Qiime 2   setup_qiime2.py:113
           now 👍                                                               


## Our first Qiime 2 command

Let's remember our workflow for today.

![our workflow](https://github.com/Gibbons-Lab/isb_course_2020/raw/master/docs/16S/assets/steps.png)

The first thing we have to do is to get the data into an artifact.
We can import the data with the `import` action from the tools. For that we have to give
Qiime 2 a *manifest* (list of raw files) and tell it what *type of data* we
are importing and what *type of artifact* we want. 

**QoL Tip:** Qiime 2 commands can get very long. To split them up over several lines we can use `\` which means "continue on the next line".

In [None]:
!qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path ubc_manifest.csv \
  --output-path ubc_data.qza \
  --input-format SingleEndFastqManifestPhred33

Since we have quality information for the sequencing reads, let's also generate
our first visualization by inspecting those. 

---

Qiime 2 commands can become pretty long. Here some pointers to remember the
structure of a command:

```
qiime plugin action --i-argument1 ... --o-argument2 ...
```

Argument types usually begin with a letter denoting their meaning:

- `--i-...` = input files
- `--o-...` = output files
- `--p-...` = parameters
- `--m-...` = metadata

---

In this case we will use the `summarize` action from the `demux` plugin with the previously generated artifact as input and output the resulting visualization to the `qualities.qzv` file.

In [None]:
!qiime demux summarize --i-data ubc_data.qza --o-visualization qualities.qzv

You can open the visualization by downloading the visaulization and using http://view.qiime2.org. To downlaod click on the folder symbol to the left and choose download from the dot menu next to the file. Alternatively you can also have a look directly [here](https://gibbons-lab.github.io/isb_course_2020/16S/qualities).

🤔 What do you observe across the read? Where would you truncate the reads?

## Analyzing sequence variants with DADA2

We will now run the DADA2 plugin which will do 3 things:

1. filter and trim the reads
2. find the most likely original sequences in the sample (ASVs)
3. remove chimeras
4. count the abundances


Since it takes a bit let's start the process and use the time to
understand what is happening:

In [6]:
!qiime dada2 denoise-single \
    --i-demultiplexed-seqs ubc_data.qza \
    --p-trunc-len 220 --p-trim-left 8 \
    --output-dir dada2 --verbose

Usage: [34mqiime dada2 denoise-single[0m 
           [OPTIONS]

  This method denoises single-end sequences,
  dereplicates them, and filters chimeras.

[1mInputs[0m:
  [34m[4m--i-demultiplexed-seqs[0m ARTIFACT [32m[0m
    [32mSampleData[SequencesWithQuality |[0m
    [32mPairedEndSequencesWithQuality][0m
                         The single-end
                         demultiplexed sequences
                         to be denoised.
                                        [35m[required][0m
[1mParameters[0m:
  [34m[4m--p-trunc-len[0m INTEGER  Position at which
                         sequences should be
                         truncated due to decrease
                         in quality. This
                         truncates the 3' end of
                         the of the input
                         sequences, which will be
                         the bases that were
                         sequenced in the last
                         cycles. Reads that 