<a id='setup'></a>

## 0. Setup

In [1]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline

In [2]:
data_dir = '../data'

if not os.path.isdir(data_dir):
    os.makedirs(data_dir)

<a id='data_import'></a>

## 1. Data import

In [3]:
! wget -nv -O $data_dir/sequences.qza 'https://polybox.ethz.ch/index.php/s/PCQspFMocVCKjZ3/download'

2022-12-16 13:18:11 URL:https://polybox.ethz.ch/index.php/s/PCQspFMocVCKjZ3/download [3433846903/3433846903] -> "../data/sequences.qza" [1]


In [5]:
! wget -nv -O $data_dir/metadata/sample_metadata.tsv 'https://polybox.ethz.ch/index.php/s/r1AYzdUVWnQyiRL/download'

2022-12-16 13:18:17 URL:https://polybox.ethz.ch/index.php/s/r1AYzdUVWnQyiRL/download [10012/10012] -> "../data/metadata/sample_metadata.tsv" [1]


We first analyze the sequence data and metadata to decide on the denoising parameters.

In [6]:
metadata_df = pd.read_csv(f'{data_dir}/metadata/sample_metadata.tsv', sep='\t')

In [7]:
! qiime tools peek $data_dir/sequences.qza

[32mUUID[0m:        394c4773-80e2-46a6-9fba-40e7c8ec3fb9
[32mType[0m:        SampleData[PairedEndSequencesWithQuality]
[32mData format[0m: SingleLanePerSamplePairedEndFastqDirFmt


In [8]:
! qiime demux summarize \
    --i-data $data_dir/sequences.qza \
    --o-visualization $data_dir/sequences.qzv

[32mSaved Visualization to: ../data/sequences.qzv[0m
[0m

Visualizing the sequences shows that the sequencing quality drops significantly after 183 bases for forward reads and after 190 after reverse reads.

In [9]:
Visualization.load(f'{data_dir}/sequences.qzv')

<a id='remove primers'></a>

## 2. Remove Primers

Cutadapt is used to trim out the forward and reverse primers, we use the same primers used by the sequencing procedure: (We do not provide some of the output files due to huge space requirements)

In [10]:
! qiime cutadapt trim-paired \
  --i-demultiplexed-sequences $data_dir/sequences.qza \
  --p-front-f AYTGGGYDTAAAGNG \
  --p-front-r CCGTCAATTYHTTTRAGT \
  --p-error-rate 0 \
  --o-trimmed-sequences $data_dir/denoising/primer-trimmed-seqs.qza

[32mSaved SampleData[PairedEndSequencesWithQuality] to: ../data/denoising/primer-trimmed-seqs.qza[0m
[0m

In [11]:
! qiime demux summarize \
  --i-data $data_dir/denoising/primer-trimmed-seqs.qza \
  --o-visualization $data_dir/denoising/primer-trimmed-seqs.qzv

[32mSaved Visualization to: ../data/denoising/primer-trimmed-seqs.qzv[0m
[0m

By comparing the visualizations of the previos plot and current plot, we can see that the lengths of the forward reads are reduced a bit.

In [12]:
Visualization.load(f'{data_dir}/denoising/primer-trimmed-seqs.qzv')

<a id='denoising'></a>

## 3. Denoising - Amplicon Sequence Variants

Dada 2 is used to denoise the paired-end sequences. The truncation length is selected based on median value of quality score larger than 30. 

In [13]:
! qiime dada2 denoise-paired \
    --i-demultiplexed-seqs $data_dir/denoising/primer-trimmed-seqs.qza \
    --p-trunc-len-f 183 \
    --p-trunc-len-r 190 \
    --p-n-threads 4 \
    --o-table $data_dir/denoising/dada2_table.qza \
    --o-representative-sequences $data_dir/denoising/dada2_rep_set.qza \
    --o-denoising-stats $data_dir/denoising/dada2_stats.qza

[32mSaved FeatureTable[Frequency] to: ../data/dada2_table.qza[0m
[32mSaved FeatureData[Sequence] to: ../data/denoising/dada2_rep_set.qza[0m
[32mSaved SampleData[DADA2Stats] to: ../data/denoising/dada2_stats.qza[0m
[0m

In [16]:
## Inspect the denoising stats
! qiime metadata tabulate \
    --m-input-file $data_dir/denoising/dada2_stats.qza \
    --o-visualization $data_dir/denoising/dada2_stats.qzv

[32mSaved Visualization to: ../data/denoising/dada2_stats.qzv[0m
[0m

In [17]:
Visualization.load(f'{data_dir}/denoising/dada2_stats.qzv')

In [18]:
## Visualize the feature table
! qiime feature-table summarize \
  --i-table $data_dir/denoising/dada2_table.qza \
  --o-visualization $data_dir/denoising/dada2_table.qzv

[32mSaved Visualization to: ../data/denoising/dada2_table.qzv[0m
[0m

In [19]:
Visualization.load(f'{data_dir}/denoising/dada2_table.qzv')

In [20]:
## Visualize the resulting sequences
! qiime feature-table tabulate-seqs \
  --i-data $data_dir/denoising/dada2_rep_set.qza \
  --o-visualization $data_dir/denoising/dada2_rep_set.qzv

[32mSaved Visualization to: ../data/denoising/dada2_rep_set.qzv[0m
[0m

In [21]:
Visualization.load(f'{data_dir}/denoising/dada2_rep_set.qzv')