<a id='setup'></a>

## 0. Setup

In [1]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline

In [2]:
data_dir = 'Alien_data'

if not os.path.isdir(data_dir):
    os.makedirs(data_dir)

<a id='data_import'></a>

## 1. Data import

In [3]:
! wget -nv -O $data_dir/sequences.qza 'https://polybox.ethz.ch/index.php/s/PCQspFMocVCKjZ3/download'

2022-10-29 21:23:43 URL:https://polybox.ethz.ch/index.php/s/PCQspFMocVCKjZ3/download [3433846903/3433846903] -> "Alien_data/sequences.qza" [1]


In [4]:
! wget -nv -O $data_dir/sample_metadata.tsv 'https://polybox.ethz.ch/index.php/s/r1AYzdUVWnQyiRL/download'

2022-10-29 21:23:49 URL:https://polybox.ethz.ch/index.php/s/r1AYzdUVWnQyiRL/download [10012/10012] -> "Alien_data/sample_metadata.tsv" [1]


In [5]:
metadata_df = pd.read_csv(f'{data_dir}/sample_metadata.tsv', sep='\t')

In [6]:
! qiime tools peek $data_dir/sequences.qza

[32mUUID[0m:        394c4773-80e2-46a6-9fba-40e7c8ec3fb9
[32mType[0m:        SampleData[PairedEndSequencesWithQuality]
[32mData format[0m: SingleLanePerSamplePairedEndFastqDirFmt


In [7]:
! qiime demux summarize \
    --i-data $data_dir/sequences.qza \
    --o-visualization $data_dir/sequences.qzv

[32mSaved Visualization to: Alien_data/sequences.qzv[0m
[0m

In [8]:
Visualization.load(f'{data_dir}/sequences.qzv')

<a id='remove primers'></a>

## 2. Remove Primers

Cutadapt is used to trim out the forward and reverse primers:

In [9]:
! qiime cutadapt trim-paired \
  --i-demultiplexed-sequences $data_dir/sequences.qza \
  --p-front-f AYTGGGYDTAAAGNG \
  --p-front-r CCGTCAATTYHTTTRAGT \
  --p-error-rate 0 \
  --o-trimmed-sequences $data_dir/primer-trimmed-seqs.qza

[32mSaved SampleData[PairedEndSequencesWithQuality] to: Alien_data/primer-trimmed-seqs.qza[0m
[0m

In [10]:
! qiime demux summarize \
  --i-data $data_dir/primer-trimmed-seqs.qza \
  --o-visualization $data_dir/primer-trimmed-seqs.qzv

[32mSaved Visualization to: Alien_data/primer-trimmed-seqs.qzv[0m
[0m

In [11]:
Visualization.load(f'{data_dir}/primer-trimmed-seqs.qzv')

<a id='denoising'></a>

## 3. Denoising - Amplicon Sequence Variants

Dada 2 is used to denoise the paired-end sequences. The truncation length is selected based on median value of quality score larger than 30. 

In [12]:
! qiime dada2 denoise-paired \
    --i-demultiplexed-seqs $data_dir/primer-trimmed-seqs.qza \
    --p-trunc-len-f 183 \
    --p-trunc-len-r 190 \
    --p-n-threads 4 \
    --o-table $data_dir/dada2_table.qza \
    --o-representative-sequences $data_dir/dada2_rep_set.qza \
    --o-denoising-stats $data_dir/dada2_stats.qza

[32mSaved FeatureTable[Frequency] to: Alien_data/dada2_table.qza[0m
[32mSaved FeatureData[Sequence] to: Alien_data/dada2_rep_set.qza[0m
[32mSaved SampleData[DADA2Stats] to: Alien_data/dada2_stats.qza[0m
[0m

In [13]:
## Inspect the denoising stats
! qiime metadata tabulate \
    --m-input-file $data_dir/dada2_stats.qza \
    --o-visualization $data_dir/dada2_stats.qzv

[32mSaved Visualization to: Alien_data/dada2_stats.qzv[0m
[0m

In [14]:
Visualization.load(f'{data_dir}/dada2_stats.qzv')

In [15]:
## Visualize the feature table
! qiime feature-table summarize \
  --i-table $data_dir/dada2_table.qza \
  --o-visualization $data_dir/dada2_table.qzv

[32mSaved Visualization to: Alien_data/dada2_table.qzv[0m
[0m

In [16]:
Visualization.load(f'{data_dir}/dada2_table.qzv')

In [17]:
## Visualize the resulting sequences
! qiime feature-table tabulate-seqs \
  --i-data $data_dir/dada2_rep_set.qza \
  --o-visualization $data_dir/dada2_rep_set.qzv

[32mSaved Visualization to: Alien_data/dada2_rep_set.qzv[0m
[0m

In [18]:
Visualization.load(f'{data_dir}/dada2_rep_set.qzv')

<a id='taxonomy'></a>

## 4. Taxonomy Classification

<a id='train_classifier'></a>

### 4.1 Training taxonomy classifier

We tried to use the silva database and train the classifers using naive bayes, but it exceeded the memory capacity by training the classifer and also by using the pre-trained Silva database, so we decided to use the pre-trained Greengene classifier. 

In [19]:
#! qiime feature-classifier fit-classifier-naive-bayes \
#     --i-reference-reads $data_dir/silva-138-ssu-nr99-seqs-515f-806r-uniq.qza \
#     --i-reference-taxonomy $data_dir/silva-138-ssu-nr99-tax-515f-806r-derep-uniq.qza \
#     --p-classify--chunk-size 1000 \
#     --o-classifier $data_dir/515f-806r-classifier.qza

In [20]:
! wget -nv -O $data_dir/gg-13-8-99-nb-classifier.qza 'https://data.qiime2.org/2022.8/common/gg-13-8-99-nb-classifier.qza'

2022-10-29 22:35:19 URL:https://s3-us-west-2.amazonaws.com/qiime2-data/2022.8/common/gg-13-8-99-nb-classifier.qza [104512483/104512483] -> "Alien_data/gg-13-8-99-nb-classifier.qza" [1]


<a id='tax_assignment'></a>

### 4.2 Taxonomy assignment

In [21]:
! qiime feature-classifier classify-sklearn \
    --i-classifier $data_dir/gg-13-8-99-nb-classifier.qza \
    --i-reads $data_dir/dada2_rep_set.qza \
    --o-classification $data_dir/taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: Alien_data/taxonomy.qza[0m
[0m

<a id='tax_visualization'></a>

### 4.3 Taxonomy visualization

In [22]:
! qiime metadata tabulate \
    --m-input-file $data_dir/taxonomy.qza \
    --o-visualization $data_dir/taxonomy.qzv

[32mSaved Visualization to: Alien_data/taxonomy.qzv[0m
[0m

In [23]:
Visualization.load(f'{data_dir}/taxonomy.qzv')

In [24]:
! qiime taxa barplot \
    --i-table $data_dir/dada2_table.qza \
    --i-taxonomy $data_dir/taxonomy.qza \
    --m-metadata-file $data_dir/sample_metadata.tsv \
    --o-visualization $data_dir/taxa-bar-plots.qzv

[32mSaved Visualization to: Alien_data/taxa-bar-plots.qzv[0m
[0m

In [25]:
Visualization.load(f'{data_dir}/taxa-bar-plots.qzv')

Mitochondria and chloroplast are filtered since they don't belong to the gut microbiota communities:

In [26]:
! qiime taxa filter-table \
    --i-table $data_dir/dada2_table.qza \
    --i-taxonomy $data_dir/taxonomy.qza \
    --p-exclude mitochondria,chloroplast \
    --o-filtered-table $data_dir/table-filtered.qza

[32mSaved FeatureTable[Frequency] to: Alien_data/table-filtered.qza[0m
[0m

In [27]:
! qiime taxa filter-seqs \
    --i-sequences $data_dir/dada2_rep_set.qza \
    --i-taxonomy $data_dir/taxonomy.qza \
    --p-exclude mitochondria,chloroplast \
    --o-filtered-sequences $data_dir/rep-seqs-filtered.qza

[32mSaved FeatureData[Sequence] to: Alien_data/rep-seqs-filtered.qza[0m
[0m

In [28]:
! qiime taxa barplot \
    --i-table $data_dir/table-filtered.qza \
    --i-taxonomy $data_dir/taxonomy.qza \
    --m-metadata-file $data_dir/sample_metadata.tsv \
    --o-visualization $data_dir/taxa-bar-plots_filtered.qzv

[32mSaved Visualization to: Alien_data/taxa-bar-plots_filtered.qzv[0m
[0m

In [29]:
Visualization.load(f'{data_dir}/taxa-bar-plots_filtered.qzv')