# 1.Import packages

In [1]:
# Importing all required packages at the start of the notebook
import IPython

from qiime2 import Visualization

import qiime2 as q2
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# 2.Import the data

In [2]:
# Location of the projects data
!mkdir -p "Project_data"
data_dir = "Project_data/Import_and_Denoizing"

In [3]:
%%bash -s $data_dir
mkdir -p "$1"

wget -nc --progress=dot:giga -P "$1" https://polybox.ethz.ch/index.php/s/uV06vmm96ZzB5eM/download/fungut_forward_reads.qza
wget -nc --progress=dot:giga -P "$1" https://polybox.ethz.ch/index.php/s/CA76kKFC9FApqpR/download/fungut_metadata.tsv

chmod -R +rxw "$1"

File ‘Project_data/Import_and_Denoizing/fungut_forward_reads.qza’ already there; not retrieving.

File ‘Project_data/Import_and_Denoizing/fungut_metadata.tsv’ already there; not retrieving.



# 3.Feature table construction

In [4]:
# Visual summary of the data
! qiime demux summarize \
    --i-data $data_dir/fungut_forward_reads.qza \
    --o-visualization $data_dir/fungut_forward_reads_demux_seqs.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoizing/fungut_forward_reads_demux_seqs.qzv[0m
[0m[?25h

In [5]:
Visualization.load(f"{data_dir}/fungut_forward_reads_demux_seqs.qzv")

Seeing that the quality of the reads stays quite high even at the last position (151 nts), we first tried truncating at this length. However, this led to a major loss of sequences when comparing denoising steps with a truncating length of 130 or 140. We also tried another denoizing technique, with no truncation at all, filtering afterhand the sequences that were present in too little samples and the ones present at a too little frequence, but that also led to the loss of a lot of sequencing. Finally, we settled down for a length of truncation of 135, because it gave us a good output when looking at the sequencing stats.

# Denoising - Amplicon Sequence Variants

In [11]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs $data_dir/fungut_forward_reads.qza \
    --p-trunc-len 135 \
    --p-n-threads 3 \
    --p-max-ee 4 \
    --p-min-fold-parent-over-abundance 4 \
    --o-table $data_dir/dada2_table.qza \
    --o-representative-sequences $data_dir/dada2_rep_set.qza \
    --o-denoising-stats $data_dir/dada2_stats.qza

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Project_data/Import_and_Denoizing/dada2_table.qza[0m
[32mSaved FeatureData[Sequence] to: Project_data/Import_and_Denoizing/dada2_rep_set.qza[0m
[32mSaved SampleData[DADA2Stats] to: Project_data/Import_and_Denoizing/dada2_stats.qza[0m
[0m[?25h

In [12]:
! qiime metadata tabulate \
  --m-input-file $data_dir/dada2_stats.qza \
  --o-visualization $data_dir/dada2_stats.qzv

! qiime feature-table tabulate-seqs \
  --i-data $data_dir/dada2_rep_set.qza \
  --o-visualization $data_dir/dada2_rep_set.qzv

! qiime feature-table summarize \
  --i-table $data_dir/dada2_table.qza \
  --m-sample-metadata-file $data_dir/fungut_metadata.tsv \
  --o-visualization $data_dir/dada2_table.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoizing/dada2_stats.qzv[0m
  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoizing/dada2_rep_set.qzv[0m
  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoizing/dada2_table.qzv[0m
[0m[?25h

In [13]:
Visualization.load(f"{data_dir}/dada2_stats.qzv")

In [14]:
Visualization.load(f"{data_dir}/dada2_rep_set.qzv")

In [15]:
Visualization.load(f"{data_dir}/dada2_table.qzv")