# 1. Import and Denoising

> **Goal:** Import sequencing reads, assess read quality and generate a denoised feature table for downstream analyses.

---

**Overview**

In this section, we import the sequencing reads for the project and perform initial quality control and denoising.  
The aim is to remove low-quality reads and sequencing errors while retaining high-quality biological signal.

The workflow is organized into five key steps:

1. Import required Python and QIIME 2 packages  
2. Import the sequencing data
3. Construct an initial feature table  
4. Apply quality filtering based on read quality profiles  
5. Perform denoising to obtain ASVs

---

**Table of Contents**

- [1.1 Import packages](#1.1-Import-packages)
- [1.2 Import the data](#1.2-Import-the-data)
- [1.3 Feature table construction](#1.3-Feature-table-construction)
- [1.4 Quality filtering](#1.4-Quality-filtering)
- [1.5 Denoising](#1.5-Denoising)

## 1.1 Import packages

In [1]:
# Importing all required packages at the start of the notebook
import IPython

from qiime2 import Visualization

import qiime2 as q2
import pandas as pd
import matplotlib.pyplot as plt
import os

%matplotlib inline

## 1.2 Import the data

In [2]:
# Location of the projects data
!mkdir -p "Project_data"
data_dir = "Project_data/Import_and_Denoising"

In [3]:
%%bash -s $data_dir
mkdir -p "$1"

wget -nc --progress=dot:giga -P "$1" https://polybox.ethz.ch/index.php/s/uV06vmm96ZzB5eM/download/fungut_forward_reads.qza

chmod -R +rxw "$1"

--2025-12-11 18:07:36--  https://polybox.ethz.ch/index.php/s/uV06vmm96ZzB5eM/download/fungut_forward_reads.qza
Resolving polybox.ethz.ch (polybox.ethz.ch)... 129.132.71.243
Connecting to polybox.ethz.ch (polybox.ethz.ch)|129.132.71.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 712595535 (680M) [application/octet-stream]
Saving to: ‘Project_data/Import_and_Denoising/fungut_forward_reads.qza’

     0K ........ ........ ........ ........  4%  255M 3s
 32768K ........ ........ ........ ........  9%  414M 2s
 65536K ........ ........ ........ ........ 14%  411M 2s
 98304K ........ ........ ........ ........ 18%  327M 2s
131072K ........ ........ ........ ........ 23%  391M 1s
163840K ........ ........ ........ ........ 28%  413M 1s
196608K ........ ........ ........ ........ 32%  396M 1s
229376K ........ ........ ........ ........ 37%  393M 1s
262144K ........ ........ ........ ........ 42%  401M 1s
294912K ........ ........ ........ ........ 47%  400M 1s
327

## 1.3 Feature table construction

In [4]:
# Visual summary of the data
! qiime demux summarize \
    --i-data $data_dir/fungut_forward_reads.qza \
    --o-visualization $data_dir/fungut_forward_reads_demux_seqs.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoising/fungut_forward_reads_demux_seqs.qzv[0m
[0m[?25h

In [5]:
Visualization.load(f"{data_dir}/fungut_forward_reads_demux_seqs.qzv")

The mean quality score is quite high along the nucleotides positions (quality score = 38 at the last position). However, at an early position, the lower whisker drops drastically, suggesting heterogeneity in quality among the different reads.
Because of this, and because ITS have a variable size, we decided to first filter based on PHRED quality score, and then to not use a length truncation.

## 1.4 Quality filtering

In [6]:
! qiime quality-filter q-score \
    --i-demux $data_dir/fungut_forward_reads.qza \
    --p-min-quality 30 \
    --o-filtered-sequences $data_dir/fungut_forward_reads_quality_filtered.qza \
    --o-filter-stats $data_dir/quality_filtering_stats.qza 

  import pkg_resources
[32mSaved SampleData[SequencesWithQuality] to: Project_data/Import_and_Denoising/fungut_forward_reads_quality_filtered.qza[0m
[32mSaved QualityFilterStats to: Project_data/Import_and_Denoising/quality_filtering_stats.qza[0m
[0m[?25h

In [7]:
! qiime metadata tabulate \
    --m-input-file $data_dir/quality_filtering_stats.qza \
    --o-visualization $data_dir/quality_filtering_stats.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoising/quality_filtering_stats.qzv[0m
[0m[?25h

In [8]:
Visualization.load(f"{data_dir}/quality_filtering_stats.qzv")

In [9]:
! qiime demux summarize \
    --i-data $data_dir/fungut_forward_reads_quality_filtered.qza \
    --o-visualization $data_dir/fungut_forward_reads_filtered_demux_seqs.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoising/fungut_forward_reads_filtered_demux_seqs.qzv[0m
[0m[?25h

In [10]:
Visualization.load(f"{data_dir}/fungut_forward_reads_filtered_demux_seqs.qzv")

## 1.5 Denoising

In [11]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs $data_dir/fungut_forward_reads_quality_filtered.qza \
    --p-trunc-len 0 \
    --p-n-threads 3 \
    --o-table $data_dir/dada2_table.qza \
    --o-representative-sequences $data_dir/dada2_rep_set.qza \
    --o-denoising-stats $data_dir/dada2_stats.qza

  import pkg_resources
[32mSaved FeatureTable[Frequency] to: Project_data/Import_and_Denoising/dada2_table.qza[0m
[32mSaved FeatureData[Sequence] to: Project_data/Import_and_Denoising/dada2_rep_set.qza[0m
[32mSaved SampleData[DADA2Stats] to: Project_data/Import_and_Denoising/dada2_stats.qza[0m
[0m[?25h

In [12]:
! qiime metadata tabulate \
  --m-input-file $data_dir/dada2_stats.qza \
  --o-visualization $data_dir/dada2_stats.qzv

! qiime feature-table tabulate-seqs \
  --i-data $data_dir/dada2_rep_set.qza \
  --o-visualization $data_dir/dada2_rep_set.qzv

! qiime feature-table summarize \
  --i-table $data_dir/dada2_table.qza \
  --m-sample-metadata-file Project_data/Metadata/updated_fungut_metadata.tsv \
  --o-visualization $data_dir/dada2_table.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoising/dada2_stats.qzv[0m
  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoising/dada2_rep_set.qzv[0m
  import pkg_resources
[32mSaved Visualization to: Project_data/Import_and_Denoising/dada2_table.qzv[0m
[0m[?25h

In [13]:
Visualization.load(f"{data_dir}/dada2_stats.qzv")

In [14]:
Visualization.load(f"{data_dir}/dada2_rep_set.qzv")

In [15]:
Visualization.load(f"{data_dir}/dada2_table.qzv")