## 1. Primary analyses of our sequences

In [1]:
import os
import sys
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Define the data directory
data_dir = '/data'

### Checking out the reads

In [2]:
! qiime demux summarize \
  --i-data ./data/forward_reads/fungut_forward_reads.qza \
  --o-visualization ./data/forward_reads/fungut_forward_reads_summary.qzv

[32mSaved Visualization to: ./data/forward_reads/fungut_forward_reads_summary.qzv[0m
[0m

In [None]:
Visualization.load('./data/forward_reads/fungut_forward_reads_summary.qzv')

## 2. Filtering the forward reads according to the preprocessed Metadata

We filtered the forward reads to retain only those samples that we kept in our metadata after removing certain NaN values.

This ensures that our datasets are consistent and ready for downstream analyses.

In [3]:
!qiime demux filter-samples \
    --i-demux ./data/forward_reads/fungut_forward_reads.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --o-filtered-demux ./data/forward_reads/filtered_fungut_forward_reads.qza

[32mSaved SampleData[SequencesWithQuality] to: ./data/forward_reads/filtered_fungut_forward_reads.qza[0m
[0m

### Checking out the filtered forward reads

In [4]:
! qiime demux summarize \
  --i-data ./data/forward_reads/filtered_fungut_forward_reads.qza \
  --o-visualization ./data/forward_reads/filtered_fungut_forward_reads_summary.qzv

[32mSaved Visualization to: ./data/forward_reads/filtered_fungut_forward_reads_summary.qzv[0m
[0m

In [5]:
Visualization.load('./data/forward_reads/filtered_fungut_forward_reads_summary.qzv')

## 3. Denoising with dada2

Denoising was performed with DADA2 without truncating or trimming. This decision was based on the quality scores that remained cinsistently high (25th percentile ≥ 30) across the entire read length. 

Retaining the full length of sequences maximizes the information available for accurate taxonomic classification and diversity analysis later down the line.


In [6]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs ./data/forward_reads/filtered_fungut_forward_reads.qza \
  --p-trim-left 0 \
  --p-trunc-len 0 \
  --o-representative-sequences ./data/feature_tables_dada/rep-seqs.qza \
  --o-table ./data/feature_tables_dada/feature-table.qza \
  --o-denoising-stats ./data/feature_tables_dada/denoising-stats.qza

[32mSaved FeatureTable[Frequency] to: ./data/feature_tables_dada/feature-table.qza[0m
[32mSaved FeatureData[Sequence] to: ./data/feature_tables_dada/rep-seqs.qza[0m
[32mSaved SampleData[DADA2Stats] to: ./data/feature_tables_dada/denoising-stats.qza[0m
[0m

#### Check out the feature table

In [7]:
!qiime feature-table summarize \
  --i-table ./data/feature_tables_dada/feature-table.qza \
  --o-visualization ./data/feature_tables_dada/feature-table-summary.qzv

[32mSaved Visualization to: ./data/feature_tables_dada/feature-table-summary.qzv[0m
[0m

In [8]:
Visualization.load('./data/feature_tables_dada/feature-table-summary.qzv')

## 4. Taxonomic Classification
This step was executed on Euler, as we ran into the problem of not having sufficient memory on JupyterHub
See our bash script we ran on Euler in './scripts/'

For classification we used the UNITE pretrained classifier from:

https://github.com/colinbrislawn/unite-train/releases

Classification was performed with `unite_ver10_dynamic_s_all_04.04.2024-Q2-2024.5.qza`, since resources were not an issue when performing the classification on `Euler`.

In [None]:
'''
!qiime feature-classifier classify-sklearn \
  --i-classifier ./data/taxonomy_classification/unite_ver10_dynamic_s_all_04.04.2024-Q2-2024.5.qza \
  --i-reads ./data/rep-seqs.qza \
  --p-reads-per-batch 1000 \
  --o-classification ./taxonomy_unite_dynamic_s_all.qza
'''

In [None]:
# tabulation of the taxonomy classification
!qiime metadata tabulate \
  --m-input-file ./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qza \
  --o-visualization ./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qzv

In [None]:
Visualization.load('./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qzv')

#### Create the taxonomy bar plot

In [9]:
!qiime taxa barplot \
  --i-table ./data/feature_tables_dada/feature-table.qza \
  --i-taxonomy ./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qza \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --o-visualization ./data/taxonomy_classification/taxa-bar-plots.qzv

[32mSaved Visualization to: ./data/taxonomy_classification/taxa-bar-plots.qzv[0m
[0m

Our initial taxonomy classification revelaed sequences categorized as `Eukaryota`, `Unassigned` and taxa from kingdoms other than `k__Fungi`. 

These likely represent contaminants, sequencing errors or classifications due to insufficient information in the reference database. To ensure our analysis is focused on fungal taxa, we filtered out all non-fungal seqeuences, including these categories, in our next step.

In [10]:
Visualization.load('./data/taxonomy_classification/taxa-bar-plots.qzv')

#### Filtering the feature table to exclude eukaryotes
We observed that we have contamination in our samples, so we filtered out those assigned `Eukaryota` and `Unassigned`.



In [11]:
!qiime taxa filter-table \
  --i-table ./data/feature_tables_dada/feature-table.qza \
  --i-taxonomy ./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qza \
  --p-include k__Fungi \
  --o-filtered-table ./data/feature_tables_dada/filtered-feature-table.qza


[32mSaved FeatureTable[Frequency] to: ./data/feature_tables_dada/filtered-feature-table.qza[0m
[0m

We now regenerate the taxonomy barplot to ensure we only have k__Fungi classifications.

### Summary of filtered feature table

In [12]:
!qiime feature-table summarize\
    --i-table ./data/feature_tables_dada/filtered-feature-table.qza \
    --o-visualization ./data/feature_tables_dada/filtered-feature-table_summary.qzv

[32mSaved Visualization to: ./data/feature_tables_dada/filtered-feature-table_summary.qzv[0m
[0m

In [13]:
Visualization.load('./data/feature_tables_dada/filtered-feature-table_summary.qzv')

## Taxa Barplot with filtered feature table

In [14]:
!qiime taxa barplot \
  --i-table ./data/feature_tables_dada/filtered-feature-table.qza \
  --i-taxonomy ./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qza \
  --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
  --o-visualization ./data/taxonomy_classification/filtered-taxa-bar-plots.qzv

[32mSaved Visualization to: ./data/taxonomy_classification/filtered-taxa-bar-plots.qzv[0m
[0m

In [15]:
Visualization.load('./data/taxonomy_classification/filtered-taxa-bar-plots.qzv')