# 1.Import packages

In [10]:
# Importing all required packages at the start of the notebook
import IPython

from qiime2 import Visualization

import pandas as pd

# 2.Import the data

In [4]:
# Location
data_dir = "Project_data/Diversity"
! mkdir -p "$data_dir"

# 3.Determination of the sampling depth

In [3]:
! qiime feature-table summarize \
    --i-table Project_data/Taxonomy/table_filtered.qza \
    --m-sample-metadata-file Project_data/Metadata/updated_fungut_metadata.tsv \
    --o-visualization $data_dir/table_filtered.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Diversity/table_filtered.qzv[0m
[0m[?25h

In [4]:
Visualization.load(f"{data_dir}/table_filtered.qzv")

In [5]:
! qiime diversity alpha-rarefaction \
    --i-table Project_data/Taxonomy/table_filtered.qza \
    --p-max-depth 80000 \
    --m-metadata-file Project_data/Metadata/updated_fungut_metadata.tsv \
    --o-visualization $data_dir/alpha-rarefaction.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Diversity/alpha-rarefaction.qzv[0m
[0m[?25h

In [6]:
Visualization.load(f"{data_dir}/alpha-rarefaction.qzv")

According to alpha rarefication, a sampling depth of 20.000 was chosen, since the Shannon and observed feature metrics start to plateau at this point. Referring to the feature table for this sampling depth results in a retention of 2.720.000 reads (40.81%) across 136 samples (90.67%).

# 4.Euler
The diversity analysis was performed using the `q2-boots` plugin for QIIME2. To run the bootstrapping with a sufficiently high number of iterations (`n = 1000`), this step was performed on Euler. As this plugin was not included in the previously installed MOSHPIT distribution, the Amplicon distribution had to be installed additionally via Miniconda.

## 4.1 Import files
As with the 2.Taxonomy script, the files required to run the bootstrapping on Euler were uploaded to Polybox for download by the script running on Euler.

## 4.2 Bootstraping script
The following script was run on Euler.

```bash
#!/bin/bash
#SBATCH --job-name=bootstraping
#SBATCH --time=24:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=32G
#SBATCH --output=bootstraping_%j.out
#SBATCH --error=bootstraping_%j.err
#SBATCH --mail-type=END,FAIL

# Activate conda
source ~/miniconda3/etc/profile.d/conda.sh
conda activate qiime2-amplicon-2025.10

# Data folder
data_dir="ProjectData"


# Download the meta data and reads
module load eth_proxy

wget --content-disposition -nc --progress=dot:giga -P "$data_dir" https://polybox.ethz.ch/index.php/s/e7ieANgiAn26oBs/download
wget --content-disposition -nc --progress=dot:giga -P "$data_dir" https://polybox.ethz.ch/index.php/s/xNSLKnR2y3QG9eb/download
wget --content-disposition -nc --progress=dot:giga -P "$data_dir" https://polybox.ethz.ch/index.php/s/KscLWzSGnkmEmY5/download

echo "Download done!"

# Run the bootstraping
qiime boots kmer-diversity \
  --i-table $data_dir/table_filtered.qza \
  --i-sequences $data_dir/rep-seqs_filtered.qza \
  --m-metadata-file $data_dir/updated_fungut_metadata.tsv\
  --p-sampling-depth 20000 \
  --p-n 1000 \
  --p-replacement \
  --p-alpha-average-method median \
  --p-beta-average-method medoid \
  --output-dir $data_dir/boots-kmer-diversity

echo "Bootstraping done!"
```

# 5.Diversity
The files created by the script on Euler were downloaded and uploaded to Polybox in order to be accessible for this script.

In [5]:
%%bash -s $data_dir

wget --content-disposition -nc --progress=dot:giga -P "$1" https://polybox.ethz.ch/index.php/s/nmb4j2YDSJbjJP2/download
wget --content-disposition -nc --progress=dot:giga -P "$1" https://polybox.ethz.ch/index.php/s/sYGkqwCffpcK8Si/download
wget --content-disposition -nc --progress=dot:giga -P "$1" https://polybox.ethz.ch/index.php/s/XJFWGkkNYfZSyse/download

chmod -R +rxw "$1"

--2025-11-18 13:01:38--  https://polybox.ethz.ch/index.php/s/nmb4j2YDSJbjJP2/download
Resolving polybox.ethz.ch (polybox.ethz.ch)... 129.132.71.243
Connecting to polybox.ethz.ch (polybox.ethz.ch)|129.132.71.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
--2025-11-18 13:01:38--  https://polybox.ethz.ch/index.php/s/sYGkqwCffpcK8Si/download
Resolving polybox.ethz.ch (polybox.ethz.ch)... 129.132.71.243
Connecting to polybox.ethz.ch (polybox.ethz.ch)|129.132.71.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
--2025-11-18 13:01:38--  https://polybox.ethz.ch/index.php/s/XJFWGkkNYfZSyse/download
Resolving polybox.ethz.ch (polybox.ethz.ch)... 129.132.71.243
Connecting to polybox.ethz.ch (polybox.ethz.ch)|129.132.71.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 81537205 (78M) [application/octet-stream]
Saving to: ‘Project_data/Diversity/braycurtis.qza’

     0K ........ ........ ........ ........ 41%  418M 0s
 32768K ..

## 5.1 Alpha diversity

In [6]:
! qiime diversity alpha-group-significance \
  --i-alpha-diversity $data_dir/shannon.qza \
  --m-metadata-file $data_dir/../Metadata/updated_fungut_metadata.tsv \
  --o-visualization $data_dir/alpha_group_significance.qzv

  import pkg_resources
[32mSaved Visualization to: Project_data/Diversity/alpha_group_significance.qzv[0m
[0m[?25h

In [7]:
Visualization.load(f"{data_dir}/alpha_group_significance.qzv")

## 5.2 Beta diversity

In [8]:
Visualization.load(f"{data_dir}/scatter_plot.qzv")

In [None]:
meta_data_df = pd.read_csv(f"{data_dir}/../Metadata/updated_fungut_metadata.tsv", sep="\t")

In [28]:
meta_data_df.columns

Index(['ID', 'country_sample', 'state_sample', 'latitude_sample',
       'longitude_sample', 'sex_sample', 'age_years_sample',
       'height_cm_sample', 'weight_kg_sample', 'bmi_sample',
       'diet_type_sample', 'ibd_sample', 'gluten_sample', 'age_range',
       'bmi_category', 'continent'],
      dtype='object')

In [31]:
meta_cols = ["age_range", "country_sample", "sex_sample", "bmi_sample", "diet_type_sample", "ibd_sample", "gluten_sample", "continent"]

for col in meta_cols:
    output_name = f"{data_dir}/bray_curtis-{col}-significance.qzv"
    print(f"Running for column: {col}")

    ! qiime diversity beta-group-significance \
        --i-distance-matrix $data_dir/braycurtis.qza \
        --m-metadata-file $data_dir/../Metadata/updated_fungut_metadata.tsv \
        --m-metadata-column {col} \
        --p-pairwise \
        --o-visualization {output_name}
    
# Errors with country_sample (only unique values), bmi_sample (numeric type) 

Running for column: age_range
  import pkg_resources
[32mSaved Visualization to: Project_data/Diversity/bray_curtis-age_range-significance.qzv[0m
[0m[?25hRunning for column: country_sample
  import pkg_resources
[31m[1mPlugin error from diversity:

  All values in the grouping vector are unique. This method cannot operate on a grouping vector with only unique values (e.g., there are no 'within' distances because each group of objects contains only a single object).

Debug info has been saved to /tmp/qiime2-q2cli-err-si1r_y1o.log[0m
[0m[?25hRunning for column: sex_sample
  import pkg_resources
[32mSaved Visualization to: Project_data/Diversity/bray_curtis-sex_sample-significance.qzv[0m
[0m[?25hRunning for column: bmi_sample
  import pkg_resources
Usage: [94mqiime diversity beta-group-significance[0m [OPTIONS]

  Determine whether groups of samples are significantly different from one
  another using a permutation-based statistical test.

[1mInputs[0m:
  [94m[4m--i-dis

In [32]:
Visualization.load(f"{data_dir}/bray_curtis-age_range-significance.qzv")

In [34]:
Visualization.load(f"{data_dir}/bray_curtis-diet_type_sample-significance.qzv")

In [35]:
Visualization.load(f"{data_dir}/bray_curtis-ibd_sample-significance.qzv")

In [36]:
Visualization.load(f"{data_dir}/bray_curtis-sex_sample-significance.qzv")

In [37]:
Visualization.load(f"{data_dir}/bray_curtis-continent-significance.qzv")