## (Optional) Krona Tools Install

Qiime2 Krona Plugin を使う場合はおそらくセットアップ不要

### データベース作成

In [7]:
import os
import subprocess

if not os.path.exists("/opt/conda"):
    os.system("ln -s /root/miniconda3 /opt/conda")

# 常に /opt/conda/opt/krona/taxonomy を削除して、シンボリックリンクを作成
print("Removing and linking taxonomy directory...")
os.system("rm -rf /opt/conda/opt/krona/taxonomy")
os.system("mkdir -p /notebooks/taxonomy")
os.system("ln -s /notebooks/taxonomy /opt/conda/opt/krona/taxonomy")

# /notebooks/taxonomy の中身を確認
taxonomy_dir = "/notebooks/taxonomy"

# 中身が空かどうかを確認
if not os.listdir(taxonomy_dir):
    # 中身が空の場合、セットアップを実行
    print("No taxonomy data found. Running ktUpdateTaxonomy.sh to update taxonomy...")
    subprocess.run(["ktUpdateTaxonomy.sh"], check=True)
    print("Krona taxonomy update completed.")
else:
    # 既に中身が存在する場合
    print("Taxonomy data already exists. Skipping ktUpdateTaxonomy.sh.")

Removing and linking taxonomy directory...
Taxonomy data already exists. Skipping ktUpdateTaxonomy.sh.


## Qiime2

### Fastq ファイルの取り込み

In [1]:
%%bash

qiime tools import \
    --type 'SampleData[PairedEndSequencesWithQuality]' \
    --input-path data/manifest.csv \
    --output-path data/paired-end-demux.qza \
    --input-format PairedEndFastqManifestPhred33



Imported data/manifest.csv as PairedEndFastqManifestPhred33 to data/paired-end-demux.qza


In [None]:
%%bash

qiime demux summarize \
    --i-data data/paired-end-demux.qza \
    --o-visualization data/paired-end-demux.qzv

### DADA2でデノイズ

In [4]:
%%bash

qiime dada2 denoise-paired \
    --i-demultiplexed-seqs data/paired-end-demux.qza \
    --p-trunc-len-f 0 \
    --p-trunc-len-r 0 \
    --p-trim-left-f 0 \
    --p-trim-left-r 0 \
    --p-trunc-q 2 \
    --p-n-reads-learn 1000000 \
    --p-max-ee-f 2.0 \
    --p-max-ee-r 2.0 \
    --p-n-threads 10 \
    --o-table data/table.qza \
    --o-denoising-stats data/stats.qza \
    --o-representative-sequences data/rep-seqs.qza \
    --verbose


R version 4.3.3 (2024-02-29) 


Loading required package: Rcpp


DADA2: 1.30.0 / Rcpp: 1.0.13 / RcppParallel: 5.1.9 
2) Filtering ..........
3) Learning Error Rates
143137500 total bases in 954250 reads from 10 samples will be used for learning the error rates.
143137500 total bases in 954250 reads from 10 samples will be used for learning the error rates.
3) Denoise samples ..........
..........
5) Remove chimeras (method = consensus)
6) Report read numbers through the pipeline
7) Write output




Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/tmpm8pg8wbt/forward --input_directory_reverse /tmp/tmpm8pg8wbt/reverse --output_path /tmp/tmpm8pg8wbt/output.tsv.biom --output_track /tmp/tmpm8pg8wbt/track.tsv --filtered_directory /tmp/tmpm8pg8wbt/filt_f --filtered_directory_reverse /tmp/tmpm8pg8wbt/filt_r --truncation_length 150 --truncation_length_reverse 150 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 10 --learn_min_reads 1000000

Saved FeatureTable[Frequency] to: data/table.qza
Saved FeatureData[Sequence] to: data/rep-seqs.qza
Saved SampleData[DADA2

### Taxonomy analysis

#### SILVAデータベースのダウンロード


In [2]:
"""
Download: [Silva 138 99% OTUs full-length sequences](https://data.qiime2.org/classifiers/sklearn-1.4.2/silva/silva-138-99-nb-classifier.qza)
UUID: 70b4b5f4-8fce-40bd-b508-afacbc12a5ed
SHA256: c08a1aa4d56b449b511f7215543a43249ae9c54b57491428a7e5548a62613616
Sklearn Version: 1.4.2
Date Trained: 2024-05-30
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
"""

'\nDownload: [Silva 138 99% OTUs full-length sequences](https://data.qiime2.org/classifiers/sklearn-1.4.2/silva/silva-138-99-nb-classifier.qza)\nUUID: 70b4b5f4-8fce-40bd-b508-afacbc12a5ed\nSHA256: c08a1aa4d56b449b511f7215543a43249ae9c54b57491428a7e5548a62613616\nSklearn Version: 1.4.2\nDate Trained: 2024-05-30\nNotes: Silva species taxonomy may be unreliable\nCitations: Robeson et al. (2020), Bokulich et al. (2018), Silva\n'

In [3]:
import hashlib
import requests

# URL and expected SHA256 checksum
url = "https://data.qiime2.org/classifiers/sklearn-1.4.2/silva/silva-138-99-nb-classifier.qza"
expected_sha256 = "c08a1aa4d56b449b511f7215543a43249ae9c54b57491428a7e5548a62613616"
file_path = "data/silva-138-99-nb-classifier.qza"

# Download the file
print("Downloading the file...")
response = requests.get(url)
with open(file_path, "wb") as file:
    file.write(response.content)
print("Download completed.")

# Verify the SHA256 checksum
print("Verifying the SHA256 checksum...")
sha256_hash = hashlib.sha256()
with open(file_path, "rb") as file:
    for byte_block in iter(lambda: file.read(4096), b""):
        sha256_hash.update(byte_block)
calculated_sha256 = sha256_hash.hexdigest()

if calculated_sha256 == expected_sha256:
    print("SHA256 checksum verification successful.")
else:
    print("SHA256 checksum verification failed.")
    print(f"Expected: {expected_sha256}")
    print(f"Calculated: {calculated_sha256}")

Downloading the file...
Download completed.
Verifying the SHA256 checksum...
SHA256 checksum verification successful.


In [8]:
%%bash

qiime feature-classifier classify-sklearn \
    --i-classifier data/silva-138-99-nb-classifier.qza \
    --i-reads data/rep-seqs.qza \
    --o-classification data/taxonomy.qza

Saved FeatureData[Taxonomy] to: data/taxonomy.qza


In [15]:
%%bash

# [WIP] V3-V4 region

qiime feature-classifier classify-sklearn \
    --i-classifier data/silva-138-99-515-806-nb-classifier.qza \
    --i-reads data/rep-seqs.qza \
    --o-classification data/taxonomy-v3-v4.qza

Plugin error from feature-classifier:

  The scikit-learn version (0.24.1) used to generate this artifact does not match the current version of scikit-learn installed (1.4.2). Please retrain your classifier for your current deployment to prevent data-corruption errors.

Debug info has been saved to /tmp/qiime2-q2cli-err-9wsicwg6.log


CalledProcessError: Command 'b'\n# V3-V4 region\n\nqiime feature-classifier classify-sklearn \\\n    --i-classifier data/silva-138-99-515-806-nb-classifier.qza \\\n    --i-reads data/rep-seqs.qza \\\n    --o-classification data/taxonomy-v3-v4.qza\n'' returned non-zero exit status 1.

### Krona Plugin

In [13]:
!pip install git+https://github.com/kaanb93/q2-krona.git

Collecting git+https://github.com/kaanb93/q2-krona.git
  Cloning https://github.com/kaanb93/q2-krona.git to /tmp/pip-req-build-pdunra_7
  Running command git clone --filter=blob:none --quiet https://github.com/kaanb93/q2-krona.git /tmp/pip-req-build-pdunra_7
  Resolved https://github.com/kaanb93/q2-krona.git to commit d794d4bafca56737732bb065589a8c1ab76eb0bd
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: q2-krona
  Building wheel for q2-krona (setup.py) ... [?25ldone
[?25h  Created wheel for q2-krona: filename=q2_krona-1.0.2-py3-none-any.whl size=4626 sha256=53ce1084438be8015012952722c0b59411d50ba545434da28a923e7c3199d823
  Stored in directory: /tmp/pip-ephem-wheel-cache-r7gcfbym/wheels/c0/3b/82/30540f77515eb7a920375db686f03ea973ece9e5eaf21ff2d3
Successfully built q2-krona
Installing collected packages: q2-krona
Successfully installed q2-krona-1.0.2
[0m

In [14]:
%%bash

qiime krona collapse-and-plot \
    --i-table data/table.qza \
    --i-taxonomy data/taxonomy.qza \
    --o-krona-plot data/krona-plot.qzv

QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.


Saved Visualization to: data/krona-plot.qzv
