In [None]:
#set up environment
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization

# create directories for the notebook. DO NOT change
data_dir = 'data/06_Kraken_results'
!data_dir = 'data/06_Kraken_results'

!mkdir -p data
!mkdir -p $data_dir

# fetches useful files for the current notebook. All files will be saved in $data_dir
!wget 'https://polybox.ethz.ch/index.php/s/BDifSidsoRrAaqX/download' -O data/Download.zip
!unzip -o data/Download.zip -d data
!rm data/Download.zip

# Taxonomy: Kraken2 based classification
For classification we decided to use the kraken2 Moshpit tool and used the combined dereplicated mags with pluspfp16 Kraken2 db. If you are not familiar with Kraken "pluspfp" contains Reference Sequences of the standard: archaea, bacteria, viral, plasmid, human1, UniVec_Core; in addition it also contains protozoa, fungi and plants. The rationale behind this choice is the attempt to classify the dereplicated mags that were extracted in step 03 from all 3 domains of life. A size-capped database was used due to limits in computational usage (16Gb). [^1]

### Step 1: downloading db

the following script was supposed to download and classify only bacteria dereplicates directly. It has not been modified since a an alternative was used for the actual classification (see next markdown) and the used database is the same

[^1] Wood et al. Genome Biology (2019) 20:257 https://doi.org/10.1186/s13059-019-1891-0


In [None]:
!head -n 35 $data_dir/06_Taxonomy_classification.slurms.sh

### Step 2: Classification
Premise: a bug in how moshpit handles cpus for this step did not allow to use a .slurms.sh. A pipeline of commands was fed to the interactive node instead.

As Previously mentioned the input is the combined of the 3 dereplicated mags obtained in Notebook 03. This command results in 2 files: the reports and the hits.

In [None]:
!cat $data_dir/06_Interactive_script.sh

### Step 3: annotation
Annotation is necessary to combine the reports and the hits into a feature-table for the Taxonomy.
The end result is a visualization of the table of taxonomical classification of all dereplicated samples.

In [None]:
#annotate hits to tree
! qiime annotate kraken2-to-mag-features \
    --i-reports $data_dir/Classify/kraken_reports_mags_derep_50.qza \
    --i-outputs $data_dir/Classify/kraken_hits_derep_50.qza \
    --o-taxonomy $data_dir/mags-taxonomy.qza

In [None]:
#create taxonomy viusalization
! qiime metadata tabulate \
    --m-input-file $data_dir/mags-taxonomy.qza \
    --o-visualization $data_dir/taxonomy.qzv

In [None]:
Visualization.load(f"{data_dir}/taxonomy.qzv")

### Step 4: Downstream analysis
this step is still a work in progress. the result we would like to achieve is the metacoder tree graph (seen in Week 11 lecture) to compare different country and food types with each other.

In [None]:
!qiime tools export \
    --input-path $data_dir/taxonomy.qzv \
    --output-path $data_dir/taxonomy_out

In [None]:
#setup R environment for metacoder plot
!pip install rpy2
%load_ext rpy2.ipython

In [None]:
%%R
#install.packages("metacoder")

In [None]:
#instal metacoder library
import rpy2
%%R
options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("metacoder")


In [None]:
%%R
tax <- read.table("/data/Kraken_results/taxonomy_out/metadata.tsv",
                  header = TRUE,
                  sep = "\t",
                  stringsAsFactors = FALSE)
