8f. Walkthrough SILVA 16S database with kraken2

(This walkthrough is work in progress and is not yet fully tested) SILVA input data is supported from version 0.4.1

In this walkthrough we will show how to create a Kraken2 16S database with FlexTaxD using SILVA taxonomy and sequences.

1 - Setup

1.1 - Environments

Install mamba in your base conda environment.

conda install mamba -n base -c conda-forge

We will then create a self-contained conda environment with flextaxd and kraken2:

mamba create -n flextaxd flextaxd kraken2

Please note that you can also use for example KrakenUniq or Ganon.

(For user new to conda please see the conda guide)

# Remember to activate the flextaxd environment
conda activate flextaxd​

1.2 Fetch source files from SILVA archive

Taxonomy | Fasta sequences

#SILVA 16S taxonomy file

#SILVA 16S sequence file

1.3 Prepare genomes

Prepare genomeid 2 taxid file (Print out genus level NF-1)

grep ">" silva_genomes.fasta | awk 'BEGIN {FS = ";" } {print $1,"\t",$(NF-1)}' | sed -r 's/\S+//2' | sed -r 's/>//g' > silva_genomeid_to_taxid.txt

Translate from RNA to DNA

sed -i 's/U/T/g' silva_data.test.fasta

Split up sequences into files (at some point flextaxd will be able to handle these files, directly)

awk '/^>/ {close(F) ; F = substr($1,2,length($1)-1)".fasta"} {print >> F}' silva_genomes.fasta

1.4 Create the flextaxd taxonomy databsae and write out names.dmp and nodes.dmp

flextaxd -db 16S_database.db -tf tax_slv_ssu_138.1.txt.gz -tt SILVA --genomeid2taxid silva_genomeid_2_taxid.txt --verbose --dbprogram kraken2 --dump

Create kraken database

flextaxd-create -db 16S_database.db --create_db --db_name GTDB_arc_bact_taxo_krakendb --genomes_path GBRS_genomes --dbprogram kraken2 -o taxonomy --verbose --processes 40 --dump