Skip to content

8f. Walkthrough SILVA 16S database with kraken2

Andreas Sjödin edited this page Nov 8, 2023 · 1 revision

(This walkthrough is work in progress and is not yet fully tested) SILVA input data is supported from version 0.4.1

In this walkthrough we will show how to create a Kraken2 16S database with FlexTaxD using SILVA taxonomy and sequences.

1 - Setup

1.1 - Environments

Install mamba in your base conda environment.

conda install mamba -n base -c conda-forge

We will then create a self-contained conda environment with flextaxd and kraken2:

mamba create -n flextaxd flextaxd kraken2

Please note that you can also use for example KrakenUniq or Ganon.

(For user new to conda please see the conda guide)

# Remember to activate the flextaxd environment
conda activate flextaxd​

1.2 Fetch source files from SILVA archive

Taxonomy | Fasta sequences

#SILVA 16S taxonomy file
wget https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/taxonomy/tax_slv_lsu_138.1.txt.gz

#SILVA 16S sequence file
wget https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/SILVA_138.1_SSURef_tax_silva.fasta.gz

1.3 Prepare genomes

Prepare genomeid 2 taxid file (Print out genus level NF-1)

grep ">" silva_genomes.fasta | awk 'BEGIN {FS = ";" } {print $1,"\t",$(NF-1)}' | sed -r 's/\S+//2' | sed -r 's/>//g' > silva_genomeid_to_taxid.txt

Translate from RNA to DNA

sed -i 's/U/T/g' silva_data.test.fasta

Split up sequences into files (at some point flextaxd will be able to handle these files, directly)

awk '/^>/ {close(F) ; F = substr($1,2,length($1)-1)".fasta"} {print >> F}' silva_genomes.fasta

1.4 Create the flextaxd taxonomy databsae and write out names.dmp and nodes.dmp

flextaxd -db 16S_database.db -tf tax_slv_ssu_138.1.txt.gz -tt SILVA --genomeid2taxid silva_genomeid_2_taxid.txt --verbose --dbprogram kraken2 --dump

Create kraken database

flextaxd-create -db 16S_database.db --create_db --db_name GTDB_arc_bact_taxo_krakendb --genomes_path GBRS_genomes --dbprogram kraken2 -o taxonomy --verbose --processes 40 --dump