-
Notifications
You must be signed in to change notification settings - Fork 7
8f. Walkthrough SILVA 16S database with kraken2
(This walkthrough is work in progress and is not yet fully tested) SILVA input data is supported from version 0.4.1
In this walkthrough we will show how to create a Kraken2 16S database with FlexTaxD using SILVA taxonomy and sequences.
Install mamba in your base conda environment.
conda install mamba -n base -c conda-forge
We will then create a self-contained conda environment with flextaxd and kraken2:
mamba create -n flextaxd flextaxd kraken2
Please note that you can also use for example KrakenUniq or Ganon.
(For user new to conda please see the conda guide)
# Remember to activate the flextaxd environment
conda activate flextaxd
#SILVA 16S taxonomy file
wget https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/taxonomy/tax_slv_lsu_138.1.txt.gz
#SILVA 16S sequence file
wget https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/SILVA_138.1_SSURef_tax_silva.fasta.gz
grep ">" silva_genomes.fasta | awk 'BEGIN {FS = ";" } {print $1,"\t",$(NF-1)}' | sed -r 's/\S+//2' | sed -r 's/>//g' > silva_genomeid_to_taxid.txt
sed -i 's/U/T/g' silva_data.test.fasta
awk '/^>/ {close(F) ; F = substr($1,2,length($1)-1)".fasta"} {print >> F}' silva_genomes.fasta
flextaxd -db 16S_database.db -tf tax_slv_ssu_138.1.txt.gz -tt SILVA --genomeid2taxid silva_genomeid_2_taxid.txt --verbose --dbprogram kraken2 --dump
Create kraken database
flextaxd-create -db 16S_database.db --create_db --db_name GTDB_arc_bact_taxo_krakendb --genomes_path GBRS_genomes --dbprogram kraken2 -o taxonomy --verbose --processes 40 --dump