Skip to content

Custom databases

KijinKim edited this page Feb 22, 2023 · 13 revisions

Kraken2 Database

Centrifuge Database

BLAST Database

Taxonomy Database

Kraken2 Database

VirPipe employs Kraken2 for taxonomic classification of Illumina reads.

Kraken2 database can be downloaded at Kraken2 Refseq index page or built as explained at official kraken2 github.

Centrifuge Database

VirPipe employs Centrifuge for taxonomic classification of Nanopore reads.

Some prebuilt centrifuge databases can be downloaded at the program's homepage. But no database only for viral sequences is provided and should be built on the user's end. We have built a database only with refseq viral sequences and shared it via Zenodo.

BLAST Database

VirPipe runs BLAST as a post-assembly analysis.

Download the prebuilt BLAST database

As NCBI uploads the prebuilt BLAST databases to its ftp server, you could download whichever database you want to deploy.

Build a new BLAST database

Instead, you could build new BLAST database with your own sequences. In this case, you need to install blast+ locally. Follow the instructions below:

  1. Install BLAST+ (if you have not done before)
conda create -n blast blast && conda activate blast
  1. Download accession2taxid from NCBI and decompress the file
# Download can take several minutes
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz && tar -xzvf nucl_gb.accession2taxid.gz
  1. Parse accession2taxid to make taxidmap
sed '1d' nucl_gb.accession2taxid | awk '{print $2" "$3}' > taxidmapfile # can take several hours
  1. Collect sequences to be contained in database to one fasta file
cat virus_1.fasta virus_2.fasta > viruses.fasta
  1. Set database name.
blast_dbname=viruses
  1. Build BLAST database
makeblastdb -in viruses.fasta -parse_seqids -taxid_map taxidmapfile -dbtype nucl -out $blast_dbname

Taxonomy Database

VirPipe annotates BLAST results with taxonomy information. This is possible by utilizing taxnomizr package in R. A database that associates accessions with taxonomies is needed in the process, and you can download it on Zenodo repository.