- python>=3.8
- R 4.2
VirID requires third-party packages from the conda-forge and bioconda channels
conda install -c bioconda blast bbmap seqkit mafft megahit trimal pplacer taxonkit bowtie2 cd-hit
conda install taxonkit diamond==2.0.15 bowtie2 samtools==1.16.1
pip install Bio biopython DendroPy matplotlib numpy pandas regex seaborn tqdm
Notes:
-
Version of the tool available for reference:
- bbduk.sh:bbmap v39.01 ; Seqkit v2.4.0 ;bowtie2 v2.5.1 ;megahit v1.2.9
- mafft v7.520 ;trimal v1.4.1 ;makeblastdb,blastn,blastp v2.13.0+
- samtools v1.16.1 ;diamond v2.0.15
-
The taxonkit dataset should also be downloaded!
All python packages will be downloaded automatically!
pip install VirID
#install R package
R
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ggtree")
packages=c("tidyverse","ggplot2","RColorBrewer","phangorn","networkD3","jsonlite","dplyr","networkD3","jsonlite")
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg)
sapply(pkg, require, character.only = TRUE)
}
ipak(packages)
Notes: tidyverse is based on systemfonts and you may need the following code to install it
conda install r-systemfonts
VirID requires an environment variable named VirID_DB_PATH
, this is the parent directory for the following databases.
See below for specific database configurations.
#set VirID_DB_PATH to environment variable
export VirID_DB_PATH=/path/to/the/database/
Notes:
-
The databases take up a lot of space, so make sure you have enough disk space. If you already have these databases, you can skip the download step and just configure them.
-
The download speed of the database depends on the internet. You can also choose other download methods such as
ascp
.
-
1.2 Unzip the file and Using bowtie2 to build the index.
bunzip2 -cv VirID_rRNA_db.fasta.bz2 > VirID_DB_PATH/rRNA/VirID_rRNA_db.fasta bowtie2-build VirID_DB_PATH/rRNA/VirID_rRNA_db.fasta VirID_DB_PATH/rRNA/rRNA_cutout_ref
#Download the `PROT_ACC2TAXID` file
wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5
#Check for the file integrity
md5sum -c prot.accession2taxid.gz.md5
#Unzip the files and onfiguration
gunzip -c prot.accession2taxid.gz > VirID_DB_PATH/accession2taxid/prot.accession2taxid
VirID medthod [options]
- medthod:
- end_to_end
- assembly_and_basic_annotation
- phylogenetic_analysis
VirID end_to_end -i 1.fastq -i2 2.fastq \
-out_dir out_path --threads 60 --keep_dup
VirID assembly_and_basic_annotation -i 1.fastq -i2 2.fastq \
-out_dir out_path --threads 60
VirID phylogenetic_analysis -classify_i test/test_contig.fasta \
-out_dir out_path --threads 90 --keep_dup