Skip to content

Setting up

Georgios Koutsovoulos edited this page Nov 23, 2023 · 22 revisions

Installation

Clone github repo

git clone https://github.com/GDKO/AvP.git

Create environment with conda

conda create --name avp python=3
conda activate avp

# Install Programs
conda install -c bioconda mafft blast=2.9.0 trimal fasttree iqtree

# Install Python libraries
pip install numpy networkx pyyaml ete3 six biopython docopt pybedtools

[!] If you want to use diamond, download the latest version from here. Avoid installing with conda since sometimes it installs a very old version.

[!] The first time you run the program it will create a database for nodes and names from the NCBI

Databases

We recommend using BLAST with NR and Diamond with other databases.

NR

A copy of NR should be present in the system. See here on how to download NR.

If you want to use diamond with nr see (#11) (thanks to @bshrestha0)

Other databases

mkdir taxdump
tar xvf taxdump.tar.gz -C taxdump

Swissprot

Download SwissProt

#Create taxid file with acc2taxid.py
acc2taxid.py -i uniprot_sprot.fasta.gz -m swissprot > sp.taxids

#Makedb with diamond
diamond makedb --in uniprot_sprot.fasta.gz --taxonmap sp.taxids --db uniprot_sprot.fasta.dmnd --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp

Uniref90

Download Uniref90

#Rename headers with sed
sed 's/>/>Uniref90|/' <(zcat uniref90.fasta.gz) | gzip > uniref90.fasta.fixed.gz

#Create taxid file with acc2taxid.py
acc2taxid.py -i uniref90.fasta.fixed.gz -m uniref > un.taxids

#Makedb with diamond
diamond makedb --in uniref90.fasta.fixed.gz --taxonmap un.taxids --db uniref90.fasta.dmnd --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp

Custom databases

#Fasta headers should be in the following format 
>DB|Accession TaxID=Number (ex. >Uniref90|Q6GZX3 TaxID=654924)

#Create taxid file with acc2taxid.py
acc2taxid.py -i [custom DB] -m uniref > db.taxids

#Create Diamond db
diamond makedb --in [custom DB]  --taxonmap db.taxids --db db.fasta.dmnd --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp

Experimental data_type= DNA option

This option inside the file config.yaml has been only tested with simulated datasets so please report if there are any issues. blastn should work for ncbi databases (e.g nt) and for custom databases see here.