Skip to content

Creating bacterial phylogenetic trees with PhyloPhlAn

alvaralmstedt edited this page Aug 13, 2015 · 9 revisions

PhyloPhlAn is a program that can be used to generate bacterial phylogenetic trees out of your genomes. It includes a large number of bacterial genomes by default which you can add to with your own sequences or create a new tree using only your own sequences. Unfortunately, its design is somewhat user unfriendly, requiring some messing around for it to work the way you want.

The official wiki, where you can find explanations of the flags etc., is available at:

https://bitbucket.org/nsegata/phylophlan/wiki/Home

Since there are hard-coded locations for where the input and output goes, you will need to download the program to a location you have absolute rights over such as for example home folder.

Download:

wget https://bitbucket.org/nsegata/phylophlan/get/default.tar.gz

then unpack:

tar -zxf default.tar.gz

Now you need to make sure you have the required versions of the external dependencies. Albiorix has the wrong versions of usearch and FastTree first in $PATH. Lets first fix usearch. Unless do not already have one, create a directory in your home that lies first in $PATH:

mkdir ~/bin

Then go into your ~/.bashrc with a text editor and put ~/bin: immediately after export PATH=

Now, copy the correct version of usearch into your prioritised directory:

cp /usr/local/bin/usearch5.2.236_i86linux32 ~/bin

Now to fix FastTree:

cd ~/bin
wget http://www.microbesonline.org/fasttree/FastTree
source ~/.bashrc

You should now have all the dependencies sorted.

If the genome you want to put into a tree is in fasta DNA format, you need to convert it into a multi-sequence amino acid format (.faa). This can be done with prodigal:

prodigal -i <genome.fasta> -a <genome.faa>

Then, you need to put your genome.faa the input directory in the phylophlan installation directory. If you have several genomes, just put them in the same place. It should look something like this:

~/phylophlan/input/job_name/genome.faa

Your are now ready to to run the program. I recommend creating a script in the phylophlan directory which gives an error report text file if something goes wrong during the run. To do this, go to the phylophlan directory and do vim phylophlan.sge. Inside this file, paste something that looks like this:

#!/bin/bash

#$ -cwd
#$ -o stdout.txt
#$ -e stderr.txt
#$ -j y
#$ -S /bin/bash

/home/username/programs/nsegata-phylophlan-f2d78771d71d/phylophlan.py -i -t job_name --nproc $NSLOTS

Just modify the path and flags and job name to your needs. The report file will be located in the phylophlan directory named stdout.txt

To run the analysis you have specified in this file on for example, node0 using 16 cores, do:

qsub -q node0 -pe mpich 16 phylophlan.sge

Your resultsing newick tree file will be located in the output/job_name folder. This tree doest not, however, contain the full taxonomical names for any of the tree tips besides the ones you put in. It only contains the IMG taxon IDs for the other tips. In order to fix this you need to run the following command on your tree. Modify paths to your need:

IFS=$'\n'; for r in `cat /home/username/programs/data/ppafull.tax.txt`; do id=`echo ${r} | cut -f1`; tax=`echo ${r} | cut -f2`; sed -i "s/${id}/${id}_${tax}/g" /home/username/programs/nsegata-phylophlan-f2d78771d71d/output/job_name/genome.tree.int.nwk; done; unset IFS

There! Now your tree should contain the full taxonimical names for all tree tips. You can view this tree with your favourite tree-viewer software (ex. FigTree or TreeGraph 2).

If there is anything you wish to add or modify on this page, feel free to do so. /Alvar