A simple computational workflow written in nextflow for inferring phylogenetic trees from a set of unaligned fasta files of protein sequences, and for visualizing these trees after rooting at midpoint. Input directory and options should be provided in the params.json file. The tool performs the following tasks given a directory of unaligned fasta files:
- Multiple sequence alignment (MUSCLE, MAFFT or FSA)
- Alignment trimming (trimAl)
- Phylogenetic reconstruction using IQ-TREE v. 1.6.X or fasttree v. 2.1.X. If IQ-TREE is chosen, then model selection is performed automatically. If fasttree is selected the LG model is used (Le-Gascuel 2008 model). When using IQ-TREE, branch support is based on 2000 SH-aLRT replicates
- Tree rerooting at midpoint and extraction of rooted tree figures in .svg format using ETE3
nextflow run fasta2tree.nf -params-file params.json
The following should be on path:
- Nextflow
Multiple sequence alignment
- MAFFT
- MUSCLE
- FSA
Alignment trimming:
- trimAl
Phylogenetic reconstruction software:
- IQ-TREE
- FASTTREE
In addition installation of ETE3 (http://etetoolkit.org/) and python3 is required.