A repository for data and tools used to benchmark getphylo.
This repository contains the following directories and subdirectories:
This directory contains information for the performance benchmarking of getphylo.
This directory contains text files containing the NBCI accessions for the Streptomyces genomes that comprise each of the six benchmarking datasets.
This directory contains the trees produced during benchmarking by getphylo, autoMLST and gtdb-tk.
This directory contains data from four case studies used to demonstrate the utility of getphylo.
This directory contains data relevant to case study one - a bacterial phylogeny. It includes the input genbank files and the output alignment, partition and tree.
This directory contains data relevant to case study two - a phylogeny of a gene clusters related to resorculin biosynthesis. It includes a list of the input MiBiG accessions and the resorculin BGC as a genbank file. It also includes the output alignment, partition and tree.
This directory contains data relevant to case study three - a phylogeny of primates. It contains a list of NCBI accessions used as input and the output alignment, partition and tree.
This directory contains data relevant to case study four - a phylogeny of Eurotiomycete fungi. It contains a list of strains used as input and the output alignment, partition and tree.
This directory contains three scripts used during benchmark:
- alignment_information.py - prints information, such as the number of informative sites, about the supplied alignment
- gtdbtk_unrooted.bash - runs the first three modules of gtdb-tk de novo workflow to produce an unrooted tree
- treesum.py - prints information, such as average branch support, about the supplied newick tree
If you use getphylo
or make use of the scripts or curated data/case studies in this repository, please cite:
Booth, T. J., Shaw, S., & Weber, T. (2023). getphylo: rapid and automatic generation of multi-locus phylogenetic trees. BioRxiv, 2023.07.26.550493.