PhyloLib is a library of efficient algorithms for phylogenetic analysis in the form of a command line application. It was developed in the scope of a master thesis that was divided into two phases. The first phase was the project composed of a report and a presentation. And the second phase was the dissertation composed of a report, an article with supplementary data, a presentation, a documentation, a video explaining how to use the library, and a video explaining how to deploy the application in a Docker image and run it. Instead of building the Docker image, it is also possible to run an already published Docker image in Docker Hub. This second phase was accomplished bearing in mind an agile approach using GitHub's project functionality. The unit tests and benchmarks developed for this library are available in the test folder of the code.
To execute a command of this command line application you should type the name of the library followed by the command name, respective type and options. The usage of this command line application can be retrieved by running the command phylolib help
and looks like the following:
Usage:
phylolib help
phylolib distance (hamming|grapetree|kimura) [options]
phylolib correction (jukescantor) [options]
phylolib algorithm (goeburst|edmonds|sl|cl|upgma|upgmc|wpgma|wpgmc|saitounei|studierkepler|unj) [options]
phylolib optimization (lbr) [options]
Options:
-o=<file> --out=<file> Output file as <format>:<location> with format being (asymmetric|symmetric|newick|nexus)
-d=<file> --dataset=<file> Input dataset file as <format>:<location> with format being (fasta|ml|snp)
-m=<file> --matrix=<file> Input distance matrix file as <format>:<location> with format being (asymmetric|symmetric)
-t=<file> --tree=<file> Input phylogenetic tree file as <format>:<location> with format being (newick|nexus)
-l=<number> --lvs=<number> Limit of locus variants to consider using goeBURST algorithm [default: 3]
You can also run multiple commands by concatenating them with a ":" character like this:
phylolib algorithm upgma --out=newick:tree.txt : distance hamming --dataset=ml:dataset.txt
The order in which the commands are executed is dictated by the phylogenetic analysis workflow, making the order in which the commands are provided indifferent. Except for commands of the same type, that is, that can be executed multiple times, as is the case of the optimization command, in which case the order of execution between them will be dictated by the order in which they are provided. For example, in the execution above, the order in which the commands would be executed would be distance and then algorithm and not algorithm and then distance.
To compile this project into a JAR and execute it, you should:
- Install Gradle and Java JDK13 or higher.
- Open the terminal in the project's folder.
- Run the command
gradle clean
to clean the project. - Run the command
gradle jar
to build the JAR. - Open the terminal in the folder build/libs of the project.
- Run the command
java -jar PhyloLib-1.0-SNAPSHOT.jar help
to execute the JAR.
To build a Docker image for this project and execute it, you should:
- Install Docker and compile the JAR of this project.
- Open the terminal in the project's folder.
- Run the command
docker build -t phylolib .
to build the Docker image. - Run the command
docker run --rm -v $HOME/<DIRECTORY>/files:/files -v $HOME/<DIRECTORY>/logs:/logs phylolib:latest help
to execute the Docker image.