Introduction

Written as part of the 2021 KIT bioinformatics practical. The program can calculate the RF distance and the generalized RF distances. To get started clone the repository:

git clone --recursive git@github.com:DoktorBotti/RF_Metrics.git

Building

Prerequisites:

boost Version 1.76.0

The log and threading component need to be compiled:
./bootstrap.sh --with-libraries=thread,log,date_time --with-toolset=<your-compiler-here> && ./b2
then point to the created compilation or install the components with ./b2 install
OR-Tools Version 9.0.9048, here are the binary releases for most OSes.

These dependencies will be found in standard locations and by setting the ORTOOLS_ROOT / BOOST_ROOT CMake or environment variable.

Running the program

commandline_rf --metric [ RF | MCI | MSI | SPI ] -i [input-file-path] -o [output-file-path] -n [true|false] -p [number of threads or -1 for auto]

The input must be in the Newick format. The output contains the pairwise distances of all trees given in the input. The normalization if turned on will divide all results by the maximum score in the current calculation. Parallelization is set to num_procs of your machine when no -p option is provided ( equivalent to -p -1)

Performance

The benchmarks took place on a six core Ryzen 5 3600 with 16 GB of ram. We evaluated our results with 2, 10, 50, 100, 130 trees respectively. We also tested multiple problem configurations with up to 1000 taxa. However, our implementation could handle much more.
The first plot shows the average execution time per metric. We divided each measurement by its taxa count to counteract the influence of different problem sizes. The following plots show the execution time, recorded with a different tree input size per plot. The vertical lines represent either timeouts or errors during calculations for their respective color. These plots only show timeouts, since this benchmark was only performed on small instances. With increasing tree and taxa count, the ram will not suffice which results in a BAD_ALLOC. The plots below show speedup and also the execution time dependent on the tree count, averaged over all taxa instances.

Name		Name	Last commit message	Last commit date
Latest commit History 251 Commits
benchmark_others		benchmark_others
benchmark_ours		benchmark_ours
cmake		cmake
data		data
libs		libs
misc		misc
src		src
test		test
utils		utils
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
License.txt		License.txt
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Building

Running the program

Performance

About

Releases

Packages

Languages

License

DoktorBotti/RF_Metrics

Folders and files

Latest commit

History

Repository files navigation

Introduction

Building

Running the program

Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages