Skip to content
/ bengen Public

Docker based Multiple sequence aligners benchmark prototype

License

Notifications You must be signed in to change notification settings

cbcrg/bengen

Repository files navigation

BENGEN

Introduction

BenGen is a fully reproducible, automatic and scalable benchmarking prototype, which provides consistently annotated and community-sharable results.

BenGen is functional for the benchmarking of multiple sequence aligners, yet can be easily adapted for the benchmarking of other bioinformatics methods.

How does it work?

Nextflow is the skeleton of Bengen and defines the Benchmarking workflow.

Aligner tools are stored as Docker images and available through the Docker Hub. A unique ID is assigned to each image. This guarantees the containers immutability and the full replicability of the benchmark over time.

Docker provides a container runtime for local and cloud environments. Singularity performs the same role in the context of HPC clusters.

An RDF database, based on the EDAM ontology vocabulary, contains metadata information about each component of the benchmark, making possible to automatize the benchmark and provide a consistent and machine-readable description of the incorporated data, algorithms and their results.

GitHub stores and tracks code changes in consistent manner. It also provides a friendly and well-known user interface that would enable third parties to contribute their own tools with ease.

alt tag

GETTING STARTED

Dependencies

In order to run bengen on your machine Docker and Nextflow need to be installed.

Setup

You first need to clone the Bengen repository:

git clone https://github.com/cbcrg/bengen

Then move in the bengen directory and use make to create all the needed images:

cd bengen && make

Now you are ready to use Bengen!

RUNNING BENGEN LOCALLY (automatic modus)

In order to run BenGen on your machine in its automatic mode, after having followed the steps under the Getting started section, you can trigger the computation locally using the following command.

nextflow run query.nf

Tip: You can use the -resume command to cache what was already computed. This could happen if you run BenGen multiple times.

nextflow run query.nf -resume

In this way, the Metadata dataset is queried and the datasets, methods and scoring functions are automatically selected and run. The selection depends on the query.rq sparql file: this selects only the eligible combinations which can be run. Eventually the results are stored in the scores.ttl file in the proper RDF format.

RUNNING BENGEN LOCALLY (manually)

In order to run BenGen manually, and so define the datasets, scoring functions and methods to be run, the bengen.nf script must be used.

nextflow run bengen.nf

Tip: You can use the -resume command to cache what was already computed. This could happen if you run BenGen multiple times.

nextflow run bengen.nf -resume

If you wish to test BenGen on a restricted amount of data in order to speed things up and quickly getting an overview on how it works you can use the following command:

nextflow run bengen.nf --scores DEMO/scores_demo.txt --methods DEMO/methods_demo.txt --dataset_folder "benchmarking_datasets_demo"

The overall benchmark is driven by a configuration file that allows the definition of different components

  • params.dataset: Defines which dataset to use. Right now only the datasets provided in the benchmark_dataset directory are allowed. If you want to use them all you can use: params.dataset="*".
  • params.renderer: Choose which renderer to use among the ones provided (csv, html, json).
  • params.out: choose how the outputfile should be named.

Example of configuration file content:

docker.enabled = true

params.dataset = "balibase-v3.01"
params.renderer = "csv"
params.out = "output.${params.renderer}"

Important Inside of the bengen directory you can find the methods.txt file and the scores.txt file. They define which aligner to use and which score function to use. You can modify them by adding/removing lines with the name of the aligners/scores you want to run (eg. bengen/NameOfAlignerOrScore).

Example of methods.txt:

bengen/mafft
bengen/tcoffee
bengen/clustalo

Example of scores.txt:

bengen/qscore
bengen/baliscore

! You can see which aligners/scores are already integrated in the project by looking respectively in the boxes or boxes_score directories. You can find these in the bengen directory.

MODIFY BENGEN

Add a Multiple Sequence Aligner

You can easily integrate your new MSA in Bengen by using a script that automatically does the work for you.

In the bengen directory that you cloned you can find the add.sh script.

ARGUMENTS:

  • -n Name of your MSA     compulsory
  • -m Complete Path to your metadata file    compulsory
  • -t Complete Path to your template file    compulsory

Example:
bash add.sh --n MSA-NAME -m /complete/path/to/your/metadatafile -t /complete/path/to/your/templatefile 

You can find more inforamation on how to properly create the metadata and template files under the documentation

CONTRIBUTE TO THE PROJECT

If you wish to contribute to the project you can integrate your new MSA in the public project.

You need to follow these steps :

  1. Clone the repository and modify it by adding your new MSa
  2. Do a pull request to merge the project
  3. Upload the docker images on dockerhub

Afterwards the maintainer of the project will recieve a notification and accept it if relevant to the project. Then the maintainer triggers the computation and the new results are shown on a public HTML page.

alt tag