Skip to content
Daniel Svensson edited this page Jun 25, 2019 · 7 revisions

Motivation

Many bioinformatics applications offer parameters that affect the process outcome, and that can be manipulated by the user. However, selecting the proper parameter settings is sometimes challenging. Not only will each parameter have an individual effect on the outcome, but there are also potential interaction effects between parameters. Both of these effects may be difficult to predict. To make the situation even more complex, multiple tools may be run in a sequential pipeline where the final output depends on the parameter configuration for each tool in the pipeline. Because of the complexity and difficulty of predicting outcomes, in practice parameters are often left at default settings or set based on personal or peer experience obtained in a trial and error-fashion. To allow for the reliable and efficient selection of parameters for bioinformatic pipelines, a systematic approach is needed.

What doepipeline offers

doepipeline is a novel approach for optimizing bioinformatic software parameters, based on core concepts of Design of Experiments (DoE) methodology and recent advances in subset designs.

In short, parameter settings are first approximated in a screening phase (figure 1a) using a subset design that efficiently spans the entire search space, and are subsequently refined in the following optimization phase (figure 1b) using response surface designs and OLS modeling.

Use doepipeline to:

  • Optimize parameter settings in a single tool, or a pipeline of tools
  • Optimize both qualitative and quantitative parameters

doepipeline overview

Example cases

Four example cases are provided to aid in getting started using doepipeline:

  1. de-novo genome assembly
  2. scaffolding of a fragmented genome assembly
  3. k-mer taxonomic classification of ONT MinION reads
  4. genetic variant calling

In all four example cases, doepipeline found parameter settings producing a better outcome with respect to the measured characteristic, as compared compared to when using default settings. doepipeline provides a systematic and robust framework for optimizing software parameter settings, in contrast to labor- and time-intensive manual parameter tweaking.

The implementation in doepipeline makes our methodology accessible and user-friendly, and allows for automatic optimization of tools in a wide range of cases.