This section provides the guidelines for adding a new process in the main.nf
file such that it allows the automatic generation of the config
files and recipes to build the |singularity|_ and |docker|_ containers. Note that a geniac command line interface is provided to :ref:`cli-page` and ensure that the pipeline is compliant with the following guidelines.
Note
All the examples below are taken from the |geniacdemo|_ pipeline. You can clone this repository and reproduce what is presented. This |geniacdemo|_ is fully functional.
Important
Consider that one process invokes only one tool.
Each process must have a label directive. The label name may be different of the process name. For example:
process fastqc { label 'fastqc' label 'lowMem' label 'lowCpu' tag "${prefix}" publishDir "${params.outDir}/fastqc", mode: 'copy' input: set val(prefix), file(reads) from rawReadsFastqcCh output: file "*_fastqc.{zip,html}" into fastqcResultsCh file "v_fastqc.txt" into fastqcVersionCh script: """ fastqc -q $reads fastqc --version > v_fastqc.txt """ }
Having a label is essential such that it makes it possible to automatically generate the configuration files conda.config
, multiconda.config
, singularity.config
, docker.config
, path.config
and multipath.config
. This configuration files use the withLabel
process selector. We will explain in the section :ref:`process-guidelines` that the name of the label must follow specific rules.
Important
Pay a lot of attention to declare the label for each process since the automatic generation of configuration files mentionned above along with the singularity / docker recipes and containers relies on the label name by parsing the conf/geniac.config
file from the source code.
Note
Why we used withLabel
rather than withName
as process selector in the configutation files? Using withLabel
offers the possibility to use the same exact same tool within two or more different processes with different options. This is a big advantage especially when you use containers as you don't have to build one container per process but the same container can be shared between processes.
- Yes, it is something like grep, sed, cat, etc., then see :ref:`process-unix`.
- Yes, the tool is available in conda and can be easily installed from bioconda, conda-forge channels, then see :ref:`process-easy-conda`.
- Yes, but it cannot be easily installed as the order of the channels matters or it requires
dependencies
and/orpip
directives in the conda recipe, then see :ref:`process-custom-conda`.
- Yes, it is available as a binary (but without source code available) or as an executable script (shell, python, perl), then see :ref:`process-exec`.
- Yes, then see :ref:`process-source-code`.
- Yes, then see :ref:`process-renv`.
Probably not, otherwise, you would not be reading this. This means that the tool can fall in any of these categories:
- it is provided as deb, rpm packages or any executable installer,
- it is a windows executable that needs mono to be run,
- it is whatever that needs a custom installation procedure.
Then see :ref:`process-custom-install`.
If Yes, see :ref:`process-env-var`.
See :ref:`process-resource` to define the informatics resources necessary to run your process.
This is an easy one.
The command must work on standard UNIX system.
Use always label 'onlyLinux'
process standardUnixCommand { label 'onlyLinux' label 'minMem' label 'minCpu' publishDir "${params.outDir}/standardUnixCommand", mode: 'copy' input: file hello from helloWorldOutputCh output: file "bonjourMonde.txt" script: """ sed -e 's/Hello World/Bonjour Monde/g' ${hello} > bonjourMonde.txt """ }
You have nothing to do, the install process will build the recipes and images for you.
Of course, the tool has to be available in a conda channel.
Edit the file conf/geniac.config
and add for example rmarkdown = "conda-forge::r-markdown=0.8=r351h96ca727_1003
in the section params.geniac.tools
as follows:
params { geniac{ tools { rmarkdown = "conda-forge::r-markdown=0.8=r351h96ca727_1003` } } }
The syntax follows the pattern from the conda package naming softName = "condaChannelName::softName=version=buildString"
.
Note that for some tools, other conda dependencies are required and can be added as follows:
params { geniac{ tools { fastqc = "conda-forge::openjdk=8.0.192=h14c3975_1003 bioconda::fastqc=0.11.6=2" } } }
Note also that you can add other conda dependencies from other tools that have been set in the section params.geniac.tools
. This ensures the consistency of the version of tools between tools whenever this is required. To do so, just add the variable in the list such as ${params.geniac.tools.python}
, as shown below:
params { geniac{ tools { fastqc = "${params.geniac.tools.python} conda-forge::openjdk=8.0.192=h14c3975_1003 bioconda::fastqc=0.11.6=2" } } }
The label directive must have the exact same name as given in the params.geniac.tools
section. The label must not contain the prefix renv which is reserved for a tool with :ref:`process-renv`.
Add your process in the main.nf
. It can take any name (which is not necessarily the same name as the software that will be called on command line) provided it follows the :ref:`overview-naming`.
process fastqc { label 'fastqc' label 'lowMem' label 'lowCpu' tag "${prefix}" publishDir "${params.outDir}/fastqc", mode: 'copy' input: set val(prefix), file(reads) from rawReadsFastqcCh output: file "*_fastqc.{zip,html}" into fastqcResultsCh file "v_fastqc.txt" into fastqcVersionCh script: """ fastqc -q $reads fastqc --version > v_fastqc.txt """ }
In most of the case, you will have nothing to do. However, some tools depend on packages that have to be installed from the :ref:`linux-page`. For example, fastqc
requires some fonts to be installed, then add the list of packages that will have to be installed with dnf (this is the Dandified YUM command which is the package management utility for the :ref:`linux-page`). To do so, edit the file conf/geniac.config
and add for example fastqc = 'fontconfig dejavu*'
in the section params.geniac.containers.yum
as follows:
geniac{ containers { yum { fastqc = 'fontconfig dejavu*' } } }
Warning
Be careful that you use the exact same name in params.geniac.containers.yum
, params.geniac.tools
and label otherwise, the container will not work.
If you need to :ref:`customcmd-page`, this can be done using the following scopes associated to the label of the tool:
params.geniac.containers.cmd.post
: to define commands which will be executed at the end of the default commands generated by geniac.params.geniac.containers.cmd.envCustom
: to define environment variables which will be set inside the docker and singularity images.
Of course, the tool has to be available in a conda channel.
Write the custom conda recipe in the directory recipes/conda
, for example add the file trickySoftware.yml
:
name: trickySoftware_env channels: - bioconda - conda-forge - defaults dependencies: - python=3.7.8=h6f2ec95_1_cpython - pip - pip: - numpy==1.19.2
Warning
The yml
file with the conda recipe must follow the following guidelines:
- Name the file using the name of the label (e.g. if the label is
trickySoftware
, the file must be namedtrickySoftware.yml
) - Choose a unique name for your conda environment.
- Each conda package has the naming pattern
softName = "condaChannelName::softName=version=buildString"
. - If you need
pip
to install some packages, addpip
in your dependencies and use the patternsoftName==version
for each package to be installed withpip
.
Edit the file conf/geniac.config
and add for example trickySoftware = "${projectDir}/recipes/conda/trickySoftware.yml
in the section params.geniac.tools
as follows:
geniac{ tools { trickySoftware = "${projectDir}/recipes/conda/trickySoftware.yml" } }
The label directive must have the exact same name as given in the params.geniac.tools
section. The label must not contain the prefix renv which is reserved for a tool with :ref:`process-renv`.
Add your process in the main.nf
. It can take any name (which is not necessarily the same name as the software that will be called on command line) provided it follows the :ref:`overview-naming`.
process trickySoftware { label 'trickySoftware' label 'minMem' label 'minCpu' publishDir "${params.outDir}/trickySoftware", mode: 'copy' output: file "trickySoftwareResults.txt" script: """ python --version > trickySoftwareResults.txt 2>&1 """ }
In most of the case, you will have nothing to do. However, some tools depend on packages that have to be installed from the :ref:`linux-page`. For example, fastqc
requires some fonts to be installed, then add the list of packages that will have to be installed with dnf (this is the Dandified YUM command which is the package management utility for the :ref:`linux-page`). To do so, edit the file conf/geniac.config
and add for example fastqc = 'fontconfig dejavu*'
in the section params.geniac.containers.yum
as follows:
geniac{ containers { yum { myFavouriteTool = 'gsl blas' } } }
If you need to :ref:`customcmd-page`, this can be done using the following scopes associated to the label of the tool:
params.geniac.containers.cmd.post
: to define commands which will be executed at the end of the default commands generated by geniac.params.geniac.containers.cmd.envCustom
: to define environment variables which will be set inside the docker and singularity images.
Warning
Be careful that you use the exact same name in params.geniac.containers.yum
, params.geniac.tools
and label, otherwise, the container will not work.
bin/
directory of the pipeline.read
and execute
UNIX permissions. It must work on a UNIX system.Use label 'onlyLinux'
if this is a bash script or define a new tool with the expected programming language to run the script of binary (e.g. label 'python'
).
Add your process in the main.nf
. It can take any name (which is not necessarily the same name as the software that will be called on command line) provided it follows the :ref:`overview-naming`.
process execBinScript { label 'onlyLinux' label 'minMem' label 'minCpu' publishDir "${params.outDir}/execBinScript", mode: 'copy' output: file "execBinScriptResults_*" script: """ apMyscript.sh > execBinScriptResults_1.txt someScript.sh > execBinScriptResults_2.txt """ }
Note
apMyscript.sh
is so named with ap prefix since it has been developed for the pipeline while someScript.sh
does not have this prefix as it is a third-party script (see :ref:`overview-naming`).
You have nothing to do, the install process will build the recipes and images for you.
First, you have to retrieve the source code and add it in a directory in the modules/fromSource
directory. Create the modules/fromSource
directory if needed. For example, add the source code of the helloWorld
tool in modules/fromSource/helloWorld
directory. This directory can be added as a |gitsubmodule|_ (see this tutorial).
Then comes the tricky part. Add in the file :download:`modules/fromSource/CMakeLists.txt <../data/modules/fromSource/CMakeLists.txt>` the |cmakeexternalproject|_ function from |cmake|_.
ExternalProject_Add( helloWorld SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/helloWorld CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${CMAKE_BINARY_DIR}/externalProject/bin)
Important
Always use the variable ${CMAKE_CURRENT_SOURCE_DIR}
in the SOURCE_DIR
directive, for example SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/helloWorld
Always install the binary in ${CMAKE_BINARY_DIR}/externalProject/bin)
(note that CMAKE_BINARY_DIR
is actually the build directory you have created to configure and build the pipeline, see :ref:`install-page`).
Important
Always create another CMakeLists.txt
file in the folder which stores the source code of the tool. For example, create the modules/fromSource/helloWorld/CMakeLists.txt
file which will explain how the source code must be installed. Depending on the source code you added, refer to the |cmake|_ documentation to correctly write the CMakeLists.txt
file.
Note
Installation from source code offers a great flexibility as the software developer can control everything during the installation process. However, this obviously requires more configuration. In particular, the software developer has to be fluent with |cmake|_ in order to tackle specific use cases, see :ref:`from-source-examples-page` for more details.
The label will be the same name as the directory you added the source code, for example helloWorld
. The label must not contain the prefix renv which is reserved for a tool with :ref:`process-renv`.
Add your process in the main.nf
. It can take any name (which is not necessarily the same name as the software that will be called on command line) provided it follows the :ref:`overview-naming`.
process helloWorld { label 'helloWorld' label 'minMem' label 'minCpu' publishDir "${params.outDir}/helloWorld", mode: 'copy' output: file "helloWorld.txt" into helloWorldOutputCh script: """ helloWorld > helloWorld.txt """ }
You have nothing to do, the install process will build the recipes and images for you.
If you need to :ref:`customcmd-page`, this can be done using the following scopes associated to the label of the tool:
params.geniac.containers.cmd.post
: to define commands which will be executed at the end of the default commands generated by geniac.params.geniac.containers.cmd.envCustom
: to define environment variables which will be set inside the docker and singularity images.
The renv package helps you to create reproducible environments for your R projects. The renv.lock
lockfile records the state of your project’s private library, and can be used to restore the state of that library as required. geniac
can use a renv.lock
lockfile to install all the package dependencies needed by your R environment.
You will need to:
- create the conda recipes in the folder
recipes/conda
which defines which R version you want to use. - add the label with the three scopes
yml
,env
andbioc
, in the sectionparams.geniac.tools
of the fileconf/geniac.config
. - copy the
renv.lock
file in a subfolder with the name of the label inside the folderrecipes/dependencies/
.
The label directive must have the exact same name as given in the params.geniac.tools
section. The label must contain the prefix renv.
Adding a tool with R packages using renv requires two process to be defined. Therefore, the complete guidelines are descibed in the section :ref:`renv-page`.
You have nothing to do, the install process will build the recipes and images for you.
If you need to :ref:`customcmd-page`, this can be done using the following scopes associated to the label of the tool:
params.geniac.containers.cmd.post
: to define commands which will be executed at the end of the default commands generated by geniac.params.geniac.containers.cmd.envCustom
: to define environment variables which will be set inside the docker and singularity images.
Create a folder in recipes/dependencies/
with the label of your tool, for example recipes/dependencies/alpine
. Add in this folder your installer file (deb, rpm or whatever) in the recipes/dependencies/
directory along with any other files that could be needed especially to build the container.
Choose any name you want.
Add your process in the main.nf
. It can take any name (which is not necessarily the same name as the software that will be called on command line) provided it follows the :ref:`overview-naming`.
process alpine { label 'alpine' label 'minMem' label 'minCpu' publishDir "${params.outDir}/alpine", mode: 'copy' input: val x from oneToFiveCh output: file "alpine_*" script: """ source ${projectDir}/env/alpine.env echo "Hello from alpine: \$(date). This is very high here: \${PEAK_HEIGHT}!" > alpine_${x}.txt """ }
This is the only case you will have to write the recipe yourself. The recipe should have the same name as the label with the suffix being either .def
for singularity and .Dockerfile
for docker. Save your recipes the folders recipes/singularity
and recipes/docker
respectively. For example, the alpine.def
recipe looks like this:
Bootstrap: docker From: alpine:3.7 %setup mkdir -p ${SINGULARITY_ROOTFS}/opt %files alpine/myDependency.sh /opt/myDependency.sh %post apk update apk add bash bash /opt/myDependency.sh %environment export LC_ALL=C export PATH=/usr/games:$PATH
The alpine.Dockerfile
recipe looks like this:
FROM alpine:3.7 RUN mkdir -p /opt ADD alpine/myDependency.sh /opt/myDependency.sh RUN apk update RUN apk add bash RUN bash /opt/myDependency.sh ENV LC_ALL C ENV PATH /usr/games:$PATH
Important
As your recipe will very likely depends on files you added for example in the recipes/dependencies/alpine
directory, you can just mention the name of the files in the %files
section for singularity or with the ADD
directive for docker include the name of the label, for example alpine/myDependency.sh
.
prerequisite
If the environment variable will be used by several processes, add it in the conf/base.config
file in the env scope as follows:
env { MY_GLOBAL_VAR = "someValue" }
example
The script apMyscript.sh
uses MY_GLOBAL_VAR
:
#! /bin/bash echo "This is a script I have developed for the pipeline." echo "MY_GLOBAL_VAR: ${MY_GLOBAL_VAR}"
This script is called in the following process:
process execBinScript { label 'onlyLinux' label 'minMem' label 'minCpu' publishDir "${params.outDir}/execBinScript", mode: 'copy' output: file "execBinScriptResults_*" script: """ apMyscript.sh > execBinScriptResults_1.txt someScript.sh > execBinScriptResults_2.txt """ }
prerequisite
Add a file with the name of your process and the extension .env
in the folder env/
. For example, add env/alpine.env
:
#!/bin/bash # required environment variables for alpine PEAK_HEIGHT="4810m" export PEAK_HEIGHT
example
In your process, source the env/alpine.env
and then use the variable you defined:
process alpine { label 'alpine' label 'minMem' label 'minCpu' publishDir "${params.outDir}/alpine", mode: 'copy' input: val x from oneToFiveCh output: file "alpine_*" script: """ source ${projectDir}/env/alpine.env echo "Hello from alpine: \$(date). This is very high here: \${PEAK_HEIGHT}!" > alpine_${x}.txt """ }
Anything related to process are defined in conf/process.config
.
You can define generic labels for both CPU and memory (as you wish) in the file conf/process.config
. For example:
withLabel: minCpu { cpus = 1 } withLabel: lowCpu { cpus = 2 } withLabel: medCpu { cpus = 4 } withLabel: highCpu { cpus = 8 } withLabel: extraCpu { cpus = 16 } withLabel: minMem { memory = 1.GB } withLabel: lowMem { memory = 2.GB } withLabel: medMem { memory = 8.GB } withLabel: highMem { memory = 16.GB } withLabel: extraMem { memory = 32.GB }
Warning
Note that you must use a multi-line format as shown above, otherwise the linter :ref:`cli-page` will throw an error.
Then, in any process, you can just set any label you need. For example:
process execBinScript { label 'onlyLinux' label 'minMem' label 'minCpu' publishDir "${params.outDir}/execBinScript", mode: 'copy' output: file "execBinScriptResults_*" script: """ apMyscript.sh > execBinScriptResults_1.txt someScript.sh > execBinScriptResults_2.txt """ }
To optimize the resources used in a computing cluster, you may want to finely tune the CPU and memory asked by the process. Do do so, define the process selector withName
in the file conf/process.config
for your process of interest. For example:
withName:outputDocumentation { memory = { checkMax( 100.MB, 'memory' ) } }
Tip
To assess what are the amount of resources used by you process refers to the Metrics section fron the |nextflow|_ documentation.
Use the publishDir
directive with the ${params.outDir}
parameters and organize your results as you wish. For example:
publishDir "${params.outDir}/execBinScript", mode: 'copy'