- How can I use geniac on an existing repository?
- How does the repository look like?
- How can I install custom commands in the docker/singularity recipes automatically generated by geniac?
- How can I use a custom docker registry to build the containers?
- How can I write the config files for the different nextflow profiles?
- How should I define the path to the genome annotations?
- How can I pass specific options to run docker or singularity containers?
- How can a conda environment be activated in docker or singularity containers?
- How can I see the recipes for the containers?
- How can I generate all the files automatically created by geniac without installing the pipeline?
- Is geniac compatible with nextflow DSL2?
- How can a process have a label which is defined by a variable?
- What are the @git_*@ variables?
- Why does the conda profile fail to build its environment or take to much time?
- Why are the tools available from source installed in pipeline/bin/fromSource and not in pipeline/bin?
- What privileges do I need to build the singularity images?
- What is the difference between singularity and apptainer?
The structure of the repository is based on |nfcore|_ and additional files and folders are expected.
All the resources for geniac are available here:
- |geniacdoc|_
- |geniacrepo|_
- |geniacdemo|_
- |geniacdemodsl2|_
- |geniactemplate|_
- Example: :download:`useCases.bash <../data/useCases.bash>`
Follow the guidelines below if you want to use geniac on an existing repository.
The guidelines and additional utilities we developed are in geniac
should be located in a folder named geniac
in your new repository. The utilities in the geniac
folder can either be copied or link to your pipeline repository as a
|gitsubmodule|_.
Note
If the geniac
is used as a submodule in your repository, execute the command git submodule update --init --recursive
once you have created the geniac
submodule, otherwise the geniac
folder will remain empty.
If you want to create a submodule, you can edit and modify the variables in the file :download:`createSubmodule.bash <../data/createSubmodule.bash>` and follow the procedure.
The following files are mandatory:
- :download:`CMakeLists.txt <../data/modules/fromSource/CMakeLists.txt>`: create a folder named
modules/fromSource
and copy this file inside if your need to :ref:`process-source-code`. Check that the file is namedCMakeLists.txt
. - :download:`geniac.config <../data/conf/geniac.config>`: copy the file in the folder
conf
. This file contains a scope namesgeniac
that defines all the nextflow variables needed to build, deploy and run the pipeline.
Moreover, depending on which case your are when you :ref:`process-page`, you can create whenever you need them the following folders:
├── env ├── modules └── recipes ├── conda ├── dependencies ├── docker └── singularity
The source code of your repository should look like this:
├── assets # assets needed for runtime ├── bin # scripts or binaries for the pipeline ├── conf # configuration files for the pipeline │ ├── geniac.config # contains the scopes mandatory for geniac ├── docs # documentation of the pipeline ├── env # process specific environment variables ├── geniac # geniac utilities │ ├── cmake # source files for the configuration step │ ├── docs # guidelines for installation │ ├── install # scripts for the build step ├── main.nf ├── modules │ └── fromSource # tools installed from source code │ ├── CMakeLists.txt │ └── helloWorld ├── nextflow.config ├── recipes # installation recipes for the tools │ ├── conda │ ├── dependencies │ ├── docker │ └── singularity └── test # data to test the pipeline └── data
How can I install custom commands in the docker/singularity recipes automatically generated by geniac?
For some tools, it migth be necessary to add custom commands in the docker/singularity recipes automatically generated by geniac. In the conf/geniac.config
file, you ca use:
params.geniac.containers.cmd.post
: to define commands which will be executed at the end of the default commands generated by geniac.params.geniac.containers.cmd.envCustom
: to define environment variables which will be set inside the docker and singularity images.
For more details on how to proceed, see :ref:`customcmd-page`.
Geniac automatically generate recipes for Docker and Singularity. To build the containers, it bootstraps on two docker containers from the official |4geniac|_ docker hub registry which includes several :ref:`linux-page`. Instead of using the official docker hub registry, you may want to use a custom registry. In this case, make sure that the exact same container tags available on |4geniac|_ are available on this custom registry and use the following option at the configuration step:
cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} -Dap_docker_registry=my-registry-url/
For example, my-registry-url
could be a docker registry available in your gitlab.
The utilities we propose allow the automatic generation of all the config files for the nextflow :ref:`run-profiles`. However, if you really want to write them yourself follow the examples described in :ref:`profiles-page`.
When the pipeline is installed with geniac, the :ref:`install-structure-dir-tree` contains a directory named annotations
. This directory can be a symlink to the directory with your existing annotations (can be set during :ref:`install-configure` with the option ap_annotation_path
). Check that:
- The file :download:`geniac.config <../data/conf/geniac.config>` defines the
genomeAnnotationPath
in the scopeparams
as follows:
params { genomeAnnotationPath = params.genomeAnnotationPath ?: "${projectDir}/../annotations" }
- All the paths to your annotations are defined using the variable
params.genomeAnnotationPath
as shown in the file :download:`genomes.config <../data/conf/genomes.config>` - You use the variables defined in the :download:`genomes.config <../data/conf/genomes.config>` in the
main.nf
, for exampleparams.genomes['mm10'].fasta
If needed, you can set the singularityRunOptions
and dockerRunOptions
values to whatever is needed for your configuration in the geniac.config
file. This will set the runOption
parameters (see Nextflow configuration) of the |singularity|_ and |docker|_ directive respectively to the selected value when the |singularity|_ and |docker|_ profiles will be called.
In order to activate a conda environment with conda activate myConda_env
when a docker or singularity container is executed, some configuration are required insie the recipes. This is described in :ref:`conda-page`.
As geniac automatically generates the recipes of the containers, they are not available in the git repository. However, they can be easily retrieved in several ways. An example to :ref:`install-generate-recipes` is provided.
There are several ways:
- either using make commans: see :ref:`install-target-config` and :ref:`install-target-containers`,
- or using Geniac CLI: see :ref:`cli-configs` and :ref:`cli-recipes`.
Since version 20.07.1, |nextflow|_ provides the DSL2 syntax that allows the definition of module libraries and simplifies the writing of complex data analysis pipelines. geniac is fully compatible with DSL2 and we provide |geniacdemodsl2|_ as an example. The guidelines to :ref:`process-page` remain exactly the same.
The main difference between |geniacdemo|_ and |geniacdemodsl2|_ are:
- each process is located in one dedicated file in the folder
nf-modules/local/process
- each subworkflow that combines different processes is located in the folder
nf-modules/local/subworkflow
- the
main.nf
includes these two folders and uses theworkflow
directive
├── assets # assets needed for runtime ├── bin # scripts or binaries for the pipeline ├── conf # configuration files for the pipeline │ ├── geniac.config # contains the geniac scope mandatory for nextflow ├── docs # documentation of the pipeline ├── env # process specific environment variables ├── geniac # geniac utilities │ ├── cmake # source files for the configuration step │ ├── docs # guidelines for installation │ ├── install # scripts for the build step ├── main.nf ├── modules # tools installed from source code │ ├── CMakeLists.txt │ ├── helloWorld ├── nextflow.config ├── nf-modules # nextflow files for DSL2 │ └── local │ ├── process │ │ ├── alpine.nf │ │ ├── checkDesign.nf │ │ ├── execBinScript.nf │ │ ├── fastqc.nf │ │ ├── getSoftwareVersions.nf │ │ ├── helloWorld.nf │ │ ├── multiqc.nf │ │ ├── outputDocumentation.nf │ │ ├── standardUnixCommand.nf │ │ ├── trickySoftware.nf │ │ └── workflowSummaryMqc.nf │ └── subworkflow │ ├── myWorkflow0.nf │ └── myWorkflow1.nf ├── recipes # installation recipes for the tools │ ├── conda │ ├── dependencies │ ├── docker │ └── singularity └── test # data to test the pipeline └── data
The |geniacdemodsl2|_ can be run as follows:
export WORK_DIR="${HOME}/tmp/myPipelineDSL2" export SRC_DIR="${WORK_DIR}/src" export INSTALL_DIR="${WORK_DIR}/install" export BUILD_DIR="${WORK_DIR}/build" export GIT_URL="https://github.com/bioinfo-pf-curie/geniac-demo-dsl2.git" mkdir -p ${INSTALL_DIR} ${BUILD_DIR} # clone the repository # the option --recursive is needed if you use geniac as a submodule # the option --remote-submodules will pull the last geniac version # using the release branch from https://github.com/bioinfo-pf-curie/geniac git clone --remote-submodules --recursive ${GIT_URL} ${SRC_DIR} cd ${BUILD_DIR} cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} make make install cd ${INSTALL_DIR}/pipeline nextflow -c conf/test.config run main.nf -profile multiconda
With nextflow, it is possible to define a label using a variable instead of a fixed string. In this case, the label value must be given in parenthesis. geniac also support such label. However, the label must be defined according to the following format: label (params.someValue ?: 'toolPrefix')
. In any case, the content of the params.someValue
must start by the toolPrefix
value. The geniac linter will check that there is a tool with a name starting with such a prefix, if it is not the case, it will throw an error.
A typical use case is the possibility to launch a pipeline with a version of a tool given as an option on the nextflow command line. Let's consider that you have declare three versions the mySoft
tool in the geniac.config
file as follows:
params { geniac{ tools { mySoft = "conda-forge::mySoft=v0=r351h96ca727_1003` mySoft_v1 = "conda-forge::mySoft=v1=r351h96ca727_1003` mySoft_v2 = "conda-forge::mySoft=v1=r351h96ca727_1003` } } }
Then, in the netxflow process, define the label as follows:
process mySoft { label (params.mySoftVersion ?: 'mySoft') label 'minMem' label 'minCpu' script: """ mySoft --version """ }
When you launch nextflow, pass the option --mySoftversion
to set which version of mySoft
you want to use.
nextflow run main.nf --mySoftversion v2 -profile test,singularity
You may also write your nextflow code to use the default version (i.e. v0
with the mySoft
label) if no version is specified.
You will find in both the main.nf
and nextflow.config
some variables surrounded by @
such as @git_repo_name@
. These variables are used during the cmake
step to extract the information from the git repository and replace them by their value. These variables are used in the nextflow manifest for example. If needed, you can remove these variables and set the value to whatever you want.
The :ref:`run-profile-conda` relies on the environment.yml
that is automatically generated by geniac. However, building a |conda| recipe can sometimes be very tricky as the order of the channels and the dependencies matters. geniac can not guess what is the appropriate order. Moreover, |conda| may want to solve conflicts between incompatible packages. Thus, in some cases, you will have no choice but to correct the environment.yml
file manually, add it the git repository (where is located the main.nf
file) and install the pipeline with the following options:
cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} -Dap_keep_envyml_from_source=ON
Note that it may be impossible to have a working environment.yml
file due to the incompatibility between tools. Use the :ref:`run-profile-multiconda` profile instead of the :ref:`run-profile-conda` profile.
Why are the tools available from source installed in pipeline/bin/fromSource and not in pipeline/bin?
The tools available from source are installed in pipeline/bin/fromSource to ensure that, when using the singularity or docker profiles, the tools installed inside the containers are used. Indeed, nextflow add the folder pipeline/bin in the environment variable PATH
when the container is launched. To illustrate the impact of this setting, assume that the pipeline has been installed with the singularity images in ${INSTALL_DIR}/pipeline
. Then, create the file ${INSTALL_DIR}/pipeline/bin/helloWorld
which contains the following bash script:
#! /bin/bash echo "Buenos dias!"
Make this bash script executable with chmod +x ${INSTALL_DIR}/pipeline/bin/helloWorld
and execute the pipeline as follows:
nextflow run main.nf -profile test,singularity
Then, the file ${INSTALL_DIR}/pipeline/results/helloWorld/helloWorld.txt
will contain Buenos dias! instead of Hello World!. This means that singularity uses the bash script in ${INSTALL_DIR}/pipeline/bin/helloWorld
instead of the helloWorld
executable which has been installed inside the image which raises reproducibility issue we can avoid by installing the tools in the folder pipeline/bin/fromSource which is not in the PATH
.
There are several ways.
- Using Geniac CLI (see :ref:`cli-singularity-build`):
- if you have the sudo privileges, use the singularity mode,
- if your are allowed to use the fakeroot option, use the singularityfakeroot mode.
- Using standard cmake options:
- if you have the sudo privileges, pass the option
-Dap_install_singularity_images=ON
to cmake, and then runsudo make
(see :ref:`install-run-singularity`), - if your are allowed to use the fakeroot option, pass both options
-Dap_install_singularity_images=ON
and-Dap_singularity_build_options=--fakeroot
to cmake, and then runmake
.
- if you have the sudo privileges, pass the option
In may 2021, the commercial entity sylabs behind Singularity forked the project. The original Singularity repository has been moved to https://github.com/apptainer/singularity which will persist as an archive and be set to read-only after the first release of Apptainer (https://github.com/apptainer/apptainer). Apptainer will provide singularity as a command line link and will maintain as much of the CLI and environment functionality as possible. From the user's perspective, very little, if anything, will change.