Skip to content

Latest commit

 

History

History
358 lines (233 loc) · 17.9 KB

faq.rst

File metadata and controls

358 lines (233 loc) · 17.9 KB

FAQ

The structure of the repository is based on |nfcore|_ and additional files and folders are expected.

All the resources for geniac are available here:

Follow the guidelines below if you want to use geniac on an existing repository.

Create a the folder geniac

The guidelines and additional utilities we developed are in geniac should be located in a folder named geniac in your new repository. The utilities in the geniac folder can either be copied or link to your pipeline repository as a |gitsubmodule|_.

Note

If the geniac is used as a submodule in your repository, execute the command git submodule update --init --recursive once you have created the geniac submodule, otherwise the geniac folder will remain empty.

If you want to create a submodule, you can edit and modify the variables in the file :download:`createSubmodule.bash <../data/createSubmodule.bash>` and follow the procedure.

Create additional files and folders

The following files are mandatory:

Moreover, depending on which case your are when you :ref:`process-page`, you can create whenever you need them the following folders:

├── env
├── modules
└── recipes
    ├── conda
    ├── dependencies
    ├── docker
    └── singularity

The source code of your repository should look like this:

├── assets                       # assets needed for runtime
├── bin                          # scripts or binaries for the pipeline
├── conf                         # configuration files for the pipeline
│   ├── geniac.config            # contains the scopes mandatory for geniac
├── docs                         # documentation of the pipeline
├── env                          # process specific environment variables
├── geniac                       # geniac utilities
│   ├── cmake                    # source files for the configuration step
│   ├── docs                     # guidelines for installation
│   ├── install                  # scripts for the build step
├── main.nf
├── modules
│   └── fromSource               # tools installed from source code
│       ├── CMakeLists.txt
│       └── helloWorld
├── nextflow.config
├── recipes                      # installation recipes for the tools
│   ├── conda
│   ├── dependencies
│   ├── docker
│   └── singularity
└── test                         # data to test the pipeline
    └── data

For some tools, it migth be necessary to add custom commands in the docker/singularity recipes automatically generated by geniac. In the conf/geniac.config file, you ca use:

  • params.geniac.containers.cmd.post: to define commands which will be executed at the end of the default commands generated by geniac.
  • params.geniac.containers.cmd.envCustom: to define environment variables which will be set inside the docker and singularity images.

For more details on how to proceed, see :ref:`customcmd-page`.

Geniac automatically generate recipes for Docker and Singularity. To build the containers, it bootstraps on two docker containers from the official |4geniac|_ docker hub registry which includes several :ref:`linux-page`. Instead of using the official docker hub registry, you may want to use a custom registry. In this case, make sure that the exact same container tags available on |4geniac|_ are available on this custom registry and use the following option at the configuration step:

cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} -Dap_docker_registry=my-registry-url/

For example, my-registry-url could be a docker registry available in your gitlab.

The utilities we propose allow the automatic generation of all the config files for the nextflow :ref:`run-profiles`. However, if you really want to write them yourself follow the examples described in :ref:`profiles-page`.

When the pipeline is installed with geniac, the :ref:`install-structure-dir-tree` contains a directory named annotations. This directory can be a symlink to the directory with your existing annotations (can be set during :ref:`install-configure` with the option ap_annotation_path). Check that:

  1. The file :download:`geniac.config <../data/conf/geniac.config>` defines the genomeAnnotationPath in the scope params as follows:
params {

  genomeAnnotationPath = params.genomeAnnotationPath ?: "${projectDir}/../annotations"

}
  1. All the paths to your annotations are defined using the variable params.genomeAnnotationPath as shown in the file :download:`genomes.config <../data/conf/genomes.config>`
  2. You use the variables defined in the :download:`genomes.config <../data/conf/genomes.config>` in the main.nf, for example params.genomes['mm10'].fasta

If needed, you can set the singularityRunOptions and dockerRunOptions values to whatever is needed for your configuration in the geniac.config file. This will set the runOption parameters (see Nextflow configuration) of the |singularity|_ and |docker|_ directive respectively to the selected value when the |singularity|_ and |docker|_ profiles will be called.

In order to activate a conda environment with conda activate myConda_env when a docker or singularity container is executed, some configuration are required insie the recipes. This is described in :ref:`conda-page`.

As geniac automatically generates the recipes of the containers, they are not available in the git repository. However, they can be easily retrieved in several ways. An example to :ref:`install-generate-recipes` is provided.

There are several ways:

Since version 20.07.1, |nextflow|_ provides the DSL2 syntax that allows the definition of module libraries and simplifies the writing of complex data analysis pipelines. geniac is fully compatible with DSL2 and we provide |geniacdemodsl2|_ as an example. The guidelines to :ref:`process-page` remain exactly the same.

The main difference between |geniacdemo|_ and |geniacdemodsl2|_ are:

  • each process is located in one dedicated file in the folder nf-modules/local/process
  • each subworkflow that combines different processes is located in the folder nf-modules/local/subworkflow
  • the main.nf includes these two folders and uses the workflow directive
├── assets                       # assets needed for runtime
├── bin                          # scripts or binaries for the pipeline
├── conf                         # configuration files for the pipeline
│   ├── geniac.config            # contains the geniac scope mandatory for nextflow
├── docs                         # documentation of the pipeline
├── env                          # process specific environment variables
├── geniac                       # geniac utilities
│   ├── cmake                    # source files for the configuration step
│   ├── docs                     # guidelines for installation
│   ├── install                  # scripts for the build step
├── main.nf
├── modules                      # tools installed from source code
│   ├── CMakeLists.txt
│   ├── helloWorld
├── nextflow.config
├── nf-modules                   # nextflow files for DSL2
│   └── local
│       ├── process
│       │   ├── alpine.nf
│       │   ├── checkDesign.nf
│       │   ├── execBinScript.nf
│       │   ├── fastqc.nf
│       │   ├── getSoftwareVersions.nf
│       │   ├── helloWorld.nf
│       │   ├── multiqc.nf
│       │   ├── outputDocumentation.nf
│       │   ├── standardUnixCommand.nf
│       │   ├── trickySoftware.nf
│       │   └── workflowSummaryMqc.nf
│       └── subworkflow
│           ├── myWorkflow0.nf
│           └── myWorkflow1.nf
├── recipes                      # installation recipes for the tools
│   ├── conda
│   ├── dependencies
│   ├── docker
│   └── singularity
└── test                         # data to test the pipeline
    └── data

The |geniacdemodsl2|_ can be run as follows:

export WORK_DIR="${HOME}/tmp/myPipelineDSL2"
export SRC_DIR="${WORK_DIR}/src"
export INSTALL_DIR="${WORK_DIR}/install"
export BUILD_DIR="${WORK_DIR}/build"
export GIT_URL="https://github.com/bioinfo-pf-curie/geniac-demo-dsl2.git"

mkdir -p ${INSTALL_DIR} ${BUILD_DIR}

# clone the repository
# the option --recursive is needed if you use geniac as a submodule
# the option --remote-submodules will pull the last geniac version
# using the release branch from https://github.com/bioinfo-pf-curie/geniac
git clone --remote-submodules --recursive ${GIT_URL} ${SRC_DIR}

cd ${BUILD_DIR}
cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR}
make
make install

cd ${INSTALL_DIR}/pipeline

nextflow -c conf/test.config run main.nf -profile multiconda

With nextflow, it is possible to define a label using a variable instead of a fixed string. In this case, the label value must be given in parenthesis. geniac also support such label. However, the label must be defined according to the following format: label (params.someValue ?: 'toolPrefix'). In any case, the content of the params.someValue must start by the toolPrefix value. The geniac linter will check that there is a tool with a name starting with such a prefix, if it is not the case, it will throw an error.

A typical use case is the possibility to launch a pipeline with a version of a tool given as an option on the nextflow command line. Let's consider that you have declare three versions the mySoft tool in the geniac.config file as follows:

params {
   geniac{
      tools {
         mySoft = "conda-forge::mySoft=v0=r351h96ca727_1003`
         mySoft_v1 = "conda-forge::mySoft=v1=r351h96ca727_1003`
         mySoft_v2 = "conda-forge::mySoft=v1=r351h96ca727_1003`
      }
   }
}

Then, in the netxflow process, define the label as follows:

process mySoft {
  label (params.mySoftVersion ?: 'mySoft')
  label 'minMem'
  label 'minCpu'


  script:
  """
  mySoft --version
  """
}

When you launch nextflow, pass the option --mySoftversion to set which version of mySoft you want to use.

nextflow run main.nf --mySoftversion v2 -profile test,singularity

You may also write your nextflow code to use the default version (i.e. v0 with the mySoft label) if no version is specified.

You will find in both the main.nf and nextflow.config some variables surrounded by @ such as @git_repo_name@. These variables are used during the cmake step to extract the information from the git repository and replace them by their value. These variables are used in the nextflow manifest for example. If needed, you can remove these variables and set the value to whatever you want.

The :ref:`run-profile-conda` relies on the environment.yml that is automatically generated by geniac. However, building a |conda| recipe can sometimes be very tricky as the order of the channels and the dependencies matters. geniac can not guess what is the appropriate order. Moreover, |conda| may want to solve conflicts between incompatible packages. Thus, in some cases, you will have no choice but to correct the environment.yml file manually, add it the git repository (where is located the main.nf file) and install the pipeline with the following options:

cmake ${SRC_DIR}/geniac -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} -Dap_keep_envyml_from_source=ON

Note that it may be impossible to have a working environment.yml file due to the incompatibility between tools. Use the :ref:`run-profile-multiconda` profile instead of the :ref:`run-profile-conda` profile.

The tools available from source are installed in pipeline/bin/fromSource to ensure that, when using the singularity or docker profiles, the tools installed inside the containers are used. Indeed, nextflow add the folder pipeline/bin in the environment variable PATH when the container is launched. To illustrate the impact of this setting, assume that the pipeline has been installed with the singularity images in ${INSTALL_DIR}/pipeline. Then, create the file ${INSTALL_DIR}/pipeline/bin/helloWorld which contains the following bash script:

#! /bin/bash

echo "Buenos dias!"

Make this bash script executable with chmod +x ${INSTALL_DIR}/pipeline/bin/helloWorld and execute the pipeline as follows:

nextflow run main.nf -profile test,singularity

Then, the file ${INSTALL_DIR}/pipeline/results/helloWorld/helloWorld.txt will contain Buenos dias! instead of Hello World!. This means that singularity uses the bash script in ${INSTALL_DIR}/pipeline/bin/helloWorld instead of the helloWorld executable which has been installed inside the image which raises reproducibility issue we can avoid by installing the tools in the folder pipeline/bin/fromSource which is not in the PATH.

There are several ways.

  • Using Geniac CLI (see :ref:`cli-singularity-build`):
    • if you have the sudo privileges, use the singularity mode,
    • if your are allowed to use the fakeroot option, use the singularityfakeroot mode.
  • Using standard cmake options:
    • if you have the sudo privileges, pass the option -Dap_install_singularity_images=ON to cmake, and then run sudo make (see :ref:`install-run-singularity`),
    • if your are allowed to use the fakeroot option, pass both options -Dap_install_singularity_images=ON and -Dap_singularity_build_options=--fakeroot to cmake, and then run make.

In may 2021, the commercial entity sylabs behind Singularity forked the project. The original Singularity repository has been moved to https://github.com/apptainer/singularity which will persist as an archive and be set to read-only after the first release of Apptainer (https://github.com/apptainer/apptainer). Apptainer will provide singularity as a command line link and will maintain as much of the CLI and environment functionality as possible. From the user's perspective, very little, if anything, will change.