The root of all Roddy plugins, including the PluginBase plugin.
All top-level tools or scripts that are supposed to be started on the cluster by Roddy are actually not directly started, but are wrapped by the resources/roddyTools/wrapInScript.sh
contained in this plugin.
You need at least Bash 4.2 for running the wrapInScript.sh
.
The wrapper script has the following general structure
- setup
- source
baseEnvironmentScript
(e.g./etc/profile
) - source the job parameter file (
PARAMETER_FILE
) - optionally, if
outputFileGroup
!= "false", change to the requested group withsg
and restart the script at the "setup" step (above) - source job-specific environment script (see "Environment Setup Support" below)
- setup scratch directory
- update the
jobStateLogfile.txt
using a lock-file - run the wrapped script using bash
- kill child-process still running after the wrapped script ended
- update the
jobStateLogfile.txt
with the job's exit code - exit
Each job is started with the default environment configured in you applicationProperties.ini
in the baseEnvironmentScript
variable. The baseEnvironmentScript
serves as kind of general configuration of your cluster environment. Usually you will use a script like /etc/profile
or $HOME/.profile
or $HOME/.bashrc
.
Note that often the baseEnvironmentScript
is not under your control and may be sensitive for certain environment options, such as set -e
or set -u
. Therefore, error checks and logging options, which are turned on in the wrapInScript.sh
if you set debugWrapInScript=true
, will be turned off while reading the base environment.
After the base environment script and after the job-parameter file were sourced, the wrapper script checks whether you have a dedicated environment script defined for the whole workflow or this specific cluster job. These environment scripts defined in one of the configuration XMLs or on the commandline via the --cvalues
parameter.
The "workflow-environment" script defines the environment for all jobs of the workflow. By contrast, "job-environment" scripts define the environment for individual jobs and take precedence over the workflow-environments.
To define a workflow-level environment setup script, you can add lines like the following to your XMLs:
<configurationvalues>
<cvalue name="workflowEnvironmentScript" value="workflowEnvironment_conda" type="string"
description="Use 'workflowEnvironment_conda' for a generic Conda environment."/>
</configurationvalues>
<processingTools>
<tool name="workflowEnvironment_conda" value="conda.sh" basepath="environments"/>
<tool name="workflowEnvironment_lsf" value="lsf.sh" basepath="environments"/>
</processingTools>
This will declare two environment scripts and select the "workflowEnvironment_conda" as the environment to use. The user can still select lsf.sh
as workflow environment by defining e.g. --cvalue="workflowEnvironmentScript:workflowEnvironment_lsf"
on the command line. In this example, environment scripts need to be located in the resources/environments
directory in the plugin, which is copied to the execution host.
You may want to specify dedicated job-environment scripts for individual cluster jobs. These take precedence over the global workflow environment script. For instance, the following defines a tool as environment script for the correctGcBias
cluster job (which is also defined as tool).
<configurationvalues>
<cvalue name="correctGcBiasEnvironmentScript" value="${TOOL_CORRECT_GC_BIAS_ENVIRONMENT_CONDA}" type="string"/>
</configurationvalues>
<processingTools>
<tool name="correctGcBiasEnvironment_conda" value="conda-correctGcBias.sh" basepath="environments"/>
</processingTools>
Internally, the tool names are mapped to a TOOL_
bash variable according to the following rules:
- inserting an underscore '_' before all capitals,
- changing all letters to upper-case, and
- prepending "TOOL_" before the name.
It is also possible, to refer to the tool by using a configuration value of the form ${TOOL_WORKFLOW_ENVIRONMENT_CONDA}
. This form is occasionally used in existing plugins, but we advise you to use the first simpler form.
Sometimes having to modify the plugin in place is not possible or desirable, in particular during development. In this case, you can also specify the environment script directly in the configuration value like in "/path/to/develEnv.sh". This path should be absolute and must be available on the execution host. This possibility is only available since version 1.2.2-5 of this plugin.
The logic to discriminate between these three cases is as follows:
- the value contains a '/': this is a direct path. This only works since version 1.2.2-5.
- the value starts with '${': this is a TOOL_ path. Since version 1.2.2-5 the matching is on
${TOOL_}
. - compose the the
TOOL_
variable name from the job-name, like described above.
The environment script is simply source
'd, so you can access variables from the parameter-file (PARAMETER_FILE
, sourced before; see above) from within that script. For instance, you have a conda.sh
that activates a Conda environment, but you want to keep the environment name configurable. You can then the conda environment name in the XML:
<cvalue name="condaEnvironmentName" value="myWorkflow" type="string"
description="Name of the Conda environment on the execution hosts. Used by the environment setup script conda.sh defined as tool below."/>
Then your conda.sh
may look like this:
source activate "$condaEnvironmentName"
The environment setup scripts are mostly useful for setting up environment variables that can be used in the wrapped script, which does the actual job for you.
To achieve this Bash variables need to be exported with the export
declaration.
Sometimes it can be useful to define a Bash function in the environment script, for use in the wrapper. These Bash functions can get exported with export -f
. An example is a wrapper function for a tool with a complex call which you want to wrap for better readability in your workflow code.
Note that due to a bug in Bash with exported array variables in Bash <4.4, something like export -a
won't work. We suggest here to take the same strategy as the PARAMETER_FILE
does, namely to export them as quoted Bash array string
export arrayStringVar="(a b c d)"
and then cast this string into a Bash arrays in your wrapped script with
declare -a arrayVar="$arrayStringVar"
The debugWrapInScript
variable -- defaulting to false
-- turns on the set +xv
verbosity shell options.
The baseEnvironmentScript
is sourced with relaxed values for set
, i.e. with set +ue
, because often files like /etc/profile
are not under the control of the person running the workflow. Conversely, changes to the set
options in the baseEnvironmentScript
are not inherited by subsequent code in the wrapInScript.sh
.
The environment script has the same values for the shell options set via set
in Bash, as the wrapper. In particular this means that errexit
is set. Changes in the environment script are inherited by subsequent code in the wrapInScript.sh
.
It is possible to run the same command that Roddy runs as remote job from the interactive command line. The wrapper script recognizes that it is run in an interactive session and avoids an exiting of the Bash upon errors (i.e. set +e
is set) but should otherwise behave exactly as if run by bsub
or qsub
.
Finally, the wrapped script has debugging options WRAPPED_SCRIPT_DEBUG_OPTIONS
. For convenience, the application of these options can be turned off by the disableDebugOptionsForToolscript
.
As stated previously, the wrapped script is executed by Bash. This means you can use a shebang-line to select an arbitrary interpreter, e.g. one you have pulled into the environment via the baseEnvironmentScript
or the workflow- or job-specific environments scripts.
The following conventions are nothing more than that and are currently not enforced by Roddy:
- use camel-case tool names starting with small letters (e.g. "correctGcBias")
- append the arbitrary environment name that you want to use to the tool name to get the name of the environment variable
- describe the environment in the
description
attribute of thecvalue
tag - the environment setup scripts is located in the "environments" subdirectory of the workflow directory in the plugin
-
1.2.2-5
- Turn off debugging options when sourcing environment files. This allows using environment scripts that fail because of
set -u
). - Refactored lockfile code in
wrapInScript.sh
- Report if user is not member of
outputFileGroup
. - Allow defining environment scripts outside the plugin
resources/
directory.
- Turn off debugging options when sourcing environment files. This allows using environment scripts that fail because of
-
1.2.2-4
buildversion.txt
did not correctly reflect the version 1.2.2- allow for /ad hoc/ custom environment scripts
-
1.2.2-3
- get Bash via
/usr/bin/env
- using a bash 4 feature to do the childprocess listing
- child-process killing
- get Bash via
-
1.2.2-2
- removed unused
preventJobExecution
variable - extended checks for
RODDY_SCRATCH
- add
killBackgroundJobs
to deal with processes not killed by batch-processing system - set generic temporary variables (
TMP
,TMPDIR
,TEMP
) to scratch - set {input,output}AnalysisBaseDirectory defaults
- removed unused
-
1.2.2-1
- updated dependency to Roddy 3.0 (note Roddy "2.4" is a development-only version)
-
1.2.2
- added shunit2 tests
- Remove autocheckpoint code
- Improve debugging
disableDebugOptionsForToolscript
to turn off wrapped script debugging- fixed typo that caused
2
-directory to be created in user's home - source
baseEnvironmentScript
- remove
CONFIG_FILE
references (i.e.runtimeConfig.sh
) - deal with environments that don't have LD_LIBRARY_PATH undefined when set -u is configured
-
1.2.1
- check LD_LIBRARY_PATH definition before exporting, otherwise error with set -u
-
1.2.0
defaultScratchDir
removed- error redirection into stderr
- fixed errors if
debugOptionsUseUndefinedVariableBreak
is set - write environment into extended logs
-
1.0.34
- require Roddy 2.4 (=3.0) and PluginBase 1.0.29
- "native" workflow support
- removed some older scripts not used anymore (fileStreamBuffer.sh, findOpenPort.sh, jobEpilogue.sh, jobPostEpilogue.sh, streamBuffer.sh)
- module support directly in wrapInScript.sh
- check parameter and configuration file usability
-
1.0.33
- first Github version of the plugin
- Roddy 2.3