Given one or more CTD files, CTDConverter
generates the needed wrappers to include them in workflow engines, such as Galaxy and CWL.
CTDConverter
has the following python dependencies:
- CTDopts
lxml
ruamel.yaml
- Python 3.6+
Python 2.x is no longer supported as of release 3.0.0
We recommend the use of conda
to manage all dependencies. If you're not sure what conda
is, make sure to read about the conda project.
The easiest way to get you started with CTD conversion is to create a conda
environment on which you'll install all dependencies. Using environments in conda
allows you to have parallel, independent python environments, thus avoiding conflicts between libraries.
If you haven't installed conda
, check conda-install(conda's installation guide). See this guide if you need help creating a conda
environment.
Once you've installed conda
, create an environment named ctd-converter
, like so:
$ conda create --name ctd-converter
You will now need to activate the environment by executing the following command:
$ source activate ctd-converter
Install the required dependencies as follows (the order of execution is actually important, due to transitive dependencies):
$ conda install --channel workflowconversion ctdopts
$ conda install lxml
$ conda install --channel conda-forge ruamel.yaml
$ conda install libxml2=2.9.2
lxml
depends on libxml2
. When you install lxml
you'll get the latest version of libxml2
(2.9.4) by default. You would usually want the latest version, but there is, however, a bug in validating XML files against a schema in this version of libxml2
.
If you require validation of input CTDs against a schema (which we recommend), you will need to downgrade to the latest known version of libxml2
that works, namely, 2.9.2.
You could just download dependencies manually and make them available through your PYTHONPATH
environment variable, if you're into that. To get more information about how to install python modules without using conda
, visit: https://docs.python.org/2/install/.
CTDConverter
is not a python module, rather, a series of scripts, so installing it is as easy as downloading the source code from https://github.com/genericworkflownodes/CTDConverter. Once you've installed all dependencies, downloaded CTDConverter
and activated your conda
environment, you're good to go.
The first thing that you need to tell CTDConverter
is the output format of the converted wrappers. CTDConverter
supports conversion of CTDs into Galaxy and CWL. Invoking it is as simple as follows:
$ python convert.py [FORMAT] [ADDITIONAL_PARAMETERS ...]
Here [FORMAT]
can be any of the supported formats (i.e., cwl
, galaxy
). CTDConverter
offers a series of format-specific scripts and we've designed these scripts to behave somewhat similarly. All converter scripts have the same core functionality, that is, read CTD files, parse them using CTDopts, validate against a schema, etc. Of course, each converter script might add extra functionality that is not present in other engines. Only the Galaxy converter script supports generation of a tool_conf.xml
file, for instance.
The following sections in this file describe the parameters that all converter scripts share.
Please refer to the detailed documentation for each of the converters for more information:
CTDConverter
can parse several CTDs and convert them. However, the process will be interrupted and an error code will be returned at the first encountered error (e.g., a CTD is not valid, there are missing support files, etc.).
In its simplest form, the converter takes an input CTD file and generates an output file. The following usage of CTDConverter
:
$ python convert.py [FORMAT] -i /data/sample_input.ctd -o /data/sample_output.xml
will parse /data/sample_input.ctd
and generate an appropriate converted file under /data/sample_output.xml
. The generated file can be added to your workflow engine as usual.
When converting several CTDs, the expected value for the -o
/--output
parameter is a folder. For example:
$ python convert.py [FORMAT] -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/converted-files
Will convert /data/ctds/one.ctd
into /data/converted-files/one.[EXT]
and /data/ctds/two.ctd
into /data/converted-files/two.[EXT]
. Each converter has a preferred extension, here shown as a variable ([EXT]
). Galaxy prefers xml
, while CWL prefers cwl
.
You can use wildcard expansion, as supported by most modern operating systems:
$ python convert.py [FORMAT] -i /data/ctds/*.ctd -o /data/converted-files
- Purpose: Provide input CTD file(s) to convert.
- Short/long version:
-i
/--input
- Required: yes.
- Taken values: a list of input CTD files.
Examples:
Any of the following invocations will convert /data/input_one.ctd
and /data/input_two.ctd
:
$ python convert.py [FORMAT] -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
$ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
$ python convert.py [FORMAT] --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
$ python convert.py [FORMAT] --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
The following invocation will convert /data/input.ctd
into /data/output.xml
:
$ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Let's assume that the folder /data/ctds
contains three files: input_one.ctd
, input_two.ctd
and input_three.ctd
. The following two invocations will produce the same output in the /data/wrappers
folder:
$ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/wrappers
$ python convert.py [FORMAT] -i /data/*.ctd -o /data/wrappers
- Purpose: Provide output destination for the converted wrapper files.
- Short/long version:
-o
/--output-destination
- Required: yes.
- Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder in which all converted CTDs will be written is expected.
Examples:
A single input is given, and the output will be generated into /data/output.xml
:
$ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
Several inputs are given. The output is the already existent folder, /data/wrappers
, and at the end of the operation, the files /data/wrappers/input_one.[EXT]
and /data/wrappers/input_two.[EXT]
will be generated:
$ python convert.py [FORMAT] -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
Please note that the output file name is not taken from the name of the input file, rather from the name of the tool, that is, from the name
attribute in the <tool>
element in its corresponding CTD. By convention, the name of the CTD file and the name of the tool match.
- Purpose: Some parameters present in the CTD are not to be exposed on the output files (e.g. parameters such as
--help
,--debug
that might won't make much sense to be exposed to users in a workflow management system), other parameters should be hardcoded (i.e. parameters that should not be exposed to the user but still set to a fixed value on the generated command line), and for other parameters it might be necessary to modify attributes of the input CTD or the generated output. - Short/long version:
-p
/--hardcoded-parameters
- Required: no.
- Taken values: A json file defining: exclusion, hardcoded, and modifications of parameters
Example:
$ pythonconvert.py [FORMAT] ... -p JSON_FILE
The json defines a mapping from parameter names to a list of modifications:
{
"parameter1": [MODIFICATION1, ...],
"parameter2": [MODIFICATION1, ...]
...
}
where each modification is a mapping as defined below.
If a parameter should always be set on the command line using a fixed value, i.e. the user can to choose the value, this can be done as follows:
"parameter": [{"value":"HARDCODED_VALUE"}]
In order to exclude a parameter, it will not appear in the generated tool or the generated command line, the same syntax as for hardcoding is used but a special reserved value is used:
"parameter": [{"value": "@"}]
It's possible to modify attributes of the input CTD definition of a parameter as well as attributes of the generated Galaxy XML tags.
"test": [{
"CTD:type": "text",
"XML:type": "hidden"
}],
Its possible to specify modifications to a parameter for only a subset of the tools by specifying a list of tools as follows:
"output_files": [{
"CTD:required": true,
"tools": ["OpenSwathDIAPreScoring"]
}, {
"CTD:restrictions": "txt,tsv,pep.xml,pepXML,html",
"tools": ["SpectraSTSearchAdapter"]
}]
- Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
- Short/long version:
-V
/--validation-schema
- Required: no.
- Taken values: location of the schema file (e.g., CTD.xsd).
CTDs can be validated against a schema. The master version of the schema can be found on CTDSchema.
If a schema is provided, all input CTDs will be validated against it.
NOTE: Please make sure to read the section on issues with schema validation if you require validation of CTDs against a schema.
- Purpose: Help workflow engines locate tools by providing a path.
- Short/long version:
-x
/--default-executable-path
- Required: no.
- Taken values: The default executable path of the tools in the target workflow engine.
CTDs can contain an <executablePath>
element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the appropriate sections in the output files.
The following invocation of the converter will use /opt/suite/bin
as a prefix when providing the executable path in the output files for any input CTD that lacks the <executablePath>
section:
$ python convert.py [FORMAT] -x /opt/suite/bin ...
There are two ways to bump tool versions.
- Definition of a
@GALAXY_VERSION@
token in the macros file. This can be used to bump all tools at once. Tool versions will be@TOOL_VERSION@+galaxy@GALAXY_VERSION@
. - Use the
--bump-file
parameter to specify the wrapper version of a subset of the tools in a json file that maps tool names/ids to a wrapper version. Tool version will be set to@TOOL_VERSION@+galaxyX
, whereX
is the version found in the json file or0
if not found.
In case of an update of the tool version, i.e. @TOOL_VERSION@
, in the first case @GALAXY_VERSION@
should be reset to 0 and the dictionary in the bump file should be emptied otherwise.
Rationale: the auto-generation of the tool xml files would overwrite the wrapper version when regenerated. Hence it needs to be specified externally, e.g. in the macros.xml or in the bump file.
Tests for Galaxy tools are generated with:
PATH=$(pwd)/tests/test-data/:$PATH
for i in tests/test-data/*ctd
do
b=$(basename $i .ctd)
python convert.py galaxy -i tests/test-data/$b.ctd -o tests/test-data/$b.xml -m tests/test-data/macros.xml -f tests/test-data/filetypes.txt --test-test -p tests/test-data/hardcoded_params.json --tool-version 5.0.011
done