This documentation provides a comprehensive guide to the template structure necessary for implementing Workflow objects. These objects enable users to codify pipeline steps and store metadata to track inputs, outputs, software, and description files (e.g., WDL or CWL) for each workflow.
## Workflow information #####################################
# General information for the workflow
#############################################################
# All the following fields are required
name: <string>
description: <string>
runner:
language: <language> # cwl, wdl
main: <file> # .cwl or .wdl file
child:
- <file> # .cwl or .wdl file
# All the following fields are optional and provided as example,
# can be expanded to anything accepted by the schema
# https://github.com/dbmi-bgm/cgap-portal/tree/master/src/encoded/schemas
title: <string>
software:
- <software>@<version|commit>
## Input information ########################################
# Input files and parameters
#############################################################
input:
# File argument
<file_argument_name>:
argument_type: file.<format> # bam, fastq, bwt, ...
# Parameter argument
<parameter_argument_name>:
argument_type: parameter.<type> # string, integer, float, json, boolean
## Output information #######################################
# Output files and quality controls
#############################################################
output:
# File output
<file_output_name>:
argument_type: file.<format>
secondary_files:
- <format> # bam, fastq, bwt, ...
# QC output
<qc_output_name>:
argument_type: qc.<type> # qc_type, e.g. quality_metric_vcfcheck
# none can be used as <type>
# if a qc_type is not defined
# quality_metric_generic can be used as <type>
# to use the general qc_type instead of a custom one
argument_to_be_attached_to: <file_output_name>
# All the following fields are optional and provided as example,
# can be expanded to anything accepted by the schema
html: <boolean>
json: <boolean>
table: <boolean>
zipped: <boolean>
# If the output is a zipped folder with multiple QC files,
# fields to define the target files inside the folder
html_in_zipped: <file>
tables_in_zipped:
- <file>
# Fields still used by tibanna that needs refactoring
# listing them as they are
qc_acl: <string> # e.g. private
qc_unzip_from_ec2: <boolean>
# Report output
<report_output_name>:
argument_type: report.<type> # report_type, e.g. file
All the following fields are required.
Name of the workflow, MUST BE GLOBALLY UNIQUE (ACROSS THE PORTAL OBJECTS).
Description of the workflow.
Definition of the data processing flow for the workflow. This field is used to specify the standard language and description files used to define the workflow. Several subfields need to be specified:
- language [required]: Language standard used for workflow description
- main [required]: Main description file
- child [optional]: List of supplementary description files used by main
At the moment we support two standards, Common Workflow Language (CWL) and Workflow Description Language (WDL).
Description of input files and parameters for the workflow. See :ref:`Input Definition <input_a>`.
Description of expected outputs for the workflow. See :ref:`Output Definition <output_a>`.
All the following fields are optional and provided as example. Can be expanded to anything accepted by the schema, see schemas.
Title of the workflow.
List of software used by the workflow.
Each software is specified using the name of the software and the version (either version or commit) in the format <software>@<version|commit>
.
Each software needs to match a software that has been previously defined, see :ref:`Software <software>`.
Each argument is defined by its name. Additional subfields need to be specified depending on the argument type.
Definition of the type of the argument.
For a file argument, the argument type is defined as file.<format>
, where <format>
is the format used by the file.
<format>
needs to match a file format that has been previously defined, see :ref:`File Format <file_format>`.
For a parameter argument, the argument type is defined as parameter.<type>
, where <type>
is the type of the value expected for the argument [string, integer, float, json, boolean].
Each output is defined by its name. Additional subfields need to be specified depending on the output type.
Definition of the type of the output.
For a file output, the argument type is defined as file.<format>
, where <format>
is the format used by the file.
<format>
needs to match a file format that has been previously defined, see :ref:`File Format <file_format>`.
For a report output, the argument type is defined as report.<type>
, where <type>
is the type of the report (e.g., file).
For a QC (Quality Control) output, the argument type is defined as qc.<type>
, where <type>
is a qc_type
defined in the schema, see schemas.
While custom qc_type
schemas are still supported for compatibility, we introduced a new generic type quality_metric_generic
.
We recommend to use this new type to implement QCs.
When using quality_metric_generic
as a qc_type
, it is possible to generate two different types of output: a key-value pairs JSON file and a compressed file.
The JSON file can be used to create a summary report of the quality metrics generated by the QC process.
The compressed file can be used to store the original output for the QC, including additional data or graphs.
Both the JSON file and compressed file will be attached to the file specified as target by argument_to_be_attached_to
with a QualityMetricGeneric
object.
The content of the JSON file will be patched directly on the object, while the compressed file will be made available for download via a link.
The output type can be specified by setting json: True
or zipped: True
in the the QC output definition.
Template for quality_metric_generic
:
}
"name": "Quality metric name",
"qc_values": [
{
"key": "Name of the key",
"tooltip": "Tooltip for the key",
"value": "Value for the key"
}
]
}
This field can be used for output files.
List of <format>
for secondary files associated to the output file.
Each <format>
needs to match a file format that has been previously defined, see :ref:`File Format <file_format>`.
This field can be used for output QCs.
Name of the output file the QC is calculated for.