Permalink
Newer
100644
3325 lines (2489 sloc)
103 KB
|
|
||
| 1 | # Workflow Description Language | |
|
|
||
| 2 | ||
| 3 | ## Table Of Contents | |
| 4 | ||
| 5 | <!---toc start--> | |
| 6 | ||
|
|
||
| 7 | * [Workflow Description Language](#workflow-description-language) | |
|
|
||
| 8 | * [Table Of Contents](#table-of-contents) | |
| 9 | * [Introduction](#introduction) | |
|
|
||
| 10 | * [State of the Specification](#state-of-the-specification) | |
|
|
||
| 11 | * [Language Specification](#language-specification) | |
| 12 | * [Global Grammar Rules](#global-grammar-rules) | |
|
|
||
| 13 | * [Whitespace, Strings, Identifiers, Constants](#whitespace-strings-identifiers-constants) | |
|
|
||
| 14 | * [Types](#types) | |
|
|
||
| 15 | * [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) | |
|
|
||
| 16 | * [Declarations](#declarations) | |
| 17 | * [Expressions](#expressions) | |
| 18 | * [Operator Precedence Table](#operator-precedence-table) | |
|
|
||
| 19 | * [Member Access](#member-access) | |
| 20 | * [Map and Array Indexing](#map-and-array-indexing) | |
|
|
||
| 21 | * [Pair Indexing](#pair-indexing) | |
|
|
||
| 22 | * [Function Calls](#function-calls) | |
| 23 | * [Array Literals](#array-literals) | |
| 24 | * [Map Literals](#map-literals) | |
|
|
||
| 25 | * [Pair Literals](#pair-literals) | |
|
|
||
| 26 | * [Document](#document) | |
| 27 | * [Import Statements](#import-statements) | |
| 28 | * [Task Definition](#task-definition) | |
| 29 | * [Sections](#sections) | |
| 30 | * [Command Section](#command-section) | |
|
|
||
| 31 | * [Command Parts](#command-parts) | |
| 32 | * [Command Part Options](#command-part-options) | |
| 33 | * [sep](#sep) | |
| 34 | * [true and false](#true-and-false) | |
| 35 | * [default](#default) | |
|
|
||
| 36 | * [Alternative heredoc syntax](#alternative-heredoc-syntax) | |
|
|
||
| 37 | * [Stripping Leading Whitespace](#stripping-leading-whitespace) | |
|
|
||
| 38 | * [Outputs Section](#outputs-section) | |
|
|
||
| 39 | * [String Interpolation](#string-interpolation) | |
|
|
||
| 40 | * [Runtime Section](#runtime-section) | |
| 41 | * [docker](#docker) | |
| 42 | * [memory](#memory) | |
| 43 | * [Parameter Metadata Section](#parameter-metadata-section) | |
| 44 | * [Metadata Section](#metadata-section) | |
| 45 | * [Examples](#examples) | |
| 46 | * [Example 1: Simplest Task](#example-1-simplest-task) | |
| 47 | * [Example 2: Inputs/Outputs](#example-2-inputsoutputs) | |
| 48 | * [Example 3: Runtime/Metadata](#example-3-runtimemetadata) | |
| 49 | * [Example 4: BWA mem](#example-4-bwa-mem) | |
| 50 | * [Example 5: Word Count](#example-5-word-count) | |
| 51 | * [Example 6: tmap](#example-6-tmap) | |
| 52 | * [Workflow Definition](#workflow-definition) | |
|
|
||
| 53 | * [Call Statement](#call-statement) | |
|
|
||
| 54 | * [Sub Workflows](#sub-workflows) | |
|
|
||
| 55 | * [Scatter](#scatter) | |
| 56 | * [Loops](#loops) | |
| 57 | * [Conditionals](#conditionals) | |
|
|
||
| 58 | * [Parameter Metadata](#parameter-metadata) | |
| 59 | * [Metadata](#metadata) | |
|
|
||
| 60 | * [Outputs](#outputs) | |
|
|
||
| 61 | * [Namespaces](#namespaces) | |
|
|
||
| 62 | * [Scope](#scope) | |
| 63 | * [Optional Parameters & Type Constraints](#optional-parameters--type-constraints) | |
| 64 | * [Prepending a String to an Optional Parameter](#prepending-a-string-to-an-optional-parameter) | |
| 65 | * [Scatter / Gather](#scatter--gather) | |
| 66 | * [Variable Resolution](#variable-resolution) | |
| 67 | * [Task-Level Resolution](#task-level-resolution) | |
| 68 | * [Workflow-Level Resolution](#workflow-level-resolution) | |
| 69 | * [Computing Inputs](#computing-inputs) | |
| 70 | * [Task Inputs](#task-inputs) | |
| 71 | * [Workflow Inputs](#workflow-inputs) | |
| 72 | * [Specifying Workflow Inputs in JSON](#specifying-workflow-inputs-in-json) | |
| 73 | * [Type Coercion](#type-coercion) | |
|
|
||
| 74 | * [Standard Library](#standard-library) | |
|
|
||
| 75 | * [File stdout()](#file-stdout) | |
| 76 | * [File stderr()](#file-stderr) | |
| 77 | * [Array\[String\] read_lines(String|File)](#arraystring-read_linesstringfile) | |
| 78 | * [Array\[Array\[String\]\] read_tsv(String|File)](#arrayarraystring-read_tsvstringfile) | |
| 79 | * [Map\[String, String\] read_map(String|File)](#mapstring-string-read_mapstringfile) | |
| 80 | * [Object read_object(String|File)](#object-read_objectstringfile) | |
| 81 | * [Array\[Object\] read_objects(String|File)](#arrayobject-read_objectsstringfile) | |
| 82 | * [mixed read_json(String|File)](#mixed-read_jsonstringfile) | |
| 83 | * [Int read_int(String|File)](#int-read_intstringfile) | |
| 84 | * [String read_string(String|File)](#string-read_stringstringfile) | |
| 85 | * [Float read_float(String|File)](#float-read_floatstringfile) | |
| 86 | * [Boolean read_boolean(String|File)](#boolean-read_booleanstringfile) | |
|
|
||
| 87 | * [File write_lines(Array\[String\])](#file-write_linesarraystring) | |
| 88 | * [File write_tsv(Array\[Array\[String\]\])](#file-write_tsvarrayarraystring) | |
| 89 | * [File write_map(Map\[String, String\])](#file-write_mapmapstring-string) | |
| 90 | * [File write_object(Object)](#file-write_objectobject) | |
| 91 | * [File write_objects(Array\[Object\])](#file-write_objectsarrayobject) | |
| 92 | * [File write_json(mixed)](#file-write_jsonmixed) | |
|
|
||
| 93 | * [Float size(File, \[String\])](#float-sizefile-string) | |
| 94 | * [String sub(String, String, String)](#string-substring-string-string) | |
|
|
||
| 95 | * [Array\[Int\] range(Int)](#arrayint-rangeint) | |
| 96 | * [Array\[Array\[X\]\] transpose(Array\[Array\[X\]\])](#arrayarrayx-transposearrayarrayx) | |
|
|
||
| 97 | * [Array\[Pair(X,Y)\] zip(Array\[X\], Array\[Y\])](#arraypairxy-ziparrayx-arrayy) | |
| 98 | * [Array\[Pair(X,Y)\] cross(Array\[X\], Array\[Y\])](#arraypairxy-crossarrayx-arrayy) | |
|
|
||
| 99 | * [Integer length(Array\[X\])](#integer-lengtharrayx) | |
|
|
||
| 100 | * [Array\[String\] prefix(String, Array\[X\])](#arraystring-prefixstring-arrayx) | |
|
|
||
| 101 | * [X select_first(Array\[X?\])](#x-select_firstarrayx) | |
| 102 | * [Array\[X\] select_all(Array\[X?\])](#arrayx-select_allarrayx) | |
| 103 | * [Boolean defined(X?)](#boolean-definedx) | |
|
|
||
| 104 | * [String basename(String)](#string-basenamestring) | |
|
|
||
| 105 | * [Int floor(Float), Int ceil(Float) and Int round(Float)](#int-floorfloat-int-ceilfloat-and-int-roundfloat) | |
|
|
||
| 106 | * [Data Types & Serialization](#data-types--serialization) | |
|
|
||
| 107 | * [Serialization of Task Inputs](#serialization-of-task-inputs) | |
| 108 | * [Primitive Types](#primitive-types) | |
| 109 | * [Compound Types](#compound-types) | |
| 110 | * [Array serialization](#array-serialization) | |
| 111 | * [Array serialization by expansion](#array-serialization-by-expansion) | |
| 112 | * [Array serialization using write_lines()](#array-serialization-using-write_lines) | |
| 113 | * [Array serialization using write_json()](#array-serialization-using-write_json) | |
| 114 | * [Map serialization](#map-serialization) | |
| 115 | * [Map serialization using write_map()](#map-serialization-using-write_map) | |
| 116 | * [Map serialization using write_json()](#map-serialization-using-write_json) | |
| 117 | * [Object serialization](#object-serialization) | |
| 118 | * [Object serialization using write_object()](#object-serialization-using-write_object) | |
| 119 | * [Object serialization using write_json()](#object-serialization-using-write_json) | |
| 120 | * [Array\[Object\] serialization](#arrayobject-serialization) | |
| 121 | * [Array\[Object\] serialization using write_objects()](#arrayobject-serialization-using-write_objects) | |
| 122 | * [Array\[Object\] serialization using write_json()](#arrayobject-serialization-using-write_json) | |
| 123 | * [De-serialization of Task Outputs](#de-serialization-of-task-outputs) | |
| 124 | * [Primitive Types](#primitive-types) | |
| 125 | * [Compound Types](#compound-types) | |
| 126 | * [Array deserialization](#array-deserialization) | |
| 127 | * [Array deserialization using read_lines()](#array-deserialization-using-read_lines) | |
| 128 | * [Array deserialization using read_json()](#array-deserialization-using-read_json) | |
| 129 | * [Map deserialization](#map-deserialization) | |
| 130 | * [Map deserialization using read_map()](#map-deserialization-using-read_map) | |
| 131 | * [Map deserialization using read_json()](#map-deserialization-using-read_json) | |
| 132 | * [Object deserialization](#object-deserialization) | |
| 133 | * [Object deserialization using read_object()](#object-deserialization-using-read_object) | |
| 134 | * [Array\[Object\] deserialization](#arrayobject-deserialization) | |
| 135 | * [Object deserialization using read_objects()](#object-deserialization-using-read_objects) | |
|
|
||
| 136 | ||
| 137 | <!---toc end--> | |
| 138 | ||
| 139 | ## Introduction | |
| 140 | ||
|
|
||
| 141 | WDL is meant to be a *human readable and writable* way to express tasks and workflows. The "Hello World" tool in WDL would look like this: | |
|
|
||
| 142 | ||
|
|
||
| 143 | ```wdl | |
|
|
||
| 144 | task hello { | |
|
|
||
| 145 | String pattern | |
| 146 | File in | |
| 147 | ||
|
|
||
| 148 | command { | |
|
|
||
| 149 | egrep '${pattern}' '${in}' | |
| 150 | } | |
| 151 | ||
|
|
||
| 152 | runtime { | |
| 153 | docker: "broadinstitute/my_image" | |
| 154 | } | |
| 155 | ||
|
|
||
| 156 | output { | |
| 157 | Array[String] matches = read_lines(stdout()) | |
|
|
||
| 158 | } | |
| 159 | } | |
|
|
||
| 160 | ||
| 161 | workflow wf { | |
| 162 | call hello | |
| 163 | } | |
|
|
||
| 164 | ``` | |
| 165 | ||
|
|
||
| 166 | This describes a task, called 'hello', which has two parameters (`String pattern` and `File in`). A `task` definition is a way of **encapsulating a UNIX command and environment and presenting them as functions**. Tasks have both inputs and outputs. Inputs are declared as declarations at the top of the `task` definition, while outputs are defined in the `output` section. | |
|
|
||
| 167 | ||
|
|
||
| 168 | The user must provide a value for these two parameters in order for this task to be runnable. Implementations of WDL should accept their [inputs as JSON format](#specifying-workflow-inputs-in-json). For example, the above task needs values for two parameters: `String pattern` and `File in`: | |
| 169 | ||
| 170 | |Variable |Value | | |
| 171 | |-------------------|---------| | |
| 172 | |wf.hello.pattern |^[a-z]+$ | | |
| 173 | |wf.hello.in |/file.txt| | |
|
|
||
| 174 | ||
|
|
||
| 175 | Or, in JSON format: | |
| 176 | ||
| 177 | ```json | |
| 178 | { | |
| 179 | "wf.hello.pattern": "^[a-z]+$", | |
| 180 | "wf.hello.in": "/file.txt" | |
| 181 | } | |
| 182 | ``` | |
| 183 | ||
| 184 | Running the `wf` workflow with these parameters would yield a command line from the `call hello`: | |
|
|
||
| 185 | ||
| 186 | ``` | |
| 187 | egrep '^[a-z]+$' '/file.txt' | |
| 188 | ``` | |
| 189 | ||
| 190 | A simple workflow that runs this task in parallel would look like this: | |
| 191 | ||
|
|
||
| 192 | ```wdl | |
|
|
||
| 193 | workflow example { | |
|
|
||
| 194 | Array[File] files | |
|
|
||
| 195 | scatter(path in files) { | |
| 196 | call hello {input: in=path} | |
| 197 | } | |
| 198 | } | |
| 199 | ``` | |
| 200 | ||
|
|
||
| 201 | The inputs to this workflow would be `example.files` and `example.hello.pattern`. | |
|
|
||
| 202 | ||
|
|
||
| 203 | ## State of the Specification | |
| 204 | ||
|
|
||
| 205 | **17 August 2015** | |
|
|
||
| 206 | ||
|
|
||
| 207 | * Added concept of fully-qualified-name as well as namespace identifier. | |
| 208 | * Changed task definitions to have all inputs as declarations. | |
| 209 | * Changed command parameters (`${`...`}`) to accept expressions and fewer "declarative" elements | |
| 210 | * command parameters also are required to evaluate to primitive types | |
| 211 | * Added a `output` section to workflows | |
| 212 | * Added a lot of functions to the standard library for serializing/deserializing WDL values | |
| 213 | * Specified scope, namespace, and variable resolution semantics | |
|
|
||
| 214 | ||
|
|
||
| 215 | # Language Specification | |
| 216 | ||
| 217 | ## Global Grammar Rules | |
| 218 | ||
|
|
||
| 219 | ### Whitespace, Strings, Identifiers, Constants | |
|
|
||
| 220 | ||
|
|
||
| 221 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 222 | ||
|
|
||
| 223 | These are common among many of the following sections | |
| 224 | ||
| 225 | ``` | |
| 226 | $ws = (0x20 | 0x9 | 0xD | 0xA)+ | |
| 227 | $identifier = [a-zA-Z][a-zA-Z0-9_]+ | |
|
|
||
| 228 | $string = "([^\\\"\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*" | |
| 229 | $string = '([^\\\'\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*' | |
|
|
||
| 230 | $boolean = 'true' | 'false' | |
|
|
||
| 231 | $integer = [1-9][0-9]*|0[xX][0-9a-fA-F]+|0[0-7]* | |
| 232 | $float = (([0-9]+)?\.([0-9]+)|[0-9]+\.|[0-9]+)([eE][-+]?[0-9]+)? | |
|
|
||
| 233 | ``` | |
| 234 | ||
|
|
||
| 235 | `$string` can accept the following between single or double-quotes: | |
| 236 | ||
| 237 | * Any character not in set: `\\`, `"` (or `'` for single-quoted string), `\n` | |
| 238 | * An escape sequence starting with `\\`, followed by one of the following characters: `\\`, `"`, `'`, `[nrbtfav]`, `?` | |
| 239 | * An escape sequence starting with `\\`, followed by 1 to 3 digits of value 0 through 7 inclusive. This specifies an octal escape code. | |
| 240 | * An escape sequence starting with `\\x`, followed by hexadecimal characters `0-9a-fA-F`. This specifies a hexidecimal escape code. | |
| 241 | * An escape sequence starting with `\\u` or `\\U` followed by either 4 or 8 hexadecimal characters `0-9a-fA-F`. This specifies a unicode code point | |
|
|
||
| 242 | ||
| 243 | ### Types | |
| 244 | ||
|
|
||
| 245 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 246 | ||
|
|
||
| 247 | All inputs and outputs must be typed. | |
| 248 | ||
| 249 | ``` | |
|
|
||
| 250 | $type = ($primitive_type | $array_type | $map_type | $object_type) $type_postfix_quantifier? | |
|
|
||
| 251 | $primitive_type = ('Boolean' | 'Int' | 'Float' | 'File' | 'String') | |
|
|
||
| 252 | $array_type = 'Array' '[' ($primitive_type | $object_type | $array_type) ']' | |
| 253 | $object_type = 'Object' | |
| 254 | $map_type = 'Map' '[' $primitive_type ',' ($primitive_type | $array_type | $map_type | $object_type) ']' | |
| 255 | $type_postfix_quantifier = '?' | '+' | |
|
|
||
| 256 | ``` | |
| 257 | ||
| 258 | Some examples of types: | |
| 259 | ||
|
|
||
| 260 | * `File` | |
| 261 | * `Array[File]` | |
| 262 | * `Map[String, String]` | |
| 263 | * `Object` | |
|
|
||
| 264 | ||
|
|
||
| 265 | Types can also have a `$type_postfix_quantifier` (either `?` or `+`): | |
| 266 | ||
| 267 | * `?` means that the value is optional. Any expressions that fail to evaluate because this value is missing will evaluate to the empty string. | |
| 268 | * `+` can only be applied to `Array` types, and it signifies that the array is required to have one or more values in it | |
| 269 | ||
| 270 | For more details on the `$type_postfix_quantifier`, see the section on [Optional Parameters & Type Constraints](#optional-parameters--type-constraints) | |
| 271 | ||
| 272 | For more information on type and how they are used to construct commands and define outputs of tasks, see the [Data Types & Serialization](#data-types--serialization) section. | |
| 273 | ||
| 274 | ### Fully Qualified Names & Namespaced Identifiers | |
| 275 | ||
|
|
||
| 276 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 277 | ||
|
|
||
| 278 | ``` | |
| 279 | $fully_qualified_name = $identifier ('.' $identifier)* | |
| 280 | $namespaced_identifier = $identifier ('.' $identifier)* | |
| 281 | ``` | |
| 282 | ||
| 283 | A fully qualified name is the unique identifier of any particular `call` or call input or output. For example: | |
| 284 | ||
| 285 | other.wdl | |
|
|
||
| 286 | ```wdl | |
|
|
||
| 287 | task foobar { | |
|
|
||
| 288 | File in | |
|
|
||
| 289 | command { | |
|
|
||
| 290 | sh setup.sh ${in} | |
|
|
||
| 291 | } | |
| 292 | output { | |
| 293 | File results = stdout() | |
| 294 | } | |
| 295 | } | |
| 296 | ``` | |
| 297 | ||
| 298 | main.wdl | |
|
|
||
| 299 | ```wdl | |
|
|
||
| 300 | import "other.wdl" as other | |
| 301 | ||
| 302 | task test { | |
| 303 | String my_var | |
| 304 | command { | |
| 305 | ./script ${my_var} | |
| 306 | } | |
| 307 | output { | |
| 308 | File results = stdout() | |
| 309 | } | |
| 310 | } | |
| 311 | ||
| 312 | workflow wf { | |
| 313 | Array[String] arr = ["a", "b", "c"] | |
| 314 | call test | |
| 315 | call test as test2 | |
| 316 | call other.foobar | |
| 317 | output { | |
|
|
||
| 318 | test.results | |
|
|
||
| 319 | foobar.results | |
| 320 | } | |
| 321 | scatter(x in arr) { | |
| 322 | call test as scattered_test { | |
| 323 | input: my_var=x | |
| 324 | } | |
| 325 | } | |
| 326 | } | |
| 327 | ``` | |
| 328 | ||
| 329 | The following fully-qualified names would exist within `workflow wf` in main.wdl: | |
| 330 | ||
| 331 | * `wf` - References top-level workflow | |
| 332 | * `wf.test` - References the first call to task `test` | |
| 333 | * `wf.test2` - References the second call to task `test` (aliased as test2) | |
| 334 | * `wf.test.my_var` - References the `String` input of first call to task `test` | |
| 335 | * `wf.test.results` - References the `File` output of first call to task `test` | |
| 336 | * `wf.test2.my_var` - References the `String` input of second call to task `test` | |
| 337 | * `wf.test2.results` - References the `File` output of second call to task `test` | |
| 338 | * `wf.foobar.results` - References the `File` output of the call to `other.foobar` | |
| 339 | * `wf.foobar.input` - References the `File` input of the call to `other.foobar` | |
| 340 | * `wf.arr` - References the `Array[String]` declaration on the workflow | |
| 341 | * `wf.scattered_test` - References the scattered version of `call test` | |
| 342 | * `wf.scattered_test.my_var` - References an `Array[String]` for each element used as `my_var` when running the scattered version of `call test`. | |
| 343 | * `wf.scattered_test.results` - References an `Array[File]` which are the accumulated results from scattering `call test` | |
| 344 | * `wf.scattered_test.1.results` - References an `File` from the second invocation (0-indexed) of `call test` within the scatter block. This particular invocation used value "b" for `my_var` | |
| 345 | ||
| 346 | A namespaced identifier has the same syntax as a fully-qualified name. It is interpreted as the left-hand side being the name of a namespace and then the right-hand side being the name of a workflow, task, or namespace within that namespace. Consider this workflow: | |
| 347 | ||
|
|
||
| 348 | ```wdl | |
|
|
||
| 349 | import "other.wdl" as ns | |
| 350 | workflow wf { | |
| 351 | call ns.ns2.task | |
| 352 | } | |
| 353 | ``` | |
| 354 | ||
| 355 | Here, `ns.ns2.task` is a namespace identifier (see the [Call Statement](#call-statement) section for more details). Namespace identifiers, like fully-qualified names are left-associative, which means `ns.ns2.task` is interpreted as `((ns.ns2).task)`, which means `ns.ns2` would have to resolve to a namespace so that `.task` could be applied. If `ns2` was a task definition within `ns`, then this namespaced identifier would be invalid. | |
|
|
||
| 356 | ||
| 357 | ### Declarations | |
| 358 | ||
|
|
||
| 359 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 360 | ||
|
|
||
| 361 | ``` | |
| 362 | $declaration = $type $identifier ('=' $expression)? | |
| 363 | ``` | |
| 364 | ||
|
|
||
| 365 | Declarations are declared at the top of any [scope](#scope). | |
|
|
||
| 366 | ||
|
|
||
| 367 | In a [task definition](#task-definition), declarations are interpreted as inputs to the task that are not part of the command line itself. | |
|
|
||
| 368 | ||
| 369 | If a declaration does not have an initialization, then the value is expected to be provided by the user before the workflow or task is run. | |
| 370 | ||
| 371 | Some examples of declarations: | |
| 372 | ||
|
|
||
| 373 | * `File x` | |
| 374 | * `String y = "abc"` | |
| 375 | * `Float pi = 3 + .14` | |
| 376 | * `Map[String, String] m` | |
|
|
||
| 377 | ||
|
|
||
| 378 | A declaration may also refer to elements that are outputs of tasks. For example: | |
| 379 | ||
|
|
||
| 380 | ```wdl | |
|
|
||
| 381 | task test { | |
| 382 | String var | |
| 383 | command { | |
| 384 | ./script ${var} | |
| 385 | } | |
| 386 | output { | |
| 387 | String value = read_string(stdout()) | |
| 388 | } | |
| 389 | } | |
| 390 | ||
| 391 | task test2 { | |
| 392 | Array[String] array | |
| 393 | command { | |
| 394 | ./script ${write_lines(array)} | |
| 395 | } | |
| 396 | output { | |
| 397 | Int value = read_int(stdout()) | |
| 398 | } | |
| 399 | } | |
| 400 | ||
| 401 | workflow wf { | |
| 402 | call test as x {input: var="x"} | |
| 403 | call test as y {input: var="y"} | |
| 404 | Array[String] strs = [x.value, y.value] | |
| 405 | call test2 as z {input: array=strs} | |
| 406 | } | |
| 407 | ``` | |
| 408 | ||
| 409 | `strs` in this case would not be defined until both `call test as x` and `call test as y` have successfully completed. Before that's the case, `strs` is undefined. If any of the two tasks fail, then evaluation of `strs` should return an error to indicate that the `call test2 as z` operation should be skipped. | |
| 410 | ||
|
|
||
| 411 | ### Expressions | |
| 412 | ||
|
|
||
| 413 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 414 | ||
|
|
||
| 415 | ``` | |
| 416 | $expression = '(' $expression ')' | |
| 417 | $expression = $expression '.' $expression | |
| 418 | $expression = $expression '[' $expression ']' | |
| 419 | $expression = $expression '(' ($expression (',' $expression)*)? ')' | |
| 420 | $expression = '!' $expression | |
| 421 | $expression = '+' $expression | |
| 422 | $expression = '-' $expression | |
|
|
||
| 423 | $expression = if $expression then $expression else $expression | |
|
|
||
| 424 | $expression = $expression '*' $expression | |
| 425 | $expression = $expression '%' $expression | |
| 426 | $expression = $expression '/' $expression | |
| 427 | $expression = $expression '+' $expression | |
| 428 | $expression = $expression '-' $expression | |
| 429 | $expression = $expression '<' $expression | |
| 430 | $expression = $expression '=<' $expression | |
| 431 | $expression = $expression '>' $expression | |
| 432 | $expression = $expression '>=' $expression | |
| 433 | $expression = $expression '==' $expression | |
| 434 | $expression = $expression '!=' $expression | |
| 435 | $expression = $expression '&&' $expression | |
| 436 | $expression = $expression '||' $expression | |
|
|
||
| 437 | $expression = '{' ($expression ':' $expression)* '}' | |
| 438 | $expression = '[' $expression* ']' | |
| 439 | $expression = $string | $integer | $float | $boolean | $identifier | |
|
|
||
| 440 | ``` | |
| 441 | ||
| 442 | Below are the valid results for operators on types. Any combination not in the list will result in an error. | |
| 443 | ||
|
|
||
| 444 | |LHS Type |Operators |RHS Type |Result |Semantics| | |
| 445 | |-----------|-----------|-----------------|---------|---------| | |
| 446 | |`Boolean`|`==`|`Boolean`|`Boolean`|| | |
| 447 | |`Boolean`|`!=`|`Boolean`|`Boolean`|| | |
| 448 | |`Boolean`|`>`|`Boolean`|`Boolean`|| | |
| 449 | |`Boolean`|`>=`|`Boolean`|`Boolean`|| | |
| 450 | |`Boolean`|`<`|`Boolean`|`Boolean`|| | |
| 451 | |`Boolean`|`<=`|`Boolean`|`Boolean`|| | |
| 452 | |`Boolean`|`||`|`Boolean`|`Boolean`|| | |
| 453 | |`Boolean`|`&&`|`Boolean`|`Boolean`|| | |
|
|
||
| 454 | |`File`|`+`|`File`|`File`|Append file paths| | |
|
|
||
| 455 | |`File`|`==`|`File`|`Boolean`|| | |
| 456 | |`File`|`!=`|`File`|`Boolean`|| | |
| 457 | |`File`|`+`|`String`|`File`|| | |
| 458 | |`File`|`==`|`String`|`Boolean`|| | |
| 459 | |`File`|`!=`|`String`|`Boolean`|| | |
| 460 | |`Float`|`+`|`Float`|`Float`|| | |
| 461 | |`Float`|`-`|`Float`|`Float`|| | |
| 462 | |`Float`|`*`|`Float`|`Float`|| | |
| 463 | |`Float`|`/`|`Float`|`Float`|| | |
| 464 | |`Float`|`%`|`Float`|`Float`|| | |
| 465 | |`Float`|`==`|`Float`|`Boolean`|| | |
| 466 | |`Float`|`!=`|`Float`|`Boolean`|| | |
| 467 | |`Float`|`>`|`Float`|`Boolean`|| | |
| 468 | |`Float`|`>=`|`Float`|`Boolean`|| | |
| 469 | |`Float`|`<`|`Float`|`Boolean`|| | |
| 470 | |`Float`|`<=`|`Float`|`Boolean`|| | |
| 471 | |`Float`|`+`|`Int`|`Float`|| | |
| 472 | |`Float`|`-`|`Int`|`Float`|| | |
| 473 | |`Float`|`*`|`Int`|`Float`|| | |
| 474 | |`Float`|`/`|`Int`|`Float`|| | |
| 475 | |`Float`|`%`|`Int`|`Float`|| | |
| 476 | |`Float`|`==`|`Int`|`Boolean`|| | |
| 477 | |`Float`|`!=`|`Int`|`Boolean`|| | |
| 478 | |`Float`|`>`|`Int`|`Boolean`|| | |
| 479 | |`Float`|`>=`|`Int`|`Boolean`|| | |
| 480 | |`Float`|`<`|`Int`|`Boolean`|| | |
| 481 | |`Float`|`<=`|`Int`|`Boolean`|| | |
| 482 | |`Float`|`+`|`String`|`String`|| | |
| 483 | |`Int`|`+`|`Float`|`Float`|| | |
| 484 | |`Int`|`-`|`Float`|`Float`|| | |
| 485 | |`Int`|`*`|`Float`|`Float`|| | |
| 486 | |`Int`|`/`|`Float`|`Float`|| | |
| 487 | |`Int`|`%`|`Float`|`Float`|| | |
| 488 | |`Int`|`==`|`Float`|`Boolean`|| | |
| 489 | |`Int`|`!=`|`Float`|`Boolean`|| | |
| 490 | |`Int`|`>`|`Float`|`Boolean`|| | |
| 491 | |`Int`|`>=`|`Float`|`Boolean`|| | |
| 492 | |`Int`|`<`|`Float`|`Boolean`|| | |
| 493 | |`Int`|`<=`|`Float`|`Boolean`|| | |
| 494 | |`Int`|`+`|`Int`|`Int`|| | |
| 495 | |`Int`|`-`|`Int`|`Int`|| | |
| 496 | |`Int`|`*`|`Int`|`Int`|| | |
|
|
||
| 497 | |`Int`|`/`|`Int`|`Int`|Integer division| | |
| 498 | |`Int`|`%`|`Int`|`Int`|Integer division, return remainder| | |
|
|
||
| 499 | |`Int`|`==`|`Int`|`Boolean`|| | |
| 500 | |`Int`|`!=`|`Int`|`Boolean`|| | |
| 501 | |`Int`|`>`|`Int`|`Boolean`|| | |
| 502 | |`Int`|`>=`|`Int`|`Boolean`|| | |
| 503 | |`Int`|`<`|`Int`|`Boolean`|| | |
| 504 | |`Int`|`<=`|`Int`|`Boolean`|| | |
| 505 | |`Int`|`+`|`String`|`String`|| | |
| 506 | |`String`|`+`|`Float`|`String`|| | |
| 507 | |`String`|`+`|`Int`|`String`|| | |
| 508 | |`String`|`+`|`String`|`String`|| | |
| 509 | |`String`|`==`|`String`|`Boolean`|| | |
| 510 | |`String`|`!=`|`String`|`Boolean`|| | |
| 511 | |`String`|`>`|`String`|`Boolean`|| | |
| 512 | |`String`|`>=`|`String`|`Boolean`|| | |
| 513 | |`String`|`<`|`String`|`Boolean`|| | |
| 514 | |`String`|`<=`|`String`|`Boolean`|| | |
| 515 | ||`-`|`Float`|`Float`|| | |
| 516 | ||`+`|`Float`|`Float`|| | |
| 517 | ||`-`|`Int`|`Int`|| | |
| 518 | ||`+`|`Int`|`Int`|| | |
| 519 | ||`!`|`Boolean`|`Boolean`|| | |
|
|
||
| 520 | ||
|
|
||
| 521 | #### If then else | |
| 522 | ||
| 523 | This is an operator that takes three arguments, a condition expression, an if-true expression and an if-false expression. The condition is always evaluated. If the condition is true then the if-true value is evaluated and returned. If the condition is false, the if-false expression is evaluated and returned. The return type of the if-then-else should be the same, regardless of which side is evaluated or runtime problems might occur. | |
| 524 | ||
| 525 | Examples: | |
| 526 | - Choose whether to say "good morning" or "good afternoon": | |
| 527 | ``` | |
| 528 | Boolean morning = ... | |
| 529 | String greeting = "good " + if morning then "morning" else "afternoon" | |
| 530 | ``` | |
| 531 | - Choose how much memory to use for a task: | |
| 532 | ``` | |
| 533 | Int array_length = length(array) | |
| 534 | runtime { | |
| 535 | memory: if array_length > 100 then "16GB" else "8GB" | |
| 536 | } | |
| 537 | ``` | |
| 538 | ||
| 539 | ||
|
|
||
| 540 | ### Operator Precedence Table | |
| 541 | ||
|
|
||
| 542 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 543 | ||
|
|
||
| 544 | | Precedence | Operator type | Associativity | Example | | |
| 545 | |------------|-----------------------|---------------|----------------------| | |
| 546 | | 12 | Grouping | n/a | (x) | | |
| 547 | | 11 | Member Access | left-to-right | x.y | | |
| 548 | | 10 | Index | left-to-right | x[y] | | |
| 549 | | 9 | Function Call | left-to-right | x(y,z,...) | | |
| 550 | | 8 | Logical NOT | right-to-left | !x | | |
| 551 | | | Unary Plus | right-to-left | +x | | |
| 552 | | | Unary Negation | right-to-left | -x | | |
| 553 | | 7 | Multiplication | left-to-right | x*y | | |
| 554 | | | Division | left-to-right | x/y | | |
| 555 | | | Remainder | left-to-right | x%y | | |
| 556 | | 6 | Addition | left-to-right | x+y | | |
| 557 | | | Subtraction | left-to-right | x-y | | |
| 558 | | 5 | Less Than | left-to-right | x<y | | |
| 559 | | | Less Than Or Equal | left-to-right | x<=y | | |
| 560 | | | Greater Than | left-to-right | x>y | | |
| 561 | | | Greater Than Or Equal | left-to-right | x>=y | | |
| 562 | | 4 | Equality | left-to-right | x==y | | |
| 563 | | | Inequality | left-to-right | x!=y | | |
| 564 | | 3 | Logical AND | left-to-right | x&&y | | |
| 565 | | 2 | Logical OR | left-to-right | x\|\|y | | |
| 566 | | 1 | Assignment | right-to-left | x=y | | |
| 567 | ||
|
|
||
| 568 | ### Member Access | |
|
|
||
| 569 | ||
|
|
||
| 570 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 571 | ||
|
|
||
| 572 | The syntax `x.y` refers to member access. `x` must be an object or task in a workflow. A Task can be thought of as an object where the attributes are the outputs of the task. | |
| 573 | ||
|
|
||
| 574 | ```wdl | |
|
|
||
| 575 | workflow wf { | |
|
|
||
| 576 | Object obj | |
| 577 | Object foo | |
|
|
||
| 578 | ||
|
|
||
| 579 | # This would cause a syntax error, | |
| 580 | # because foo is defined twice in the same namespace. | |
| 581 | call foo { | |
|
|
||
| 582 | input: var=obj.attr # Object attribute | |
| 583 | } | |
|
|
||
| 584 | ||
| 585 | call foo as foo2 { | |
|
|
||
| 586 | input: var=foo.out # Task output | |
|
|
||
| 587 | } | |
| 588 | } | |
| 589 | ``` | |
| 590 | ||
|
|
||
| 591 | ### Map and Array Indexing | |
|
|
||
| 592 | ||
|
|
||
| 593 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 594 | ||
|
|
||
| 595 | The syntax `x[y]` is for indexing maps and arrays. If `x` is an array, then `y` must evaluate to an integer. If `x` is a map, then `y` must evaluate to a key in that map. | |
| 596 | ||
|
|
||
| 597 | ### Pair Indexing | |
| 598 | ||
| 599 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 600 | ||
| 601 | Given a Pair `x`, the left and right elements of that type can be accessed using the syntax `x.left` and `x.right`. | |
| 602 | ||
|
|
||
| 603 | ### Function Calls | |
| 604 | ||
|
|
||
| 605 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 606 | ||
|
|
||
| 607 | Function calls, in the form of `func(p1, p2, p3, ...)`, are either [standard library functions](#standard-library) or engine-defined functions. | |
|
|
||
| 608 | ||
| 609 | In this current iteration of the spec, users cannot define their own functions. | |
|
|
||
| 610 | ||
| 611 | ### Array Literals | |
| 612 | ||
|
|
||
| 613 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 614 | ||
|
|
||
| 615 | Arrays values can be specified using Python-like syntax, as follows: | |
| 616 | ||
| 617 | ``` | |
| 618 | Array[String] a = ["a", "b", "c"] | |
| 619 | Array[Int] b = [0,1,2] | |
| 620 | ``` | |
| 621 | ||
| 622 | ### Map Literals | |
|
|
||
| 623 | ||
|
|
||
| 624 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 625 | ||
|
|
||
| 626 | Maps values can be specified using a similar Python-like sytntax: | |
| 627 | ||
| 628 | ``` | |
| 629 | Map[Int, Int] = {1: 10, 2: 11} | |
| 630 | Map[String, Int] = {"a": 1, "b": 2} | |
| 631 | ``` | |
|
|
||
| 632 | ||
|
|
||
| 633 | ### Pair Literals | |
| 634 | ||
| 635 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 636 | ||
|
|
||
| 637 | Pair values can be specified inside of a WDL using another Python-like syntax, as follows: | |
|
|
||
| 638 | ||
| 639 | ``` | |
| 640 | Pair[Int, String] twenty_threes = (23, "twenty-three") | |
| 641 | ``` | |
| 642 | ||
|
|
||
| 643 | Pair values can also be specified within the [workflow inputs JSON](https://github.com/broadinstitute/wdl/blob/develop/SPEC.md#specifying-workflow-inputs-in-json) with a `Left` and `Right` value specified using JSON style syntax. For example, given a workflow `wf_hello` and workflow-level variable `twenty_threes`, it could be declared in the workflow inputs JSON as follows: | |
| 644 | ``` | |
| 645 | { | |
| 646 | "wf_hello.twenty_threes": { "Left": 23, "Right": "twenty-three" } | |
| 647 | } | |
| 648 | ``` | |
| 649 | ||
|
|
||
| 650 | ## Document | |
| 651 | ||
|
|
||
| 652 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 653 | ||
|
|
||
| 654 | ``` | |
| 655 | $document = ($import | $task | $workflow)+ | |
| 656 | ``` | |
| 657 | ||
| 658 | `$document` is the root of the parse tree and it consists of one or more import statement, task, or workflow definition | |
| 659 | ||
| 660 | ## Import Statements | |
| 661 | ||
|
|
||
| 662 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 663 | ||
|
|
||
| 664 | A WDL file may contain import statements to include WDL code from other sources | |
|
|
||
| 665 | ||
| 666 | ``` | |
|
|
||
| 667 | $import = 'import' $ws+ $string ($ws+ 'as' $ws+ $identifier)? | |
|
|
||
| 668 | ``` | |
| 669 | ||
|
|
||
| 670 | The import statement specifies that `$string` which is to be interpted as a URI which points to a WDL file. The engine is responsible for resolving the URI and downloading the contents. The contents of the document in each URI must be WDL source code. | |
| 671 | ||
|
|
||
| 672 | Every imported WDL file requires a namespace which can be specified using an identifier (via the `as $identifier` syntax). If you do not explicitly specify a namespace identifier then the default namespace is the filename of the imported WDL, minus the .wdl extension. | |
| 673 | For all imported WDL files, the tasks and workflows imported from that file will only be accessible through that assigned [namespace](#namespaces). | |
|
|
||
| 674 | ||
|
|
||
| 675 | ```wdl | |
|
|
||
| 676 | import "http://example.com/lib/analysis_tasks" as analysis | |
|
|
||
| 677 | import "http://example.com/lib/stdlib" | |
| 678 | ||
|
|
||
| 679 | ||
| 680 | workflow wf { | |
|
|
||
| 681 | File bam_file | |
| 682 | ||
| 683 | # file_size is from "http://example.com/lib/stdlib" | |
|
|
||
| 684 | call stdlib.file_size { | |
|
|
||
| 685 | input: file=bam_file | |
| 686 | } | |
|
|
||
| 687 | call analysis.my_analysis_task { | |
|
|
||
| 688 | input: size=file_size.bytes, file=bam_file | |
|
|
||
| 689 | } | |
| 690 | } | |
| 691 | ``` | |
| 692 | ||
|
|
||
| 693 | Engines should at the very least support the following protocols for import URIs: | |
| 694 | ||
| 695 | * `http://` and `https://` | |
| 696 | * `file://` | |
| 697 | * no protocol (which should be interpreted as `file://` | |
| 698 | ||
|
|
||
| 699 | ## Task Definition | |
| 700 | ||
|
|
||
| 701 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 702 | ||
|
|
||
| 703 | A task is a declarative construct with a focus on constructing a command from a template. The command specification is interpreted in an engine specific way, though a typical case is that a command is a UNIX command line which would be run in a Docker image. | |
| 704 | ||
| 705 | Tasks also define their outputs, which is essential for building dependencies between tasks. Any other data specified in the task definition (e.g. runtime information and meta-data) is optional. | |
| 706 | ||
| 707 | ``` | |
|
|
||
| 708 | $task = 'task' $ws+ $identifier $ws* '{' $ws* $declaration* $task_sections $ws* '}' | |
|
|
||
| 709 | ``` | |
| 710 | ||
| 711 | For example, `task name { ... }`. Inside the curly braces defines the sections. | |
| 712 | ||
| 713 | ### Sections | |
| 714 | ||
|
|
||
| 715 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 716 | ||
|
|
||
| 717 | The task has one or more sections: | |
| 718 | ||
| 719 | ``` | |
|
|
||
| 720 | $task_sections = ($command | $runtime | $task_output | $parameter_meta | $meta)+ | |
|
|
||
| 721 | ``` | |
| 722 | ||
| 723 | > *Additional requirement*: Exactly one `$command` section needs to be defined, preferably as the first section. | |
| 724 | ||
| 725 | ### Command Section | |
| 726 | ||
|
|
||
| 727 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 728 | ||
|
|
||
| 729 | ``` | |
| 730 | $command = 'command' $ws* '{' (0xA | 0xD)* $command_part+ $ws+ '}' | |
| 731 | $command = 'command' $ws* '<<<' (0xA | 0xD)* $command_part+ $ws+ '>>>' | |
| 732 | ``` | |
| 733 | ||
|
|
||
| 734 | A command is a *task section* that starts with the keyword 'command', and is enclosed in curly braces or `<<<` `>>>`. The body of the command specifies the literal command line to run with placeholders (`$command_part_var`) for the parts of the command line that needs to be filled in. | |
|
|
||
| 735 | ||
|
|
||
| 736 | #### Command Parts | |
|
|
||
| 737 | ||
|
|
||
| 738 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 739 | ||
|
|
||
| 740 | ``` | |
| 741 | $command_part = $command_part_string | $command_part_var | |
| 742 | $command_part_string = ^'${'+ | |
|
|
||
| 743 | $command_part_var = '${' $var_option* $expression '}' | |
|
|
||
| 744 | ``` | |
| 745 | ||
| 746 | The parser should read characters from the command line until it reaches a `${` character sequence. This is interpreted as a literal string (`$command_part_string`). | |
| 747 | ||
| 748 | The parser should interpret any variable enclosed in `${`...`}` as a `$command_part_var`. | |
| 749 | ||
|
|
||
| 750 | The `$expression` usually references declarations at the task level. For example: | |
|
|
||
| 751 | ||
|
|
||
| 752 | ```wdl | |
|
|
||
| 753 | task test { | |
| 754 | String flags | |
| 755 | command { | |
| 756 | ps ${flags} | |
| 757 | } | |
| 758 | } | |
| 759 | ``` | |
|
|
||
| 760 | ||
|
|
||
| 761 | In this case `flags` within the `${`...`}` is an expression. The `$expression` can also be more complex, like a function call: `write_lines(some_array_value)` | |
|
|
||
| 762 | ||
|
|
||
| 763 | > **NOTE**: the `$expression` in this context can only evaluate to a primitive type (e.g. not `Array`, `Map`, or `Object`). The only exception to this rule is when `sep` is specified as one of the `$var_option` fields | |
|
|
||
| 764 | ||
|
|
||
| 765 | As another example, consider how the parser would parse the following command: | |
|
|
||
| 766 | ||
| 767 | ``` | |
|
|
||
| 768 | grep '${start}...${end}' ${input} | |
|
|
||
| 769 | ``` | |
| 770 | ||
| 771 | This command would be parsed as: | |
| 772 | ||
| 773 | * `grep '` - command_part_string | |
| 774 | * `${start}` - command_part_var | |
| 775 | * `...` - command_part_string | |
| 776 | * `${end}` - command_part_var | |
| 777 | * `' ` - command_part_string | |
|
|
||
| 778 | * `${input}` - command_part_var | |
|
|
||
| 779 | ||
|
|
||
| 780 | #### Command Part Options | |
|
|
||
| 781 | ||
|
|
||
| 782 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 783 | ||
|
|
||
| 784 | ``` | |
| 785 | $var_option = $var_option_key $ws* '=' $ws* $var_option_value | |
|
|
||
| 786 | $var_option_key = 'sep' | 'true' | 'false' | 'quote' | 'default' | |
|
|
||
| 787 | $var_option_value = $expression | |
| 788 | ``` | |
| 789 | ||
| 790 | The `$var_option` is a set of key-value pairs for any additional and less-used options that need to be set on a parameter. | |
| 791 | ||
|
|
||
| 792 | ##### sep | |
|
|
||
| 793 | ||
|
|
||
| 794 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 795 | ||
|
|
||
| 796 | 'sep' is interpreted as the separator string used to join multiple parameters together. `sep` is only valid if the expression evaluates to an `Array`. | |
| 797 | ||
| 798 | For example, if there were a declaration `Array[Int] ints = [1,2,3]`, the command `python script.py ${sep=',' numbers}` would yield the command line: | |
|
|
||
| 799 | ||
| 800 | ``` | |
| 801 | python script.py 1,2,3 | |
| 802 | ``` | |
| 803 | ||
|
|
||
| 804 | Alternatively, if the command were `python script.py ${sep=' ' numbers}` it would parse to: | |
|
|
||
| 805 | ||
| 806 | ``` | |
| 807 | python script.py 1 2 3 | |
| 808 | ``` | |
| 809 | ||
| 810 | > *Additional Requirements*: | |
| 811 | > | |
| 812 | > 1. sep MUST accept only a string as its value | |
| 813 | ||
|
|
||
| 814 | ##### true and false | |
|
|
||
| 815 | ||
|
|
||
| 816 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 817 | ||
|
|
||
| 818 | 'true' and 'false' are only used for type Boolean and they specify what the parameter returns when the Boolean is true or false, respectively. | |
|
|
||
| 819 | ||
|
|
||
| 820 | For example, `${true='--enable-foo', false='--disable-foo' Boolean yes_or_no}` would evaluate to either `--enable-foo` or `--disable-foo` based on the value of yes_or_no. | |
|
|
||
| 821 | ||
|
|
||
| 822 | If either value is left out, then it's equivalent to specifying the empty string. If the parameter is `${true='--enable-foo' Boolean yes_or_no}`, and a value of false is specified for this parameter, then the parameter will evaluate to the empty string. | |
|
|
||
| 823 | ||
| 824 | > *Additional Requirement*: | |
| 825 | > | |
| 826 | > 1. `true` and `false` values MUST be strings. | |
|
|
||
| 827 | > 2. `true` and `false` are only allowed if the type is `Boolean` | |
|
|
||
| 828 | ||
|
|
||
| 829 | ##### default | |
|
|
||
| 830 | ||
|
|
||
| 831 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 832 | ||
|
|
||
| 833 | This specifies the default value if no other value is specified for this parameter. | |
| 834 | ||
|
|
||
| 835 | ``` | |
| 836 | task default_test { | |
| 837 | String? s | |
| 838 | command { | |
| 839 | ./my_cmd ${default="foobar" s} | |
| 840 | } | |
| 841 | } | |
| 842 | ``` | |
| 843 | ||
| 844 | This task takes an optional `String` parameter and if a value is not specified, then the value of `foobar` will be used instead. | |
| 845 | ||
|
|
||
| 846 | > *Additional Requirements*: | |
| 847 | > | |
| 848 | > 1. The type of the expression must match the type of the parameter | |
|
|
||
| 849 | > 2. If 'default' is specified, the `$type_postfix_quantifier` for the variable's type MUST be `?` | |
| 850 | ||
| 851 | #### Alternative heredoc syntax | |
| 852 | ||
|
|
||
| 853 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 854 | ||
|
|
||
| 855 | Sometimes a command is sufficiently long enough or might use `{` characters that using a different set of delimiters would make it more clear. In this case, enclose the command in `<<<`...`>>>`, as follows: | |
| 856 | ||
|
|
||
| 857 | ```wdl | |
|
|
||
| 858 | task heredoc { | |
| 859 | File in | |
| 860 | ||
| 861 | command<<< | |
| 862 | python <<CODE | |
| 863 | with open("${in}") as fp: | |
| 864 | for line in fp: | |
| 865 | if not line.startswith('#'): | |
| 866 | print(line.strip()) | |
| 867 | CODE | |
| 868 | >>> | |
| 869 | } | |
| 870 | ``` | |
| 871 | ||
| 872 | Parsing of this command should be the same as the prior section describes. | |
| 873 | ||
| 874 | #### Stripping Leading Whitespace | |
| 875 | ||
|
|
||
| 876 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 877 | ||
|
|
||
| 878 | Any text inside of the `command` section, after instantiated, should have all *common leading whitespace* removed. In the `task heredoc` example in the previous section, if the user specifies a value of `/path/to/file` as the value for `File in`, then the command should be: | |
|
|
||
| 879 | ||
| 880 | ``` | |
| 881 | python <<CODE | |
| 882 | with open("/path/to/file") as fp: | |
| 883 | for line in fp: | |
| 884 | if not line.startswith('#'): | |
| 885 | print(line.strip()) | |
| 886 | CODE | |
| 887 | ``` | |
| 888 | ||
| 889 | The 2-spaces that were common to each line were removed. | |
| 890 | ||
| 891 | If the user mixes tabs and spaces, the behavior is undefined. A warning is suggested, and perhaps a convention of 4 spaces per tab. Other implementations might return an error in this case. | |
|
|
||
| 892 | ||
| 893 | ### Outputs Section | |
| 894 | ||
|
|
||
| 895 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 896 | ||
|
|
||
| 897 | The outputs section defines which of the files and values should be exported after a successful run of this tool. | |
| 898 | ||
| 899 | ``` | |
|
|
||
| 900 | $task_output = 'output' $ws* '{' ($ws* $task_output_kv $ws*)* '}' | |
| 901 | $task_output_kv = $type $identifier $ws* '=' $ws* $string | |
|
|
||
| 902 | ``` | |
| 903 | ||
| 904 | The outputs section contains typed variable definitions and a binding to the variable that they export. | |
| 905 | ||
| 906 | The left-hand side of the equality defines the type and name of the output. | |
| 907 | ||
| 908 | The right-hand side defines the path to the file that contains that variable definition. | |
| 909 | ||
| 910 | For example, if a task's output section looks like this: | |
| 911 | ||
| 912 | ``` | |
| 913 | output { | |
|
|
||
| 914 | Int threshold = read_int("threshold.txt") | |
|
|
||
| 915 | } | |
| 916 | ``` | |
| 917 | ||
| 918 | Then the task is expecting a file called "threshold.txt" in the current working directory where the task was executed. Inside of that file must be one line that contains only an integer and whitespace. See the [Data Types & Serialization](#data-types--serialization) section for more details. | |
| 919 | ||
|
|
||
| 920 | The filename strings may also contain variable definitions themselves (see the [String Interpolation](#string-interpolation) section below for more details): | |
|
|
||
| 921 | ||
| 922 | ``` | |
| 923 | output { | |
|
|
||
| 924 | Array[String] quality_scores = read_lines("${sample_id}.scores.txt") | |
|
|
||
| 925 | } | |
| 926 | ``` | |
| 927 | ||
| 928 | If this is the case, then `sample_id` is considered an input to the task. | |
| 929 | ||
|
|
||
| 930 | As with inputs, the outputs can reference previous outputs in the same block. The only requirement is that the output being referenced must be specified *before* the output which uses it. | |
|
|
||
| 931 | ||
|
|
||
| 932 | ``` | |
| 933 | output { | |
| 934 | String a = "a" | |
| 935 | String ab = a + "b" | |
| 936 | } | |
| 937 | ``` | |
| 938 | ||
| 939 | ||
| 940 | Globs can be used to define outputs which contain many files. The glob function generates an array of File outputs: | |
| 941 | ||
| 942 | ``` | |
| 943 | output { | |
| 944 | Array[File] output_bams = glob("*.bam") | |
| 945 | } | |
| 946 | ``` | |
|
|
||
| 947 | ||
|
|
||
| 948 | ### String Interpolation | |
| 949 | ||
|
|
||
| 950 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 951 | ||
|
|
||
| 952 | Within tasks, any string literal can use string interpolation to access the value of any of the task's inputs. The most obvious example of this is being able to define an output file which is named as function of its input. For example: | |
| 953 | ||
|
|
||
| 954 | ```wdl | |
|
|
||
| 955 | task example { | |
|
|
||
| 956 | String prefix | |
| 957 | File bam | |
| 958 | command { | |
| 959 | python analysis.py --prefix=${prefix} ${bam} | |
| 960 | } | |
|
|
||
| 961 | output { | |
| 962 | File analyzed = "${prefix}.out" | |
| 963 | File bam_sibling = "${bam}.suffix" | |
| 964 | } | |
| 965 | } | |
| 966 | ``` | |
| 967 | ||
| 968 | Any `${identifier}` inside of a string literal must be replaced with the value of the identifier. If prefix were specified as `foobar`, then `"${prefix}.out"` would be evaluated to `"foobar.out"`. | |
| 969 | ||
|
|
||
| 970 | ### Runtime Section | |
| 971 | ||
|
|
||
| 972 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 973 | ||
|
|
||
| 974 | ``` | |
| 975 | $runtime = 'runtime' $ws* '{' ($ws* $runtime_kv $ws*)* '}' | |
|
|
||
| 976 | $runtime_kv = $identifier $ws* '=' $ws* $expression | |
|
|
||
| 977 | ``` | |
| 978 | ||
|
|
||
| 979 | The runtime section defines key/value pairs for runtime information needed for this task. Individual backends will define which keys they will inspect so a key/value pair may or may not actually be honored depending on how the task is run. | |
|
|
||
| 980 | ||
|
|
||
| 981 | Values can be any expression and it is up to the engine to reject keys and/or values that do not make sense in that context. For example, consider the following WDL: | |
| 982 | ||
| 983 | ```wdl | |
| 984 | task test { | |
| 985 | command { | |
| 986 | python script.py | |
| 987 | } | |
| 988 | runtime { | |
| 989 | docker: ["ubuntu:latest", "broadinstitute/scala-baseimage"] | |
| 990 | } | |
| 991 | } | |
| 992 | ``` | |
| 993 | ||
| 994 | The value for the `docker` runtime attribute in this case is an array of values. The parser should accept this. Some engines might interpret it as an "either this image or that image" or could reject it outright. | |
| 995 | ||
| 996 | Since values are expressions, they can also reference variables in the task: | |
| 997 | ||
| 998 | ```wdl | |
| 999 | task test { | |
| 1000 | String ubuntu_version | |
| 1001 | ||
| 1002 | command { | |
| 1003 | python script.py | |
| 1004 | } | |
| 1005 | runtime { | |
| 1006 | docker: "ubuntu:" + ubuntu_version | |
| 1007 | } | |
| 1008 | } | |
| 1009 | ``` | |
| 1010 | ||
| 1011 | Most key/value pairs are arbitrary. However, the following keys have recommended conventions: | |
|
|
||
| 1012 | ||
| 1013 | #### docker | |
| 1014 | ||
|
|
||
| 1015 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1016 | ||
|
|
||
| 1017 | Location of a Docker image for which this task ought to be run. This can have a format like `ubuntu:latest` or `broadinstitute/scala-baseimage` in which case it should be interpreted as an image on DockerHub (i.e. it is valid to use in a `docker pull` command). | |
| 1018 | ||
| 1019 | ```wdl | |
| 1020 | task docker_test { | |
| 1021 | String arg | |
| 1022 | ||
| 1023 | command { | |
| 1024 | python process.py ${arg} | |
| 1025 | } | |
| 1026 | runtime { | |
| 1027 | docker: "ubuntu:latest" | |
| 1028 | } | |
| 1029 | } | |
| 1030 | ``` | |
|
|
||
| 1031 | ||
| 1032 | #### memory | |
| 1033 | ||
|
|
||
| 1034 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1035 | ||
|
|
||
| 1036 | Memory requirements for this task. Two kinds of values are supported for this attributes: | |
| 1037 | ||
| 1038 | * `Int` - Intepreted as bytes | |
| 1039 | * `String` - This should be a decimal value with suffixes like `B`, `KB`, `MB` or binary suffixes `KiB`, `MiB`. For example: `6.2 GB`, `5MB`, `2GiB`. | |
|
|
||
| 1040 | ||
|
|
||
| 1041 | ```wdl | |
|
|
||
| 1042 | task memory_test { | |
|
|
||
| 1043 | String arg | |
| 1044 | ||
| 1045 | command { | |
| 1046 | python process.py ${arg} | |
| 1047 | } | |
| 1048 | runtime { | |
| 1049 | memory: "2GB" | |
| 1050 | } | |
| 1051 | } | |
| 1052 | ``` | |
| 1053 | ||
|
|
||
| 1054 | ### Parameter Metadata Section | |
| 1055 | ||
|
|
||
| 1056 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1057 | ||
|
|
||
| 1058 | ``` | |
| 1059 | $parameter_meta = 'parameter_meta' $ws* '{' ($ws* $parameter_meta_kv $ws*)* '}' | |
| 1060 | $parameter_meta_kv = $identifier $ws* '=' $ws* $string | |
| 1061 | ``` | |
| 1062 | ||
| 1063 | This purely optional section contains key/value pairs where the keys are names of parameters and the values are string descriptions for those parameters. | |
| 1064 | ||
| 1065 | > *Additional requirement*: Any key in this section MUST correspond to a parameter in the command line | |
| 1066 | ||
| 1067 | ### Metadata Section | |
| 1068 | ||
|
|
||
| 1069 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1070 | ||
|
|
||
| 1071 | ``` | |
| 1072 | $meta = 'meta' $ws* '{' ($ws* $meta_kv $ws*)* '}' | |
| 1073 | $meta_kv = $identifier $ws* '=' $ws* $string | |
| 1074 | ``` | |
| 1075 | ||
| 1076 | This purely optional section contains key/value pairs for any additional meta data that should be stored with the task. For example, perhaps author or contact email. | |
| 1077 | ||
| 1078 | ### Examples | |
| 1079 | ||
| 1080 | #### Example 1: Simplest Task | |
| 1081 | ||
|
|
||
| 1082 | ```wdl | |
|
|
||
| 1083 | task hello_world { | |
| 1084 | command {echo hello world} | |
| 1085 | } | |
| 1086 | ``` | |
| 1087 | ||
| 1088 | #### Example 2: Inputs/Outputs | |
| 1089 | ||
|
|
||
| 1090 | ```wdl | |
|
|
||
| 1091 | task one_and_one { | |
|
|
||
| 1092 | String pattern | |
| 1093 | File infile | |
| 1094 | ||
|
|
||
| 1095 | command { | |
|
|
||
| 1096 | grep ${pattern} ${infile} | |
|
|
||
| 1097 | } | |
| 1098 | output { | |
|
|
||
| 1099 | File filtered = stdout() | |
|
|
||
| 1100 | } | |
| 1101 | } | |
| 1102 | ``` | |
| 1103 | ||
| 1104 | #### Example 3: Runtime/Metadata | |
| 1105 | ||
|
|
||
| 1106 | ```wdl | |
|
|
||
| 1107 | task runtime_meta { | |
|
|
||
| 1108 | String memory_mb | |
| 1109 | String sample_id | |
| 1110 | String param | |
| 1111 | String sample_id | |
| 1112 | ||
|
|
||
| 1113 | command { | |
| 1114 | java -Xmx${memory_mb}M -jar task.jar -id ${sample_id} -param ${param} -out ${sample_id}.out | |
| 1115 | } | |
| 1116 | output { | |
|
|
||
| 1117 | File results = "${sample_id}.out" | |
|
|
||
| 1118 | } | |
| 1119 | runtime { | |
| 1120 | docker: "broadinstitute/baseimg" | |
| 1121 | } | |
| 1122 | parameter_meta { | |
| 1123 | memory_mb: "Amount of memory to allocate to the JVM" | |
| 1124 | param: "Some arbitrary parameter" | |
| 1125 | sample_id: "The ID of the sample in format foo_bar_baz" | |
| 1126 | } | |
| 1127 | meta { | |
| 1128 | author: "Joe Somebody" | |
| 1129 | email: "joe@company.org" | |
| 1130 | } | |
| 1131 | } | |
| 1132 | ``` | |
| 1133 | ||
| 1134 | #### Example 4: BWA mem | |
| 1135 | ||
|
|
||
| 1136 | ```wdl | |
| 1137 | task bwa_mem_tool { | |
|
|
||
| 1138 | Int threads | |
| 1139 | Int min_seed_length | |
| 1140 | Int min_std_max_min | |
| 1141 | File reference | |
| 1142 | File reads | |
| 1143 | ||
|
|
||
| 1144 | command { | |
|
|
||
| 1145 | bwa mem -t ${threads} \ | |
| 1146 | -k ${min_seed_length} \ | |
| 1147 | -I ${sep=',' min_std_max_min+} \ | |
| 1148 | ${reference} \ | |
| 1149 | ${sep=' ' reads+} > output.sam | |
|
|
||
| 1150 | } | |
| 1151 | output { | |
|
|
||
| 1152 | File sam = "output.sam" | |
|
|
||
| 1153 | } | |
| 1154 | runtime { | |
|
|
||
| 1155 | docker: "broadinstitute/baseimg" | |
|
|
||
| 1156 | } | |
| 1157 | } | |
| 1158 | ``` | |
| 1159 | ||
|
|
||
| 1160 | Notable pieces in this example is `${sep=',' min_std_max_min+}` which specifies that min_std_max_min can be one or more integers (the `+` after the variable name indicates that it can be one or more). If an `Array[Int]` is passed into this parameter, then it's flattened by combining the elements with the separator character (`sep=','`). | |
|
|
||
| 1161 | ||
| 1162 | This task also defines that it exports one file, called 'sam', which is the stdout of the execution of bwa mem. | |
| 1163 | ||
| 1164 | The 'docker' portion of this task definition specifies which that this task must only be run on the Docker image specified. | |
| 1165 | ||
| 1166 | #### Example 5: Word Count | |
| 1167 | ||
|
|
||
| 1168 | ```wdl | |
| 1169 | task wc2_tool { | |
|
|
||
| 1170 | File file1 | |
|
|
||
| 1171 | command { | |
|
|
||
| 1172 | wc ${file1} | |
|
|
||
| 1173 | } | |
| 1174 | output { | |
|
|
||
| 1175 | Int count = read_int(stdout()) | |
|
|
||
| 1176 | } | |
| 1177 | } | |
| 1178 | ||
|
|
||
| 1179 | workflow count_lines4_wf { | |
|
|
||
| 1180 | Array[File] files | |
|
|
||
| 1181 | scatter(f in files) { | |
|
|
||
| 1182 | call wc2_tool { | |
|
|
||
| 1183 | input: file1=f | |
| 1184 | } | |
|
|
||
| 1185 | } | |
| 1186 | output { | |
|
|
||
| 1187 | wc2_tool.count | |
|
|
||
| 1188 | } | |
| 1189 | } | |
| 1190 | ``` | |
| 1191 | ||
| 1192 | In this example, it's all pretty boilerplate, declarative code, except for some language-y like features, like `firstline(stdout)` and `append(list_of_count, wc2-tool.count)`. These both can be implemented fairly easily if we allow for custom function definitions. Parsing them is no problem. Implementation would be fairly simple and new functions would not be hard to add. Alternatively, this could be something like JavaScript or Python snippets that we run. | |
| 1193 | ||
| 1194 | #### Example 6: tmap | |
| 1195 | ||
|
|
||
| 1196 | This task should produce a command line like this: | |
|
|
||
| 1197 | ||
| 1198 | ``` | |
| 1199 | tmap mapall \ | |
| 1200 | stage1 map1 --min-seq-length 20 \ | |
| 1201 | map2 --min-seq-length 20 \ | |
| 1202 | stage2 map1 --max-seq-length 20 --min-seq-length 10 --seed-length 16 \ | |
| 1203 | map2 --max-seed-hits -1 --max-seq-length 20 --min-seq-length 10 | |
| 1204 | ``` | |
| 1205 | ||
| 1206 | Task definition would look like this: | |
| 1207 | ||
|
|
||
| 1208 | ```wdl | |
| 1209 | task tmap_tool { | |
|
|
||
| 1210 | Array[String] stages | |
| 1211 | File reads | |
| 1212 | ||
|
|
||
| 1213 | command { | |
|
|
||
| 1214 | tmap mapall ${sep=' ' stages} < ${reads} > output.sam | |
|
|
||
| 1215 | } | |
| 1216 | output { | |
|
|
||
| 1217 | File sam = "output.sam" | |
|
|
||
| 1218 | } | |
| 1219 | } | |
| 1220 | ``` | |
| 1221 | ||
| 1222 | For this particular case where the command line is *itself* a mini DSL, The best option at that point is to allow the user to type in the rest of the command line, which is what `${sep=' ' stages+}` is for. This allows the user to specify an array of strings as the value for `stages` and then it concatenates them together with a space character | |
| 1223 | ||
| 1224 | |Variable|Value| | |
| 1225 | |--------|-----| | |
| 1226 | |reads |/path/to/fastq| | |
| 1227 | |stages |["stage1 map1 --min-seq-length 20 map2 --min-seq-length 20", "stage2 map1 --max-seq-length 20 --min-seq-length 10 --seed-length 16 map2 --max-seed-hits -1 --max-seq-length 20 --min-seq-length 10"]| | |
| 1228 | ||
| 1229 | ## Workflow Definition | |
| 1230 | ||
|
|
||
| 1231 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1232 | ||
|
|
||
| 1233 | ``` | |
|
|
||
| 1234 | $workflow = 'workflow' $ws* '{' $ws* $workflow_element* $ws* '}' | |
|
|
||
| 1235 | $workflow_element = $call | $loop | $conditional | $declaration | $scatter | $parameter_meta | $meta | |
|
|
||
| 1236 | ``` | |
| 1237 | ||
| 1238 | A workflow is defined as the keyword `workflow` and the body being in curly braces. | |
| 1239 | ||
|
|
||
| 1240 | An example of a workflow that runs one task (not defined here) would be: | |
| 1241 | ||
| 1242 | ```wdl | |
|
|
||
| 1243 | workflow wf { | |
|
|
||
| 1244 | Array[File] files | |
| 1245 | Int threshold | |
| 1246 | Map[String, String] my_map | |
|
|
||
| 1247 | ||
| 1248 | call analysis_job { | |
| 1249 | input: search_paths=files, threshold=threshold, gender_lookup=my_map | |
| 1250 | } | |
|
|
||
| 1251 | } | |
| 1252 | ``` | |
| 1253 | ||
|
|
||
| 1254 | ### Call Statement | |
|
|
||
| 1255 | ||
|
|
||
| 1256 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1257 | ||
|
|
||
| 1258 | ``` | |
|
|
||
| 1259 | $call = 'call' $ws* $namespaced_identifier $ws+ ('as' $identifier)? $ws* $call_body? | |
| 1260 | $call_body = '{' $ws* $inputs? $ws* '}' | |
| 1261 | $inputs = 'input' $ws* ':' $ws* $variable_mappings | |
| 1262 | $variable_mappings = $variable_mapping_kv (',' $variable_mapping_kv)* | |
| 1263 | $variable_mapping_kv = $identifier $ws* '=' $ws* $expression | |
|
|
||
| 1264 | ``` | |
| 1265 | ||
|
|
||
| 1266 | A workflow may call other tasks/workflows via the `call` keyword. The `$namespaced_identifier` is the reference to which task to run. Most commonly, it's simply the name of a task (see examples below), but it can also use `.` as a namespace resolver. | |
| 1267 | ||
| 1268 | See the section on [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for details about how the `$namespaced_identifier` ought to be interpreted | |
| 1269 | ||
|
|
||
| 1270 | All `call` statements must be uniquely identifiable. By default, the call's unique identifier is the task name (e.g. `call foo` would be referenced by name `foo`). However, if one were to `call foo` twice in a workflow, each subsequent `call` statement will need to alias itself to a unique name using the `as` clause: `call foo as bar`. | |
|
|
||
| 1271 | ||
|
|
||
| 1272 | A `call` statement may reference a workflow too (e.g. `call other_workflow`). In this case, the `$inputs` section specifies a subset of the workflow's inputs and must specify fully qualified names. | |
|
|
||
| 1273 | ||
|
|
||
| 1274 | ```wdl | |
| 1275 | import "lib.wdl" as lib | |
|
|
||
| 1276 | workflow wf { | |
| 1277 | call my_task | |
| 1278 | call my_task as my_task_alias | |
| 1279 | call my_task as my_task_alias2 { | |
| 1280 | input: threshold=2 | |
| 1281 | } | |
| 1282 | call lib.other_task | |
| 1283 | } | |
| 1284 | ``` | |
| 1285 | ||
|
|
||
| 1286 | The `$call_body` is optional and is meant to specify how to satisfy a subset of the the task or workflow's input parameters as well as a way to map tasks outputs to variables defined in the [visible scopes](#scope). | |
|
|
||
| 1287 | ||
| 1288 | A `$variable_mapping` in the `$inputs` section maps parameters in the task to expressions. These expressions usually reference outputs of other tasks, but they can be arbitrary expressions. | |
| 1289 | ||
|
|
||
| 1290 | As an example, here is a workflow in which the second task requires an output from the first task: | |
|
|
||
| 1291 | ||
|
|
||
| 1292 | ```wdl | |
|
|
||
| 1293 | task task1 { | |
|
|
||
| 1294 | command { | |
| 1295 | python do_stuff.py | |
| 1296 | } | |
| 1297 | output { | |
| 1298 | File results = stdout() | |
| 1299 | } | |
|
|
||
| 1300 | } | |
| 1301 | task task2 { | |
|
|
||
| 1302 | File foobar | |
| 1303 | command { | |
| 1304 | python do_stuff2.py ${foobar} | |
| 1305 | } | |
| 1306 | output { | |
| 1307 | File results = stdout() | |
| 1308 | } | |
|
|
||
| 1309 | } | |
| 1310 | workflow wf { | |
| 1311 | call task1 | |
|
|
||
| 1312 | call task2 { | |
| 1313 | input: foobar=task1.results | |
| 1314 | } | |
|
|
||
| 1315 | } | |
| 1316 | ``` | |
| 1317 | ||
|
|
||
| 1318 | #### Sub Workflows | |
| 1319 | ||
|
|
||
| 1320 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1321 | ||
|
|
||
| 1322 | Workflows can also be called inside of workflows. | |
| 1323 | ||
| 1324 | `main.wdl` | |
| 1325 | ``` | |
| 1326 | import "sub_wdl.wdl" as sub | |
| 1327 | ||
| 1328 | workflow main_workflow { | |
| 1329 | ||
| 1330 | call sub.wf_hello { input: wf_hello_input = "sub world" } | |
| 1331 | ||
| 1332 | output { | |
| 1333 | String main_output = wf_hello.salutation | |
| 1334 | } | |
| 1335 | } | |
| 1336 | ``` | |
| 1337 | ||
| 1338 | `sub_wdl.wdl` | |
| 1339 | ``` | |
| 1340 | task hello { | |
| 1341 | String addressee | |
| 1342 | command { | |
| 1343 | echo "Hello ${addressee}!" | |
| 1344 | } | |
| 1345 | runtime { | |
| 1346 | docker: "ubuntu:latest" | |
| 1347 | } | |
| 1348 | output { | |
| 1349 | String salutation = read_string(stdout()) | |
| 1350 | } | |
| 1351 | } | |
| 1352 | ||
| 1353 | workflow wf_hello { | |
| 1354 | String wf_hello_input | |
| 1355 | ||
| 1356 | call hello {input: addressee = wf_hello_input } | |
| 1357 | ||
| 1358 | output { | |
| 1359 | String salutation = hello.salutation | |
| 1360 | } | |
| 1361 | } | |
| 1362 | ``` | |
| 1363 | ||
| 1364 | Note that because a wdl file can only contain 1 workflow, sub workflows can only be used through imports. | |
| 1365 | Otherwise, calling a workflow or a task is equivalent syntactically. | |
| 1366 | Inputs are specified and outputs retrieved the same way as they are for task calls. | |
| 1367 | ||
|
|
||
| 1368 | ### Scatter | |
| 1369 | ||
|
|
||
| 1370 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1371 | ||
|
|
||
| 1372 | ``` | |
|
|
||
| 1373 | $scatter = 'scatter' $ws* '(' $ws* $scatter_iteration_statment $ws* ')' $ws* $scatter_body | |
| 1374 | $scatter_iteration_statment = $identifier $ws* 'in' $ws* $expression | |
| 1375 | $scatter_body = '{' $ws* $workflow_element* $ws* '}' | |
|
|
||
| 1376 | ``` | |
| 1377 | ||
|
|
||
| 1378 | A "scatter" clause defines that everything in the body (`$scatter_body`) can be run in parallel. The clause in parentheses (`$scatter_iteration_statement`) declares which collection to scatter over and what to call each element. | |
|
|
||
| 1379 | ||
|
|
||
| 1380 | The `$scatter_iteration_statement` has two parts: the "item" and the "collection". For example, `scatter(x in y)` would define `x` as the item, and `y` as the collection. The item is always an identifier, while the collection is an expression that MUST evaluate to an `Array` type. The item will represent each item in that expression. For example, if `y` evaluated to an `Array[String]` then `x` would be a `String`. | |
|
|
||
| 1381 | ||
|
|
||
| 1382 | The `$scatter_body` defines a set of scopes that will execute in the context of this scatter block. | |
| 1383 | ||
| 1384 | For example, if `$expression` is an array of integers of size 3, then the body of the scatter clause can be executed 3-times in parallel. `$identifier` would refer to each integer in the array. | |
|
|
||
| 1385 | ||
| 1386 | ``` | |
| 1387 | scatter(i in integers) { | |
| 1388 | call task1{input: num=i} | |
|
|
||
| 1389 | call task2{input: num=task1.output} | |
|
|
||
| 1390 | } | |
| 1391 | ``` | |
| 1392 | ||
| 1393 | In this example, `task2` depends on `task1`. Variable `i` has an implicit `index` attribute to make sure we can access the right output from `task1`. Since both task1 and task2 run N times where N is the length of the array `integers`, any scalar outputs of these tasks is now an array. | |
| 1394 | ||
| 1395 | ### Loops | |
| 1396 | ||
|
|
||
| 1397 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
|
|
||
| 1398 | ||
|
|
||
| 1399 | ``` | |
| 1400 | $loop = 'while' '(' $expression ')' '{' $workflow_element* '}' | |
| 1401 | ``` | |
| 1402 | ||
| 1403 | Loops are distinct from scatter clauses because the body of a while loop needs to be executed to completion before another iteration is considered for iteration. The `$expression` condition is evaluated only when the iteration count is zero or if all `$workflow_element`s in the body have completed successfully for the current iteration. | |
| 1404 | ||
| 1405 | ### Conditionals | |
| 1406 | ||
|
|
||
| 1407 | :pig2: Available in [Cromwell](https://github.com/broadinstitute/cromwell) version 24 and higher | |
|
|
||
| 1408 | ||
|
|
||
| 1409 | ``` | |
| 1410 | $conditional = 'if' '(' $expression ')' '{' $workflow_element* '}' | |
| 1411 | ``` | |
| 1412 | ||
|
|
||
| 1413 | Conditionals only execute the body if the expression evaluates to true. | |
| 1414 | ||
| 1415 | * When a call's output is referenced outside the same containing `if` it will need to be handled as an optional type. E.g. | |
| 1416 | ``` | |
| 1417 | workflow foo { | |
| 1418 | # Call 'x', producing a Boolean output: | |
| 1419 | call x | |
| 1420 | Boolean x_out = x.out | |
| 1421 | ||
| 1422 | # Call 'y', producing an Int output, in a conditional block: | |
| 1423 | if (x_out) { | |
| 1424 | call y | |
| 1425 | Int y_out = y.out | |
| 1426 | } | |
| 1427 | ||
| 1428 | # Outside the if block, we have to handle this output as optional: | |
| 1429 | Int? y_out_maybe = y.out | |
| 1430 | ||
| 1431 | # Call 'z' which takes an optional Int input: | |
| 1432 | call z { input: optional_int = y_out_maybe } | |
| 1433 | } | |
| 1434 | ``` | |
| 1435 | * Optional types can be coalesced by using the `select_all` and `select_first` array functions: | |
| 1436 | ``` | |
| 1437 | workflow foo { | |
| 1438 | Array[Int] scatter_range = [1, 2, 3, 4, 5] | |
| 1439 | scatter (i in scatter_range) { | |
| 1440 | call x { input: i = i } | |
| 1441 | if (x.validOutput) { | |
| 1442 | Int x_out = x.out | |
| 1443 | } | |
| 1444 | } | |
| 1445 | ||
| 1446 | # Because it was declared inside the scatter and the if-block, the type of x_out is different here: | |
| 1447 | Array[Int?] x_out_maybes = x_out | |
| 1448 | ||
| 1449 | # We can select only the valid elements with select_all: | |
| 1450 | Array[Int] x_out_valids = select_all(x_out_maybes) | |
| 1451 | ||
| 1452 | # Or we can select the first valid element: | |
| 1453 | Int x_out_first = select_first(x_out_maybes) | |
| 1454 | } | |
| 1455 | ``` | |
|
|
||
| 1456 | ||
|
|
||
| 1457 | ### Parameter Metadata | |
| 1458 | ||
|
|
||
| 1459 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1460 | ||
|
|
||
| 1461 | ``` | |
| 1462 | $wf_parameter_meta = 'parameter_meta' $ws* '{' ($ws* $wf_parameter_meta_kv $ws*)* '}' | |
| 1463 | $wf_parameter_meta_kv = $identifier $ws* '=' $ws* $string | |
| 1464 | ``` | |
| 1465 | ||
| 1466 | This purely optional section contains key/value pairs where the keys are names of parameters and the values are string descriptions for those parameters. | |
| 1467 | ||
| 1468 | > *Additional requirement*: Any key in this section MUST correspond to a worflow input | |
| 1469 | ||
| 1470 | As an example: | |
| 1471 | ``` | |
| 1472 | parameter_meta { | |
| 1473 | memory_mb: "Amount of memory to allocate to the JVM" | |
| 1474 | param: "Some arbitrary parameter" | |
| 1475 | sample_id: "The ID of the sample in format foo_bar_baz" | |
| 1476 | } | |
| 1477 | ``` | |
| 1478 | ||
| 1479 | ### Metadata | |
| 1480 | ||
|
|
||
| 1481 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1482 | ||
|
|
||
| 1483 | ``` | |
| 1484 | $wf_meta = 'meta' $ws* '{' ($ws* $wf_meta_kv $ws*)* '}' | |
| 1485 | $wf_meta_kv = $identifier $ws* '=' $ws* $string | |
| 1486 | ``` | |
| 1487 | ||
| 1488 | This purely optional section contains key/value pairs for any additional meta data that should be stored with the workflow. For example, perhaps author or contact email. | |
| 1489 | ||
| 1490 | As an example: | |
| 1491 | ``` | |
| 1492 | meta { | |
| 1493 | author: "Joe Somebody" | |
| 1494 | email: "joe@company.org" | |
| 1495 | } | |
| 1496 | ``` | |
| 1497 | ||
|
|
||
| 1498 | ### Outputs | |
| 1499 | ||
|
|
||
| 1500 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1501 | ||
|
|
||
| 1502 | Each `workflow` definition can specify an optional `output` section. This section lists outputs from individual `call`s that you also want to expose as outputs to the `workflow` itself. | |
| 1503 | If the `output {...}` section is omitted, then the workflow includes all outputs from all calls in its final output. | |
| 1504 | Workflow outputs follow the same syntax rules as task outputs. | |
| 1505 | They can reference call outputs, workflow inputs and previous workflow outputs. | |
| 1506 | e.g: | |
| 1507 | ||
| 1508 | ``` | |
| 1509 | task t { | |
| 1510 | command { | |
| 1511 | # do something | |
| 1512 | } | |
| 1513 | output { | |
| 1514 | String out = "out" | |
| 1515 | } | |
| 1516 | } | |
| 1517 | ||
| 1518 | workflow w { | |
| 1519 | String w_input = "some input" | |
| 1520 | ||
| 1521 | call t | |
| 1522 | call t as u | |
| 1523 | ||
| 1524 | output { | |
| 1525 | String t_out = t.out | |
| 1526 | String u_out = u.out | |
| 1527 | String input_as_output = w_input | |
| 1528 | String previous_output = u_out | |
| 1529 | } | |
| 1530 | } | |
| 1531 | ``` | |
| 1532 | ||
| 1533 | Note that they can't reference call inputs. However this can be achieved by declaring the desired call input as an output. | |
| 1534 | Expressions are allowed. | |
| 1535 | ||
| 1536 | When declaring a workflow output that points to a call inside a scatter, the aggregated call is used. | |
| 1537 | e.g: | |
| 1538 | ||
| 1539 | ``` | |
| 1540 | task t { | |
| 1541 | command { | |
| 1542 | # do something | |
| 1543 | } | |
| 1544 | output { | |
| 1545 | String out = "out" | |
| 1546 | } | |
| 1547 | } | |
| 1548 | ||
| 1549 | workflow w { | |
| 1550 | Array[Int] arr = [1, 2] | |
| 1551 | ||
| 1552 | scatter(i in arr) { | |
| 1553 | call t | |
| 1554 | } | |
| 1555 | ||
| 1556 | output { | |
| 1557 | Array[String] t_out = t.out | |
| 1558 | } | |
| 1559 | } | |
| 1560 | ``` | |
| 1561 | ||
| 1562 | `t_out` has an `Array[String]` result type, because `call t` is inside a scatter. | |
| 1563 | ||
| 1564 | *THE FOLLOWING SYNTAX IS DEPRECATED BUT IS STILL SUPPORTED TO MAINTAIN BACKWARD COMPATIBILITY* | |
|
|
||
| 1565 | ``` | |
|
|
||
| 1566 | $workflow_output = 'output' '{' ($workflow_output_fqn ($workflow_output_fqn)* '}' | |
|
|
||
| 1567 | $workflow_output_fqn = $fully_qualified_name '.*'? | |
|
|
||
| 1568 | ``` | |
| 1569 | ||
|
|
||
| 1570 | Replacing call output names with a `*` acts as a match-all wildcard. | |
|
|
||
| 1571 | ||
|
|
||
| 1572 | The output names in this section must be qualified with the call which created them, as in the example below. | |
|
|
||
| 1573 | ||
| 1574 | ``` | |
|
|
||
| 1575 | task task1 { | |
| 1576 | command { ./script } | |
| 1577 | output { File results = stdout() } | |
|
|
||
| 1578 | } | |
| 1579 | ||
|
|
||
| 1580 | task task2 { | |
| 1581 | command { ./script2 } | |
| 1582 | output { | |
| 1583 | File results = stdout() | |
| 1584 | String value = read_string("some_file") | |
|
|
||
| 1585 | } | |
| 1586 | } | |
| 1587 | ||
| 1588 | workflow wf { | |
|
|
||
| 1589 | call task1 | |
|
|
||
| 1590 | call task2 as altname | |
|
|
||
| 1591 | output { | |
|
|
||
| 1592 | task1.* | |
| 1593 | altname.value | |
|
|
||
| 1594 | } | |
| 1595 | } | |
| 1596 | ``` | |
| 1597 | ||
|
|
||
| 1598 | In this example, the fully-qualified names that would be exposed as workflow outputs would be `wf.task1.results`, `wf.altname.value`. | |
|
|
||
| 1599 | ||
|
|
||
| 1600 | # Namespaces | |
| 1601 | ||
|
|
||
| 1602 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1603 | ||
|
|
||
| 1604 | Import statements can be used to pull in tasks/workflows from other locations as well as to create namespaces. In the simplest case, an import statement adds the tasks/workflows that are imported into the specified namespace. For example: | |
|
|
||
| 1605 | ||
| 1606 | tasks.wdl | |
| 1607 | ``` | |
| 1608 | task x { | |
| 1609 | command { python script.py } | |
| 1610 | } | |
| 1611 | task y { | |
| 1612 | command { python script2.py } | |
| 1613 | } | |
| 1614 | ``` | |
| 1615 | ||
| 1616 | workflow.wdl | |
| 1617 | ``` | |
|
|
||
| 1618 | import "tasks.wdl" as pyTasks | |
|
|
||
| 1619 | ||
| 1620 | workflow wf { | |
|
|
||
| 1621 | call pyTasks.x | |
| 1622 | call pyTasks.y | |
|
|
||
| 1623 | } | |
| 1624 | ``` | |
| 1625 | ||
|
|
||
| 1626 | Tasks `x` and `y` are inside the namespace `pyTasks`, which is different from the `wf` namespace belonging to the primary workflow. However, if no namespace is specified for tasks.wdl: | |
|
|
||
| 1627 | ||
| 1628 | workflow.wdl | |
| 1629 | ``` | |
|
|
||
| 1630 | import "tasks.wdl" | |
|
|
||
| 1631 | ||
| 1632 | workflow wf { | |
|
|
||
| 1633 | call tasks.x | |
| 1634 | call tasks.y | |
|
|
||
| 1635 | } | |
| 1636 | ``` | |
| 1637 | ||
|
|
||
| 1638 | Now everything inside of `tasks.wdl` must be accessed through the default namespace `tasks`. | |
|
|
||
| 1639 | ||
|
|
||
| 1640 | Each namespace may contain namespaces, tasks, and at most one workflow. The names of the contained namespaces, tasks, and workflow need to be unique within that namespace. For example, one cannot import two workflows while they have the same namespace identifier. Additionally, a workflow and a namespace both named `foo` cannot exist inside a common namespace. Similarly there cannot be a task `foo` in a workflow also named `foo`. | |
| 1641 | However, you can import two workflows with different namespace identifiers that have identically named tasks. For example, you can import namespaces `foo` and `bar`, both of which contain a task `baz`, and you can call `foo.baz` and `bar.baz` from the same primary workflow. | |
|
|
||
| 1642 | ||
|
|
||
| 1643 | # Scope | |
|
|
||
| 1644 | ||
|
|
||
| 1645 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1646 | ||
|
|
||
| 1647 | Scopes are defined as: | |
|
|
||
| 1648 | ||
|
|
||
| 1649 | * `workflow {...}` blocks | |
| 1650 | * `call` blocks | |
| 1651 | * `while(expr) {...}` blocks | |
| 1652 | * `if(expr) {...}` blocks | |
| 1653 | * `scatter(x in y) {...}` blocks | |
|
|
||
| 1654 | ||
|
|
||
| 1655 | Inside of any scope, variables may be [declared](#declarations). The variables declared in that scope are visible to any sub-scope, recursively. For example: | |
|
|
||
| 1656 | ||
|
|
||
| 1657 | ``` | |
| 1658 | task my_task { | |
| 1659 | Int x | |
| 1660 | File f | |
| 1661 | command { | |
| 1662 | my_cmd --integer=${var} ${f} | |
| 1663 | } | |
| 1664 | } | |
|
|
||
| 1665 | ||
|
|
||
| 1666 | workflow wf { | |
| 1667 | Array[File] files | |
| 1668 | Int x = 2 | |
| 1669 | scatter(file in files) { | |
| 1670 | Int x = 3 | |
| 1671 | call my_task { | |
| 1672 | Int x = 4 | |
| 1673 | input: var=x, f=file | |
| 1674 | } | |
| 1675 | } | |
| 1676 | } | |
| 1677 | ``` | |
|
|
||
| 1678 | ||
|
|
||
| 1679 | `my_task` will use `x=4` to set the value for `var` in its command line. However, `my_task` also needs a value for `x` which is defined at the task level. Since `my_task` has two inputs (`x` and `var`), and only one of those is set in the `call my_task` declaration, the value for `my_task.x` still needs to be provided by the user when the workflow is run. | |
|
|
||
| 1680 | ||
|
|
||
| 1681 | # Optional Parameters & Type Constraints | |
|
|
||
| 1682 | ||
|
|
||
| 1683 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1684 | ||
|
|
||
| 1685 | [Types](#types) can be optionally suffixed with a `?` or `+` in certain cases. | |
| 1686 | ||
| 1687 | * `?` means that the parameter is optional. A user does not need to specify a value for the parameter in order to satisfy all the inputs to the workflow. | |
| 1688 | * `+` applies only to `Array` types and it represents a constraint that the `Array` value must containe one-or-more elements. | |
|
|
||
| 1689 | ||
| 1690 | ``` | |
|
|
||
| 1691 | task test { | |
| 1692 | Array[File] a | |
| 1693 | Array[File]+ b | |
| 1694 | Array[File]? c | |
| 1695 | #File+ d <-- can't do this, + only applies to Arrays | |
| 1696 | ||
| 1697 | command { | |
| 1698 | /bin/mycmd ${sep=" " a} | |
| 1699 | /bin/mycmd ${sep="," b} | |
| 1700 | /bin/mycmd ${write_lines(c)} | |
| 1701 | } | |
| 1702 | } | |
| 1703 | ||
| 1704 | workflow wf { | |
| 1705 | call test | |
| 1706 | } | |
| 1707 | ``` | |
| 1708 | ||
| 1709 | If you provided these values for inputs: | |
| 1710 | ||
| 1711 | |var |value| | |
| 1712 | |---------|-----| | |
| 1713 | |wf.test.a|["1", "2", "3"]| | |
| 1714 | |wf.test.b|[]| | |
| 1715 | ||
| 1716 | The workflow engine should reject this because `wf.test.b` is required to have at least one element. If we change it to: | |
| 1717 | ||
| 1718 | |var |value| | |
| 1719 | |---------|-----| | |
| 1720 | |wf.test.a|["1", "2", "3"]| | |
| 1721 | |wf.test.b|["x"]| | |
| 1722 | ||
| 1723 | This would be valid input because `wf.test.c` is not required. Given these values, the command would be instantiated as: | |
| 1724 | ||
| 1725 | ``` | |
| 1726 | /bin/mycmd 1 2 3 | |
| 1727 | /bin/mycmd x | |
| 1728 | /bin/mycmd | |
| 1729 | ``` | |
| 1730 | ||
| 1731 | If our inputs were: | |
| 1732 | ||
| 1733 | |var |value| | |
| 1734 | |---------|-----| | |
| 1735 | |wf.test.a|["1", "2", "3"]| | |
| 1736 | |wf.test.b|["x","y"]| | |
| 1737 | |wf.test.c|["a","b","c","d"]| | |
| 1738 | ||
| 1739 | Then the command would be instantiated as: | |
| 1740 | ||
| 1741 | ``` | |
| 1742 | /bin/mycmd 1 2 3 | |
| 1743 | /bin/mycmd x,y | |
| 1744 | /bin/mycmd /path/to/c.txt | |
| 1745 | ``` | |
| 1746 | ||
| 1747 | ## Prepending a String to an Optional Parameter | |
| 1748 | ||
|
|
||
| 1749 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1750 | ||
|
|
||
| 1751 | Sometimes, optional parameters need a string prefix. Consider this task: | |
| 1752 | ||
| 1753 | ```wdl | |
|
|
||
| 1754 | task test { | |
|
|
||
| 1755 | String? val | |
| 1756 | command { | |
| 1757 | python script.py --val=${val} | |
| 1758 | } | |
| 1759 | } | |
| 1760 | ``` | |
| 1761 | ||
| 1762 | Since `val` is optional, this command line can be instantiated in two ways: | |
| 1763 | ||
| 1764 | ``` | |
| 1765 | python script.py --val=foobar | |
| 1766 | ``` | |
| 1767 | ||
| 1768 | Or | |
| 1769 | ||
| 1770 | ``` | |
| 1771 | python script.py --val= | |
| 1772 | ``` | |
| 1773 | ||
| 1774 | The latter case is very likely an error case, and this `--val=` part should be left off if a value for `val` is omitted. To solve this problem, modify the expression inside the template tag as follows: | |
| 1775 | ||
| 1776 | ``` | |
| 1777 | python script.py ${"--val=" + val} | |
| 1778 | ``` | |
| 1779 | ||
| 1780 | # Scatter / Gather | |
| 1781 | ||
|
|
||
| 1782 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1783 | ||
|
|
||
| 1784 | The `scatter` block is meant to parallelize a series of identical tasks but give them slightly different inputs. The simplest example is: | |
| 1785 | ||
|
|
||
| 1786 | ```wdl | |
|
|
||
| 1787 | task inc { | |
| 1788 | Int i | |
| 1789 | ||
| 1790 | command <<< | |
| 1791 | python -c "print(${i} + 1)" | |
| 1792 | >>> | |
| 1793 | ||
| 1794 | output { | |
| 1795 | Int incremented = read_int(stdout()) | |
| 1796 | } | |
| 1797 | } | |
| 1798 | ||
| 1799 | workflow wf { | |
| 1800 | Array[Int] integers = [1,2,3,4,5] | |
| 1801 | scatter(i in integers) { | |
| 1802 | call inc{input: i=i} | |
| 1803 | } | |
| 1804 | } | |
| 1805 | ``` | |
| 1806 | ||
| 1807 | Running this workflow (which needs no inputs), would yield a value of `[2,3,4,5,6]` for `wf.inc`. While `task inc` itself returns an `Int`, when it is called inside a scatter block, that type becomes an `Array[Int]`. | |
| 1808 | ||
| 1809 | Any task that's downstream from the call to `inc` and outside the scatter block must accept an `Array[Int]`: | |
| 1810 | ||
| 1811 | ||
|
|
||
| 1812 | ```wdl | |
|
|
||
| 1813 | task inc { | |
| 1814 | Int i | |
| 1815 | ||
| 1816 | command <<< | |
| 1817 | python -c "print(${i} + 1)" | |
| 1818 | >>> | |
| 1819 | ||
| 1820 | output { | |
| 1821 | Int incremented = read_int(stdout()) | |
| 1822 | } | |
| 1823 | } | |
| 1824 | ||
| 1825 | task sum { | |
| 1826 | Array[Int] ints | |
| 1827 | ||
| 1828 | command <<< | |
| 1829 | python -c "print(${sep="+" ints})" | |
|
|
||
| 1830 | >>> | |
|
|
||
| 1831 | ||
| 1832 | output { | |
| 1833 | Int sum = read_int(stdout()) | |
| 1834 | } | |
| 1835 | } | |
| 1836 | ||
| 1837 | workflow wf { | |
| 1838 | Array[Int] integers = [1,2,3,4,5] | |
| 1839 | scatter (i in integers) { | |
| 1840 | call inc {input: i=i} | |
| 1841 | } | |
| 1842 | call sum {input: ints = inc.increment} | |
| 1843 | } | |
| 1844 | ``` | |
| 1845 | ||
| 1846 | This workflow will output a value of `20` for `wf.sum.sum`. This works because `call inc` will output an `Array[Int]` because it is in the scatter block. | |
| 1847 | ||
| 1848 | However, from inside the scope of the scatter block, the output of `call inc` is still an `Int`. So the following is valid: | |
| 1849 | ||
|
|
||
| 1850 | ```wdl | |
|
|
||
| 1851 | workflow wf { | |
| 1852 | Array[Int] integers = [1,2,3,4,5] | |
| 1853 | scatter(i in integers) { | |
| 1854 | call inc {input: i=i} | |
| 1855 | call inc as inc2 {input: i=inc.incremented} | |
| 1856 | } | |
| 1857 | call sum {input: ints = inc2.increment} | |
| 1858 | } | |
| 1859 | ``` | |
| 1860 | ||
| 1861 | In this example, `inc` and `inc2` are being called in serial where the output of one is fed to another. inc2 would output the array `[3,4,5,6,7]` | |
| 1862 | ||
| 1863 | # Variable Resolution | |
| 1864 | ||
|
|
||
| 1865 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1866 | ||
|
|
||
| 1867 | Inside of [expressions](#expressions), variables are resolved differently depending on if the expression is in a `task` declaration or a `workflow` declaration | |
| 1868 | ||
| 1869 | ## Task-Level Resolution | |
| 1870 | ||
|
|
||
| 1871 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1872 | ||
|
|
||
| 1873 | Inside a task, resolution is trivial: The variable referenced MUST be a [declaration](#declarations) of the task. For example: | |
| 1874 | ||
|
|
||
| 1875 | ```wdl | |
|
|
||
| 1876 | task my_task { | |
| 1877 | Array[String] strings | |
| 1878 | command { | |
| 1879 | python analyze.py --strings-file=${write_lines(strings)} | |
| 1880 | } | |
| 1881 | } | |
| 1882 | ``` | |
| 1883 | ||
| 1884 | Inside of this task, there exists only one expression: `write_lines(strings)`. In here, when the expression evaluator tries to resolve `strings`, which must be a declaration of the task (in this case it is). | |
| 1885 | ||
| 1886 | ## Workflow-Level Resolution | |
| 1887 | ||
|
|
||
| 1888 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1889 | ||
|
|
||
| 1890 | In a workflow, resolution works by traversing the scope heirarchy starting from expression that references the variable. | |
| 1891 | ||
|
|
||
| 1892 | ```wdl | |
|
|
||
| 1893 | workflow wf { | |
| 1894 | String s = "wf_s" | |
|
|
||
| 1895 | String t = "t" | |
|
|
||
| 1896 | call my_task { | |
| 1897 | String s = "my_task_s" | |
| 1898 | input: in0 = s+"-suffix", in1 = t+"-suffix" | |
| 1899 | } | |
| 1900 | } | |
| 1901 | ``` | |
| 1902 | ||
| 1903 | In this example, there are two expressions: `s+"-suffix"` and `t+"-suffix"`. `s` is resolved as `"my_task_s"` and `t` is resolved as `"t"`. | |
| 1904 | ||
| 1905 | # Computing Inputs | |
| 1906 | ||
|
|
||
| 1907 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1908 | ||
|
|
||
| 1909 | Both tasks and workflows have a typed inputs that must be satisfied in order to run. The following sections describe how to compute inputs for `task` and `workflow` declarations | |
| 1910 | ||
| 1911 | ## Task Inputs | |
| 1912 | ||
|
|
||
| 1913 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1914 | ||
|
|
||
| 1915 | Tasks define all their inputs as declarations at the top of the task definition. | |
|
|
||
| 1916 | ||
|
|
||
| 1917 | ```wdl | |
|
|
||
| 1918 | task test { | |
| 1919 | String s | |
| 1920 | Int i | |
| 1921 | Float f | |
| 1922 | ||
| 1923 | command { | |
| 1924 | ./script.sh -i ${i} -f ${f} | |
| 1925 | } | |
| 1926 | } | |
| 1927 | ``` | |
| 1928 | ||
| 1929 | In this example, `s`, `i`, and `f` are inputs to this task. Even though the command line does not reference `${s}`. Implementations of WDL engines may display a warning or report an error in this case, since `s` isn't used. | |
| 1930 | ||
| 1931 | ## Workflow Inputs | |
| 1932 | ||
|
|
||
| 1933 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 1934 | ||
|
|
||
| 1935 | Workflows have declarations, like tasks, but a workflow must also account for all calls to sub-tasks when determining inputs. | |
| 1936 | ||
| 1937 | Workflows also return their inputs as fully qualified names. Tasks only return the names of the variables as inputs (as they're guaranteed to be unique within a task). However, since workflows can call the same task twice, names might collide. The general algorithm for computing inputs going something like this: | |
| 1938 | ||
| 1939 | * Take all inputs to all `call` statements in the workflow | |
| 1940 | * Subtract out all inputs that are satisfied through the `input: ` section | |
| 1941 | * Add in all declarations which don't have a static value defined | |
| 1942 | ||
| 1943 | Consider the following workflow: | |
| 1944 | ||
|
|
||
| 1945 | ```wdl | |
|
|
||
| 1946 | task t1 { | |
| 1947 | String s | |
| 1948 | Int x | |
| 1949 | ||
| 1950 | command { | |
| 1951 | ./script --action=${s} -x${x} | |
| 1952 | } | |
| 1953 | output { | |
| 1954 | Int count = read_int(stdout()) | |
| 1955 | } | |
| 1956 | } | |
| 1957 | ||
| 1958 | task t2 { | |
| 1959 | String s | |
| 1960 | Int t | |
| 1961 | Int x | |
| 1962 | ||
| 1963 | command { | |
| 1964 | ./script2 --action=${s} -x${x} --other=${t} | |
| 1965 | } | |
| 1966 | output { | |
| 1967 | Int count = read_int(stdout()) | |
| 1968 | } | |
| 1969 | } | |
| 1970 | ||
| 1971 | task t3 { | |
| 1972 | Int y | |
| 1973 | File ref_file # Do nothing with this | |
| 1974 | ||
| 1975 | command { | |
| 1976 | python -c "print(${y} + 1)" | |
| 1977 | } | |
| 1978 | output { | |
| 1979 | Int incr = read_int(stdout()) | |
| 1980 | } | |
| 1981 | } | |
| 1982 | ||
| 1983 | workflow wf { | |
| 1984 | Int int_val | |
| 1985 | Int int_val2 = 10 | |
| 1986 | Array[Int] my_ints | |
| 1987 | File ref_file | |
| 1988 | ||
| 1989 | call t1 { | |
| 1990 | input: x=int_val | |
| 1991 | } | |
| 1992 | call t2 { | |
| 1993 | input: x=int_val, t=t1.count | |
| 1994 | } | |
| 1995 | scatter(i in my_ints) { | |
| 1996 | call t3 { | |
| 1997 | input: y=i, ref=ref_file | |
| 1998 | } | |
| 1999 | } | |
| 2000 | } | |
| 2001 | ``` | |
| 2002 | ||
| 2003 | The inputs to `wf` would be: | |
| 2004 | ||
| 2005 | * `wf.t1.s` as a `String` | |
| 2006 | * `wf.t2.s` as a `String` | |
| 2007 | * `wf.int_val` as an `Int` | |
| 2008 | * `wf.my_ints` as an `Array[Int]` | |
| 2009 | * `wf.ref_file` as a `File` | |
| 2010 | ||
| 2011 | ## Specifying Workflow Inputs in JSON | |
| 2012 | ||
|
|
||
| 2013 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2014 | ||
|
|
||
| 2015 | Once workflow inputs are computed (see previous section), the value for each of the fully-qualified names needs to be specified per invocation of the workflow. Workflow inputs are specified in JSON or YAML format. In JSON, the inputs to the workflow in the previous section can be: | |
| 2016 | ||
| 2017 | ``` | |
| 2018 | { | |
| 2019 | "wf.t1.s": "some_string", | |
| 2020 | "wf.t2.s": "some_string", | |
| 2021 | "wf.int_val": 3, | |
| 2022 | "wf.my_ints": [5,6,7,8], | |
| 2023 | "wf.ref_file": "/path/to/file.txt" | |
| 2024 | } | |
| 2025 | ``` | |
| 2026 | ||
| 2027 | It's important to note that the type in JSON must be coercable to the WDL type. For example `wf.int_val` expects an integer, but if we specified it in JSON as `"wf.int_val": "3"`, this coercion from string to integer is not valid and would result in a type error. See the section on [Type Coercion](#type-coercion) for more details. | |
| 2028 | ||
| 2029 | # Type Coercion | |
| 2030 | ||
|
|
||
| 2031 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2032 | ||
|
|
||
| 2033 | WDL values can be created from either JSON values or from native language values. The below table references String-like, Integer-like, etc to refer to values in a particular programming language. For example, "String-like" could mean a `java.io.String` in the Java context or a `str` in Python. An "Array-like" could refer to a `Seq` in Scala or a `list` in Python. | |
| 2034 | ||
| 2035 | |WDL Type |Can Accept |Notes / Constraints| | |
| 2036 | |---------|-------------|-------------------| | |
| 2037 | |`String` |JSON String|| | |
| 2038 | | |String-like|| | |
| 2039 | | |`String`|Identity coercion| | |
| 2040 | | |`File`|| | |
| 2041 | |`File` |JSON String|Interpreted as a file path| | |
| 2042 | | |String-like|Interpreted as file path| | |
| 2043 | | |`String`|Interpreted as file path| | |
| 2044 | | |`File`|Identity Coercion| | |
| 2045 | |`Int` |JSON Number|Use floor of the value for non-integers| | |
| 2046 | | |Integer-like|| | |
| 2047 | | |`Int`|Identity coercion| | |
| 2048 | |`Float` |JSON Number|| | |
| 2049 | | |Float-like|| | |
| 2050 | | |`Float`|Identity coercion| | |
| 2051 | |`Boolean`|JSON Boolean|| | |
| 2052 | | |Boolean-like|| | |
| 2053 | | |`Boolean`|Identity coercion| | |
| 2054 | |`Array[T]`|JSON Array|Elements must be coercable to `T`| | |
| 2055 | | |Array-like|Elements must be coercable to `T`| | |
| 2056 | |`Map[K, V]`|JSON Object|keys and values must be coercable to `K` and `V`, respectively| | |
| 2057 | | |Map-like|keys and values must be coercable to `K` and `V`, respectively| | |
|
|
||
| 2058 | ||
| 2059 | # Standard Library | |
| 2060 | ||
|
|
||
| 2061 | ## File stdout() | |
|
|
||
| 2062 | ||
|
|
||
| 2063 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2064 | ||
|
|
||
| 2065 | Returns a `File` reference to the stdout that this task generated. | |
|
|
||
| 2066 | ||
|
|
||
| 2067 | ## File stderr() | |
|
|
||
| 2068 | ||
|
|
||
| 2069 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2070 | ||
|
|
||
| 2071 | Returns a `File` reference to the stderr that this task generated. | |
|
|
||
| 2072 | ||
|
|
||
| 2073 | ## Array[String] read_lines(String|File) | |
|
|
||
| 2074 | ||
|
|
||
| 2075 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2076 | ||
|
|
||
| 2077 | Given a file-like object (`String`, `File`) as a parameter, this will read each line as a string and return an `Array[String]` representation of the lines in the file. | |
|
|
||
| 2078 | ||
| 2079 | The order of the lines in the returned `Array[String]` must be the order in which the lines appear in the file-like object. | |
| 2080 | ||
| 2081 | This task would `grep` through a file and return all strings that matched the pattern: | |
| 2082 | ||
|
|
||
| 2083 | ```wdl | |
|
|
||
| 2084 | task do_stuff { | |
| 2085 | String pattern | |
|
|
||
| 2086 | File file | |
|
|
||
| 2087 | command { | |
|
|
||
| 2088 | grep '${pattern}' ${file} | |
|
|
||
| 2089 | } | |
| 2090 | output { | |
| 2091 | Array[String] matches = read_lines(stdout()) | |
| 2092 | } | |
| 2093 | } | |
| 2094 | ``` | |
| 2095 | ||
|
|
||
| 2096 | ## Array[Array[String]] read_tsv(String|File) | |
|
|
||
| 2097 | ||
|
|
||
| 2098 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2099 | ||
|
|
||
| 2100 | the `read_tsv()` function takes one parameter, which is a file-like object (`String`, `File`) and returns an `Array[Array[String]]` representing the table from the TSV file. | |
|
|
||
| 2101 | ||
|
|
||
| 2102 | If the parameter is a `String`, this is assumed to be a local file path relative to the current working directory of the task. | |
|
|
||
| 2103 | ||
| 2104 | For example, if I write a task that outputs a file to `./results/file_list.tsv`, and my task is defined as: | |
| 2105 | ||
|
|
||
| 2106 | ```wdl | |
|
|
||
| 2107 | task do_stuff { | |
|
|
||
| 2108 | File file | |
|
|
||
| 2109 | command { | |
|
|
||
| 2110 | python do_stuff.py ${file} | |
|
|
||
| 2111 | } | |
| 2112 | output { | |
|
|
||
| 2113 | Array[Array[String]] output_table = read_tsv("./results/file_list.tsv") | |
|
|
||
| 2114 | } | |
| 2115 | } | |
| 2116 | ``` | |
| 2117 | ||
|
|
||
| 2118 | Then when the task finishes, to fulfull the `outputs_table` variable, `./results/file_list.tsv` must be a valid TSV file or an error will be reported. | |
| 2119 | ||
|
|
||
| 2120 | ## Map[String, String] read_map(String|File) | |
|
|
||
| 2121 | ||
|
|
||
| 2122 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2123 | ||
|
|
||
| 2124 | Given a file-like object (`String`, `File`) as a parameter, this will read each line from a file and expect the line to have the format `col1\tcol2`. In other words, the file-like object must be a two-column TSV file. | |
|
|
||
| 2125 | ||
|
|
||
| 2126 | This task would `grep` through a file and return all strings that matched the pattern: | |
| 2127 | ||
| 2128 | The following task would write a two-column TSV to standard out and that would be interpreted as a `Map[String, String]`: | |
| 2129 | ||
|
|
||
| 2130 | ```wdl | |
|
|
||
| 2131 | task do_stuff { | |
|
|
||
| 2132 | String flags | |
|
|
||
| 2133 | File file | |
|
|
||
| 2134 | command { | |
|
|
||
| 2135 | ./script --flags=${flags} ${file} | |
|
|
||
| 2136 | } | |
| 2137 | output { | |
| 2138 | Map[String, String] mapping = read_map(stdout()) | |
| 2139 | } | |
| 2140 | } | |
| 2141 | ``` | |
| 2142 | ||
|
|
||
| 2143 | ## Object read_object(String|File) | |
|
|
||
| 2144 | ||
|
|
||
| 2145 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2146 | ||
|
|
||
| 2147 | Given a file-like object that contains a 2-row and n-column TSV file, this function will turn that into an Object. | |
| 2148 | ||
|
|
||
| 2149 | ```wdl | |
|
|
||
| 2150 | task test { | |
| 2151 | command <<< | |
| 2152 | python <<CODE | |
| 2153 | print('\t'.join(["key_{}".format(i) for i in range(3)])) | |
| 2154 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 2155 | CODE | |
| 2156 | >>> | |
| 2157 | output { | |
| 2158 | Object my_obj = read_object(stdout()) | |
| 2159 | } | |
| 2160 | } | |
| 2161 | ``` | |
| 2162 | ||
| 2163 | The command will output to stdout the following: | |
| 2164 | ||
| 2165 | ``` | |
| 2166 | key_1\tkey_2\tkey_3 | |
| 2167 | value_1\tvalue_2\tvalue_3 | |
| 2168 | ``` | |
| 2169 | ||
| 2170 | Which would be turned into an `Object` in WDL that would look like this: | |
| 2171 | ||
| 2172 | |Attribute|Value| | |
| 2173 | |---------|-----| | |
| 2174 | |key_1 |"value_1"| | |
| 2175 | |key_2 |"value_2"| | |
| 2176 | |key_3 |"value_3"| | |
| 2177 | ||
|
|
||
| 2178 | ## Array[Object] read_objects(String|File) | |
|
|
||
| 2179 | ||
|
|
||
| 2180 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2181 | ||
|
|
||
| 2182 | Given a file-like object that contains a 2-row and n-column TSV file, this function will turn that into an Object. | |
| 2183 | ||
|
|
||
| 2184 | ```wdl | |
|
|
||
| 2185 | task test { | |
| 2186 | command <<< | |
| 2187 | python <<CODE | |
| 2188 | print('\t'.join(["key_{}".format(i) for i in range(3)])) | |
| 2189 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 2190 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 2191 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 2192 | CODE | |
| 2193 | >>> | |
| 2194 | output { | |
| 2195 | Array[Object] my_obj = read_objects(stdout()) | |
| 2196 | } | |
| 2197 | } | |
| 2198 | ``` | |
| 2199 | ||
| 2200 | The command will output to stdout the following: | |
| 2201 | ||
| 2202 | ``` | |
| 2203 | key_1\tkey_2\tkey_3 | |
| 2204 | value_1\tvalue_2\tvalue_3 | |
| 2205 | value_1\tvalue_2\tvalue_3 | |
| 2206 | value_1\tvalue_2\tvalue_3 | |
| 2207 | ``` | |
| 2208 | ||
| 2209 | Which would be turned into an `Array[Object]` in WDL that would look like this: | |
| 2210 | ||
| 2211 | |Index|Attribute|Value| | |
| 2212 | |-----|---------|-----| | |
| 2213 | |0 |key_1 |"value_1"| | |
| 2214 | | |key_2 |"value_2"| | |
| 2215 | | |key_3 |"value_3"| | |
| 2216 | |1 |key_1 |"value_1"| | |
| 2217 | | |key_2 |"value_2"| | |
| 2218 | | |key_3 |"value_3"| | |
| 2219 | |2 |key_1 |"value_1"| | |
| 2220 | | |key_2 |"value_2"| | |
| 2221 | | |key_3 |"value_3"| | |
| 2222 | ||
|
|
||
| 2223 | ## mixed read_json(String|File) | |
|
|
||
| 2224 | ||
|
|
||
| 2225 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 2226 | ||
|
|
||
| 2227 | the `read_json()` function takes one parameter, which is a file-like object (`String`, `File`) and returns a data type which matches the data structure in the JSON file. The mapping of JSON type to WDL type is: | |
|
|
||
| 2228 | ||
| 2229 | |JSON Type|WDL Type| | |
| 2230 | |---------|--------| | |
| 2231 | |object|`Map[String, ?]`| | |
| 2232 | |array|`Array[?]`| | |
|
|
||
| 2233 | |number|`Int` or `Float`| | |
|
|
||
| 2234 | |string|`String`| | |
| 2235 | |boolean|`Boolean`| | |
| 2236 | |null|???| | |
| 2237 | ||
| 2238 | If the parameter is a `String`, this is assumed to be a local file path relative to the current working directory of the task. | |
| 2239 | ||
| 2240 | For example, if I write a task that outputs a file to `./results/file_list.json`, and my task is defined as: | |
| 2241 | ||
|
|
||
| 2242 | ```wdl | |
|
|
||
| 2243 | task do_stuff { | |
|
|
||
| 2244 | File file | |
|
|
||
| 2245 | command { | |
|
|
||
| 2246 | python do_stuff.py ${file} | |
|
|
||
| 2247 | } | |
| 2248 | output { | |
| 2249 | Map[String, String] output_table = read_json("./results/file_list.json") | |
| 2250 | } | |
| 2251 | } | |
| 2252 | ``` | |
| 2253 | ||
|
|
||
| 2254 | Then when the task finishes, to fulfull the `output_table` variable, `./results/file_list.json` must be a valid TSV file or an error will be reported. | |
| 2255 | ||
|
|
||
| 2256 | ## Int read_int(String|File) | |
|
|
||
| 2257 | ||
|
|
||
| 2258 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2259 | ||
|
|
||
| 2260 | The `read_int()` function takes a file path which is expected to contain 1 line with 1 integer on it. This function returns that integer. | |
| 2261 | ||
|
|
||
| 2262 | ## String read_string(String|File) | |
|
|
||
| 2263 | ||
|
|
||
| 2264 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2265 | ||
|
|
||
| 2266 | The `read_string()` function takes a file path which is expected to contain 1 line with 1 string on it. This function returns that string. | |
| 2267 | ||
| 2268 | No trailing newline characters should be included | |
| 2269 | ||
|
|
||
| 2270 | ## Float read_float(String|File) | |
|
|
||
| 2271 | ||
|
|
||
| 2272 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2273 | ||
|
|
||
| 2274 | The `read_float()` function takes a file path which is expected to contain 1 line with 1 floating point number on it. This function returns that float. | |
| 2275 | ||
|
|
||
| 2276 | ## Boolean read_boolean(String|File) | |
|
|
||
| 2277 | ||
|
|
||
| 2278 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2279 | ||
|
|
||
| 2280 | The `read_boolean()` function takes a file path which is expected to contain 1 line with 1 Boolean value (either "true" or "false" on it). This function returns that Boolean value. | |
| 2281 | ||
| 2282 | ## File write_lines(Array[String]) | |
| 2283 | ||
|
|
||
| 2284 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2285 | ||
|
|
||
| 2286 | Given something that's compatible with `Array[String]`, this writes each element to it's own line on a file. with newline `\n` characters as line separators. | |
| 2287 | ||
|
|
||
| 2288 | ```wdl | |
|
|
||
| 2289 | task example { | |
|
|
||
| 2290 | Array[String] array = ["first", "second", "third"] | |
|
|
||
| 2291 | command { | |
| 2292 | ./script --file-list=${write_lines(array)} | |
| 2293 | } | |
| 2294 | } | |
| 2295 | ``` | |
| 2296 | ||
| 2297 | If this task were run, the command might look like: | |
| 2298 | ||
| 2299 | ``` | |
| 2300 | ./script --file-list=/local/fs/tmp/array.txt | |
| 2301 | ``` | |
| 2302 | ||
| 2303 | And `/local/fs/tmp/array.txt` would contain: | |
| 2304 | ||
| 2305 | ``` | |
| 2306 | first | |
| 2307 | second | |
| 2308 | third | |
| 2309 | ``` | |
| 2310 | ||
| 2311 | ## File write_tsv(Array[Array[String]]) | |
| 2312 | ||
|
|
||
| 2313 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2314 | ||
|
|
||
| 2315 | Given something that's compatible with `Array[Array[String]]`, this writes a TSV file of the data structure. | |
| 2316 | ||
|
|
||
| 2317 | ```wdl | |
|
|
||
| 2318 | task example { | |
|
|
||
| 2319 | Array[String] array = [["one", "two", "three"], ["un", "deux", "trois"]] | |
|
|
||
| 2320 | command { | |
| 2321 | ./script --tsv=${write_tsv(array)} | |
| 2322 | } | |
| 2323 | } | |
| 2324 | ``` | |
| 2325 | ||
| 2326 | If this task were run, the command might look like: | |
| 2327 | ||
| 2328 | ``` | |
| 2329 | ./script --tsv=/local/fs/tmp/array.tsv | |
| 2330 | ``` | |
| 2331 | ||
| 2332 | And `/local/fs/tmp/array.tsv` would contain: | |
| 2333 | ||
| 2334 | ``` | |
| 2335 | one\ttwo\tthree | |
| 2336 | un\tdeux\ttrois | |
| 2337 | ``` | |
| 2338 | ||
| 2339 | ## File write_map(Map[String, String]) | |
| 2340 | ||
|
|
||
| 2341 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2342 | ||
|
|
||
| 2343 | Given something that's compatible with `Map[String, String]`, this writes a TSV file of the data structure. | |
| 2344 | ||
|
|
||
| 2345 | ```wdl | |
|
|
||
| 2346 | task example { | |
| 2347 | Map[String, String] map = {"key1": "value1", "key2": "value2"} | |
| 2348 | command { | |
| 2349 | ./script --map=${write_map(map)} | |
| 2350 | } | |
| 2351 | } | |
| 2352 | ``` | |
| 2353 | ||
| 2354 | If this task were run, the command might look like: | |
| 2355 | ||
| 2356 | ``` | |
| 2357 | ./script --tsv=/local/fs/tmp/map.tsv | |
| 2358 | ``` | |
| 2359 | ||
| 2360 | And `/local/fs/tmp/map.tsv` would contain: | |
| 2361 | ||
| 2362 | ``` | |
| 2363 | key1\tvalue1 | |
| 2364 | key2\tvalue2 | |
| 2365 | ``` | |
| 2366 | ||
| 2367 | ## File write_object(Object) | |
| 2368 | ||
|
|
||
| 2369 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2370 | ||
|
|
||
| 2371 | Given any `Object`, this will write out a 2-row, n-column TSV file with the object's attributes and values. | |
| 2372 | ||
| 2373 | ``` | |
| 2374 | task test { | |
| 2375 | Object input | |
| 2376 | command <<< | |
| 2377 | /bin/do_work --obj=${write_object(input)} | |
| 2378 | >>> | |
| 2379 | output { | |
| 2380 | File results = stdout() | |
| 2381 | } | |
| 2382 | } | |
| 2383 | ``` | |
| 2384 | ||
| 2385 | if `input` were to have the value: | |
| 2386 | ||
| 2387 | |Attribute|Value| | |
| 2388 | |---------|-----| | |
| 2389 | |key_1 |"value_1"| | |
| 2390 | |key_2 |"value_2"| | |
| 2391 | |key_3 |"value_3"| | |
| 2392 | ||
| 2393 | The command would instantiate to: | |
| 2394 | ||
| 2395 | ``` | |
| 2396 | /bin/do_work --obj=/path/to/input.tsv | |
| 2397 | ``` | |
| 2398 | ||
| 2399 | Where `/path/to/input.tsv` would contain: | |
| 2400 | ||
| 2401 | ``` | |
| 2402 | key_1\tkey_2\tkey_3 | |
| 2403 | value_1\tvalue_2\tvalue_3 | |
| 2404 | ``` | |
| 2405 | ||
| 2406 | ## File write_objects(Array[Object]) | |
| 2407 | ||
|
|
||
| 2408 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2409 | ||
|
|
||
| 2410 | Given any `Array[Object]`, this will write out a 2+ row, n-column TSV file with each object's attributes and values. | |
| 2411 | ||
|
|
||
| 2412 | ```wdl | |
|
|
||
| 2413 | task test { | |
|
|
||
| 2414 | Array[Object] in | |
|
|
||
| 2415 | command <<< | |
|
|
||
| 2416 | /bin/do_work --obj=${write_objects(in)} | |
|
|
||
| 2417 | >>> | |
| 2418 | output { | |
| 2419 | File results = stdout() | |
| 2420 | } | |
| 2421 | } | |
| 2422 | ``` | |
| 2423 | ||
|
|
||
| 2424 | if `in` were to have the value: | |
|
|
||
| 2425 | ||
|
|
||
| 2426 | |Index|Attribute|Value| | |
| 2427 | |-----|---------|-----| | |
| 2428 | |0 |key_1 |"value_1"| | |
| 2429 | | |key_2 |"value_2"| | |
| 2430 | | |key_3 |"value_3"| | |
| 2431 | |1 |key_1 |"value_4"| | |
| 2432 | | |key_2 |"value_5"| | |
| 2433 | | |key_3 |"value_6"| | |
| 2434 | |2 |key_1 |"value_7"| | |
| 2435 | | |key_2 |"value_8"| | |
| 2436 | | |key_3 |"value_9"| | |
|
|
||
| 2437 | ||
|
|
||
| 2438 | The command would instantiate to: | |
|
|
||
| 2439 | ||
|
|
||
| 2440 | ``` | |
| 2441 | /bin/do_work --obj=/path/to/input.tsv | |
| 2442 | ``` | |
|
|
||
| 2443 | ||
|
|
||
| 2444 | Where `/path/to/input.tsv` would contain: | |
|
|
||
| 2445 | ||
|
|
||
| 2446 | ``` | |
| 2447 | key_1\tkey_2\tkey_3 | |
| 2448 | value_1\tvalue_2\tvalue_3 | |
| 2449 | value_4\tvalue_5\tvalue_6 | |
| 2450 | value_7\tvalue_8\tvalue_9 | |
| 2451 | ``` | |
|
|
||
| 2452 | ||
|
|
||
| 2453 | ## File write_json(mixed) | |
|
|
||
| 2454 | ||
|
|
||
| 2455 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 2456 | ||
|
|
||
| 2457 | Given something with any type, this writes the JSON equivalent to a file. See the table in the definition of [read_json()](#mixed-read_jsonstringfile) | |
|
|
||
| 2458 | ||
|
|
||
| 2459 | ```wdl | |
|
|
||
| 2460 | task example { | |
| 2461 | Map[String, String] map = {"key1": "value1", "key2": "value2"} | |
| 2462 | command { | |
| 2463 | ./script --map=${write_json(map)} | |
| 2464 | } | |
| 2465 | } | |
| 2466 | ``` | |
|
|
||
| 2467 | ||
|
|
||
| 2468 | If this task were run, the command might look like: | |
|
|
||
| 2469 | ||
|
|
||
| 2470 | ``` | |
| 2471 | ./script --tsv=/local/fs/tmp/map.json | |
| 2472 | ``` | |
| 2473 | ||
| 2474 | And `/local/fs/tmp/map.json` would contain: | |
| 2475 | ||
| 2476 | ``` | |
| 2477 | { | |
| 2478 | "key1": "value1" | |
| 2479 | "key2": "value2" | |
| 2480 | } | |
| 2481 | ``` | |
|
|
||
| 2482 | ||
|
|
||
| 2483 | ## Float size(File, [String]) | |
| 2484 | ||
|
|
||
| 2485 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2486 | ||
|
|
||
| 2487 | Given a `File` and a `String` (optional), returns the size of the file in Bytes or in the unit specified by the second argument. | |
| 2488 | ||
| 2489 | ```wdl | |
| 2490 | task example { | |
| 2491 | File input_file | |
| 2492 | ||
| 2493 | command { | |
| 2494 | echo "this file is 22 bytes" > created_file | |
| 2495 | } | |
| 2496 | ||
| 2497 | output { | |
| 2498 | Float input_file_size = size(input_file) | |
| 2499 | Float created_file_size = size("created_file") # 22.0 | |
| 2500 | Float created_file_size_in_KB = size("created_file", "K") # 0.022 | |
| 2501 | } | |
| 2502 | } | |
| 2503 | ``` | |
| 2504 | ||
| 2505 | Supported units are KiloByte ("K", "KB"), MegaByte ("M", "MB"), GigaByte ("G", "GB"), TeraByte ("T", "TB") as well as their [binary version](https://en.wikipedia.org/wiki/Binary_prefix) "Ki" ("KiB"), "Mi" ("MiB"), "Gi" ("GiB"), "Ti" ("TiB"). | |
| 2506 | Default unit is Bytes ("B"). | |
|
|
||
| 2507 | ||
|
|
||
| 2508 | ||
| 2509 | ## String sub(String, String, String) | |
| 2510 | ||
|
|
||
| 2511 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2512 | ||
|
|
||
| 2513 | Given 3 String parameters `input`, `pattern`, `replace`, this function will replace any occurrence matching `pattern` in `input` by `replace`. | |
| 2514 | `pattern` is expected to be a [regular expression](https://en.wikipedia.org/wiki/Regular_expression). Details of regex evaluation will depend on the execution engine running the WDL. | |
| 2515 | ||
| 2516 | Example 1: | |
| 2517 | ||
| 2518 | ```wdl | |
| 2519 | String chocolike = "I like chocolate when it's late" | |
| 2520 | ||
| 2521 | String chocolove = sub(chocolike, "like", "love") # I love chocolate when it's late | |
| 2522 | String chocoearly = sub(chocolike, "late", "early") # I like chocoearly when it's early | |
| 2523 | String chocolate = sub(chocolike, "late$", "early") # I like chocolate when it's early | |
| 2524 | } | |
| 2525 | ``` | |
| 2526 | ||
| 2527 | The sub function will also accept `input` and `replace` parameters that can be coerced to a String (e.g. File). This can be useful to swap the extension of a filename for example | |
| 2528 | ||
| 2529 | Example 2: | |
| 2530 | ||
| 2531 | ```wdl | |
| 2532 | task example { | |
| 2533 | File input_file = "my_input_file.bam" | |
| 2534 | String output_file_name = sub(input_file, "\\.bam$", ".index") # my_input_file.index | |
|
|
||
| 2535 | ||
|
|
||
| 2536 | command { | |
| 2537 | echo "I want an index instead" > ${output_file_name} | |
| 2538 | } | |
|
|
||
| 2539 | ||
|
|
||
| 2540 | output { | |
|
|
||
| 2541 | File outputFile = output_file_name | |
|
|
||
| 2542 | } | |
| 2543 | } | |
| 2544 | ``` | |
| 2545 | ||
|
|
||
| 2546 | ## Array[Int] range(Int) | |
| 2547 | ||
|
|
||
| 2548 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2549 | ||
|
|
||
| 2550 | Given an integer argument, the `range` function creates an array of integers of length equal to the given argument. For example `range(3)` provides the array: `(0, 1, 2)`. | |
| 2551 | ||
| 2552 | ## Array[Array[X]] transpose(Array[Array[X]]) | |
| 2553 | ||
|
|
||
| 2554 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2555 | ||
|
|
||
| 2556 | Given a two dimensional array argument, the `transpose` function transposes the two dimensional array according to the standard matrix transpose rules. For example `transpose( ((0, 1, 2), (3, 4, 5)) )` will return the rotated two-dimensional array: `((0, 3), (1, 4), (2, 5))`. | |
| 2557 | ||
|
|
||
| 2558 | ## Array[Pair[X,Y]] zip(Array[X], Array[Y]) | |
|
|
||
| 2559 | ||
|
|
||
| 2560 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2561 | ||
|
|
||
| 2562 | Given any two Object types, the `zip` function returns the dot product of those Object types in the form of a Pair object. | |
| 2563 | ||
| 2564 | ``` | |
| 2565 | Pair[Int, String] p = (0, "z") | |
| 2566 | Array[Int] xs = [ 1, 2, 3 ] | |
| 2567 | Array[String] ys = [ "a", "b", "c" ] | |
| 2568 | Array[String] zs = [ "d", "e" ] | |
| 2569 | ||
| 2570 | Array[Pair[Int, String]] zipped = zip(xs, ys) # i.e. zipped = [ (1, "a"), (2, "b"), (3, "c") ] | |
| 2571 | ``` | |
| 2572 | ||
|
|
||
| 2573 | ## Array[Pair[X,Y]] cross(Array[X], Array[Y]) | |
|
|
||
| 2574 | ||
|
|
||
| 2575 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2576 | ||
|
|
||
| 2577 | Given any two Object types, the `cross` function returns the cross product of those Object types in the form of a Pair object. | |
| 2578 | ||
| 2579 | ``` | |
| 2580 | Pair[Int, String] p = (0, "z") | |
| 2581 | Array[Int] xs = [ 1, 2, 3 ] | |
| 2582 | Array[String] ys = [ "a", "b", "c" ] | |
| 2583 | Array[String] zs = [ "d", "e" ] | |
| 2584 | ||
|
|
||
| 2585 | Array[Pair[Int, String]] crossed = cross(xs, zs) # i.e. crossed = [ (1, "d"), (1, "e"), (2, "d"), (2, "e"), (3, "d"), (3, "e") ] | |
|
|
||
| 2586 | ``` | |
| 2587 | ||
|
|
||
| 2588 | ## Integer length(Array[X]) | |
| 2589 | ||
| 2590 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2591 | ||
| 2592 | Given an Array, the `length` function returns the number of elements in the Array as an Integer. | |
| 2593 | ||
| 2594 | ``` | |
| 2595 | Array[Int] xs = [ 1, 2, 3 ] | |
| 2596 | Array[String] ys = [ "a", "b", "c" ] | |
| 2597 | Array[String] zs = [ ] | |
| 2598 | ||
| 2599 | Integer xlen = length(xs) # 3 | |
| 2600 | Integer ylen = length(ys) # 3 | |
| 2601 | Integer zlen = length(zs) # 0 | |
|
|
||
| 2602 | ``` | |
|
|
||
| 2603 | ||
|
|
||
| 2604 | ## Array[String] prefix(String, Array[X]) | |
|
|
||
| 2605 | ||
| 2606 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2607 | ||
|
|
||
| 2608 | Given a String and an Array[X] where X is a primitive type, the `prefix` function returns an array of strings comprised | |
| 2609 | of each element of the input array prefixed by the specified prefix string. For example: | |
|
|
||
| 2610 | ||
| 2611 | ``` | |
| 2612 | Array[String] env = ["key1=value1", "key2=value2", "key3=value3"] | |
| 2613 | Array[String] env_param = prefix("-e ", env) # ["-e key1=value1", "-e key2=value2", "-e key3=value3"] | |
|
|
||
| 2614 | ||
| 2615 | Array[Integer] env2 = [1, 2, 3] | |
| 2616 | Array[String] env2_param = prefix("-f ", env2) # ["-f 1", "-f 2", "-f 3"] | |
|
|
||
| 2617 | ``` | |
| 2618 | ||
|
|
||
| 2619 | ## X select_first(Array[X?]) | |
| 2620 | ||
| 2621 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2622 | ||
| 2623 | Given an array of optional values, `select_first` will select the first defined value and return it. Note that this is a runtime check and requires that at least one defined value will exist: if no defined value is found when select_first is evaluated, the workflow will fail. | |
| 2624 | ||
| 2625 | ## Array[X] select_all(Array[X?]) | |
| 2626 | ||
| 2627 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2628 | ||
| 2629 | Given an array of optional values, `select_all` will select only those elements which are defined. | |
| 2630 | ||
| 2631 | ## Boolean defined(X?) | |
| 2632 | ||
| 2633 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2634 | ||
| 2635 | This function will return `false` if the argument is an unset optional value. It will return `true` in all other cases. | |
|
|
||
| 2636 | ||
|
|
||
| 2637 | ## String basename(String) | |
| 2638 | ||
| 2639 | :pig2: [Supported in Cromwell 27](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2640 | ||
| 2641 | - This function returns the basename of a file path passed to it: `basename("/path/to/file.txt")` returns `"file.txt"`. | |
| 2642 | - Also supports an optional parameter, suffix to remove: `basename("/path/to/file.txt", ".txt")` returns `"file"`. | |
| 2643 | ||
|
|
||
| 2644 | ## Int floor(Float), Int ceil(Float) and Int round(Float) | |
| 2645 | ||
| 2646 | :pig2: [Supported in Cromwell 28](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2647 | ||
| 2648 | - These functions convert a Float value into an Int by: | |
| 2649 | - floor: Round **down** to the next lower integer | |
| 2650 | - ceil: Round **up** to the next higher integer | |
| 2651 | - round: Round to the nearest integer based on standard rounding rules | |
| 2652 | ||
|
|
||
| 2653 | # Data Types & Serialization | |
| 2654 | ||
|
|
||
| 2655 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2656 | ||
|
|
||
| 2657 | Tasks and workflows are given values for their input parameters in order to run. The type of each of those input parameters are declarations on the `task` or `workflow`. Those input parameters can be any [valid type](#types): | |
|
|
||
| 2658 | ||
|
|
||
| 2659 | Primitive Types: | |
| 2660 | ||
| 2661 | * String | |
| 2662 | * Int | |
| 2663 | * Float | |
| 2664 | * File | |
| 2665 | * Boolean | |
|
|
||
| 2666 | ||
| 2667 | Compound Types: | |
|
|
||
| 2668 | ||
|
|
||
| 2669 | * Array | |
| 2670 | * Map | |
|
|
||
| 2671 | * Object | |
|
|
||
| 2672 | * Pair | |
|
|
||
| 2673 | ||
|
|
||
| 2674 | When a WDL workflow engine instantiates a command specified in the `command` section of a `task`, it must serialize all `${...}` tags in the command into primitive types. | |
|
|
||
| 2675 | ||
| 2676 | For example, if I'm writing a tool that operates on a list of FASTQ files, there are a variety of ways that this list can be passed to that task: | |
| 2677 | ||
|
|
||
| 2678 | * A file containing one file path per line (e.g. `Rscript analysis.R --files=fastq_list.txt`) | |
| 2679 | * A file containing a JSON list (e.g. `Rscript analysis.R --files=fastq_list.json`) | |
| 2680 | * Enumerated on the command line (e.g. (`Rscript analysis.R 1.fastq 2.fastq 3.fastq`) | |
|
|
||
| 2681 | ||
| 2682 | Each of these methods has its merits and one method might be better for one tool while another method would be better for another tool. | |
| 2683 | ||
|
|
||
| 2684 | On the other end, tasks need to be able to communicate data structures back to the workflow engine. For example, let's say this same tool that takes a list of FASTQs wants to return back a `Map[File, Int]` representing the number of reads in each FASTQ. A tool might choose to output it as a two-column TSV or as a JSON object and WDL needs to know how to convert that to the proper data type. | |
|
|
||
| 2685 | ||
|
|
||
| 2686 | WDL provides some [standard library functions](#standard-library) for converting compound types like `Array` into primitive types, like `File`. | |
|
|
||
| 2687 | ||
|
|
||
| 2688 | When a task finishes, the `output` section defines how to convert the files and stdout/stderr into WDL types. For example, | |
|
|
||
| 2689 | ||
|
|
||
| 2690 | ```wdl | |
|
|
||
| 2691 | task test { | |
| 2692 | Array[File] files | |
|
|
||
| 2693 | command { | |
|
|
||
| 2694 | Rscript analysis.R --files=${sep=',' files} | |
|
|
||
| 2695 | } | |
| 2696 | output { | |
|
|
||
| 2697 | Array[String] strs = read_lines(stdout()) | |
|
|
||
| 2698 | } | |
| 2699 | } | |
| 2700 | ``` | |
| 2701 | ||
|
|
||
| 2702 | Here, the expression `read_lines(stdout())` says "take the output from stdout, break into lines, and return that result as an Array[String]". See the definition of [read_lines](#arraystring-read_linesstringfile) and [stdout](#file-stdout) for more details. | |
|
|
||
| 2703 | ||
|
|
||
| 2704 | ## Serialization of Task Inputs | |
|
|
||
| 2705 | ||
|
|
||
| 2706 | ### Primitive Types | |
|
|
||
| 2707 | ||
|
|
||
| 2708 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2709 | ||
|
|
||
| 2710 | Serializing primitive inputs into strings is intuitively easy because the value is just turned into a string and inserted into the command line. | |
| 2711 | ||
| 2712 | Consider this example: | |
|
|
||
| 2713 | ||
|
|
||
| 2714 | ```wdl | |
|
|
||
| 2715 | task output_example { | |
|
|
||
| 2716 | String s | |
| 2717 | Int i | |
| 2718 | Float f | |
| 2719 | ||
|
|
||
| 2720 | command { | |
|
|
||
| 2721 | python do_work.py ${s} ${i} ${f} | |
|
|
||
| 2722 | } | |
| 2723 | } | |
| 2724 | ``` | |
| 2725 | ||
|
|
||
| 2726 | If I provide values for the declarations in the task as: | |
|
|
||
| 2727 | ||
|
|
||
| 2728 | |var|value| | |
| 2729 | |---|-----| | |
| 2730 | |s |"str"| | |
| 2731 | |i |2 | | |
| 2732 | |f |1.3 | | |
|
|
||
| 2733 | ||
|
|
||
| 2734 | Then, the command would be instantiated as: | |
|
|
||
| 2735 | ||
| 2736 | ``` | |
|
|
||
| 2737 | python do_work.py str 2 1.3 | |
|
|
||
| 2738 | ``` | |
| 2739 | ||
|
|
||
| 2740 | ### Compound Types | |
|
|
||
| 2741 | ||
|
|
||
| 2742 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2743 | ||
|
|
||
| 2744 | Compound types, like `Array` and `Map` must be converted to a primitive type before it can be used in the command. There are many ways to turn a compound types into primitive types, as laid out in following sections | |
|
|
||
| 2745 | ||
|
|
||
| 2746 | #### Array serialization | |
|
|
||
| 2747 | ||
|
|
||
| 2748 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2749 | ||
|
|
||
| 2750 | Arrays can be serialized in two ways: | |
| 2751 | ||
| 2752 | * **Array Expansion**: elements in the list are flattened to a string with a separator character. | |
| 2753 | * **File Creation**: create a file with the elements of the array in it and passing that file as the parameter on the command line. | |
| 2754 | ||
|
|
||
| 2755 | ##### Array serialization by expansion | |
|
|
||
| 2756 | ||
|
|
||
| 2757 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2758 | ||
|
|
||
| 2759 | The array flattening approach can be done if a parameter is specified as `${sep=' ' my_param}`. `my_param` must be declared as an `Array` of primitive types. When the value of `my_param` is specified, then the values are joined together with the separator character (a space in this case). For example: | |
|
|
||
| 2760 | ||
|
|
||
| 2761 | ```wdl | |
|
|
||
| 2762 | task test { | |
| 2763 | Array[File] bams | |
| 2764 | command { | |
| 2765 | python script.py --bams=${sep=',' bams} | |
| 2766 | } | |
| 2767 | } | |
|
|
||
| 2768 | ``` | |
| 2769 | ||
| 2770 | If passed an array for the value of `bams`: | |
| 2771 | ||
| 2772 | |Element | | |
| 2773 | |--------------| | |
| 2774 | |/path/to/1.bam| | |
| 2775 | |/path/to/2.bam| | |
| 2776 | |/path/to/3.bam| | |
| 2777 | ||
| 2778 | Would produce the command `python script.py --bams=/path/to/1.bam,/path/to/2.bam,/path/to/1.bam` | |
| 2779 | ||
|
|
||
| 2780 | ##### Array serialization using write_lines() | |
|
|
||
| 2781 | ||
|
|
||
| 2782 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2783 | ||
|
|
||
| 2784 | An array may be turned into a file with each element in the array occupying a line in the file. | |
| 2785 | ||
|
|
||
| 2786 | ```wdl | |
|
|
||
| 2787 | task test { | |
| 2788 | Array[File] bams | |
| 2789 | command { | |
| 2790 | sh script.sh ${write_lines(bams)} | |
| 2791 | } | |
| 2792 | } | |
|
|
||
| 2793 | ``` | |
| 2794 | ||
| 2795 | if `bams` is given this array: | |
| 2796 | ||
| 2797 | |Element | | |
| 2798 | |--------------| | |
| 2799 | |/path/to/1.bam| | |
| 2800 | |/path/to/2.bam| | |
| 2801 | |/path/to/3.bam| | |
| 2802 | ||
| 2803 | Then, the resulting command line could look like: | |
| 2804 | ||
| 2805 | ``` | |
|
|
||
| 2806 | sh script.sh /jobs/564758/bams | |
|
|
||
| 2807 | ``` | |
| 2808 | ||
|
|
||
| 2809 | Where `/jobs/564758/bams` would contain: | |
|
|
||
| 2810 | ||
| 2811 | ``` | |
| 2812 | /path/to/1.bam | |
| 2813 | /path/to/2.bam | |
| 2814 | /path/to/3.bam | |
| 2815 | ``` | |
| 2816 | ||
|
|
||
| 2817 | ##### Array serialization using write_json() | |
|
|
||
| 2818 | ||
|
|
||
| 2819 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 2820 | ||
|
|
||
| 2821 | The array may be turned into a JSON document with the file path for the JSON file passed in as the parameter: | |
| 2822 | ||
|
|
||
| 2823 | ```wdl | |
|
|
||
| 2824 | task test { | |
| 2825 | Array[File] bams | |
| 2826 | command { | |
| 2827 | sh script.sh ${write_json(bams)} | |
| 2828 | } | |
| 2829 | } | |
|
|
||
| 2830 | ``` | |
| 2831 | ||
| 2832 | if `bams` is given this array: | |
| 2833 | ||
| 2834 | |Element | | |
| 2835 | |--------------| | |
| 2836 | |/path/to/1.bam| | |
| 2837 | |/path/to/2.bam| | |
| 2838 | |/path/to/3.bam| | |
| 2839 | ||
| 2840 | Then, the resulting command line could look like: | |
| 2841 | ||
| 2842 | ``` | |
| 2843 | sh script.sh /jobs/564758/bams.json | |
| 2844 | ``` | |
| 2845 | ||
| 2846 | Where `/jobs/564758/bams.json` would contain: | |
| 2847 | ||
| 2848 | ``` | |
| 2849 | [ | |
| 2850 | "/path/to/1.bam", | |
| 2851 | "/path/to/2.bam", | |
| 2852 | "/path/to/3.bam" | |
| 2853 | ] | |
| 2854 | ``` | |
| 2855 | ||
|
|
||
| 2856 | #### Map serialization | |
|
|
||
| 2857 | ||
|
|
||
| 2858 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2859 | ||
|
|
||
| 2860 | Map types cannot be serialized on the command line directly and must be serialized through a file | |
| 2861 | ||
|
|
||
| 2862 | ##### Map serialization using write_map() | |
|
|
||
| 2863 | ||
|
|
||
| 2864 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2865 | ||
|
|
||
| 2866 | The map type can be serialized as a two-column TSV file and the parameter on the command line is given the path to that file, using the `write_map()` function: | |
|
|
||
| 2867 | ||
|
|
||
| 2868 | ```wdl | |
|
|
||
| 2869 | task test { | |
| 2870 | Map[String, Float] sample_quality_scores | |
| 2871 | command { | |
| 2872 | sh script.sh ${write_map(sample_quality_scores)} | |
| 2873 | } | |
| 2874 | } | |
|
|
||
| 2875 | ``` | |
| 2876 | ||
|
|
||
| 2877 | if `sample_quality_scores` is given this Map[String, Float] as: | |
|
|
||
| 2878 | ||
| 2879 | |Key |Value | | |
| 2880 | |-------|------| | |
| 2881 | |sample1|98 | | |
| 2882 | |sample2|95 | | |
| 2883 | |sample3|75 | | |
| 2884 | ||
| 2885 | Then, the resulting command line could look like: | |
| 2886 | ||
| 2887 | ``` | |
| 2888 | sh script.sh /jobs/564757/sample_quality_scores.tsv | |
| 2889 | ``` | |
| 2890 | ||
| 2891 | Where `/jobs/564757/sample_quality_scores.tsv` would contain: | |
| 2892 | ||
| 2893 | ``` | |
| 2894 | sample1\t98 | |
| 2895 | sample2\t95 | |
| 2896 | sample3\t75 | |
| 2897 | ``` | |
| 2898 | ||
|
|
||
| 2899 | ##### Map serialization using write_json() | |
|
|
||
| 2900 | ||
|
|
||
| 2901 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 2902 | ||
|
|
||
| 2903 | The map type can also be serialized as a JSON file and the parameter on the command line is given the path to that file, using the `write_json()` function: | |
|
|
||
| 2904 | ||
|
|
||
| 2905 | ```wdl | |
|
|
||
| 2906 | task test { | |
| 2907 | Map[String, Float] sample_quality_scores | |
| 2908 | command { | |
| 2909 | sh script.sh ${write_json(sample_quality_scores)} | |
| 2910 | } | |
| 2911 | } | |
|
|
||
| 2912 | ``` | |
| 2913 | ||
| 2914 | if sample_quality_scores is given this map: | |
| 2915 | ||
| 2916 | |Key |Value | | |
| 2917 | |-------|------| | |
| 2918 | |sample1|98 | | |
| 2919 | |sample2|95 | | |
| 2920 | |sample3|75 | | |
| 2921 | ||
| 2922 | Then, the resulting command line could look like: | |
| 2923 | ||
| 2924 | ``` | |
| 2925 | sh script.sh /jobs/564757/sample_quality_scores.json | |
| 2926 | ``` | |
| 2927 | ||
| 2928 | Where `/jobs/564757/sample_quality_scores.json` would contain: | |
| 2929 | ||
| 2930 | ``` | |
| 2931 | { | |
| 2932 | "sample1": 98, | |
| 2933 | "sample2": 95, | |
| 2934 | "sample3": 75 | |
| 2935 | } | |
| 2936 | ``` | |
| 2937 | ||
|
|
||
| 2938 | #### Object serialization | |
|
|
||
| 2939 | ||
|
|
||
| 2940 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2941 | ||
|
|
||
| 2942 | An object is a more general case of a map where the keys are strings and the values are of arbitrary types and treated as strings. Objects can be serialized with either `write_object()` or `write_json()` functions: | |
|
|
||
| 2943 | ||
|
|
||
| 2944 | ##### Object serialization using write_object() | |
|
|
||
| 2945 | ||
|
|
||
| 2946 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 2947 | ||
|
|
||
| 2948 | ```wdl | |
|
|
||
| 2949 | task test { | |
| 2950 | Object sample | |
| 2951 | command { | |
| 2952 | perl script.pl ${write_object(sample)} | |
| 2953 | } | |
| 2954 | } | |
|
|
||
| 2955 | ``` | |
| 2956 | ||
| 2957 | if sample is provided as: | |
| 2958 | ||
| 2959 | |Attribute|Value | | |
| 2960 | |---------|------| | |
| 2961 | |attr1 |value1| | |
| 2962 | |attr2 |value2| | |
| 2963 | |attr3 |value3| | |
| 2964 | |attr4 |value4| | |
| 2965 | ||
| 2966 | Then, the resulting command line could look like: | |
| 2967 | ||
| 2968 | ``` | |
| 2969 | perl script.pl /jobs/564759/sample.tsv | |
| 2970 | ``` | |
| 2971 | ||
| 2972 | Where `/jobs/564759/sample.tsv` would contain: | |
| 2973 | ||
| 2974 | ``` | |
| 2975 | attr1\tattr2\tattr3\tattr4 | |
| 2976 | value1\tvalue2\tvalue3\tvalue4 | |
| 2977 | ``` | |
| 2978 | ||
|
|
||
| 2979 | ##### Object serialization using write_json() | |
|
|
||
| 2980 | ||
|
|
||
| 2981 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 2982 | ||
|
|
||
| 2983 | ```wdl | |
|
|
||
| 2984 | task test { | |
| 2985 | Object sample | |
| 2986 | command { | |
| 2987 | perl script.pl ${write_json(sample)} | |
| 2988 | } | |
| 2989 | } | |
|
|
||
| 2990 | ``` | |
| 2991 | ||
| 2992 | if sample is provided as: | |
| 2993 | ||
| 2994 | |Attribute|Value | | |
| 2995 | |---------|------| | |
| 2996 | |attr1 |value1| | |
| 2997 | |attr2 |value2| | |
| 2998 | |attr3 |value3| | |
| 2999 | |attr4 |value4| | |
| 3000 | ||
| 3001 | Then, the resulting command line could look like: | |
| 3002 | ||
| 3003 | ``` | |
| 3004 | perl script.pl /jobs/564759/sample.json | |
| 3005 | ``` | |
| 3006 | ||
| 3007 | Where `/jobs/564759/sample.json` would contain: | |
| 3008 | ||
| 3009 | ``` | |
| 3010 | { | |
| 3011 | "attr1": "value1", | |
| 3012 | "attr2": "value2", | |
| 3013 | "attr3": "value3", | |
| 3014 | "attr4": "value4", | |
| 3015 | } | |
| 3016 | ``` | |
|
|
||
| 3017 | #### Array[Object] serialization | |
| 3018 | ||
|
|
||
| 3019 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3020 | ||
|
|
||
| 3021 | `Array[Object]` must guarantee that all objects in the array have the same set of attributes. These can be serialized with either `write_objects()` or `write_json()` functions, as described in following sections. | |
| 3022 | ||
| 3023 | ##### Array[Object] serialization using write_objects() | |
| 3024 | ||
|
|
||
| 3025 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3026 | ||
|
|
||
| 3027 | an `Array[Object]` can be serialized using `write_objects()` into a TSV file: | |
| 3028 | ||
|
|
||
| 3029 | ```wdl | |
|
|
||
| 3030 | task test { | |
| 3031 | Array[Object] sample | |
| 3032 | command { | |
| 3033 | perl script.pl ${write_objects(sample)} | |
| 3034 | } | |
| 3035 | } | |
| 3036 | ``` | |
| 3037 | ||
| 3038 | if sample is provided as: | |
| 3039 | ||
| 3040 | |Index|Attribute|Value | | |
| 3041 | |-----|---------|-------| | |
| 3042 | |0 |attr1 |value1 | | |
| 3043 | | |attr2 |value2 | | |
| 3044 | | |attr3 |value3 | | |
| 3045 | | |attr4 |value4 | | |
| 3046 | |1 |attr1 |value5 | | |
| 3047 | | |attr2 |value6 | | |
| 3048 | | |attr3 |value7 | | |
| 3049 | | |attr4 |value8 | | |
| 3050 | ||
| 3051 | Then, the resulting command line could look like: | |
| 3052 | ||
| 3053 | ``` | |
| 3054 | perl script.pl /jobs/564759/sample.tsv | |
| 3055 | ``` | |
| 3056 | ||
| 3057 | Where `/jobs/564759/sample.tsv` would contain: | |
| 3058 | ||
| 3059 | ``` | |
| 3060 | attr1\tattr2\tattr3\tattr4 | |
| 3061 | value1\tvalue2\tvalue3\tvalue4 | |
| 3062 | value5\tvalue6\tvalue7\tvalue8 | |
| 3063 | ``` | |
| 3064 | ||
| 3065 | ##### Array[Object] serialization using write_json() | |
| 3066 | ||
|
|
||
| 3067 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 3068 | ||
|
|
||
| 3069 | an `Array[Object]` can be serialized using `write_json()` into a JSON file: | |
| 3070 | ||
|
|
||
| 3071 | ```wdl | |
|
|
||
| 3072 | task test { | |
| 3073 | Array[Object] sample | |
| 3074 | command { | |
| 3075 | perl script.pl ${write_json(sample)} | |
| 3076 | } | |
| 3077 | } | |
| 3078 | ``` | |
| 3079 | ||
| 3080 | if sample is provided as: | |
| 3081 | ||
| 3082 | |Index|Attribute|Value | | |
| 3083 | |-----|---------|-------| | |
| 3084 | |0 |attr1 |value1 | | |
| 3085 | | |attr2 |value2 | | |
| 3086 | | |attr3 |value3 | | |
| 3087 | | |attr4 |value4 | | |
| 3088 | |1 |attr1 |value5 | | |
| 3089 | | |attr2 |value6 | | |
| 3090 | | |attr3 |value7 | | |
| 3091 | | |attr4 |value8 | | |
| 3092 | ||
| 3093 | Then, the resulting command line could look like: | |
| 3094 | ||
| 3095 | ``` | |
| 3096 | perl script.pl /jobs/564759/sample.json | |
| 3097 | ``` | |
| 3098 | ||
| 3099 | Where `/jobs/564759/sample.json` would contain: | |
| 3100 | ||
| 3101 | ``` | |
| 3102 | [ | |
| 3103 | { | |
| 3104 | "attr1": "value1", | |
| 3105 | "attr2": "value2", | |
| 3106 | "attr3": "value3", | |
| 3107 | "attr4": "value4" | |
| 3108 | }, | |
| 3109 | { | |
| 3110 | "attr1": "value5", | |
| 3111 | "attr2": "value6", | |
| 3112 | "attr3": "value7", | |
| 3113 | "attr4": "value8" | |
| 3114 | } | |
| 3115 | ] | |
| 3116 | ``` | |
| 3117 | ||
| 3118 | ## De-serialization of Task Outputs | |
| 3119 | ||
|
|
||
| 3120 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3121 | ||
|
|
||
| 3122 | A task's command can only output data as files. Therefore, every de-serialization function in WDL takes a file input and returns a WDL type | |
| 3123 | ||
| 3124 | ### Primitive Types | |
| 3125 | ||
|
|
||
| 3126 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3127 | ||
|
|
||
| 3128 | De-serialization of primitive types is done through a `read_*` function. For example, `read_int("file/path")` and `read_string("file/path")`. | |
| 3129 | ||
| 3130 | For example, if I have a task that outputs a `String` and an `Int`: | |
| 3131 | ||
|
|
||
| 3132 | ```wdl | |
|
|
||
| 3133 | task output_example { | |
| 3134 | String param1 | |
| 3135 | String param2 | |
| 3136 | command { | |
| 3137 | python do_work.py ${param1} ${param2} --out1=int_file --out2=str_file | |
| 3138 | } | |
| 3139 | output { | |
| 3140 | Int my_int = read_int("int_file") | |
| 3141 | String my_str = read_string("str_file") | |
| 3142 | } | |
| 3143 | } | |
| 3144 | ``` | |
| 3145 | ||
| 3146 | Both files `file_with_int` and `file_with_uri` should contain one line with the value on that line. This value is then validated against the type of the variable. If `file_with_int` contains a line with the text "foobar", the workflow must fail this task with an error. | |
| 3147 | ||
| 3148 | ### Compound Types | |
| 3149 | ||
|
|
||
| 3150 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3151 | ||
|
|
||
| 3152 | Tasks can also output to a file or stdout/stderr an `Array`, `Map`, or `Object` data structure in a two major formats: | |
| 3153 | ||
| 3154 | * JSON - because it fits naturally with the types within WDL | |
| 3155 | * Text based / TSV - These are usually simple table and text-based encodings (e.g. `Array[String]` could be serialized by having each element be a line in a file) | |
| 3156 | ||
| 3157 | #### Array deserialization | |
| 3158 | ||
|
|
||
| 3159 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3160 | ||
|
|
||
| 3161 | Maps are deserialized from: | |
| 3162 | ||
| 3163 | * Files that contain a JSON Array as their top-level element. | |
| 3164 | * Any file where it is desirable to interpret each line as an element of the `Array`. | |
| 3165 | ||
| 3166 | ##### Array deserialization using read_lines() | |
| 3167 | ||
|
|
||
| 3168 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3169 | ||
|
|
||
| 3170 | `read_lines()` will return an `Array[String]` where each element in the array is a line in the file. | |
| 3171 | ||
| 3172 | This return value can be auto converted to other `Array` types. For example: | |
| 3173 | ||
|
|
||
| 3174 | ```wdl | |
|
|
||
| 3175 | task test { | |
| 3176 | command <<< | |
| 3177 | python <<CODE | |
| 3178 | import random | |
| 3179 | for i in range(10): | |
| 3180 | print(random.randrange(10)) | |
| 3181 | CODE | |
| 3182 | >>> | |
| 3183 | output { | |
| 3184 | Array[Int] my_ints = read_lines(stdout()) | |
| 3185 | } | |
| 3186 | } | |
| 3187 | ``` | |
| 3188 | ||
| 3189 | `my_ints` would contain ten random integers ranging from 0 to 10. | |
| 3190 | ||
| 3191 | ##### Array deserialization using read_json() | |
| 3192 | ||
|
|
||
| 3193 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 3194 | ||
|
|
||
| 3195 | `read_json()` will return whatever data type resides in that JSON file | |
| 3196 | ||
|
|
||
| 3197 | ```wdl | |
|
|
||
| 3198 | task test { | |
| 3199 | command <<< | |
| 3200 | echo '["foo", "bar"]' | |
| 3201 | >>> | |
| 3202 | output { | |
| 3203 | Array[String] my_array = read_json(stdout()) | |
| 3204 | } | |
| 3205 | } | |
| 3206 | ``` | |
| 3207 | ||
| 3208 | This task would assign the array with elements `"foo"` and `"bar"` to `my_array`. | |
| 3209 | ||
| 3210 | If the echo statement was instead `echo '{"foo": "bar"}'`, the engine MUST fail the task for a type mismatch. | |
| 3211 | ||
| 3212 | #### Map deserialization | |
| 3213 | ||
|
|
||
| 3214 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3215 | ||
|
|
||
| 3216 | Maps are deserialized from: | |
| 3217 | ||
| 3218 | * Files that contain a JSON Object as their top-level element. | |
| 3219 | * Files that contain a two-column TSV file. | |
| 3220 | ||
| 3221 | ##### Map deserialization using read_map() | |
| 3222 | ||
|
|
||
| 3223 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3224 | ||
|
|
||
| 3225 | `read_map()` will return an `Map[String, String]` where the keys are the first column in the TSV input file and the corresponding values are the second column. | |
| 3226 | ||
| 3227 | This return value can be auto converted to other `Map` types. For example: | |
| 3228 | ||
|
|
||
| 3229 | ```wdl | |
|
|
||
| 3230 | task test { | |
| 3231 | command <<< | |
| 3232 | python <<CODE | |
| 3233 | for i in range(3): | |
| 3234 | print("key_{idx}\t{idx}".format(idx=i) | |
| 3235 | CODE | |
| 3236 | >>> | |
| 3237 | output { | |
| 3238 | Map[String, Int] my_ints = read_map(stdout()) | |
| 3239 | } | |
| 3240 | } | |
| 3241 | ``` | |
| 3242 | ||
| 3243 | This would put a map containing three keys (`key_0`, `key_1`, and `key_2`) and three respective values (`0`, `1`, and `2`) as the value of `my_ints` | |
| 3244 | ||
| 3245 | ##### Map deserialization using read_json() | |
| 3246 | ||
|
|
||
| 3247 | :pig2: Coming soon in [Cromwell](https://github.com/broadinstitute/cromwell) | |
| 3248 | ||
|
|
||
| 3249 | `read_json()` will return whatever data type resides in that JSON file. If that file contains a JSON object with homogeneous key/value pair types (e.g. `string -> int` pairs), then the `read_json()` function would return a `Map`. | |
| 3250 | ||
|
|
||
| 3251 | ```wdl | |
|
|
||
| 3252 | task test { | |
| 3253 | command <<< | |
| 3254 | echo '{"foo":"bar"}' | |
| 3255 | >>> | |
| 3256 | output { | |
| 3257 | Map[String, String] my_map = read_json(stdout()) | |
| 3258 | } | |
| 3259 | } | |
| 3260 | ``` | |
| 3261 | ||
| 3262 | This task would assign the one key-value pair map in the echo statement to `my_map`. | |
| 3263 | ||
| 3264 | If the echo statement was instead `echo '["foo", "bar"]'`, the engine MUST fail the task for a type mismatch. | |
| 3265 | ||
| 3266 | #### Object deserialization | |
| 3267 | ||
|
|
||
| 3268 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3269 | ||
|
|
||
| 3270 | Objects are deserialized from files that contain a two-row, n-column TSV file. The first row are the object attribute names and the corresponding entries on the second row are the values. | |
| 3271 | ||
| 3272 | ##### Object deserialization using read_object() | |
| 3273 | ||
|
|
||
| 3274 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3275 | ||
|
|
||
| 3276 | `read_object()` will return an `Object` where the keys are the first row in the TSV input file and the corresponding values are the second row (corresponding column). | |
| 3277 | ||
|
|
||
| 3278 | ```wdl | |
|
|
||
| 3279 | task test { | |
| 3280 | command <<< | |
| 3281 | python <<CODE | |
| 3282 | print('\t'.join(["key_{}".format(i) for i in range(3)])) | |
| 3283 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 3284 | CODE | |
| 3285 | >>> | |
| 3286 | output { | |
| 3287 | Object my_obj = read_object(stdout()) | |
| 3288 | } | |
| 3289 | } | |
| 3290 | ``` | |
| 3291 | ||
| 3292 | This would put an object containing three attributes (`key_0`, `key_1`, and `key_2`) and three respective values (`value_0`, `value_1`, and `value_2`) as the value of `my_obj` | |
| 3293 | ||
| 3294 | #### Array[Object] deserialization | |
| 3295 | ||
|
|
||
| 3296 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3297 | ||
|
|
||
| 3298 | `Array[Object]` MUST assume that all objects in the array are homogeneous (they have the same attributes, but the attributes don't have to have the same values) | |
| 3299 | ||
| 3300 | An `Array[Object]` is deserialized from files that contains at least 2 rows and a uniform n-column TSV file. The first row are the object attribute names and the corresponding entries on the subsequent rows are the values | |
| 3301 | ||
| 3302 | ##### Object deserialization using read_objects() | |
|
|
||
| 3303 | ||
|
|
||
| 3304 | :pig2: [Cromwell supported](https://github.com/broadinstitute/cromwell#wdl-support) :white_check_mark: | |
| 3305 | ||
|
|
||
| 3306 | `read_object()` will return an `Object` where the keys are the first row in the TSV input file and the corresponding values are the second row (corresponding column). | |
| 3307 | ||
|
|
||
| 3308 | ```wdl | |
|
|
||
| 3309 | task test { | |
| 3310 | command <<< | |
| 3311 | python <<CODE | |
| 3312 | print('\t'.join(["key_{}".format(i) for i in range(3)])) | |
| 3313 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 3314 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 3315 | print('\t'.join(["value_{}".format(i) for i in range(3)])) | |
| 3316 | CODE | |
| 3317 | >>> | |
| 3318 | output { | |
| 3319 | Array[Object] my_obj = read_objects(stdout()) | |
| 3320 | } | |
| 3321 | } | |
| 3322 | ``` | |
|
|
||
| 3323 | ||
|
|
||
| 3324 | This would create an array of **three identical** `Object`s containing three attributes (`key_0`, `key_1`, and `key_2`) and three respective values (`value_0`, `value_1`, and `value_2`) as the value of `my_obj` |