Skip to content

PMCC-BioinformaticsCore/python-wdlgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-wdlgen

Build Status PyPI version

Workflow Description Language is way to describe tasks and workflows in a "human readable and writable way". It was initially developed and offered by Broad Institute to be paired with their workflow engine Cromwell, however it has since been made open source with other engines such as Toil and DNAnexus*.

WARNING

This module now only generates developmental WDL, this includes Directories and wrapping all inputs in an input block. To use this generated WDL, you must use a version of Cromwell higher than 37.

This module automatically includes version development in the Workflow and Task outputs. The guides below may not reflect the current version of this repository, but will be updated soon.

This syntax is based on the Developmental Workflow Description Language specification.


Motiviation

I needed an easy way to generate some BASIC WDL through some in memory objects, and I was using (a fork of) common-workflow-language/python-cwlgen, I figured I could open this up to see what use it has.

Installation

pip install illusional.wdlgen

General support

This software is provided as-is, without warranty of any kind ... and so on.

It's a pretty dumb wrapper that uses string interpolation to generate the structure. It wouldn't handle automatically escaping illegal characters.

Generally it supports:

  • Types - All types are represented as a WdlType, which can either be a PrimitiveType, or an ArrayType (see goal). Also supports the postfix quantifiers.

  • Workflow creation (wdlgen.Workflow)

    • manual imports (wdlgen.Workflow.WorkflowImport)
    • inputs (wdlgen.Input)
    • outputs (wdlgen.Output)
    • calls:
      • general call (wdlgen.WorkflowCall)
      • scatter (wdlgen.WorkflowScatter(WorkflowCall[]))
    • meta: wdlgen.Meta
    • parameter_meta: wdlgen.ParameterMeta
  • Task creation (wdlgen.Task) - This is based similar to how CWL constructs its commands.

    • inputs: wdlgen.Input
    • outputs: wdlgen.Output
    • runtime: wdlgen.Task.Runtime
    • command: wdlgen.Task.Command
      • arguments: wdlgen.Task.Command.Argument
      • inputs: wdlgen.Task.Command.Input
    • meta: wdlgen.Meta
    • parameter_meta: wdlgen.ParameterMeta

How to use

This will give you a brief overview on how to use python-wdlgen. Goals are to improve the write a proper documentation spec, but if you have a moderate understanding of workflows in either CWL or WDL, this code will hopefully be fairly intuitive.

Every class inherits from a WDLBase which means it must have a get_string() method which returns the string representation of the class, calling this on any children it may have.

Types

All types are represented as a WDLType, which has a parse method. It's a little overkill in some cases, but makes managing attributes a bit easier.

parsed_string = wdlgen.WdlType.parse("String")	# WdlType<PrimitiveType<String>>
parsed_op_str = wdlgen.WdlType.parse("String?") # WdlType<PrimtiveType<String>>
parsed_array = wdlgen.WDLType.parse("File[]")	# WdlType<ArrayType<File>>
parsed_ar_oq = wdlgen.WdlType(parse("Int?[]+"))	# WdlType<ArrayType<Int?> (+)>

You can also construct these manually:

parsed_string = WdlType(PrimitiveType("String"))
parsed_op_str = WdlType(PrimtiveType("String", optional=True))
parsed_array = WdlType(ArrayType(WdlType(PrimitiveType("File"))))
parsed_ar_q = WdlType(ArrayType(WdlType(PrimitiveType("Int"), optional=True), requires_multiple=True))

Input / Output

Input: wdlgen.Input(data_type: WdlType, name: str, expression: str = None)

Output: wdlgen.Output(data_type: WdlType, name: str, expression: str = None)

both of which output something like: {WdlType} {name} [= {expression}]

Task

A task is a collection of Inputs, Outputs and a Command that are identified by a name. Inputs and Outputs are as above. Note that you can use functions such as stdout() or other for the expression.

If you don't want to play by these rules, don't include any inputs or outputs and just provide your whole string to the initializer for command.

t = wdlgen.Task("task_name")
t.inputs.append(wdlgen.Input(wdlgen.WdlType.parse("String"), "taskGreeting"))
# command in next section
t.outputs.append(wdlgen.Output(wdlgen.WdlType.parse("File"), "standardOut", "stdout()"))

Command

The command is broken up similar to how CWL breaks its command generation up, by itself it has a base command. Each component has a corresponding input (else use the wdlgen.Task.Command.Argument class), optionality, position, prefix (and whether the value should be separated from prefix; think -o {val} vs outputDir={val}) and potentially a default.

Construct a command like the following:

command = wdlgen.Task.Command("echo")
command.inputs.append(wdlgen.Task.Command.CommandInput("taskGreeting", optional=False, position=None, prefix="-a", separate_value_from_prefix=True, default=None))
command.inputs.append(wdlgen.Task.Command.CommandInput("otherInput", optional=True, position=2, prefix="optional-param=", separate_value_from_prefix=False, default=None))

# t is the task
t.command = command
print(command.get_string())

This will result in the following WDL command:

echo \
  -a ${taskGreeting} \
  ${"optional-param=" + otherInput}

Task output:

The combination of the task and command outputs:

version development

task task_name {
  input {
    String taskGreeting
  }
  command {
    echo \
      -a ${taskGreeting} \
      ${"optional-param=" + otherInput}
  }

  output {
    File standardOut = stdout()
  }
}

Workflow

You should have moderate idea of the structure of WDL as there's no cleverness or abstraction done anywhere. Beware: there's also no checking attributes (to see if your inputMap actually corresponds to inputs).

The structure of a workflow is m

w = wdlgen.Workflow("workflow_name")

w.imports.append(wdlgen.Workflow.WorkflowImport("tool_file", ""))
w.inputs.append(
    wdlgen.Input(
        wdlgen.WdlType.parse("String"), 
        "inputGreeting"
    )
)


inputs_map = {"taskGreeting": "inputGreeting"}
w.calls.append(wdlgen.WorkflowCall("Q.namspaced_task_identifier", "task_alias", inputs_map))
w.outputs.append(wdlgen.Output(wdlgen.WdlType.parse("File"), "standardOut", "task_alias.standardOut")

Which outputs:

version development

import "tools/tool_file.wdl"

workflow workflow_name {
  input { 
    String inputGreeting
  }
  call Q.namspaced_task_identifier as task_alias {
    input:
      taskGreeting=inputGreeting
  }
  output {
    File standardOut = task_alias.standardOut
  }
}

Known limitations

I'm not a fan of the string interpolation generation of WDL that this module does. I think trying to build an Abstract syntax tree and then there should be something that convert that into the DSL that WDL uses.

You could also cause syntax errors in generated WDL by providing illegal characters.

Goals

  • Improve code-level documentation.
  • Increase the testing coverage + quality of unit tests.
  • Better represent the WDL spec.
  • Find an easier distribution / release method - such as PIP.
  • Automate testing and delivery through TravisCI / CircleCI or similar.
  • Validate each value by WDL's language specifications.
  • Add support for structs

Long goals

  • Write a documentation site.
  • Make classes convert into AST and then into DSL.

Issues and Pull Requests

Feel free to log issues and make pull requests. I make no guarantee to the existence or timeliness of replies.

Links: