There is a very low probability that you're in the right place. If you're looking for a python WDL parser see the WDL repo.
This repository is deprecated The intention of this repository was to provide a Python object model on top of parsed WDL. It is out of date but we're leaving it here in case someone wants an example of how to do such a thing. If you'd like to pick up the torch please let us know.
NOTE AGAIN If you're reading below this you're almost certainly in the wrong place!
A Python implementation of a WDL parser and language bindings.
For Scala language bindings, use WDL4S.
PyWDL works with Python 2 or Python 3. Install via setup.py
:
$ python setup.py install
Or via pip:
$ pip install wdl
The main wdl
package provides an interface to turn WDL source code into native Python objects. This means that a workflow {}
block in WDL would become a Workflow
object in Python and a task {}
block becomes a Task
object.
To parse WDL source code into a WdlDocument
object, import the wdl
package and load a WDL string with wdl.loads("wdl code")
or WDL from a file-like object using wdl.load(fp, resource_name)
.
For example:
import wdl
import wdl.values
wdl_code = """
task my_task {
File file
command {
./my_binary --input=${file} > results
}
output {
File results = "results"
}
}
workflow my_wf {
call my_task
}
"""
# Use the language bindings to parse WDL into Python objects
wdl_namespace = wdl.loads(wdl_code)
for workflow in wdl_namespace.workflows:
print('Workflow "{}":'.format(workflow.name))
for call in workflow.calls():
print(' Call: {} (task {})'.format(call.name, call.task.name))
for task in wdl_namespace.tasks:
name = task.name
abstract_command = task.command
def lookup(name):
if name == 'file': return wdl.values.WdlFile('/path/to/file.txt')
instantated_command = task.command.instantiate(lookup)
print('Task "{}":'.format(name))
print(' Abstract Command: {}'.format(abstract_command))
print(' Instantiated Command: {}'.format(instantated_command))
Using the language bindings as shown above is the recommended way to use PyWDL. One can also directly access the parser to parse WDL source code into an abstract syntax tree using the wdl.parser
package:
import wdl.parser
wdl_code = """
task my_task {
File file
command {
./my_binary --input=${file} > results
}
output {
File results = "results"
}
}
workflow my_wf {
call my_task
}
"""
# Parse source code into abstract syntax tree
ast = wdl.parser.parse(wdl_code).ast()
# Print out abstract syntax tree
print(ast.dumps(indent=2))
# Access the first task definition, print out its name
print(ast.attr('definitions')[0].attr('name').source_string)
# Find all 'Task' ASTs
task_asts = wdl.find_asts(ast, 'Task')
for task_ast in task_asts:
print(task_ast.dumps(indent=2))
# Find all 'Workflow' ASTs
workflow_asts = wdl.find_asts(ast, 'Workflow')
for workflow_ast in workflow_asts:
print(workflow_ast.dumps(indent=2))
An AST is the output of the parsing algorithm. It is a tree structure in which the root node is always a Document
AST
The best way to get started working with ASTs is to visualize them by using the wdl parse
subcommand to see the AST as text. For example, consider the following WDL file
example.wdl
task a {
command {./foo_bin}
}
task b {
command {./bar_bin}
}
task c {
command {./baz_bin}
}
workflow w {}
Then, use the command line to parse and output the AST:
$ wdl parse example.wdl
(Document:
imports=[],
definitions=[
(Task:
name=<string:1:6 identifier "YQ==">,
declarations=[],
sections=[
(RawCommand:
parts=[
<string:2:12 cmd_part "Li9mb29fYmlu">
]
)
]
),
(Task:
name=<string:4:6 identifier "Yg==">,
declarations=[],
sections=[
(RawCommand:
parts=[
<string:5:12 cmd_part "Li9iYXJfYmlu">
]
)
]
),
(Task:
name=<string:7:6 identifier "Yw==">,
declarations=[],
sections=[
(RawCommand:
parts=[
<string:8:12 cmd_part "Li9iYXpfYmlu">
]
)
]
),
(Workflow:
name=<string:10:10 identifier "dw==">,
body=[]
)
]
)
Programmatically, if one wanted to traverse this AST to pull out data:
import wdl.parser
import wdl
with open('example.wdl') as fp:
ast = wdl.parser.parse(fp.read()).ast()
task_a = ast.attr('definitions')[0]
task_b = ast.attr('definitions')[1]
task_c = ast.attr('definitions')[2]
for ast in task_a.attr('sections'):
if ast.name == 'RawCommand':
task_a_command = ast
for ast in task_a_command.attr('parts'):
if isinstance(ast, wdl.parser.Terminal):
print('command string: ' + ast.source_string)
else:
print('command parameter: ' + ast.dumps())
The Ast
class is a syntax tree with a name and children nodes.
Attributes:
name
is a string that refers to the type of AST, (e.g.Workflow
,Task
,Document
,RawCommand
)attributes
is a dictionary where the keys are the name of the attribute and the values can be one of three types:Ast
,AstList
,Terminal
.
Methods:
def attr(self, name)
.ast.attr('name')
is the same asast.attributes['name']
.def dumps(self, indent=None, b64_source=True)
- returns a String representation of this AstList. theindent
parameter takes an integer for the indent level. Omitting this value will cause there to be no new-lines in the resulting string.b64_source
will be passed to recursive invocations ofdumps
.
The wdl.parser.Terminal
object represents a literal piece of the original source code. This always shows up as leaf nodes on Ast
objects
Attributes:
source_string
- String segment from the source code.line
- Line number wheresource_string
was in source code.col
- Column number wheresource_string
was in source code.resource
- Name of the location for the source code. Usually a file system path or perhaps URI.id
- Numeric identifier, unique to the top levelAst
. Used mostly internally.str
- String identifier of this terminal. Used mostly internally.
Methods:
def dumps(self, b64_source=True, **kwargs)
- return a String representation of this terminal.b64_source
means that the source code will be base64 encoded because sometimes the source contains newlines or special characters that make it difficult to read when a whole AST is string-ified.
class AstList(list)
represents a sequence of Ast
, AstList
, and Terminal
objects
Methods:
def dumps(self, indent=None, b64_source=True)
- returns a String representation of this AstList. theindent
parameter takes an integer for the indent level. Omitting this value will cause there to be no new-lines in the resulting string.b64_source
will be passed to recursive invocations ofdumps
.
Parsing a WDL file will result in unevaluated expressions. For example:
workflow test {
Int a = (1 + 2) * 3
call my_task {
input: var=a*2, var2="file"+".txt"
}
}
This workflow definition has three expressions in it: (1 + 2) * 3
, a*2
, and "file"+".txt"
.
Expressions are stored in wdl.binding.Expression
object. The AST for the expression is stored in this object.
Expressions can be evaluated with the eval()
method on the Expression
class.
import wdl
# Manually parse expression into wdl.binding.Expression
expression = wdl.parse_expr("(1 + 2) * 3")
# Evaluate the expression.
# Returns a WdlValue, specifically a WdlIntegerValue(9)
evaluated = expression.eval()
# Get the Python value
print(evaluated.value)
Sometimes expressions contain references to variables or functions. In order for these to be resolved, one must pass a lookup function and an implementation of the functions that you want to support:
import wdl
from wdl.values import WdlInteger, WdlUndefined
def test_lookup(identifier):
if identifier == 'var':
return WdlInteger(4)
else:
return WdlUndefined
def test_functions():
def add_one(parameters):
# assume at least one parameter exists, for simplicity
return WdlInteger(parameters[0].value + 1)
def get_function(name):
if name == 'add_one': return add_one
else: raise EvalException("Function {} not defined".format(name))
return get_function
# WdlInteger(12)
print(wdl.parse_expr("var * 3").eval(test_lookup))
# WdlInteger(8)
print(wdl.parse_expr("var + var").eval(test_lookup))
# WdlInteger(9)
print(wdl.parse_expr("add_one(var + var)").eval(test_lookup, test_functions()))
$ wdl --help
usage: wdl [-h] [--version] [--debug] [--no-color] {run,parse} ...
Workflow Description Language (WDL)
positional arguments:
{runarse} WDL Actions
run Run you a WDL
parse Parse a WDL file, print parse tree
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--debug Open the floodgates
--no-color Don't colorize output
Parse a WDL file:
$ wdl parse examples/ex2.wdl
(Document:
definitions=[
(Task:
name=<ex2.wdl:1:6 identifier "c2NhdHRlcl90YXNr">,
declarations=[],
sections=[
(RawCommand:
...
A wdl file can be converted to the dot format in order to be able to visualize the pipeline as a graph. For example:
$ wdl2dot -i hello.wdl -o hello.dot
Then use interactive renderer xdot or save to an image:
$ xdot hello.dot
$ dot -Tsvg hello.dot -o hello.svg