Skip to content

Editing .dax templates

whelks-chance edited this page Sep 27, 2012 · 1 revision

1. To design a DAX (Pegasus readable XML file) a workflow is created within Triana, using the DaxJob and DaxFile Units. These Units are found in the org/trianacode/pegasus/dax folder in the side-pane. Clicking and dragging two fileUnits and one JobUnit onto a fresh workflow window should produce something similar to:

These can then be connected together as in a regular workflow. Pegasus expects the pattern
file → job → file
so we will do this.

As can be assumed, this workflow demonstrates the contents of a single file taken as input to some processing job, which in turn produces a single file as output.

2. These units can be described further by the properties set within their properties panel. Double clicking on a unit brings up the option to change the name of the unit. In the case of a fileUnit, this name defines the filename of the data file. In the case of a job, the name is simply a way to describe what the job is doing. We will change the units names now:

3. This is the simplest type of workflow, and we can now create a pegasus-readable DAX xml file from it. To do this, we attach a DaxCreatorUnit to the end of the workflow:

The output location for the xml file can be set in the units preference panel. Running the workflow produces the dax file, which contains the following:


<?xml version="1.0" encoding="UTF-8"?>

<!-- generated: 2010-09-29T11:35:38+01:00 -->

<!-- generated by: ian [??] -->

<adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-2.1.xsd" version="2.1" count="1" index="0" name="test" jobCount="1" fileCount="0" childCount="0">

<!-- part 1: list of all referenced files (may be empty) -->

<!-- part 2: definition of all jobs (at least one) -->

  <job id="ID0000001" name="Do_something">

    <argument></argument>

    <uses file="Input_file" link="input" register="true" transfer="true" type="data"/>

    <uses file="Output_file" link="output" register="true" transfer="true" type="data"/>

  </job>

<!-- part 3: list of control-flow dependencies (may be empty) -->

</adag>

4. The ability to render a DAX xml file in Triana can now be displayed. Clicking File → Import… and selecting the file we just created will render the workflow in a new window.

5. To extend this example, lets assume we have a processing job which takes more than one input file, and that there is a logical pattern to the filenames. Eg input1, input2, input3… These inputs have some connection to each other, and we would like to display this in our workflow, while avoiding the need for large numbers of very similar fileUnits being displayed on the screen. This would be difficult to understand, and reduces the usability of the workflow.

Instead, we introduce the idea of a collection of files, which denotes the existence of multiple files which have some connection to each other, and follow a predictable naming pattern.

We will create a file collection now, using the first fileUnit in our workflow. Selecting “collection” in the fileUnit’s properties panel allows us to define how many files this unit should represent:

Here we have selected the “Collection” checkbox, and moved the slider slightly to define 3 files. When “Apply” is clicked, the units symbol changes to show the change:

The shadow shows this change to “Input_file”. Running the workflow again will produce an updated DAX file (overwriting the previous one, unless the output location is changed). If this DAX xml is then imported as before, we get:

The DaxCreatorUnit has created multiple “Input_file”s and has added an incrementation to the filename. The jobUnit has been updated to accept all these inputs.

6. JobUnits can also be made into collections. This represents replication of a job, for example if there are numerous inputs and productivity can be increased by sharing these files between many jobs. The combination of file collections and job collections can produce very complected dax xml outputs, while retaining a simple and readable workflow within Triana.

Here all the units are now collections with varying numbers set in their properties panels:

This produces the output:

Clone this wiki locally