Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow-level function calls used in multiple stages pushed into first stage #162

Closed
jtratner opened this issue Jul 19, 2018 · 7 comments
Closed

Comments

@jtratner
Copy link
Contributor

Could the common stage handle running string manipulation functions on workflow inputs? That way, the parallelism would be preserved.

For example, this workflow will be compiled such that both stageB and stageC are dependent upon stageA for run_name.

workflow myworkflow {
    File seq_tarball
    String run_name = basename(seq_tarball, ".tar.gz")

    call stageA { input: seq_tarball=seq_tarball, run_name=run_name }
    call stageB { input: seq_tarball=seq_tarball, run_name=run_name }
    call stageC { input: seq_tarball=seq_tarball, run_name=run_name }
}

task stageA {
    File seq_tarball
    String run_name
    command { echo ${run_name}; sleep 300 }
}
task stageB {
    File seq_tarball
    String run_name
    command { echo ${run_name}; sleep 300 }
}
task stageC {
    File seq_tarball
    String run_name
    command { echo ${run_name}; sleep 300 }
}

Gets compiled such that the run_name calculation is pushed into running "stageA", and then has stageB and stageC depend upon that (so if you had a super long-running task, the later stages would all have to wait).

If you add a compile step that doesn't do anything, it's all good.

workflow myworkflow {
    File seq_tarball
    String run_name = basename(seq_tarball, ".tar.gz")

    call compile_workflow_vars { input: run_name=run_name }
    call stageA { input: seq_tarball=seq_tarball, run_name=run_name }
    call stageB { input: seq_tarball=seq_tarball, run_name=run_name }
    call stageC { input: seq_tarball=seq_tarball, run_name=run_name }
}

task compile_workflow_vars {
    String run_name
    command { echo ${run_name}}
}

task stageA {
    File seq_tarball
    String run_name
    command { echo ${run_name}; sleep 300 }
}
task stageB {
    File seq_tarball
    String run_name
    command { echo ${run_name}; sleep 300 }
}
task stageC {
    File seq_tarball
    String run_name
    command { echo ${run_name}; sleep 300 }
}
@jtratner
Copy link
Contributor Author

Compiled version of initial workflow:

Properties          dxWDL_checksum=15B316145AFAA216823FBF2F4D71985F
Tags                dxWDL
Edit Version        0
Title               myworkflow
Summary
Output Folder       -
Input Spec          stage-2:
                    [stage-2.seq_tarball (file, default={"$dnanexus_link": {"outputField":
                  "seq_tarball", "stage": "stage-0"}})]
                    [stage-2.run_name (string, default={"$dnanexus_link": {"outputField":
                  "run_name", "stage": "stage-1"}})]
                stage-3:
                    [stage-3.seq_tarball (file, default={"$dnanexus_link": {"outputField":
                  "seq_tarball", "stage": "stage-0"}})]
                    [stage-3.run_name (string, default={"$dnanexus_link": {"outputField":
                  "run_name", "stage": "stage-1"}})]
                stage-0:
                    stage-0.seq_tarball (file)
                stage-1:
                    [stage-1.seq_tarball (file, default={"$dnanexus_link": {"outputField":
                  "seq_tarball", "stage": "stage-0"}})]
Output Spec         stage-0:
                    stage-0.seq_tarball (file)
                stage-1:
                    stage-1.run_name (string)
Stage 0             common (stage-0)
  Executable        applet-FJ8G3900gzY0Qg1V6Y002f1p
Stage 1             stageA (stage-1)
  Executable        applet-FJ8G3980gzY3QjG66q1bg6VF
Stage 2             stageB (stage-2)
  Executable        applet-FJ8G3800gzY48FQK8vZk3BzF
Stage 3             stageC (stage-3)
  Executable        applet-FJ8G38Q0gzY3QjG66q1bg6V9
Stage 4             outputs (stage-last)
  Executable        applet-FJ8G38j0gzYGvX63JqvZ1pF5

@orodeh
Copy link
Contributor

orodeh commented Jul 20, 2018

@jtratner I think this will actually work the way you want to, in release 0.71. Can you check it out please?

@orodeh
Copy link
Contributor

orodeh commented Jul 20, 2018

Another improvement was made in release 0.72. Any top level WDL call that has no subexpressions is compiled directly into a stage.

@orodeh orodeh closed this as completed Jul 20, 2018
@jtratner
Copy link
Contributor Author

is it possible that the fix here got reverted at some point? In 0.78.1 I had to add a noop tasks to prevent this from occurring.

Specifically this is case where:

call mytask {}
Array[File] whatever = flatten([mytask.out, anothertask.out]) 
call task2{input: whatever}
call task3{input: whatever}

then task 3 has to wait on task2 for completion.

@orodeh
Copy link
Contributor

orodeh commented Jan 16, 2019

Are you compiling it in locked, or unlocked mode?

@orodeh orodeh reopened this Jan 17, 2019
@jtratner
Copy link
Contributor Author

Sorry I didn't respond - unlocked mode.

@orodeh
Copy link
Contributor

orodeh commented Apr 25, 2019

Having just rewritten the compiler, I think your workaround is the way to solve it. Yes, it is hacky, but it is simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants