Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays are not scattered when passed to subworkflows #20

Open
biokcb opened this issue Jul 19, 2018 · 6 comments
Open

Arrays are not scattered when passed to subworkflows #20

biokcb opened this issue Jul 19, 2018 · 6 comments
Labels
enhancement New feature or request

Comments

@biokcb
Copy link

biokcb commented Jul 19, 2018

Hi,

We have a simple example workflow that seems to be passing array inputs without scattering them to lower level scripts

top_workflow.cwl calls -> subworkflow.cwl calls -> echocat.cwl calls -> echocat.sh which takes 3 inputs (string, file, file).

subworkflow.cwl just has a single step which takes a string input and a File[] input and passes it to the command line tool. This works fine with CWLEXEC. When I use top_workflow.cwl to scatter over an array of strings or an array of arrays of files, they do not get scattered, but instead passed directly to the command line tool, where it fails because the shell script cannot use it this way. The string array as a single string and the File array of arrays as a single array. Attached is the example and in the output.txt file at line 646 the command is built incorrectly.

SubworkflowArrayScatterError.tar.gz

@skeeey
Copy link
Collaborator

skeeey commented Jul 25, 2018

@biokcb, Currently, cwlexec does not support to scatter a step on subflow level. This because cwlexec will submit all of jobs in a flow to LSF at the beginning, this will make the jobs to be queued better, so this means cwlexec will expend all of jobs in a flow. If there is a scattered subflow, the problem will be a bit complex, e.g. a subflow depends some other steps, we must wait to other steps are done then expend it, and, there is always a workaround can bypass the scattered subflow, so we finally decide to put this as a low priority, I think we will support this in future.

For your case, you can scatter the echocat.cwl in subworkflow.cwl
SubworkflowArrayScatterWorkaround.zip

@biokcb
Copy link
Author

biokcb commented Jul 25, 2018

@skeeey Thanks for the update! I can definitely implement the workaround for now, but the example I gave was a more minimal one-step sub workflow that reproduced the error. For some of my workflows there are multiple steps that I'd like to be grouped into a sub workflow so that samples can proceed to each step independently. If I scatter per command line tool step, each step expects an array and must wait until all samples are processed in the previous step. If other samples don't need to wait on one particularly time-intensive sample, then our overall time spent processing samples can be reduced. I believe this will be a useful feature for us, so if you are able to support it in the future that would be great. Thanks!

@drjrm3
Copy link

drjrm3 commented Jul 25, 2018

@skeeey Can you explain this workaround a bit more? I don't quite see that this workaround helps our situation, but I want to understand what you mean by this first.

there is always a workaround can bypass the scattered subflow, so we finally decide to put this as a low priority, I think we will support this in future.

@skeeey
Copy link
Collaborator

skeeey commented Jul 26, 2018

@drjrm3 The workaround is like @biokcb 's way, we can scatter every step for a subflow instead of scatter the whole subflow, indeed, it has the defect as @biokcb said. Currently, we focus to implement the ExpressTool, I think after it is finished, we can solve this problem

@skeeey skeeey added enhancement New feature or request and removed known issue labels Sep 30, 2018
@skeeey
Copy link
Collaborator

skeeey commented Oct 8, 2018

Also need to test #33

@nick018905
Copy link

nick018905 commented Jan 13, 2022

Hi,
@skeeey cwlexec is a very convenient CWL engine to dispatch jobs to IBM LSF. But that's too bad without scatter subworkflow. Is there any plan to support this? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants