Allow merge #78

cimendes · 2018-05-15T10:37:04Z

At the moment assemblerflow allows to fork processes but if you want to add another process after the fork, you have to add after each process in the fork. Example:

assemblerflow.py build -t "integrity_coverage fastqc_trimmomatic remove_host (spades card_rgi | metaspades card_rgi | megahit card_rgi)"

It's much more intuitive to instead do:

assemblerflow.py build -t "integrity_coverage fastqc_trimmomatic remove_host (spades | metaspades | megahit) card_rgi "

This requires adding a new operator merge to assemblerflow.

The text was updated successfully, but these errors were encountered:

tiagofilipe12 · 2018-05-17T21:09:35Z

This is not actually a merge, because you don't want to merge the outputs from spades, metaspades and megahit. You want to multiply the card_rgi process as many times as the lanes in the fork.

To merge outputs assemblerflow already have some solutions like compiler channels.

Despite I am familiar with the issue you raise here, I am also skeptical that the second pipeline string is more human readable than the first.

cimendes · 2018-05-18T14:02:57Z

I agreed that it's not actually a merge per se, and that's something we should discuss about implementing.

Adding the behaviour to multiply the processes after a fork to all the lanes would prevent users from having to copy big chunks of processes to each lane. This is specially important if there are lots of processes after or if there are many lanes in the fork.

ODiogoSilva · 2018-05-18T14:25:43Z

As @tiagofilipe12 said, the compiler channels already handle cases of merging. However, I agree with @cimendes in that it is cumbersome to repeat some processes after the fork. I've checked the pipeline parser code, and I think it would be simple to implement the following syntax:

A (B C D E | X - | Z - )
# Is the same as
A (B C D E | X C D E | Z C D E )

The - character would act as a repeater of all processes after the fork.

EDIT: This could get tricky when there lanes with different processes, but the rule could be to repeat the last lane's processes:

A (B C D E | X E | Z - )
# would be the same as
A (B C D E | X E | Z E )

cimendes · 2018-05-24T10:56:23Z

I find @ODiogoSilva proposal a minimal simplification of what is implemented at the moment and it would only be advantageous if you implement a very long chain of processes after the fork.

A (B C D E | X C D E | Z C D E )

 A (B | X | Z ) C D E

With this syntax, I find it very intuitive that the CDE processes are repeated in all the lanes of the fork. I don't find the proposed alternative as intuitive

A (B C D E | X - | Z - )

Plus, as @ODiogoSilva stated, the lanes with different processes aren't as clear with this syntax. in that canse my proposal would be something like

A (B C D | X | Z) E

tiagofilipe12 · 2018-11-30T13:53:43Z

I am keen on implementing @ODiogoSilva syntax, because it doesn't break the current usage (it just extends it) and it doesn't get confounded with merge of processes from different lanes (within a fork).

Merge is in fact a trickier question because you may have 3 lanes but just want to merge two of them. I am still unsure on how to handle that in a simple nomenclature as a pipeline string. But maybe a possible implementation would be something like this:

# this would result in B, C and D outputs being merged by E
A (B | C | D) E
# this would result in B an C outputs being merged by E
A (B | C | D) E[B,C]

The default behaviour would be if no list of processes is given to E, then every output from the different lanes in the fork would be fed to E. On the other hand, if you want to explicitly provide the inputs only from some of the lanes in the fork you would pass a list together with the process name (E in this case) but maybe it needs to add a new character between the process name and the list. However, I would avoid using = since it may be confounded with directives.
Then, there is everything else related with the channels handling after the pipeline string parser...

So breaking into tasks:

allow repeat option within forks as suggested by @ODiogoSilva .
allow lane merging from the pipeline string.
handle merging by the engine after the pipeline string parser and checks.

What do you think?

ODiogoSilva · 2018-11-30T15:33:35Z

Closed in favor of #174

cimendes added enhancement New feature or request help wanted Extra attention is needed labels May 15, 2018

tiagofilipe12 added the discussion label May 17, 2018

ODiogoSilva added this to the 1.3.0 milestone Jun 7, 2018

ODiogoSilva closed this as completed Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow merge #78

Allow merge #78

cimendes commented May 15, 2018

tiagofilipe12 commented May 17, 2018 •

edited

Loading

cimendes commented May 18, 2018

ODiogoSilva commented May 18, 2018 •

edited

Loading

cimendes commented May 24, 2018

tiagofilipe12 commented Nov 30, 2018 •

edited

Loading

ODiogoSilva commented Nov 30, 2018

Allow merge #78

Allow merge #78

Comments

cimendes commented May 15, 2018

tiagofilipe12 commented May 17, 2018 • edited Loading

cimendes commented May 18, 2018

ODiogoSilva commented May 18, 2018 • edited Loading

cimendes commented May 24, 2018

tiagofilipe12 commented Nov 30, 2018 • edited Loading

ODiogoSilva commented Nov 30, 2018

tiagofilipe12 commented May 17, 2018 •

edited

Loading

ODiogoSilva commented May 18, 2018 •

edited

Loading

tiagofilipe12 commented Nov 30, 2018 •

edited

Loading