Description
A Task is the basic building block of pipelines in Waterwheel. It needs to be able to handle streams, synchronous, and asynchronous functions. There needs to be an standard way to declare a Task is over. As well, Tasks need to be joined into new Tasks, and parallelized into new Tasks. You should be able to join two Tasks that stream one into another without additional config.
Tasks can be:
- BlockingTask:
- input: anything - Array or Object of primitive, File, Stream
- output: will not provide a Readable Stream, instead ending with or without a return value
- *StreamingTask:
- input: anything
- output: Readable Stream
- AsyncTask:
- input: anything
- output: Promise preferred, or callback
A File is just a class that hold a value
and is used (at the moment) in the task action creator for an input instanceof File
check to then determine if the file pattern (e.g. **/*.sra
) should be resolved with globby. This is so that the task stream/promise creator can use a resolved input variable with the actual filename.
We will refer to the joining:
join(task1, task2, taskn) -> Task
and parallelization:parallel(task1, [task2, task3], [task4, task5] -> Task
and forking:- consider this pipeline:
Theseqtk merge
task can be forked and provided to two other tasks:filter kmc
andfilter khmer
. This can be done by defining a filter task with an array of task action creators. Then, since the kmc and khmer variants produce the same output, just via a different tool, the pipeline will automatically duplicate thebwa mem | samtools view | samtools sort; samtools index; samtools mpileup | bcftools call
section of the pipeline for each of them.
as the orchestration of Tasks.
One way to enable the orchestration of Tasks is with callbacks. The task creator takes an object defining input
and output
, and then a function describing the task action creator. This returns a function of next. The join
method and parallel
method programmatically assign a function to the next to achieve their goals.
Another way is with Promises. Perhaps more elegant than callbacks, can reject
when things go bad.
Another way to do this can be through events. The Task function can thrown an event whence the return Object from the task creator has completed. This has the advantage that it helps define a standard way to declare tasks are over. However, perhaps it can become messy listening for the same event when doing joins and parallels. This can be superseded by emitting a taskFinish
event with some data that perhaps has a task uuid.