Skip to content

Task Architecture #1

Closed
Closed
@thejmazz

Description

@thejmazz

A Task is the basic building block of pipelines in Waterwheel. It needs to be able to handle streams, synchronous, and asynchronous functions. There needs to be an standard way to declare a Task is over. As well, Tasks need to be joined into new Tasks, and parallelized into new Tasks. You should be able to join two Tasks that stream one into another without additional config.

Tasks can be:

  • BlockingTask:
    • input: anything - Array or Object of primitive, File, Stream
    • output: will not provide a Readable Stream, instead ending with or without a return value
  • *StreamingTask:
    • input: anything
    • output: Readable Stream
  • AsyncTask:
    • input: anything
    • output: Promise preferred, or callback

A File is just a class that hold a value and is used (at the moment) in the task action creator for an input instanceof File check to then determine if the file pattern (e.g. **/*.sra) should be resolved with globby. This is so that the task stream/promise creator can use a resolved input variable with the actual filename.

We will refer to the joining:

  • join(task1, task2, taskn) -> Task
    and parallelization:
  • parallel(task1, [task2, task3], [task4, task5] -> Task
    and forking:
  • consider this pipeline:
    new doc 6_1
    The seqtk merge task can be forked and provided to two other tasks: filter kmc and filter khmer. This can be done by defining a filter task with an array of task action creators. Then, since the kmc and khmer variants produce the same output, just via a different tool, the pipeline will automatically duplicate the bwa mem | samtools view | samtools sort; samtools index; samtools mpileup | bcftools call section of the pipeline for each of them.

as the orchestration of Tasks.

One way to enable the orchestration of Tasks is with callbacks. The task creator takes an object defining input and output, and then a function describing the task action creator. This returns a function of next. The join method and parallel method programmatically assign a function to the next to achieve their goals.

Another way is with Promises. Perhaps more elegant than callbacks, can reject when things go bad.

Another way to do this can be through events. The Task function can thrown an event whence the return Object from the task creator has completed. This has the advantage that it helps define a standard way to declare tasks are over. However, perhaps it can become messy listening for the same event when doing joins and parallels. This can be superseded by emitting a taskFinish event with some data that perhaps has a task uuid.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions