Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pipeline parsing #74

tiagofilipe12 opened this issue Aug 23, 2017 · 2 comments

Refactor pipeline parsing #74

tiagofilipe12 opened this issue Aug 23, 2017 · 2 comments


Copy link

@tiagofilipe12 tiagofilipe12 commented Aug 23, 2017

Right now bionode-watermill resolves a set of promises,

const pipeline = join(task1, task2)

but it would be nice to have a object-like structure in pipeline definition

const pipeline = { join: { task1, task2 } }

And then bionode-watermill would parse this object to first get the pipeline before actually execute it and then execute it.
This way it would be possible to predict (before running) the pipeline shape, inputs and outputs and then after running the pipeline confirm that everything was properly set and executed as expected. Also, this can greatly increase pipeline visualization in the sense that we can improve visualization to render different colors for what was run, is running and ended.

Copy link

@thejmazz thejmazz commented Aug 24, 2017

Two points:

  1. We will need to be very precise about use of objects and arrays, your example could easily be:
const pipeline = { join: [ task1, task2 ] }
  1. Last time I thought about this, I went and tried to convert an existing, somewhat complex pipeline into an object. It ended up being kinda verbose. So I'm partial to keeping the function style (since it's nice and terse, almost like a DSL, user needs to import specific functions which can each have params validated etc, as opposed to manual keys in objects) but have it build an object representation internally - the "verbose" representation. However maybe this verbose object would be more appealing in YAML (but converting strings into JS functions is sketchy - scripts is fine).

PS: I think also we need not necessarily have an object representation to have a pre computed DAG - it does look like those two features go hand-in-hand, but it is also probably possible to use the orchestrator functions and construct a DAG. If it is possible, I would prefer to implement new features (pre-computed DAG) without overhauls of other things if possible.

Copy link
Member Author

@tiagofilipe12 tiagofilipe12 commented Aug 24, 2017

Just one more thing: after this refactor where DAG can be obtained before running the pipeline, fork also can be refactored to work without those bunch of rules in lib/orchestrators/join.js, since we will have tasks available before running the pipeline. The issue right now is that we can only fetch tasks when we run a given orchestrator, but everything else outside the scope of that orchestrator is unavailable to handle and fork needs to handle downstream tasks (tasks that run after fork), by multiplying them as much as the number of branches that fork has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Watermill board
DAG related
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants