New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Graph (DAG) Visualization #50

Closed
thejmazz opened this Issue May 29, 2017 · 4 comments

Comments

3 participants
@thejmazz
Member

thejmazz commented May 29, 2017

It is useful to have a visual representation of the Directed Acyclic Graph (DAG) that is produced during the execution of a pipeline.

In the graph,

  • each node is a task
  • a node may use outputs from n parent nodes as input(s). each input value will be resolved from one node. this information is not currently stored. it could be done as an alternative edge type (or perhaps use discrete edge weightings for different edge types)
  • in these visualizations, think of three vertical | as the child node of two parent nodes. TODO actual graph diagrams
  • join(A, B) creates the DAG a ---> b
  • junction(A, B) creates the DAG
a ---|
     |--->
b ---|
  • join(junction(A, B), C) creates the DAG
a ---|
     |---> c
b ---|
  • join(A, fork(X, Y), C) creates the DAG
      |---> x ---> c'
a --->|
      |---> y ---> c''

The redux reducer for the DAG is here. It uses graph.js.

The graph exists in the store under the path collection (i.e. a valid selector would be (state) => state.collection.

A function jsonifyGraph is also exported. This is because the graph object from graph.js is not serializable. This creates a serializable JSON representation of the graph.

See here how the collection (aka DAG) is logged out during task resolution for debug.

A first implementation of this could be to write the JSON graph to disk during the pipeline execution, overwriting the previous file whenever a ADD_OUTPUT or ADD_JUNCTION_VERTEX actions have been dispatched (i.e. whenever the state of the DAG changes). This way if a task fails, at least we have the last best graph stored.

Then it is a matter of parsing that JSON into a visualization using something like d3.

Suggestions to improve the way the graph is handled within watermill are welcome. Perhaps there is a better serializable format to use (e.g. graphml format).

BONUS

  • do it in a realtime with Electron/Browser app listening to changes in the redux store. new nodes should be added as the tasks run.

@thejmazz thejmazz added the mozsprint label May 29, 2017

@bmpvieira bmpvieira referenced this issue May 30, 2017

Closed

Mozilla's Global Sprint (June 1st and 2nd 2017) #44

7 of 17 tasks complete

@bmpvieira bmpvieira added this to Backlog in Bionode Project Board May 30, 2017

@bmpvieira bmpvieira added the feature label May 30, 2017

@thejmazz

This comment has been minimized.

Member

thejmazz commented Jun 22, 2017

See the current JSON representation here. Note the duplicated logging of nodes if they are children of other nodes. This JSON graph structure is probably not ideal, or at least, should be created from another structure of { nodes: [], edges: []}.

It could be useful to use a more standard graph format. The most ideal would be:

  • serializable (so it can belong to redux state)
  • can be loaded into a nice graph api

Some links:

Even OBO could work: this could have nodes for a file, which might have an edge "created_by" and "used_by" to different task nodes

@tiagofilipe12 tiagofilipe12 self-assigned this Jun 30, 2017

@tiagofilipe12 tiagofilipe12 moved this from Backlog to In Progress in Bionode Project Board Jun 30, 2017

@tiagofilipe12

This comment has been minimized.

Member

tiagofilipe12 commented Jul 9, 2017

We now have this kind of structure and a simple graph visualization with d3, available at localhost:8084 when watermill is running. Though, it still lacks operationString.

tiagofilipe12 added a commit that referenced this issue Jul 11, 2017

@tiagofilipe12

This comment has been minimized.

Member

tiagofilipe12 commented Jul 20, 2017

Shall we close this? Of course graph visualization can be further improved but for now we have a simple DAG visualization tool.

@thejmazz

This comment has been minimized.

Member

thejmazz commented Jul 20, 2017

Yes lets close for now. We can always make an issue for more specific improvements.

@thejmazz thejmazz closed this Jul 20, 2017

@bmpvieira bmpvieira moved this from In Progress to Done in Bionode Project Board Aug 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment