Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): add renku workflow visualize #2372

Merged
merged 14 commits into from Oct 12, 2021
Merged

Conversation

Panaetius
Copy link
Member

@Panaetius Panaetius commented Sep 27, 2021

adds new command renku workflow visualize with options:
: limit activity graph to activity chains ending in
--from : limit activity graph to activity chains starting at (allows multiple)
--columns: Columns to show in graph, valid columns are command, id, date
--hide-files: only show activities, not inputs/outputs
--ascii: don't use unicode box characters for display, only ascii characters
--no-color: don't colorize output
--no-pager: don't use less pager, print to stdout directly
--interactive: Enter interactive mode, use arrow keys to navigate, press enter to show details of highlighted activity

For help, see renku workflow visualize --help.
Example usage

$ renku run echo 'A' > input
$ renku run cp input output
$ renku workflow visualize output

closes #2131

@Panaetius Panaetius force-pushed the 2131-workflow-visualize branch 5 times, most recently from f7abb2c to b1e4ad9 Compare September 28, 2021 17:04
@rokroskar
Copy link
Member

As much as I hate to admit it, this is pretty cool 😆

A couple of things I noticed:

  • make -c command the default (i.e. don't show the id)
  • the colors change between each set of nodes - ideally the same color would be used for an entire step
  • maybe --no-pager should be the default... or better yet, we should detect when the output will exceed the size of the terminal and use a pager.
  • would be nice to visualize workflows by name not just by path of output files

@Panaetius Panaetius force-pushed the 2131-workflow-visualize branch 2 times, most recently from 331f75b to c8755a3 Compare September 29, 2021 07:10
@Panaetius
Copy link
Member Author

Panaetius commented Sep 29, 2021

make -c command the default (i.e. don't show the id)

👍

* the colors change between each set of nodes - ideally the same color would be used for an entire step

What do you mean they change between each set of nodes? Each edge gets a unique color so there's no confusion if there's overlapping edges. Or you mean the colors change between executions of the visualize command? That I could fix.

* maybe `--no-pager` should be the default... or better yet, we should detect when the output will exceed the size of the terminal and use a pager.

That's a good idea.

* would be nice to visualize workflows by name not just by path of output files

This command visualizes activities, which don't have names. The only place where visualizing a Plan (which has a name) would make sense is for CompositePlans, but those don't have Activities.

@Panaetius Panaetius force-pushed the 2131-workflow-visualize branch 2 times, most recently from 73cd33a to 8df41b9 Compare September 29, 2021 07:44
@rokroskar
Copy link
Member

rokroskar commented Sep 29, 2021

Thanks @Panaetius!

  • colors changing: let me try to clarify - if I run renku workflow run wc hello.txt > hello.wc the step includes the input file hello.txt, the output file hello.wc and the execution node of the command. Right now, the edge hello.txt --> execution node is a different color than the edge execution node --> hello.wc. I think they should be the same color.
  • visualizing plans: I understand that this is currently visualizing activities, but if I can do renku run --name step1 I would expect to be able to do renku workflow visualize step1. Similarly (and perhaps especially) if I do renku workflow compose my-workflow step1 step2 I definitely want to do renku workflow visualize my-workflow.

One final point - this seems like a fantastic starting point to have a less obtuse way to create pngs as well. We should probably make a follow-up issue to replace the old rdf2dot functionality? Or do you want to use some other library?

@Panaetius
Copy link
Member Author

Yes, I originally wanted to add dot output to this PR but then decided a follow-up issue is better, just haven't created it yet. I added one now: #2376

For colors, well, it's difficult, since the File nodes and Activity nodes are just independent nodes after layout/when coloring is done. And the edges are independent of the nodes at this point (they just have the correct coordinates so printing them works out). It also makes assigning colors more difficult, as currently I can just check if an edge overlaps and has the same color, then assign a different color, but with nodes I would have to check if any edges of a node overlap with any edges of another node and change the color of the edges for the node, which could cause a conflict with some other node and edge that didn't have a conflict before.
So I'd say it's at least half a day of extra work, if not more.

visualizing plans: This uses mostly the same logic as rerun, it just gets activity chains between paths and at no point are Plans involved in the calculation of the graph (other then for getting the executed command for display purposes).
We could get Plans, map them to activities and then layout that. For composite plans we could get the contained basic Plans and then do the same, but this might produce unexpected results if the composite plan was never executed (the graph would not respect the linking rules set on the composite). Also, just passing the name instead of a path would probably not work, as there could be ambiguities (If I have a file named run1 and a plan named run1). So the question in case of an unexecuted Composite plan is, what do you want to see? The layout as enforced by the Composite or the layout as produced by linking activities associated with plans contained in the Composite? echo 'test' > A and cp B C plans, linked in a composite by joining A->B, would show two disconnected nodes if the Composite was never executed. Otherwise we would suddenly have different semantics, not "Showing a graph of what was executed" but "Showing a graph of what would happen if this was executed"

@rokroskar
Copy link
Member

rokroskar commented Sep 29, 2021

(edited)

re: colors - it would be fine if we do it as a follow-up, but I think on bigger visualizations showing all edges of a step in the same color will make it much easier to parse what's going on.

re: plans vs activities

renku workflow
Usage: renku workflow [OPTIONS] COMMAND [ARGS]...

  Workflow commands.

Options:
  -h, --help  Show this message and exit.

Commands:
  compose    Create a composite workflow consisting of multiple steps.
  edit       Edit workflow details.
  execute    Execute a given workflow.
  export     Export workflow.
  inputs     Show all inputs used by workflows.
  ls         List or manage workflows with subcommands.
  outputs    Show all outputs generated by workflows.
  remove     Remove a workflow named <name>.
  show       Show details for workflow <name_or_id>.
  visualize  Graph visualization of a workflow pipeline.

afaict none of the commands except for visualize here deal with activities rather than plans. So if visualize will show provenance then we should probably not make that the default.

As for your question about what I would expect in the composite plan - I think in your example it's clear that the user would want to see the two steps linked, i.e. flow from A --> B irrespective if this had been executed or not (indeed, the workflow commands are for assembling/dreaming up workflows)

@rokroskar
Copy link
Member

maybe it's just a matter of the current visualize implementation being in the wrong place? should it be under renku graph?

@Panaetius
Copy link
Member Author

Panaetius commented Sep 29, 2021

afaict none of the commands except for visualize here deal with activities rather than plans

visualize, inputs, outputs deal with activities.

But maybe you're right and it's in the wrong place. We also have renku log that (at least for now) also only deals in activities but is somewhere else entirely. The reason it's all under workflow is that users, especially beginners, might not know the distinction between activities and plans, and so having something like renku workflow ... and renku activity ... might be confusing. Also renku graph is intended for knowledge graph (i.e. JSON-LD) stuff.

I agree the current way of organizing the commands isn't the best, but I'm not sure what the best way would be in terms of the trade-off between complexity and being clear.

And follow-up for the colors: #2377

@Panaetius Panaetius force-pushed the 2131-workflow-visualize branch 2 times, most recently from e409d07 to 32cec04 Compare September 29, 2021 10:10
@Panaetius Panaetius force-pushed the 2131-workflow-visualize branch 7 times, most recently from 2b2a16a to e2b57d6 Compare September 29, 2021 14:30
@Panaetius Panaetius marked this pull request as ready for review September 29, 2021 14:30
@Panaetius
Copy link
Member Author

From my point of view, this is working pretty well now, modulo #2380. A couple more questions:

* One little nit-picky thing - usually `-h` and `--help` both do the same thing and I often use the former when wanting a quick look at the options. In this command `-h` stands for something else - could we come up with a different naming there?

* Should calling `renku workflow visualize` without an argument show the help message instead of the workflows for all the outputs? Otherwise we should include in the doc string a brief description of what the default behavior is.

* Can we add an option to specify a revision range? I'd like to see the workflow that produced a previous version of a file

* Maybe the command description could be more specific: "Visualization of workflows that produced outputs at the specified paths. (default: show workflows of all current outputs)"

@rokroskar all done. It now shows help by default.

@rokroskar
Copy link
Member

Great thanks @Panaetius - I think -x/--exclude-files is also much better.

@rokroskar
Copy link
Member

I've tried this out now also with the --revision flag and it seems to work very well, thanks for adding that! Would be great if @gavin-k-lee or @TaoSunVoyage might have some time to give it a spin before we merge and if @mohammad-sdsc or @vigsterkr can review the code. This is a really great new addition!

@gavin-k-lee
Copy link

gavin-k-lee commented Oct 5, 2021

Hi! This is really cool! I ran into some errors but I think they're due to space issues on my screen (however my monitor is quite big, 26in).

I'm running renku workflow visualize on this project https://renkulab.io/gitlab/lee.gavin.k/advanced-tutorial on my local laptop.

I was just running random things so if the things I ran weren't expected, ignore them.

(base) gavin@gavin-XPS-13-9310:~/Repositories/advanced-tutorial$ renku workflow visualize --no-color -x -i
Ahhhhhhhh! You have found a bug. 🐞

1. Open an issue by typing "open";
2. Print human-readable information by typing "print";
3. See the full traceback without submitting details (default: "ignore").

Please select an action by typing its name (open, print, ignore) [ignore]: print
## Describe the bug
A clear and concise description.

## Details
*Please verify and redact the details.*

**Renku version:** 1.0.0rc2.dev10+gad59f780
**OS:** Linux (#47-Ubuntu SMP Wed Aug 18 10:41:03 UTC 2021)
**Python:** 3.8.5

### Traceback

Traceback (most recent call last):
File "[...]/renku/cli/exception_handler.py", line 123, in main
result = super().main(*args, **kwargs)
File "[...]/renku/cli/exception_handler.py", line 89, in main
return super().main(*args, **kwargs)
File "[...]/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "[...]/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "[...]/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "[...]/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "[...]/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "[...]/renku/cli/workflow.py", line 883, in visualize
viewer.run()
File "[...]/renku/cli/utils/curses.py", line 322, in run
curses.wrapper(self._main)
File "[...]/curses/init.py", line 105, in wrapper
return func(stdscr, *args, **kwds)
File "[...]/renku/cli/utils/curses.py", line 317, in _main
self._init_curses(screen)
File "[...]/renku/cli/utils/curses.py", line 87, in _init_curses
self._addstr_with_color_codes(self.content_pad, i, 0, l)
File "[...]/renku/cli/utils/curses.py", line 132, in _addstr_with_color_codes
window.addstr(y, x, snippet, color)
_curses.error: addwstr() returned ERR


## Additional context
Add any other context about the problem.

I got the same error with all the following commands:

renku workflow visualize -i outputs/report.txt
renku workflow visualize -i --from outputs/healthcare-dataset-stroke-data_females.txt

It might have to do with mixing -x -i flags? I'm not 100% sure.

vigsterkr
vigsterkr previously approved these changes Oct 6, 2021
Copy link
Contributor

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀
it felt like reviewing an old DOS game rather than renku code 🙃

@vigsterkr
Copy link
Contributor

@gavin-k-lee what was your terminal size....?

renku/cli/utils/curses.py Show resolved Hide resolved
renku/cli/utils/curses.py Show resolved Hide resolved
@gavin-k-lee
Copy link

FWIW:

(base) gavin@gavin-XPS-13-9310:~/Repositories/adv-tutorial$ stty size
55 203

@Panaetius
Copy link
Member Author

Should be fixed now. @gavin-k-lee

vigsterkr
vigsterkr previously approved these changes Oct 8, 2021
gavin-k-lee
gavin-k-lee previously approved these changes Oct 8, 2021
Copy link

@gavin-k-lee gavin-k-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The aforementioned bugs seem to be resolved. I especially like the 'full command' info in the interactive mode because that's something the commit messages when doing renku commands have necessarily needed to be cut short.

@Panaetius Panaetius dismissed stale reviews from gavin-k-lee and vigsterkr via a9832ed October 8, 2021 07:59
@Panaetius
Copy link
Member Author

Looks great! The aforementioned bugs seem to be resolved. I especially like the 'full command' info in the interactive mode because that's something the commit messages when doing renku commands have necessarily needed to be cut short.

It'll still be cut off if the command doesn't fit in the popup window. We might want to add scrolling to the popup as well but the UX around that isn't clear to me. I think for now it's good enough

@Panaetius Panaetius enabled auto-merge (squash) October 8, 2021 16:33
Copy link
Contributor

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Panaetius Panaetius merged commit 3a2c35d into master Oct 12, 2021
@Panaetius Panaetius deleted the 2131-workflow-visualize branch October 12, 2021 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add renku workflow visualize command
4 participants