New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add colors to represent high level layer types #7974
Conversation
Can one of the admins verify this patch? |
Discussed in the meeting today, we want to:
|
For that second part, why should someone care / what does it mean:
|
UpdateColor Schemelayer_colors = {
"DataFrameIOLayer": "purple",
"ShuffleLayer": "rose",
"SimpleShuffleLayer": "rose",
"ArrayOverlayLayer": "pink",
"BroadcastJoinLayer": "blue",
"Blockwise": "green",
"BlockwiseLayer": "green",
"BlockwiseCreateArray": "green",
"MaterializedLayer": "gray",
} Explanation
Key Points
|
Apart from colors, we could also tinker around with the outline or the node shape as well. Some examples
|
and, where do we stand on adding these colors to the HTML Reprs as well (as shown #7919 (comment)) |
Can we put "Legend: Layer types" as the legend heading? I think adding that it's about layer types will help make it clearer. The I like the double layer outline, that does draw my attention to it. The rounded edges I would be less likely to notice, for me that's less effective. I personally am not a big fan of adding more colours to the HTML reprs. I think it makes it more confusing because it's not explained what they mean, and the layer type information is included in the details dropdown for each of them already. |
Which all layers would you like to see following this? My plan:
|
I'm unsure about whether we should highlight some layer types (eg: with double outlines) given that we don't really have a clear-cut, definitive list for what is or isn't efficient. I'll let @martindurant weigh in though, since I think you may have had some previous conversations around that. |
@martindurant when you get a chance can you look over this for merging. It is Freyam's last few days of GSOC, so we'd like to get these open PRs in. |
Can we have a final demo image here, perhaps also add to the initial PR description, so people coming here get to understand what was done quickly? I'm not sure I answered regarding double outlining some nodes; I would not do it yet, and see how the colouring goes down with the community first. If you did the rounded corners, that's fine, since it's rather subtle. I expect in the long run we will come up with some shapes too, but let's let this PR do what the current title says. |
@martindurant Updated ✔️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I went through this again and I have some thoughts.
dask/highlevelgraph.py
Outdated
@@ -1271,13 +1320,52 @@ def to_graphviz( | |||
attrs.setdefault("fontsize", str(node_size)) | |||
attrs.setdefault("tooltip", str(node_tooltips)) | |||
|
|||
if color == "layer_type": | |||
layer_colors = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible future improvement: it feels like these should be (class) attributes of the layers themselves. Since colouring is not going to be default yet, it's ok to leave them here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments (sorry) on the docs
💛 |
black dask
/flake8 dask
/isort dask
In this PR, I have added an extra attribute to the graphviz output of High-Level graphs - node fill color. Nodes are colored on the basis of their
layer_type
.This was achieved by using DOT's
fillcolor
attribute.Currently, this option is made optional to the users. They need to provide an additional kwarg while calling
dask.visualize()
withcolor="layer_type"
.Demo
c.dask.visualize(color="layer_type")
c.dask.visualize(color="layer_type")
Color Scheme
Explanation
DataFrameIOLayer
: inefficient;ShuffleLayer
,SimpleShuffleLayer
: inefficient;ArrayOverlayLayer
: inefficient;BroadcastJoinLayer
: (?);Blockwise
,BlockwiseLayer
,BlockwiseCreateArray
: efficient; easy to parallize;MaterializedLayer
: inefficient; better to materialize as late as possible;Key Points