-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provenance graph visualization enhancement #4081
Provenance graph visualization enhancement #4081
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #4081 +/- ##
===========================================
+ Coverage 78.85% 78.86% +0.01%
===========================================
Files 467 467
Lines 34468 34481 +13
===========================================
+ Hits 27178 27190 +12
- Misses 7290 7291 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @unkcpz , thanks for the contribution! I made some specific change requests below, let me know what you think. I also have two more general comments:
(1) This being a change that involves a change in how a visual output is produced, it could have been helpful if you provided some visual aids to see this difference in your OP (perhaps an examples of a graph before and after the modifs). Or maybe also as cript that creates a DB and prints the plot. Anyways, just to take into consideration for the future.
(2) We are currently undergoing a complete re-structuring of our documentation, but perhaps it would be worthwhile to add a few lines about how to use this feature in the graph
documentation. You could open an issue so we don't forget.
I also have a general issue to point out that may involve some changes, but would first require a little bit of discussion or feedback from other core-developers, so I'm going to just make a comment on the PR after sending this more concrete review.
aiida/cmdline/commands/cmd_node.py
Outdated
@click.option( | ||
'--target-cls', help='Only show nodes of target node class specified by class name', type=click.STRING, default='' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) Perhaps the name target_cls
could be a bit more descriptive, specially since people could think "target" refers to the node being selected as origin node for the graph. Perhaps it would be better to use something along the lines of highlight_class
or something similar.
(2) I would say that, unless you have a reason to use the empty string, it would be better to use None
as the default for non-provided inputs.
(3) Would there be a way to add into the help string information so the user knows what is the string that needs to be provided here? As it stands right now its not very clear how I would identify the class that I want to highlight (do I use "FolderData"? "data.folder.FolderData"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be better to use something along the lines of highlight_class or something similar.
Yes, target-cls
seems a little ambiguous. highlight_class
is better, will change to it. But I think highlight-class is to long as a cmd option, so I add -c
(short for 'class') for this.
it would be better to use None as the default for non-provided inputs.
I keep it a empty string here to make the variable assignment simple, otherwise getattr(orm, target_cls, None)
will complain about the type of target_cls
.
The simple assignment code getattr(orm, highlight_class, None)
for highlight_class will be:
if highlight_class:
highlight_class = getattr(orm, highlight_class)
else:
highlight_class = None
If the specification in the code base is using None for the situation, I'll adopt your suggestion. Please let me know about your preference.
Would there be a way to add into the help string information so the user knows what is the string that needs to be provided here?
I assume all builtin data class can be import by orm.<className>
for example, orm.FolderData
is equivalent to orm.data.folder.FolderData
. Is that true for all the builtin data class and will this alias of data type remains invariable in future API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If fact, the argument highlight_class
here can be a tuple of data type class. But I don't know how to implement it in cmdline interface. Maybe it can be realized by using multiple=True
option of click
. Are there other places in the code base multiple=True
be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, I see what you mean. Notice however that if highlight_class
is None
by default, then you don't really need the else
clause of that conditional, right? Just the following should be enough:
if highlight_class:
highlight_class = getattr(orm, highlight_class, None)
Regarding the multiple=True
option, from what I could find this is to be able to pass multiple times the flag (so --highlight_class class1 --highlight_class class2
), does it also allow you to pass a single list? (--highlight_class=[class1, class2]
). This indeed would be useful versatility.
aiida/cmdline/commands/cmd_node.py
Outdated
origin_style_override = {'color': 'red', 'penwidth': 6} | ||
graph.recurse_ancestors( | ||
root_node, | ||
depth=ancestor_depth, | ||
link_types=link_types, | ||
annotate_links='both', | ||
origin_style=origin_style_override, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this doesn't seem to be an option right now, we might eventually allow the user to choose his own style for the origin/root node (or maybe the whole palette), in which case it is perhaps not a good idea to override this in here. Why not set this as the default origin_style
inside graph.recurse_ancestors
? (and descendants
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I put it here since I assume its not an option for user in cmdline interface, and I am trying to keep the internal code as much as possible, so that not to affect the default functionality. I have same consideration previously but I personally prefer the present scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding new features, this is indeed a good consideration, but take into account that the origin_style
highlight is actually a current issue-bug in the code, as you noted in your OP. So I would say it makes sense that the fix for this is internal to the code, and since the fix is making this the default behaviour, so why not make it the default origin_style
?
@@ -183,6 +183,61 @@ def test_graph_recurse_ancestors(self): | |||
(nodes.pd1.pk, nodes.wc1.pk, LinkPair(LinkType.INPUT_WORK, 'input2'))]) | |||
) | |||
|
|||
def test_graph_recurse_spot_target_cls(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this test is very nicely structured, but there is perhaps a subtlety: it is checking for the equality with a complete hardcoded graph which includes very specific formatting stuff (for example, the exact colors of the default palette of the graph generator). This makes it a bit more cumbersome to maintain.
Since most of the functionality should already be covered in the other tests, I would think that here you only need to check that the highlighted nodes are different. Would it perhaps be possible to run the meethod twice, with and without the "highlighted class" option and compare the specific places where they have to be similar or different (and in this second case just check that they are not the same)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! modified to only test for the diff.
aiida/tools/visualization/graph.py
Outdated
if target_cls and not isinstance(traversed_node, target_cls): | ||
style_override = self._ignre_node_override_style | ||
self.add_node(traversed_node, style_override=style_override) | ||
else: | ||
self.add_node(traversed_node, style_override={}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) So, maybe I would just call this _ignored_node_style
? (writing "ignored" fully for clarity and removing "override" since this is just a style, the action of overriding happens when you pass it to add_node
in the corresponding parameter).
(2) Also, notice that the default for style_override
is None
. Although currently passing the empty dict has the same effect, it is conceivable (though I admit a bit unlikely) that this could be re-purposed in the future. Just for consistency and to be on the safe side I would advise to use None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the typo here 😅 . Changed as your advise.
aiida/tools/visualization/graph.py
Outdated
override_styles_dict = { # pylint: disable=invalid-name | ||
'ignore_node': { | ||
'color': 'lightgray', | ||
'fillcolor': 'white', | ||
'penwidth': 2 | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure about the idea of having a global dict
for just a single case, than then is stored as an internal variable of the class, that then is used in two ocassions. It feels like one too many layers of generalization with something that is still not clear HOW should be generalized (i.e. maybe in the future we need the ignored nodes to be different for each class, maybe a lighter version of the current defaults, and so we need to convert this into a function or even an optional flag in the current implementation).
Did you have any particular reason why you implemented it this way? If not I would maybe get rid of this global dict and just have the other line (see other comment for name change resoning):
self._ignored_node_style = {
'color': 'lightgray',
'fillcolor': 'white',
'penwidth': 2
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep it as a global variable under the *_node_styles
. So all styles are set in the one place. I admit that using a dict rather that a function as other style setting codes is a bit confusing.
I add another override style for origin_node
, and all override styles can be collected here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is good to keep a dict constant variable here. but seems it is a reason here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having two entries would justiy a bit more the existence of the global variable. I still think this is premature generalization, but I guess if you prefer to keep it like this there is no strong reason to insist in changing it, so ¯\_(ツ)_/¯
.
Thanks for review @ramirezfranciscof . The effect of the implementation of the feature: happy to get more feedback. 😄 |
4edef73
to
c622396
Compare
Ok, so, one question to @unkcpz : when you were designing this, what did you have in mind/expected that the user would be able to do when they wanted to highlight processes? Right you the behaviour is that of the first case, and as @ltalirz promptly identified, many users will probably be more comfortable identifying the nodes with their more specific "sub-class" names (like they appear in the graphs) as in the second example. If you agree with this, then maybe we need to change a bit the behavior of this implementation and instead of passing the class like this:
You need to just pass to
which leaves us with two ugly options:
Of these two, I think the best is the second one, or actually an improved version of the second one: to take this logic of assigning labels and put it in an external function (similar to In any case, this makes me wonder if maybe this points to a deeper structural issue and we should actually not be making these decisions inside |
I'll probably get round to looking at this properly next week. But note that you should make sure that the documentation is also updated here: https://github.com/aiidateam/aiida-core/tree/develop/docs/source/working_with_aiida/visualising_graphs (I'm not sure where this is moving to in https://github.com/aiidateam/aiida-core/projects/18?) |
Hi @unkcpz - while I'll leave most of the review to @ramirezfranciscof , there is one important request I need to make:
Users will expect this to work with those labels just as well. |
aiida/cmdline/commands/cmd_node.py
Outdated
@click.option( | ||
'-c', | ||
'--highlight-class', | ||
help="Only colored nodes of specific datatype identified by its class name ('StructureData', 'FolderData', etc.).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="Only colored nodes of specific datatype identified by its class name ('StructureData', 'FolderData', etc.).", | |
help="Only color nodes of specific class label (as displayed in the graph, e.g. 'StructureData', 'FolderData', etc.).", |
@ramirezfranciscof Thanks for your feedback. To be honest, I did not thought about how to deal with the Process type at first. I couldn't agree more that name shown in the graph is much reasonable to be chosen as the filter option. And as @ltalirz suggesting, comparing strings from CLI with label in graph would be a better design and be more versatile.
with exported data from https://github.com/aiidateam/aiida-export-migration-tests/blob/master/aiida-export-migration-tests/archives/export_v0.8_manual.aiida will generate graph as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @unkcpz ! Looking almost ready, just a couple more modifications and a question and we should be ready to go.
aiida/tools/visualization/graph.py
Outdated
sublabel_text = node_sublabel_func(node) | ||
if sublabel_text: | ||
label.append(sublabel_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why did you change this? I mean, the previous variable is still called label
, not label_text
. I would change this back unless you had a reason for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use 'text' as suffix to distinguish the type of these variable from the type of label
where label
is a list
here, so I think it might be more readable to name the inside text of label as label_text and sublabel_text. And that is why I named the isolated function _get_node_label_text
rather than _get_node_label
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. This is actually a good reason. I would ideally prefer to clarify this the other way around (I would expect the sublabel
to be a string/text, so I would change the other one to label_list
), but it is ok if you want to leave it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I'll adopt your advice.
aiida/tools/visualization/graph.py
Outdated
if not origin_style: | ||
origin_style = self._origin_node_style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you doing this instead of setting the default for the function? If the caller wants to pass "None" as the origin node style (maybe he wants to map a whole workflow without highlighting any starting point) this would override that choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.... I did this because pylint complain about using dict as default value. Maybe user should pass {}
as the orgin style to achieve the purpose you mentioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, I see, this is because it is not a good idea to use mutables as defaults. I think if you wrap your dictionary around dict( your_dict )
that should do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use types.MappingProxyType({})
as a default value.
The reason pylint is complaining here is that having mutable defaults is dangerous - any mutation to the default object will affect subsequent function calls. The MappingProxyType
wraps a dict in an immutable proxy. Since no one holds a "normal" reference to that dict (it's constructed right there), it's effectively immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you wrap your dictionary around
dict( your_dict )
that should do the trick.
@greschd Isn't this enought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, no idea.. MappingProxyType
was added to types
in Python 3.3.
@unkcpz is the pylint
version you've got installed locally exactly the same as what's in the aiida-core
setup.json
?
I'd be tempted to just slap a # pylint: disable=no-name-in-module,useless-suppression
on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it could be that aiida/cmdline/params/types
is shadowing the types
module, or pylint
just doesn't understand that that is not a top-level module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the pylint
version is same, also version of packages in "dev_precommit". @greschd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unkcpz We can't seem to figure it out. For now, I would say just add the pylint: disable
and open up an issue with this problem once the PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. @ramirezfranciscof
got_diff = ''.join([l for l in diff if l.startswith('+Exit')]) | ||
|
||
expected_diff = """\ | ||
+Exit Code: 200" color=lightgray fillcolor=white penwidth=2 shape=rectangle style=filled]""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why are you only checking this difference in the "CalcFunctionNode" if you are also changing the "WorkChainNode", the "RemoteData" and the "FolderData"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I test one condition here to make sure the changes works. As you say, better to make a complete contrast. Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that it makes more sense to check all the expected changes.
aiida/tools/visualization/graph.py
Outdated
@@ -187,6 +190,19 @@ def pstate_node_styles(node): | |||
return node_style | |||
|
|||
|
|||
override_styles_dict = { # pylint: disable=invalid-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason pylint
complains here is that PEP8 suggests constants to be named in all caps (OVERRIDE_STYLES_DICT
).
Since we're generally following that guideline (see e.g. hashing.py), I'd suggest renaming the variable and dropping the pylint: disable
statement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Here I use lowercase to makes it look similar to other functions that do style collecting. Again, it seems unnecessary. 😿
Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, just nitpicking here 🙂
On second thought, do we want this constant to be used outside this module, or should it remain an implementation detail here? If it's the latter, we should probably add a leading underscore, and remove it from the __all__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it might be used outside this module just as other style setting functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll let @ramirezfranciscof decide that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't think this would be used in any other part of the aiida-core, if that is what you meant. If you intended this to be exposed to users so that they can "pick" the style from this dict and pass that to the recurse_whatever
functions when manually using them, I guess it is ok to expose it, but I personally doubt that this dict
is going to get much bigger or that people are going to use it to customize their plots. Personally I would err on the side of caution and not expose it unless we really think it will be used this way, but I will leave this up to you @unkcpz .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it in first place since I expose and use it in verdi_graph
. I agree that there is no chance that others would reuse this dict.
Erased it from __all__
and add a leading underscore.
88ab225
to
fdc8530
Compare
Hey @chrisjsewell , you wanted to check this out? We are almost ready to merge. |
d4eb3e2
to
0268769
Compare
the argument highlight_classes is added in method `recurse_ancestor` and `recurse_descendants` in Provenance Graph visualization for the purpose of quickly spotting the desire node. The target classes to be highlight are passed to the argument as a class name string, it was compared with the label of node. So the node label getting function is isolated to be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I see you even switched the sublabel_text
/label_list
naming, nice touch, appreciated ;)
@ramirezfranciscof before this is merged can we should check that https://github.com/aiidateam/aiida-core/blob/develop/docs/source/working_with_aiida/visualising_graphs/visualising_graphs.ipynb still produces the same outcome: https://aiida.readthedocs.io/projects/aiida-core/en/latest/working_with_aiida/visualising_graphs/visualising_graphs.html |
@chrisjsewell Sure there are some small changes in graph outcomes. Since the origin_style to highlight the origin node is set to default. |
@unkcpz I can do this later today once |
Ok added this feature to documentation, so good to go 👍 https://206-77234579-gh.circle-artifacts.com/0/html/howto/visualising_graphs/visualising_graphs.html |
Thanks @unkcpz and @chrisjsewell ! |
Thanks for reviewing! |
Fixes #3718
In this PR, features of
verdi node graph generate
is extended.I am trying to find a way, in order to quickly spot nodes of specific node type from a large provenance graph, in my user scenario.
Provenance graph provide an easy way to quickly find the ancestor or descendants nodes, however, when the workflow is complex it is often not easy to scroll the image to find the desire node. I add a argument
target_cls
to recurse functions so that the generated graph will more recognizable with expected nodes.