Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: respect nested output namespaces in Process.exposed_outputs #4863

Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Apr 19, 2021

Fixes #3533

The exposed_outputs method was detecting nested namespaces in the
dictionary of outputs of a node by using the namespace separator
character of the port namespace class. However, this character, which is
the ., gets converted to __ when the link is stored in the database.
This transformation is necessary since the . is a reserved character
to enable attribute based dereferencing, e.g.

node.outputs.some_output

Since link labels can contain underscores, we use a double underscore
which, together with the guarantee that link labels don't have leading
or terminating underscores, can uniquely determine the namespaces in any
given link label.

The solution is to not manually reconstruct the namespaces in the output
dictionary but simply use the get_outgoing().nested() method which
does it for us.

@ramirezfranciscof
Copy link
Member

Does this also address #4623 ? I would think it should, or at least be intimately related. It was mentioned by @bosonie in issue #3533 but I've seen no comment regarding that.

@JPchico
Copy link
Contributor

JPchico commented Apr 19, 2021

Does this also address #4623 ? I would think it should, or at least be intimately related. It was mentioned by @bosonie in issue #3533 but I've seen no comment regarding that.

I do not think it does, since I think that node.get_outgoing(link_label_filter=name) would still fail to get the proper name as it would pick just the top level label instead of having something like namespace__port which is the kind of label that one would have to pass to link_label_filter.

@bosonie
Copy link

bosonie commented Apr 19, 2021

I do not think it does, since I think that node.get_outgoing(link_label_filter=name) would still fail to get the proper name as it would pick just the top level label instead of having something like namespace__port which is the kind of label that one would have to pass to link_label_filter.

Correct

@sphuber sphuber force-pushed the fix/3533/exposed-outputs-nested-namespaces branch from 8c9a87e to 6867517 Compare April 19, 2021 16:00
@codecov
Copy link

codecov bot commented Apr 19, 2021

Codecov Report

Merging #4863 (9668717) into develop (f1aab1f) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #4863   +/-   ##
========================================
  Coverage    79.62%   79.62%           
========================================
  Files          519      519           
  Lines        37095    37095           
========================================
  Hits         29532    29532           
  Misses        7563     7563           
Flag Coverage Δ
django 74.36% <100.00%> (ø)
sqlalchemy 73.24% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/engine/processes/process.py 91.73% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1aab1f...9668717. Read the comment docs.

The `exposed_outputs` method was detecting nested namespaces in the
dictionary of outputs of a node by using the namespace separator
character of the port namespace class. However, this character, which is
the `.`, gets converted to `__` when the link is stored in the database.
This transformation is necessary since the `.` is a reserved character
to enable attribute based dereferencing, e.g.

    node.outputs.some_output

Since link labels can contain underscores, we use a double underscore
which, together with the guarantee that link labels don't have leading
or terminating underscores, can uniquely determine the namespaces in any
given link label.

The solution is to not manually reconstruct the namespaces in the output
dictionary but simply use the `get_outgoing().nested()` method which
does it for us.
@sphuber
Copy link
Contributor Author

sphuber commented Apr 26, 2021

@ramirezfranciscof could you review this today. This needs to be released in 1.6.2 a.s.a.p as it is blocking some users.

Copy link
Member

@ramirezfranciscof ramirezfranciscof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, LGTM!

BTW: I'm a bit curious about having tests for this in test_process.py and test_work_chain.py. It would seem that the main difference is that the first calls the methods directly (tests that the methods work), whereas the second goes through the engine running system (i.e. tests that these methods are actually called during run).

If this is so, perhaps the name / location of the tests is not very indicative of their purpose and it would be good to open an issue to at some point deal with that. Am I missing something else @sphuber ?

@ramirezfranciscof ramirezfranciscof self-assigned this Apr 26, 2021
@sphuber
Copy link
Contributor Author

sphuber commented Apr 26, 2021

Thanks for the quick review @ramirezfranciscof

BTW: I'm a bit curious about having tests for this in test_process.py and test_work_chain.py. It would seem that the main difference is that the first calls the methods directly (tests that the methods work), whereas the second goes through the engine running system (i.e. tests that these methods are actually called during run).

Well, the exposed_outputs is defined on the Process class and so I added the tests in the corresponding unit test. That being said, exposing outputs really only makes sense for workflow type processes because typically you want to expose the ports of a subprocess that you want to call, which is only something that workflows can do. Calculation processes therefore don't really have a use. But that is a different question really. I think that unit tests for a particular method should be done in the file of the respective class. If we move the method to another (sub)class, then the unit tests move as well.

@sphuber sphuber merged commit 42455ac into aiidateam:develop Apr 26, 2021
@sphuber sphuber deleted the fix/3533/exposed-outputs-nested-namespaces branch April 26, 2021 09:21
@ramirezfranciscof
Copy link
Member

Well, the exposed_outputs is defined on the Process class and so I added the tests in the corresponding unit test. That being said, exposing outputs really only makes sense for workflow type processes because typically you want to expose the ports of a subprocess that you want to call, which is only something that workflows can do. Calculation processes therefore don't really have a use. But that is a different question really. I think that unit tests for a particular method should be done in the file of the respective class. If we move the method to another (sub)class, then the unit tests move as well.

Mmm ok, I think we are saying a similar thing. Let me reformulate maybe: I agree with the following points...

  • Since exposed_outputs is defined on the Process class then the tests for the methods should be on test_process.py (I'm not sure why it has to be defined there instead of in the workchain subclass, but anyways I'm ok with that, it was not the issue).
  • It is a good idea to test that the engine is calling this method and setting the right thing when we run or submit. And since it only makes sense to do this with a workchain, this should be what gets built.

My point was more questioning if "integration" test should be on test_work_chain.py. Now that I take a second look, I see that all (or at least most) of the tests in test_work_chain.py are done through run calls to the engine, which is the same way we seem to be testing all specific processes (test_calc_job.py, test_calcfunction.py, test_workfunction.py).

Now, I think I still have the question of why don't we test more directly (without going through the engine) the methods of specific subclasses of Process. And also perhaps if we should make it more clear where should the test for direct methods go, and where should the "integration" ones that go through the engine go.

@sphuber
Copy link
Contributor Author

sphuber commented Apr 26, 2021

I don't think that there should be a distinction whether integration tests go into test_process or test_work_chain. They both should contain unit tests. Now, writing pure unit tests for these complex classes are tricky and often would require a lot of mocking. We are simply using the full execution to do the mocking for us. You can argue that this makes it an integration test, but I don't think it matters too much. I think the idea should be that whenever reasonably possible we should attempt to isolate the direct interface that we are testing and just mock the required state, but in the case of processes, this can be so difficult that simply running them may be a viable solution as well.

@mbercx
Copy link
Member

mbercx commented Jul 5, 2021

While trying to add a note on tab-completion for the outputs for the tutorial, I ran into the following deprecation warning:

In [1]: cj_node = load_node(311)
/opt/conda/lib/python3.7/site-packages/aiida/orm/utils/managers.py:98: AiidaDeprecationWarning: dereferencing nodes with links containing double underscores is deprecated, simply replace the double underscores with a single dot instead. For example:
`self.inputs.some__label` can be written as `self.inputs.some.label` instead.
Support for double underscores will be removed in the future.
  'Support for double underscores will be removed in the future.', AiidaDeprecationWarning
In [2]: cj_node.outputs. <TAB>

The deprecation warning prints when I try to tab-complete the outputs. @ramirezfranciscof guided me to this PR. I didn't look into much detail what is responsible, but if I revert back to aiida-core==1.6.1, before this change was released, I no longer have this problem.

@sphuber
Copy link
Contributor Author

sphuber commented Jul 5, 2021

This is not the corresponding PR, but #4625 is. I am bit surprised that the deprecation warning triggers already at tab completion though. It should only trigger when actually retrieving a node through a label containing double underscores. So I am confused how this is happening. Still, does the tab-completion still work, despite the warning?

@mbercx
Copy link
Member

mbercx commented Jul 5, 2021

This is not the corresponding PR, but #4625 is. I am bit surprised that the deprecation warning triggers already at tab completion though. It should only trigger when actually retrieving a node through a label containing double underscores. So I am confused how this is happening. Still, does the tab-completion still work, despite the warning?

Yes, but it prints every time the tab-completion is attempted. 😅

@unkcpz
Copy link
Member

unkcpz commented Jul 6, 2021

By setting a traceback right before the deprecate waring I got the following information:

Traceback (most recent call last):                                                                                      
  File "/home/unkcpz/miniconda3/envs/aiida-core-dev/lib/python3.8/site-packages/jedi/cache.py", line 110, in wrapper
    return dct[key]
KeyError: ((), frozenset())  

During handling of the above exception, another exception occurred:
                              
Traceback (most recent call last):
  File "/home/unkcpz/Projects/python-code/aiida_core/aiida/orm/utils/managers.py", line 88, in _get_node_by_link_label
    node = attribute_dict[label]                                                                                        
KeyError: '__wrapped__'

I have no idea where this __wrapped__ key come from, but escape it solve the issue.

@sphuber
Copy link
Contributor Author

sphuber commented Jul 6, 2021

The __wrapped__ dunder method is added when a function is wrapped by a decorator and this returns the original wrapped function (see Python docs). The wrapping seems to come from this jedi cache that is present. I have no idea where this is coming from though and who is asking to cache this stuff. The problem is though that in our code, we warn whenever the label contains a double underscore, which is the case for dunder methods, but we should ignore those. I split this issue off to #5010 and am submitting a PR to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process.exposed_outputs does not work with nested namespaces
6 participants