Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process: Update outputs before updating node process state #5813

Merged
merged 2 commits into from
Dec 5, 2022

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Dec 2, 2022

Fixes #5812

In a7be795a26aea9fdff1d6f762582f1b77801ff3d the order of the two
method calls in Process.update_node_state was reversed. The idea was
that the update_outputs call can except, for example if a Process
implementation adds an invalid output by calling Process.out. In this
case the process will go into the excepted state and the process state
would sometimes not be updated on the node. By reversing the two actions
the process state was hoped to be more accurate in these cases.

However, since this change, the nightly workflow started failing where
nested workchains would except because a subprocess didn't have an
output that they expected. It is not clear at all what the relation is
but it seems that reverting the change described above resolves the
problem. Unfortunately, the bug does not seem to be deterministic and
doesn't always manifest.

Here, we reverse the order to the original state, updating the outputs
first, but in order to guarantee that the process state is always
updated, even if the output update excepts, it is called in the finally
clause of the try-except in which the update_outputs is wrapped.

@sphuber sphuber force-pushed the fix/nightly branch 8 times, most recently from 424020b to d671488 Compare December 2, 2022 17:38
@sphuber sphuber force-pushed the fix/nightly branch 2 times, most recently from e91e0c0 to 508970d Compare December 3, 2022 13:25
In `a7be795a26aea9fdff1d6f762582f1b77801ff3d` the order of the two
method calls in `Process.update_node_state` was reversed. The idea was
that the `update_outputs` call can except, for example if a `Process`
implementation adds an invalid output by calling `Process.out`. In this
case the process will go into the excepted state and the process state
would sometimes not be updated on the node. By reversing the two actions
the process state was hoped to be more accurate in these cases.

However, since this change, the `nightly` workflow started failing where
nested workchains would except because a subprocess didn't have an
output that they expected. It is not clear at all what the relation is
but it seems that reverting the change described above resolves the
problem. Unfortunately, the bug does not seem to be deterministic and
doesn't always manifest.

Here, we reverse the order to the original state, updating the outputs
first, but in order to guarantee that the process state is always
updated, even if the output update excepts, it is called in the finally
clause of the try-except in which the `update_outputs` is wrapped.
@sphuber sphuber changed the title Tests: Fix the nightly workflow Process: Update outputs before updating node process state Dec 3, 2022
@unkcpz
Copy link
Member

unkcpz commented Dec 5, 2022

Hi @sphuber, I don't have clue why it failed. But since it happened on NestedWorkChain, the error may able to be reproduced by a unit test with daemon fixture?

@sphuber
Copy link
Contributor Author

sphuber commented Dec 5, 2022

Hi @sphuber, I don't have clue why it failed. But since it happened on NestedWorkChain, the error may able to be reproduced by a unit test with daemon fixture?

The problem is that the bug is not deterministic. In many of the runs that failed, a majority of the workchains worked just fine, and it was a random one that failed. That's why adding an explicit unit test for this would be pointless I think.

@@ -406,7 +406,20 @@ def on_entered(self, from_state: Optional[plumpy.process_states.State]) -> None:
"""After entering a new state, save a checkpoint and update the latest process state change timestamp."""
# pylint: disable=cyclic-import
from aiida.engine.utils import set_process_state_change_timestamp
self.update_node_state(self._state)

# For reasons unknown, it is important to update the outputs first, before doing anything else, otherwise there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be turned into an issue, for someone to figure out later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nightly tests failing
3 participants