Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #722 issue affecting failed workflows #728

Merged
merged 2 commits into from
Sep 27, 2017

Conversation

muhrin
Copy link
Contributor

@muhrin muhrin commented Sep 27, 2017

We overwrote the constructor for Persistence but did not call the super
class. This was a problem because the super constructor was actually
adding the class as a listener of the MONITOR. It is the MONITOR that
would tell the persistence when a process had crashed (because of
exception). This means that the pickle was not being moved to failed
and therefore the workflow would be re-ran.

We overwrote the constructor for Persistence but did not call the super
class.  This was a problem because the super constructor was actually
adding the class as a listener of the MONITOR.  It is the MONITOR that
would tell the persistence when a process had crashed (because of
exception).  This means that the pickle was not being moved to failed
and therefore the workflow would be re-ran.
@muhrin muhrin requested a review from lekah September 27, 2017 13:29
@sphuber
Copy link
Contributor

sphuber commented Sep 27, 2017

Martin! Good to hear from you mate. I am curious though, why did this not happen when I just ran the daemon as opposed to manually calling tick_workflow_engine?

@muhrin
Copy link
Contributor Author

muhrin commented Sep 27, 2017

@sphuber I asked myself the same thing. Turns out the bug was sorta there with the daemon as well. What would happen is that the process would crash but the flock would remain (because persistence hadn't received the message about it failing). So the daemon wouldn't try to re-run it...until it was restarted. So it's a little subtle to notice.

@sphuber
Copy link
Contributor

sphuber commented Sep 27, 2017

Pulled the changes locally and the zombie test now runs as expected.

@sphuber sphuber merged commit 1aa2bd3 into aiidateam:develop Sep 27, 2017
@muhrin muhrin deleted the fix_722_failed_wf branch September 27, 2017 17:11
@lekah
Copy link
Contributor

lekah commented Sep 27, 2017

Thanks guys, it also works for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants