Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the execmanager.retrieve_calculation idempotent'ish #3142

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Jul 7, 2019

Fixes #3141

The retrieve_calculation would cause an exception if called multiple
times for the same calculations, which can happen if the first time that
the runner was working on it got interrupted, for example due to a
daemon shutdown. The reason is that the second time around the adding of
the retrieved folder data node will raise a uniqueness exception,
because there can only be one output with the same label.

Note that full idem-potency is impossible, but this change should make
the problem a lot less likely to occur. The idea is to delay the actual
attaching of the retrieved folder data node to the last moment possible.
This way, if the method is called again and the folder is already there,
we can be reasonably sure that the files were already retrieved
successfully and we simply return, leaving the call a no-op. This is done
in the beginning of the function to check if the output node already
exists using the LinkManager.first() call.

The LinkManager.first() method is adapted to, instead of raising a
ValueError when no entry is found at all, simply return None. This
is more consistent with the behavior of QueryBuilder.first().

The `retrieve_calculation` would cause an exception if called multiple
times for the same calculations, which can happen if the first time that
the runner was working on it got interrupted, for example due to a
daemon shutdown. The reason is that the second time around the adding of
the `retrieved` folder data node will raise a uniqueness exception,
because there can only be one output with the same label.

Note that full idempotency is impossible, but this change should make
the problem a lot less likely to occurr. The idea is to delay the actual
attaching of the retrieved folder data node to the last moment possible.
This way, if the method is called again and the folder is already there,
we can be reasonably sure that the files were already retrieved
succesfully and we simply return, leaving the call a no-op. This is done
in the beginning of the function to check if the output node already
exists using the `LinkManager.first()` call. If the node exists, the
retrieve function has apparently already been called before and reached
the end of the function where it adds the retrieved folder. This means
all the files were already successfully retrieved so we can safely skip.

The `LinkManager.first()` method is adapted to, instead of raising a
`ValueError` when no entry is found at all, simply return `None`. This
is more consistent with the behavior of `QueryBuilder.first()`.
@sphuber sphuber force-pushed the fix_3141_execmanager_retrieve_calculation_idempotence branch from 19e3e84 to 81de7b2 Compare July 8, 2019 10:07
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - ideally, the final solution would be to make sure the add_incoming an atomic operation (wrapping it in a transaction?). I think we can approve this, maybe can you add a note to check/improve this in the issue where we should address atomicity, transactions etc?

I also agree with returning None in first(), this is consistence with SQLAlchemy's first() method vs. one() that instead raises

@sphuber sphuber merged commit 8447c6b into aiidateam:develop Jul 9, 2019
@sphuber sphuber deleted the fix_3141_execmanager_retrieve_calculation_idempotence branch July 9, 2019 13:27
d-tomerini pushed a commit to d-tomerini/aiida_core that referenced this pull request Sep 30, 2019
…#3142)

The `retrieve_calculation` would cause an exception if called multiple
times for the same calculations, which can happen if the first time that
the runner was working on it got interrupted, for example due to a
daemon shutdown. The reason is that the second time around the adding of
the `retrieved` folder data node will raise a uniqueness exception,
because there can only be one output with the same label.

Note that full idempotency is impossible, but this change should make
the problem a lot less likely to occur. The idea is to delay the actual
attaching of the retrieved folder data node to the last moment possible.
This way, if the method is called again and the folder is already there,
we can be reasonably sure that the files were already retrieved
successfully and we simply return, leaving the call a no-op. This is done
in the beginning of the function to check if the output node already
exists using the `LinkManager.first()` call. If the node exists, the
retrieve function has apparently already been called before and reached
the end of the function where it adds the retrieved folder. This means
all the files were already successfully retrieved so we can safely skip.

The `LinkManager.first()` method is adapted to, instead of raising a
`ValueError` when no entry is found at all, simply return `None`. This
is more consistent with the behavior of `QueryBuilder.first()`.
d-tomerini pushed a commit to d-tomerini/aiida_core that referenced this pull request Oct 16, 2019
…#3142)

The `retrieve_calculation` would cause an exception if called multiple
times for the same calculations, which can happen if the first time that
the runner was working on it got interrupted, for example due to a
daemon shutdown. The reason is that the second time around the adding of
the `retrieved` folder data node will raise a uniqueness exception,
because there can only be one output with the same label.

Note that full idempotency is impossible, but this change should make
the problem a lot less likely to occur. The idea is to delay the actual
attaching of the retrieved folder data node to the last moment possible.
This way, if the method is called again and the folder is already there,
we can be reasonably sure that the files were already retrieved
successfully and we simply return, leaving the call a no-op. This is done
in the beginning of the function to check if the output node already
exists using the `LinkManager.first()` call. If the node exists, the
retrieve function has apparently already been called before and reached
the end of the function where it adds the retrieved folder. This means
all the files were already successfully retrieved so we can safely skip.

The `LinkManager.first()` method is adapted to, instead of raising a
`ValueError` when no entry is found at all, simply return `None`. This
is more consistent with the behavior of `QueryBuilder.first()`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the execmanager.retrieve_calculation idempotent as best as possible
2 participants