-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make some DataONE APIs into transactions in Metacat #1642
Comments
@artntek @taojing2002 one way to determine if this will be a real issue on k8s deployments would be to develop a test for CRUD API operations that take some time and force a pod evictions at key points in the middle of what should be an atomic operation. For that to happen, we might need to test with a long-running operation, such as a large data upload. During that "transaction", we could then use the k8s Eviction API to evict the metacat pod and make k8s spin it up on another node. This is what would happen if, for example, a hardware failure happens or if Nick cordons and drains a node for maintenance. The example in the docs they give for the API call is: curl -v -H 'Content-type: application/json' https://your-cluster-api-endpoint.example/api/v1/namespaces/default/pods/quux/eviction -d @eviction.json We should be able to automate that in a test, albeit with some complications due to authentication against a k8s cluster -- its a complicated integration test. Maybe its sufficient for us to just run this run this test manually at first to gauge if there are even any issues here to be addressed. |
In this ticket: we found:
Matthew commented: Matthew and I discussed the issue - now the code is fine. |
Recently we got some reports that the Metacat instances only had partial records on objects. It is warning sign that uploading objects to Metacat is not a transaction.
Create
,update
, anddelete
are the methods which should be transactions - they should either fully succeed or rollback properly.As we move Metacat instances to k8s , we should expect interruptions during those procedures more frequently.
So we need to design a system to do this job.
The text was updated successfully, but these errors were encountered: