Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #1136 - Error 500 deleting DB without quorum #1139

Merged
merged 7 commits into from Jul 13, 2018

Conversation

jjrodrig
Copy link
Contributor

@jjrodrig jjrodrig commented Feb 1, 2018

Overview

The current behaviour for database deletion in a cluster is:

  • Database deletion returns 404 - Not found if all nodes respond not found
  • Database deletion returns 200 - OK if all nodes respond and at least one is ok
  • Database deletion returns 202 - Accepted if the number of responses met quorum and at least one is ok
  • Database deletion returns 500 - Error if the responses are bellow quorum

After this PR the behaviour for database deletetion will be:

  • Database deletion returns 404 - Not found if all nodes respond not found
  • Database deletion returns 200 - OK if the quorum is met and at least one is ok
  • Database deletion returns 202 - Accepted if the number of responses are bellow quorum and at least one is ok
  • Database deletion returns 500 - Error in other cases

Testing recommendations

This PR can be tested in this way

 make test-cluster-with-quorum
 make test-cluster-without-quorum

New test for db deletion is included

Side change in test/javascript/run
I've included a change in test/javascript/run. Now the script does not exit with error if the parameter suites or ignore_js_suites is provided with a value that is not matched with an existing test.

After this PR, it is possible to run make check ignore_js_suites=reduce_builtin or make check suites=all_docs without getting an error in the new cluster testing targets.

Related Issues or Pull Requests

Fixes #1136

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

Complete deletion tests with not found
@wohali
Copy link
Member

wohali commented Feb 16, 2018

@rnewson can you review this? You may also want to have a glance at the already merged companion PR, #1127

{N, M, _} when N >= (W div 2 + 1), M > 0 ->
{stop, accepted};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kocolosk as the original decision maker for these semantics, do you have a problem with this? We've accepted the same idea on the create db side but before we accept this one, and make a formal release including both changes, I wanted to hear your thoughts. The idea here is that you should not get a 500 error when creating or deleting a db in a degraded cluster.

@jjrodrig
Copy link
Contributor Author

jjrodrig commented Feb 21, 2018

@rnewson I've done some more testing on the deletion with different cluster conditions. It seems that some .couch files keep orphaned if a database is deleted with one of the nodes of the cluster stopped.

  • Start 3-node cluster and create test db:
./dev/run -n 3 --with-admin-party-please
curl -X PUT http://127.0.0.1:15984/test?q=1

image
One .couch file is created per node

  • Stop 1 node in the cluster and delete the database
./dev/run -n 3 --with-admin-party-please --degrade-cluster 1
curl -X DELETE http://127.0.0.1:15984/test

image
The .couch is removed and persist in the stopped node as expected

  • Start the complete cluster and check if the database is deleted
./dev/run -n 3 --with-admin-party-please 
curl -X HEAD http://127.0.0.1:15984/test -v

The database is not found but the .couch file is propagated from the restarted node to the rest
image

It seems that .couch files keep orphaned in the system after a deletion with at least a node stopped. This PR does not modify this behavior. Do you think, this is an issue?

@kocolosk
Copy link
Member

Thanks for this PR @jjrodrig, very well done.

The orphaned files are a known condition. They are OK from the perspective of database correctness, as a new database created with the same name will have a different creation timestamp and so the old data would not be surfaced. It could be a useful future enhancement to remove orphaned shard files in a background process.

The previous behavior of distinguishing between a majority or minority of committed updates to the replicas of the shard table is not something I'm interested in preserving. I think we can do better.

I do think the use of 202 Accepted as an indicator to the client that "hey, things are a little messy right now, you might see surprises while we work things out" is a good thing. My concern with the current PR is that we may return 200 OK to a user even though some nodes in the cluster still host the old (soon-to-be-deleted) version of the database. For example, consider the following sequence of events:

  1. Cluster experiences a network partition which creates subsets A and B
  2. User submits DELETE /foo to a node in A and receives 200 OK
  3. User submits PUT /foo to a node in A and receives 200 OK
  4. User submits PUT /foo/bar to a node in B and receives 202 Accepted
  5. Network partition resolves, shard maps are updated

In this sequence the /foo/bar document will be lost permanently 👎

Perhaps a preferable approach is to use 202 Accepted for every situation in which

  1. We get at least one acknowledgement and
  2. We do not hear back from a cluster member

The downside of this approach is that we wait around to hear back from every cluster member, and the request_timeout can be quite large. But I think we need to be quite careful about using 200 OK in situations where some cluster member could still be accepting new data in a shard on death row.

What do you think?

@nickva nickva added this to the 2.2.0 milestone Feb 28, 2018
@jjrodrig
Copy link
Contributor Author

jjrodrig commented Mar 6, 2018

Thanks @kocolosk for your comments.

I see that the orphaned files question is a different issue. We are facing it as part of a cleanup process that we have implemented in our system. Databases are periodically transferred to a temporary database and then moved back to the work database. It is like a purging procedure that allows us to keep the databases small. During this process we are removing and creating databases and orphaned files are produced. +1 for the orphaned files cleanup process enhancement.

Respect to the main issue of this PR, I think that the main problem is to respond with a 500 Error when the operation has been accepted and the database is deleted on the nodes that have received the request. It is responding with an error but the database is deleted in the active nodes, and later on propagated to the rest of nodes once they are accessible. It seem to me that this behaviour is more akin to a 202 Accepted result.

The idea of responding 200 Ok in the case that the quorum is met is mainly to keep the API consistent with other operations where the quorum is considered but I see that this can have some drawbacks.

@wohali
Copy link
Member

wohali commented Mar 26, 2018

@jjrodrig Are there updates coming to this PR from you?

@jjrodrig
Copy link
Contributor Author

@wohali I'll modify the condition to respond with 200 - Ok only if we have a response from all the nodes, If we have a positive response from some of them 202-Accepted is returned.

The problem I see is that the behaviour is not consistent with the creation where the quorum is considered, but is better than the current situation which returns a 500-Error

@janl
Copy link
Member

janl commented Jul 9, 2018

@jjrodrig can you resolve the conflict? Thanks!

@jjrodrig
Copy link
Contributor Author

@janl I've updated db deletion tests to cover new behaviour

Thanks

@janl janl merged commit 71cf9f4 into apache:master Jul 13, 2018
cloudant-service pushed a commit to ibm-cloud-docs/Cloudant that referenced this pull request Sep 14, 2018
apache/couchdb#1139 changes the meaning
of some response codes for database deletion request. This PR
documents the response codes according to the change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants