Fix for issue #1136 - Error 500 deleting DB without quorum #1139

jjrodrig · 2018-02-01T20:59:13Z

Overview

The current behaviour for database deletion in a cluster is:

Database deletion returns 404 - Not found if all nodes respond not found
Database deletion returns 200 - OK if all nodes respond and at least one is ok
Database deletion returns 202 - Accepted if the number of responses met quorum and at least one is ok
Database deletion returns 500 - Error if the responses are bellow quorum

After this PR the behaviour for database deletetion will be:

Database deletion returns 404 - Not found if all nodes respond not found
Database deletion returns 200 - OK if the quorum is met and at least one is ok
Database deletion returns 202 - Accepted if the number of responses are bellow quorum and at least one is ok
Database deletion returns 500 - Error in other cases

Testing recommendations

This PR can be tested in this way

 make test-cluster-with-quorum
 make test-cluster-without-quorum

New test for db deletion is included

Side change in test/javascript/run
I've included a change in test/javascript/run. Now the script does not exit with error if the parameter suites or ignore_js_suites is provided with a value that is not matched with an existing test.

After this PR, it is possible to run make check ignore_js_suites=reduce_builtin or make check suites=all_docs without getting an error in the new cluster testing targets.

Related Issues or Pull Requests

Fixes #1136

Checklist

Code is written and works correctly;
Changes are covered by tests;
Documentation reflects the changes;

Complete deletion tests with not found

wohali · 2018-02-16T18:04:44Z

@rnewson can you review this? You may also want to have a glance at the already merged companion PR, #1127

rnewson · 2018-02-20T09:16:29Z

src/fabric/src/fabric_db_delete.erl

        {N, M, _} when N >= (W div 2 + 1), M > 0 ->
-            {stop, accepted};


@kocolosk as the original decision maker for these semantics, do you have a problem with this? We've accepted the same idea on the create db side but before we accept this one, and make a formal release including both changes, I wanted to hear your thoughts. The idea here is that you should not get a 500 error when creating or deleting a db in a degraded cluster.

jjrodrig · 2018-02-21T16:19:20Z

@rnewson I've done some more testing on the deletion with different cluster conditions. It seems that some .couch files keep orphaned if a database is deleted with one of the nodes of the cluster stopped.

Start 3-node cluster and create test db:

./dev/run -n 3 --with-admin-party-please
curl -X PUT http://127.0.0.1:15984/test?q=1

One .couch file is created per node

Stop 1 node in the cluster and delete the database

./dev/run -n 3 --with-admin-party-please --degrade-cluster 1
curl -X DELETE http://127.0.0.1:15984/test

The .couch is removed and persist in the stopped node as expected

Start the complete cluster and check if the database is deleted

./dev/run -n 3 --with-admin-party-please 
curl -X HEAD http://127.0.0.1:15984/test -v

The database is not found but the .couch file is propagated from the restarted node to the rest

It seems that .couch files keep orphaned in the system after a deletion with at least a node stopped. This PR does not modify this behavior. Do you think, this is an issue?

kocolosk · 2018-02-27T21:45:31Z

Thanks for this PR @jjrodrig, very well done.

The orphaned files are a known condition. They are OK from the perspective of database correctness, as a new database created with the same name will have a different creation timestamp and so the old data would not be surfaced. It could be a useful future enhancement to remove orphaned shard files in a background process.

The previous behavior of distinguishing between a majority or minority of committed updates to the replicas of the shard table is not something I'm interested in preserving. I think we can do better.

I do think the use of 202 Accepted as an indicator to the client that "hey, things are a little messy right now, you might see surprises while we work things out" is a good thing. My concern with the current PR is that we may return 200 OK to a user even though some nodes in the cluster still host the old (soon-to-be-deleted) version of the database. For example, consider the following sequence of events:

Cluster experiences a network partition which creates subsets A and B
User submits DELETE /foo to a node in A and receives 200 OK
User submits PUT /foo to a node in A and receives 200 OK
User submits PUT /foo/bar to a node in B and receives 202 Accepted
Network partition resolves, shard maps are updated

In this sequence the /foo/bar document will be lost permanently 👎

Perhaps a preferable approach is to use 202 Accepted for every situation in which

We get at least one acknowledgement and
We do not hear back from a cluster member

The downside of this approach is that we wait around to hear back from every cluster member, and the request_timeout can be quite large. But I think we need to be quite careful about using 200 OK in situations where some cluster member could still be accepting new data in a shard on death row.

What do you think?

jjrodrig · 2018-03-06T21:08:47Z

Thanks @kocolosk for your comments.

I see that the orphaned files question is a different issue. We are facing it as part of a cleanup process that we have implemented in our system. Databases are periodically transferred to a temporary database and then moved back to the work database. It is like a purging procedure that allows us to keep the databases small. During this process we are removing and creating databases and orphaned files are produced. +1 for the orphaned files cleanup process enhancement.

Respect to the main issue of this PR, I think that the main problem is to respond with a 500 Error when the operation has been accepted and the database is deleted on the nodes that have received the request. It is responding with an error but the database is deleted in the active nodes, and later on propagated to the rest of nodes once they are accessible. It seem to me that this behaviour is more akin to a 202 Accepted result.

The idea of responding 200 Ok in the case that the quorum is met is mainly to keep the API consistent with other operations where the quorum is considered but I see that this can have some drawbacks.

wohali · 2018-03-26T23:30:25Z

@jjrodrig Are there updates coming to this PR from you?

…error-deleting-db

jjrodrig · 2018-03-29T20:13:39Z

@wohali I'll modify the condition to respond with 200 - Ok only if we have a response from all the nodes, If we have a positive response from some of them 202-Accepted is returned.

The problem I see is that the behaviour is not consistent with the creation where the quorum is considered, but is better than the current situation which returns a 500-Error

…error-deleting-db

…her case

janl · 2018-07-09T17:47:26Z

@jjrodrig can you resolve the conflict? Thanks!

…error-deleting-db

jjrodrig · 2018-07-10T07:26:54Z

@janl I've updated db deletion tests to cover new behaviour

Thanks

apache/couchdb#1139 changes the meaning of some response codes for database deletion request. This PR documents the response codes according to the change.

Fix for issue apache#1136 - Error 500 deleting DB without quorum

1408d4f

Complete deletion tests with not found

jjrodrig mentioned this pull request Feb 9, 2018

Tests for db,doc and attachment operations with different quorum situations #1156

Closed

rnewson reviewed Feb 20, 2018

View reviewed changes

nickva added this to the 2.2.0 milestone Feb 28, 2018

wohali added the waiting on user label Mar 26, 2018

Merge branch 'master' of https://github.com/apache/couchdb into 1136-…

b800fe6

…error-deleting-db

jjrodrig added 2 commits March 29, 2018 22:22

Merge branch 'master' of https://github.com/apache/couchdb into 1136-…

08c08c4

…error-deleting-db

Deletion responds 200 after a response from every node, and 202 in ot…

d16fbfd

…her case

jjrodrig added 2 commits July 9, 2018 21:42

Merge branch 'master' of https://github.com/apache/couchdb into 1136-…

0cc9059

…error-deleting-db

Adjust deletion tests in different cluster quorum conditions

b9981d7

Merge branch 'master' into 1136-error-deleting-db

1343953

janl merged commit 71cf9f4 into apache:master Jul 13, 2018

iilyak mentioned this pull request Nov 26, 2018

Add new return codes for database creation/deletion apache/couchdb-documentation#360

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for issue #1136 - Error 500 deleting DB without quorum #1139

Fix for issue #1136 - Error 500 deleting DB without quorum #1139

jjrodrig commented Feb 1, 2018

wohali commented Feb 16, 2018

rnewson Feb 20, 2018

jjrodrig commented Feb 21, 2018 •

edited

kocolosk commented Feb 27, 2018

jjrodrig commented Mar 6, 2018

wohali commented Mar 26, 2018

jjrodrig commented Mar 29, 2018

janl commented Jul 9, 2018

jjrodrig commented Jul 10, 2018

		{N, M, _} when N >= (W div 2 + 1), M > 0 ->
		{stop, accepted};

Navigation Menu

Fix for issue #1136 - Error 500 deleting DB without quorum #1139

Fix for issue #1136 - Error 500 deleting DB without quorum #1139

Conversation

jjrodrig commented Feb 1, 2018

Overview

Testing recommendations

Related Issues or Pull Requests

Checklist

wohali commented Feb 16, 2018

rnewson Feb 20, 2018

Choose a reason for hiding this comment

jjrodrig commented Feb 21, 2018 • edited

kocolosk commented Feb 27, 2018

jjrodrig commented Mar 6, 2018

wohali commented Mar 26, 2018

jjrodrig commented Mar 29, 2018

janl commented Jul 9, 2018

jjrodrig commented Jul 10, 2018

jjrodrig commented Feb 21, 2018 •

edited