Skip to content

Conversation

@jjrodrig
Copy link
Contributor

@jjrodrig jjrodrig commented Jan 25, 2018

Overview

Current behaviour of DB creation in a cluster degraded situation is not consistent with the general behaviour described for document creation in the same situation.

The number of copies of a document with the same revision that have to be read before CouchDB returns with a 200 is equal to a half of total copies of the document plus one. It is the same for the number of nodes that need to save a document before a write is returned with 201. If there are less nodes than that number, then 202 is returned. Both read and write numbers can be specified with a request as r and w parameters accordingly.

The current behaviour for database creation in a cluster is:

  • Database creation returns 201 - Created if all nodes responds ok
  • Database creation returns 202 - Accepted if the quorum is met
  • Database creation returns 500 - Error if the responses are bellow quorum

The quorum is the default: Number of nodes/2 +1

This PR changes the database creation result with the following behaviour:

  • Database creation returns 201 - Creation if the quorum is met
  • Database creation returns 202 - Accepted if at least one node responds ok
  • Database creation returns 500 - Error if there is no correct response from any node

Testing recommendations

  • All previous tests are ok
  • I've focused on chttpd and javascript tests
  • I've skiped reduce_builtin.js test as it is failing even in master branch
    test/javascript/tests/reduce_builtin.js Error: {gen_server,call,[<0.2426.1>,{get_state,49},infinity]}

make check apps=chttpd ignore_js_suites=reduce_builtin

I didn't identify testing infrastructure for testing cluster degradation issues. So I decided to add some support for testing in different cluster conditions. Now make has two more task:

  • test-cluster-with-quorum, which launch a three node cluster and stops one node then executes the tests
  • test-cluster-without-quorum which launch a three node cluster and stops two nodes then executes the tests

This PR can be tested in this way

 make test-cluster-with-quorum
 make test-cluster-without-quorum

Related Issues or Pull Requests

Issue #603

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests; (DB creation yes, Cluster degradation no)
  • Documentation reflects the changes;

Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase and squash, I'm happy with this change. Thanks!

Makefile Outdated
endif
@rm -rf dev/lib
@dev/run -n 3 -q --with-admin-party-please \
--enable-erlang-views --degrade-cluster 2 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Makefile Outdated
-c 'startup_jitter=0' \
'test/javascript/run --suites "$(suites)" \
--ignore "$(ignore_js_suites)" \
--path test/javascript/tests-cluster/with-quorum'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

NumOk when NumOk >= (W div 2 +1) ->
{stop, ok};
NumOk when NumOk >= (W div 2 + 1) ->
NumOk when NumOk > 0 ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the right change.

Add degrade-cluster option for cluster testing

Add tests for different cluster conditions with/without quorum

Add test-cluster-with-quorum and test-cluster-without-quorum tasks
@jjrodrig jjrodrig force-pushed the 603-error-creating-db branch from 3573f3e to a585f0b Compare January 29, 2018 15:27
@rnewson
Copy link
Member

rnewson commented Jan 30, 2018

+1

@janl
Copy link
Member

janl commented Jan 30, 2018

@jjrodrig we wanted to make it so that make test-cluster-with[out]-quorum is run as part of make check, but this doesn’t seem to work on Travis, would you be able to assist with debugging this?

@jjrodrig
Copy link
Contributor Author

Ok, I'll check it.

@janl
Copy link
Member

janl commented Jan 30, 2018

Thank you! :)

@janl janl force-pushed the 603-error-creating-db branch from 31843dc to 2ae9fd2 Compare January 30, 2018 15:38
@janl
Copy link
Member

janl commented Jan 30, 2018

I’m trying a new variant where the new make targets run first in make check

@jjrodrig
Copy link
Contributor Author

@janl I've reproduced the problem in my environment. It seems that the stop after the mango test is collisioning with the start of the cluster test.
If you see this commit:

jjrodrig@f45b404

I've introduced an sleep after mango tests and the check is working

Also, I've noticed that the mango test do not clean up dev/lib so it can be affected by previous test executions.

@janl janl merged commit 1c39e0c into apache:master Jan 31, 2018
@janl
Copy link
Member

janl commented Jan 31, 2018

merged this, and filed an issue for cleaning up mango: #1134

Thanks a lot for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants