New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use configured shards db in custodian instead of "dbs" #3807
Conversation
How to test:
|
Previously, dbs with N < cluster default N would pollute logs with critical errors regarding not having enough shards. Instead, use each database's expected N value to emit custodian reports. Note: the expected N value is a bit tricky to understand since with shard splitting feature, shard ranges are not guaranteed to exactly match for all copies. The N value is then defined as the max number of rings which can be completed with the given set of shards -- complete the ring once, remove participating shards, try again, etc. Lucky for us, that function is already written (`mem3_util:calculate_max_n(Shards)` so we are just re-using it.
d6af550
to
8d86b39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as advertised:
[notice] 2021-10-28T22:48:16.151309Z node1@127.0.0.1 <0.7556.0> 16807916a8 localhost:15984 127.0.0.1 adm PUT /n1db?n=1 201 ok 69
[notice] 2021-10-28T22:48:25.183759Z node1@127.0.0.1 <0.7739.0> d924989833 localhost:15984 127.0.0.1 adm PUT /n2db?n=2 201 ok 71
[notice] 2021-10-28T22:48:47.861749Z node1@127.0.0.1 <0.132.0> -------- config: [couchdb] maintenance_mode set to true for reason nil
[critical] 2021-10-28T22:48:47.864359Z node1@127.0.0.1 <0.8160.0> -------- 1 shard in cluster with only 1 copy on nodes not in maintenance mode
[critical] 2021-10-28T22:48:47.864401Z node1@127.0.0.1 <0.8160.0> -------- 1 shard in cluster with only 0 copies on nodes not in maintenance mode
[warning] 2021-10-28T22:48:47.864451Z node1@127.0.0.1 <0.8160.0> -------- 4 shards in cluster with only 2 copies on nodes not in maintenance mode
[notice] 2021-10-28T22:49:45.102174Z node1@127.0.0.1 <0.132.0> -------- config: [couchdb] maintenance_mode set to false for reason nil
[notice] 2021-10-28T22:51:05.504482Z node1@127.0.0.1 <0.350.0> -------- rexi_server_mon : cluster unstable
[notice] 2021-10-28T22:51:05.504535Z node1@127.0.0.1 <0.354.0> -------- rexi_server_mon : cluster unstable
[notice] 2021-10-28T22:51:05.504620Z node1@127.0.0.1 <0.349.0> -------- rexi_server : cluster unstable
[notice] 2021-10-28T22:51:05.504672Z node1@127.0.0.1 <0.353.0> -------- rexi_buffer : cluster unstable
[notice] 2021-10-28T22:51:05.505066Z node1@127.0.0.1 <0.442.0> -------- couch_replicator_clustering : cluster unstable
[notice] 2021-10-28T22:51:05.505197Z node1@127.0.0.1 <0.452.0> -------- Stopping replicator db changes listener <0.1061.0>
[notice] 2021-10-28T22:51:05.511643Z node1@127.0.0.1 <0.10406.0> -------- All system databases exist.
[warning] 2021-10-28T22:51:05.513446Z node1@127.0.0.1 <0.10405.0> -------- 2 shards in cluster with only 1 copy on nodes that are currently up
[warning] 2021-10-28T22:51:05.513493Z node1@127.0.0.1 <0.10405.0> -------- 2 shards in cluster with only 1 copy on nodes not in maintenance mode
[warning] 2021-10-28T22:51:05.513539Z node1@127.0.0.1 <0.10405.0> -------- 4 shards in cluster with only 2 copies on nodes that are currently up
[warning] 2021-10-28T22:51:05.513576Z node1@127.0.0.1 <0.10405.0> -------- 4 shards in cluster with only 2 copies on nodes not in maintenance mode
[notice] 2021-10-28T22:51:20.505004Z node1@127.0.0.1 <0.349.0> -------- rexi_server : cluster stable
[notice] 2021-10-28T22:51:20.505575Z node1@127.0.0.1 <0.353.0> -------- rexi_buffer : cluster stable
@@ -1,6 +1,6 @@ | |||
Custodian is responsible for the data stored in CouchDB databases. | |||
|
|||
Custodian scans the "dbs" database, which details the location of | |||
Custodian scans the shards database, which details the location of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix!
couch_event:link_listener( | ||
?MODULE, handle_db_event, nil, [{dbname, <<"dbs">>}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this during import 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! Thanks for taking a look at the PR
However with custodian working finally, it also started to emit false positive errors in the logs for dbs with
N
< cluster defaultN
. We fix that in a separate commit where, instead of the cluster default N value we use each database's expected N value.The expected N value is a bit tricky to understand since with shard splitting feature, shard ranges are not guaranteed to exactly match for all copies. The
N
value is then defined as the max number of rings which can be completed with the given set of shards -- complete the ring once, remove participating shards, try again, etc. Lucky for us, that function is already written asmem3_util:calculate_max_n/2
so we are just re-using it.