Use configured shards db in custodian instead of "dbs"#3807
Merged
Conversation
Contributor
Author
|
How to test:
|
Previously, dbs with N < cluster default N would pollute logs with critical errors regarding not having enough shards. Instead, use each database's expected N value to emit custodian reports. Note: the expected N value is a bit tricky to understand since with shard splitting feature, shard ranges are not guaranteed to exactly match for all copies. The N value is then defined as the max number of rings which can be completed with the given set of shards -- complete the ring once, remove participating shards, try again, etc. Lucky for us, that function is already written (`mem3_util:calculate_max_n(Shards)` so we are just re-using it.
d6af550 to
8d86b39
Compare
jaydoane
approved these changes
Oct 28, 2021
Contributor
jaydoane
left a comment
There was a problem hiding this comment.
Works as advertised:
[notice] 2021-10-28T22:48:16.151309Z node1@127.0.0.1 <0.7556.0> 16807916a8 localhost:15984 127.0.0.1 adm PUT /n1db?n=1 201 ok 69
[notice] 2021-10-28T22:48:25.183759Z node1@127.0.0.1 <0.7739.0> d924989833 localhost:15984 127.0.0.1 adm PUT /n2db?n=2 201 ok 71
[notice] 2021-10-28T22:48:47.861749Z node1@127.0.0.1 <0.132.0> -------- config: [couchdb] maintenance_mode set to true for reason nil
[critical] 2021-10-28T22:48:47.864359Z node1@127.0.0.1 <0.8160.0> -------- 1 shard in cluster with only 1 copy on nodes not in maintenance mode
[critical] 2021-10-28T22:48:47.864401Z node1@127.0.0.1 <0.8160.0> -------- 1 shard in cluster with only 0 copies on nodes not in maintenance mode
[warning] 2021-10-28T22:48:47.864451Z node1@127.0.0.1 <0.8160.0> -------- 4 shards in cluster with only 2 copies on nodes not in maintenance mode
[notice] 2021-10-28T22:49:45.102174Z node1@127.0.0.1 <0.132.0> -------- config: [couchdb] maintenance_mode set to false for reason nil
[notice] 2021-10-28T22:51:05.504482Z node1@127.0.0.1 <0.350.0> -------- rexi_server_mon : cluster unstable
[notice] 2021-10-28T22:51:05.504535Z node1@127.0.0.1 <0.354.0> -------- rexi_server_mon : cluster unstable
[notice] 2021-10-28T22:51:05.504620Z node1@127.0.0.1 <0.349.0> -------- rexi_server : cluster unstable
[notice] 2021-10-28T22:51:05.504672Z node1@127.0.0.1 <0.353.0> -------- rexi_buffer : cluster unstable
[notice] 2021-10-28T22:51:05.505066Z node1@127.0.0.1 <0.442.0> -------- couch_replicator_clustering : cluster unstable
[notice] 2021-10-28T22:51:05.505197Z node1@127.0.0.1 <0.452.0> -------- Stopping replicator db changes listener <0.1061.0>
[notice] 2021-10-28T22:51:05.511643Z node1@127.0.0.1 <0.10406.0> -------- All system databases exist.
[warning] 2021-10-28T22:51:05.513446Z node1@127.0.0.1 <0.10405.0> -------- 2 shards in cluster with only 1 copy on nodes that are currently up
[warning] 2021-10-28T22:51:05.513493Z node1@127.0.0.1 <0.10405.0> -------- 2 shards in cluster with only 1 copy on nodes not in maintenance mode
[warning] 2021-10-28T22:51:05.513539Z node1@127.0.0.1 <0.10405.0> -------- 4 shards in cluster with only 2 copies on nodes that are currently up
[warning] 2021-10-28T22:51:05.513576Z node1@127.0.0.1 <0.10405.0> -------- 4 shards in cluster with only 2 copies on nodes not in maintenance mode
[notice] 2021-10-28T22:51:20.505004Z node1@127.0.0.1 <0.349.0> -------- rexi_server : cluster stable
[notice] 2021-10-28T22:51:20.505575Z node1@127.0.0.1 <0.353.0> -------- rexi_buffer : cluster stable
| Custodian is responsible for the data stored in CouchDB databases. | ||
|
|
||
| Custodian scans the "dbs" database, which details the location of | ||
| Custodian scans the shards database, which details the location of |
| start_event_listener() -> | ||
| DbName = mem3_sync:shards_db(), | ||
| couch_event:link_listener( | ||
| ?MODULE, handle_db_event, nil, [{dbname, <<"dbs">>}] |
Contributor
There was a problem hiding this comment.
Sorry I missed this during import 😞
Contributor
Author
There was a problem hiding this comment.
No worries! Thanks for taking a look at the PR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
However with custodian working finally, it also started to emit false positive errors in the logs for dbs with
N< cluster defaultN. We fix that in a separate commit where, instead of the cluster default N value we use each database's expected N value.The expected N value is a bit tricky to understand since with shard splitting feature, shard ranges are not guaranteed to exactly match for all copies. The
Nvalue is then defined as the max number of rings which can be completed with the given set of shards -- complete the ring once, remove participating shards, try again, etc. Lucky for us, that function is already written asmem3_util:calculate_max_n/2so we are just re-using it.