Skip to content
This repository was archived by the owner on Oct 17, 2022. It is now read-only.

Conversation

@adrienverge
Copy link
Contributor

@adrienverge adrienverge commented Oct 8, 2018

The doc said:

If you have 4 shards, that means that you can have at most 4 nodes.

But at other places it states:

For systems with lots of small, infrequently accessed databases, or
for servers with fewer CPU cores, consider reducing this value to
1 or 2.

and:

In a default 3-node cluster, each node would receive 8 shards.

From my understanding, the number of shards can be anything, and even
q=1 will not alter safety of data. Only n can. If I understand
correctly:

  • If you have 4 copies of a shard (n=4), that means that you can
    have at most 4 nodes.
  • The number of nodes is not impacted by the number of shards (q).
  • You can have safe clusters with n=3, q=1 (2 nodes can be down).

So this commit tries to make the doc clearer. -- Please tell me if I'm wrong!

@wohali
Copy link
Member

wohali commented Oct 11, 2018

This isn't accurate either, unfortunately. I think we need a more complete rewrite here.

If you have 4 copies of a shard, n=4, then you can still have 100 nodes in your cluster - but that specific shard replica will live at most on 4 of those nodes. If you have q=8 on top of your n=4, then you will use a maximum of 8 x 4 = 32 nodes in a 100 node cluster. Again, this is fine - if you had 4 databases with q=8 and n=4 you'd have at least one shard replica on every node, with some nodes having 2 replicas.

shard you can have only one node, just as with CouchDB 1.x.
A shard is a part of a database. It can be replicated multiple times. The more
copies of a shard, the more you can scale out. If you have 4 replicas, that
means that you can have at most 4 nodes. With one replica you can have only one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"means that all 4 copies of this specific shard will live on at most 4 nodes."

A shard is a part of a database. It can be replicated multiple times. The more
copies of a shard, the more you can scale out. If you have 4 replicas, that
means that you can have at most 4 nodes. With one replica you can have only one
node, just as with CouchDB 1.x.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"No node can have more than one copy of each shard replica."

"The default for CouchDB since 2.0.0 is q=8 and n=3, meaning each database (and secondary index) is split into 8 shards, with 3 replicas per shard, for a total of 24 shard replica files.

"For a CouchDB cluster only hosting a single database with these default values, a maximum of 24 nodes can be used to scale horizontally."

The doc said:
> If you have 4 shards, that means that you can have at most 4 nodes.

But at other places it states:
> For systems with lots of small, infrequently accessed databases, or
> for servers with fewer CPU cores, consider reducing this value to
> ``1`` or ``2``.

and:
> In a default 3-node cluster, each node would receive 8 shards.

So this commit tries to make the doc clearer.
@adrienverge
Copy link
Contributor Author

Thanks for the explanation @wohali, it's clear now.

I've updated the pull request according to your suggestions.

@wohali wohali merged commit 938a8e4 into apache:master Oct 11, 2018
@wohali
Copy link
Member

wohali commented Oct 11, 2018

Thanks for the help, @adrienverge !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants