From 3f16cf2e2bbba183e36bbbc8f66848cdb2da650a Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Fri, 29 Mar 2024 14:13:01 -0400 Subject: [PATCH 1/6] First pass --- .../deployment-considerations-lt-3nodes.adoc | 51 +++++++++++++++---- 1 file changed, 40 insertions(+), 11 deletions(-) diff --git a/modules/install/pages/deployment-considerations-lt-3nodes.adoc b/modules/install/pages/deployment-considerations-lt-3nodes.adoc index 4dc73a01ec..3ba259f0e9 100644 --- a/modules/install/pages/deployment-considerations-lt-3nodes.adoc +++ b/modules/install/pages/deployment-considerations-lt-3nodes.adoc @@ -18,6 +18,7 @@ The following limitations apply to deployments with two nodes: + When a deployment of Couchbase Server has fewer than three nodes, auto-failover is disabled. This is because with fewer than three nodes in the deployment, it's not easy to determine which node is having an issue and thus avoid a split-brain configuration. +You can optionally add an arbiter node to your cluster to to provide xref:#quorum-arbitration[quorum arbitration]. * *Maximum number of replicas is 1* + @@ -32,29 +33,57 @@ In the event of a failure, running the cluster at this level ensures that the re [#quorum-arbitration] === Quorum Arbitration -For a two-node cluster, safer running can be achieved by the deployment of a third node, as an _Arbiter_ node: this means _a node that hosts no Couchbase service_. -An Arbiter node can be deployed only under Couchbase Server Version 7.6 and later. +If you're deploying a two-node database or a database with two server groups, consider adding an arbiter node to your cluster. +An arbiter node is part of the Couchbase Server cluster, but it doesn't any service other than the xref:learn:clusters-and-availability/cluster-manager.adoc[] which joins it to the cluster. +The arbiter node helps your cluster in two ways: -Note that although the addition of the Arbiter does mean that the cluster is thereby made a three-node cluster, the third node may not require the capacity of either of the data-serving nodes. +* It provides xref:learn:clusters-and-availability/nodes.adoc#fast-failover[fast failover] which decreases the cluster's latency when responding to a failover. -Such a deployment ensures that the cluster is protected against the situation where the nodes get divided into two separate sets, due to an unanticipated https://en.wikipedia.org/wiki/Network_partition[network partition^]. -For information, see xref:learn:clusters-and-availability/hard-failover.adoc[Hard Failover], and in particular, xref:learn:clusters-and-availability/hard-failover.adoc#default-and-unsafe[Hard Failover in Default and Unsafe Modes]. +* It provides quorum arbitration, which allows Couchbase Server to execute a failover when it would otherwise refuse to do so. -For information on service deployment with: +In a two-node cluster without an arbiter node, when one node fails the other cannot differentiate between node failure and a network failure causing https://en.wikipedia.org/wiki/Network_partition[network partitioning^]. +The remaining node refuses to failover to avoid a split brain configuration where two groups of nodes independently alter data and risk potential data conflicts. +The arbiter node eliminates the possibility of conflict by ensuring only one set of nodes can continue to operate. +It does this by providing a quorum to the remaining node, letting it know that it's safe to failover and take on the role of the failed node. -* The REST API, see xref:rest-api:rest-set-up-services.adoc[Assigning Services]. +The arbiter node fulfills a similar role when there are two xref:learn:clusters-and-availability/groups.adoc[server groups]. +In case of network partitioning, the arbiter node can make sure that only one server group continues processing. +To play this role, the arbiter node should be in a third server group to maximize the chance of it remaining available if one of the server groups fails. +Ideally, you should locate it in a separate rack when using physical hardware. +When it runs in a container, cloud instance, or as a virtual machine, locate it on a separate host machine from the server groups. -* Couchbase Web Console during node-addition and node-joining, see the demonstrated uses of checkboxes, in xref:manage:manage-nodes/add-node-and-rebalance#arbiter-node-addition[Add a Node and Rebalance] and in xref:manage:manage-nodes/join-cluster-and-rebalance#arbiter-node-join[Join a Cluster and Rebalance]. +To deploy an arbiter node, add or join a node to the cluster without assigning it any services. +For instructions on adding a node, see: -* The CLI during node-addition, see xref:cli:cbcli/couchbase-cli-server-add[server-add]. +* xref:rest-api:rest-set-up-services.adoc[Assigning Services] to deploy via the REST API. + +* xref:manage:manage-nodes/add-node-and-rebalance#arbiter-node-addition[Add a Node and Rebalance] and xref:manage:manage-nodes/join-cluster-and-rebalance#arbiter-node-join[Join a Cluster and Rebalance] to deploy via the Couchbase Server Web Console. + +* The xref:cli:cbcli/couchbase-cli-server-add[server-add] command to deploy via the command line interface. + +For more information about failover, see xref:learn:clusters-and-availability/hard-failover.adoc[Hard Failover], and in particular, xref:learn:clusters-and-availability/hard-failover.adoc#default-and-unsafe[Hard Failover in Default and Unsafe Modes]. + +==== Arbiter Nodes and Licenses + +An arbiter node does not run any services other than the xref:learn:clusters-and-availability/cluster-manager.adoc[]. +This service is necessary to make the arbiter node part of the Couchbase Server cluster. +Because it does not run other services, such as the xref:learn:services-and-indexes/services/data-service.adoc[], the arbiter node does not require an additional license to run. +It's essentially free aside from the cost of running the physical hardware, container, or cloud instance. + +==== Sizing Recommendations for Arbiter Nodes + +Because the arbiter node only runs the Cluster Manager service, it often does not require the same resources as other nodes in the cluster. +Your best option for choosing hardware or a cloud instance for an arbiter node is to start with the smallest instance you plan to or currently use in your cluster. +Then monitor the arbiter node's CPU and RAM usage when your cluster experiences failovers and rebalanaces. +These events in the cluster are what cause the highest loads on the arbiter node. +If you find the arbiter is not using all of the CPU and RAM you allocated to it, you can choose to downsize the arbiter node. === Metadata Management In Couchbase Server 7.0+, metadata is managed by means of _Chronicle_; which is a _consensus-based_ system, based on the https://raft.github.io/[Raft^] algorithm. Due to the strong consistency with which topology-related metadata is thus managed, in the event of a _quorum failure_ (meaning, the unresponsiveness of at least half of the cluster's nodes -- for example, the unresponsiveness of one node in a two-node cluster), no modification of nodes, buckets, scopes, and collections can take place until the quorum failure is resolved. -Note that optionally, the quorum failure can be resolved by means of _unsafe failover_. -However, that the consequences of unsafe failover in 7.0 are different from those in previous versions; and the new consequences should be fully understood before unsafe failover is attempted. + For a complete overview of how all metadata is managed by Couchbase Server, see xref:learn:clusters-and-availability/metadata-management.adoc[Metadata Management]. For information on _unsafe failover_ and its consequences, see xref:learn:clusters-and-availability/hard-failover.adoc#performing-an-unsafe-failover[Performing an Unsafe Failover]. From 8ee395bee1dc79de2220c14698c1ed590895a308 Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Mon, 1 Apr 2024 09:47:36 -0400 Subject: [PATCH 2/6] Minor edit --- modules/install/pages/deployment-considerations-lt-3nodes.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/install/pages/deployment-considerations-lt-3nodes.adoc b/modules/install/pages/deployment-considerations-lt-3nodes.adoc index 3ba259f0e9..6ecec98665 100644 --- a/modules/install/pages/deployment-considerations-lt-3nodes.adoc +++ b/modules/install/pages/deployment-considerations-lt-3nodes.adoc @@ -44,7 +44,7 @@ The arbiter node helps your cluster in two ways: In a two-node cluster without an arbiter node, when one node fails the other cannot differentiate between node failure and a network failure causing https://en.wikipedia.org/wiki/Network_partition[network partitioning^]. The remaining node refuses to failover to avoid a split brain configuration where two groups of nodes independently alter data and risk potential data conflicts. The arbiter node eliminates the possibility of conflict by ensuring only one set of nodes can continue to operate. -It does this by providing a quorum to the remaining node, letting it know that it's safe to failover and take on the role of the failed node. +The arbiter provides a quorum to the remaining node, letting it know that it's safe to failover and take on the role of the failed node. The arbiter node fulfills a similar role when there are two xref:learn:clusters-and-availability/groups.adoc[server groups]. In case of network partitioning, the arbiter node can make sure that only one server group continues processing. From 8d7ed205525ba5364272f17523c8fb8126819f04 Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Mon, 1 Apr 2024 10:11:54 -0400 Subject: [PATCH 3/6] Minor fixes --- .../install/pages/deployment-considerations-lt-3nodes.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/install/pages/deployment-considerations-lt-3nodes.adoc b/modules/install/pages/deployment-considerations-lt-3nodes.adoc index 6ecec98665..bb96c9c280 100644 --- a/modules/install/pages/deployment-considerations-lt-3nodes.adoc +++ b/modules/install/pages/deployment-considerations-lt-3nodes.adoc @@ -18,7 +18,7 @@ The following limitations apply to deployments with two nodes: + When a deployment of Couchbase Server has fewer than three nodes, auto-failover is disabled. This is because with fewer than three nodes in the deployment, it's not easy to determine which node is having an issue and thus avoid a split-brain configuration. -You can optionally add an arbiter node to your cluster to to provide xref:#quorum-arbitration[quorum arbitration]. +You can optionally add an arbiter node to your cluster to provide xref:#quorum-arbitration[quorum arbitration]. * *Maximum number of replicas is 1* + @@ -76,7 +76,7 @@ Because the arbiter node only runs the Cluster Manager service, it often does no Your best option for choosing hardware or a cloud instance for an arbiter node is to start with the smallest instance you plan to or currently use in your cluster. Then monitor the arbiter node's CPU and RAM usage when your cluster experiences failovers and rebalanaces. These events in the cluster are what cause the highest loads on the arbiter node. -If you find the arbiter is not using all of the CPU and RAM you allocated to it, you can choose to downsize the arbiter node. +If you find the arbiter is not using all of the CPU and RAM you allocated to it, you can choose to downsize it. === Metadata Management From 4195b247c8c3c266b6bbd60bef30a32f477d5a9a Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Mon, 1 Apr 2024 11:37:34 -0400 Subject: [PATCH 4/6] More minor updates --- .../install/pages/deployment-considerations-lt-3nodes.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/install/pages/deployment-considerations-lt-3nodes.adoc b/modules/install/pages/deployment-considerations-lt-3nodes.adoc index bb96c9c280..d288a8659e 100644 --- a/modules/install/pages/deployment-considerations-lt-3nodes.adoc +++ b/modules/install/pages/deployment-considerations-lt-3nodes.adoc @@ -72,8 +72,8 @@ It's essentially free aside from the cost of running the physical hardware, cont ==== Sizing Recommendations for Arbiter Nodes -Because the arbiter node only runs the Cluster Manager service, it often does not require the same resources as other nodes in the cluster. -Your best option for choosing hardware or a cloud instance for an arbiter node is to start with the smallest instance you plan to or currently use in your cluster. +Because the arbiter node runs only the Cluster Manager service, it often does not require the same resources as other nodes in the cluster. +Your best option for choosing hardware or a cloud instance for an arbiter node is to start with the smallest instance you plan to or are already using in your cluster. Then monitor the arbiter node's CPU and RAM usage when your cluster experiences failovers and rebalanaces. These events in the cluster are what cause the highest loads on the arbiter node. If you find the arbiter is not using all of the CPU and RAM you allocated to it, you can choose to downsize it. From ba4a643385971e9e5165239097293831d37951a9 Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Thu, 11 Apr 2024 11:33:36 -0400 Subject: [PATCH 5/6] Edit based on Ben's feedback. --- .../install/pages/deployment-considerations-lt-3nodes.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/modules/install/pages/deployment-considerations-lt-3nodes.adoc b/modules/install/pages/deployment-considerations-lt-3nodes.adoc index d288a8659e..6c68880ef8 100644 --- a/modules/install/pages/deployment-considerations-lt-3nodes.adoc +++ b/modules/install/pages/deployment-considerations-lt-3nodes.adoc @@ -75,8 +75,10 @@ It's essentially free aside from the cost of running the physical hardware, cont Because the arbiter node runs only the Cluster Manager service, it often does not require the same resources as other nodes in the cluster. Your best option for choosing hardware or a cloud instance for an arbiter node is to start with the smallest instance you plan to or are already using in your cluster. Then monitor the arbiter node's CPU and RAM usage when your cluster experiences failovers and rebalanaces. -These events in the cluster are what cause the highest loads on the arbiter node. +These events (especially rebalances) in the cluster cause the highest CPU loads and memory use on the arbiter node. If you find the arbiter is not using all of the CPU and RAM you allocated to it, you can choose to downsize it. +The arbiter's RAM and CPU use grow as the number of nodes, buckets, and collections grow in your cluster. +Therefore, continue to monitor the RAM and CPU use of your arbiter node, especially after you expand your cluster. === Metadata Management From 0b41a00141ea4609e5b322f53d5045db20766d52 Mon Sep 17 00:00:00 2001 From: Gary Gray <137797428+ggray-cb@users.noreply.github.com> Date: Thu, 11 Apr 2024 13:37:16 -0400 Subject: [PATCH 6/6] Grammar edits. --- modules/install/pages/deployment-considerations-lt-3nodes.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/install/pages/deployment-considerations-lt-3nodes.adoc b/modules/install/pages/deployment-considerations-lt-3nodes.adoc index 6c68880ef8..7f7118fda6 100644 --- a/modules/install/pages/deployment-considerations-lt-3nodes.adoc +++ b/modules/install/pages/deployment-considerations-lt-3nodes.adoc @@ -75,7 +75,7 @@ It's essentially free aside from the cost of running the physical hardware, cont Because the arbiter node runs only the Cluster Manager service, it often does not require the same resources as other nodes in the cluster. Your best option for choosing hardware or a cloud instance for an arbiter node is to start with the smallest instance you plan to or are already using in your cluster. Then monitor the arbiter node's CPU and RAM usage when your cluster experiences failovers and rebalanaces. -These events (especially rebalances) in the cluster cause the highest CPU loads and memory use on the arbiter node. +Rebalances in the cluster cause the highest CPU loads and memory use on the arbiter node. If you find the arbiter is not using all of the CPU and RAM you allocated to it, you can choose to downsize it. The arbiter's RAM and CPU use grow as the number of nodes, buckets, and collections grow in your cluster. Therefore, continue to monitor the RAM and CPU use of your arbiter node, especially after you expand your cluster.