Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-11185 document fast failover/node without services #3378

Merged
merged 7 commits into from
Mar 14, 2024

Conversation

tonyjhillman
Copy link
Contributor

No description provided.

The _Data Service_ must run on at least one node.
Each node can run any number of services, up to the maximum, which is seven.
In Couchbase Enterprise Server Version 7.6+, a node can be configured to run _no_ service.
The _Data Service_ must run on at least one node of the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is important here but....The first node of the cluster (the one that creates the cluster) must have the data service configured.

* _Quorum Arbitration_.
See xref:install:deployment-considerations-lt-3nodes.adoc#quorum-arbitration[Quorum Arbitration].

Note, however, that in the creation of an initial, one-node cluster, the node must always be assigned at least the Data Service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see it mentioned here.

@@ -21,3 +21,11 @@ See xref:rest-api:rest-setting-security.adoc[Configure On-the-Wire Security].
* Credentials for Couchbase-Server internal users can now be rotated at any time, by means of the REST API.
See xref:rest-api:rest-rotate-internal-credentials.adoc[Rotate Internal Credentials].

* One or more _Serviceless Nodes_ can now be added to a cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should change to Arbiter Node

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Disclaimer: I've not read the doc proposal in it's entirety yet, this comment is just about naming of this new capability)

I agree with Tony's comments in the email that @stevewatanabe shared that "Arbiter Node" isn't a good description of this feature. Rather than describe a new capability of the server (that we can add nodes running only the cluster manager) it describes one use-case for those nodes (acting as an arbiter). Given that we've identified at least two uses for these types of nodes so far (acting as an arbiter and for preferential orchestrator selection) I don't think that it's a particularly good idea to name them after one use case, rather than the capability that we've added. We don't know what other use cases we might find for these in the future, and "Arbiter Node" might not fit those particularly well.

A couple of points additionally worth highlighting:

  1. In addition to the faster failover case, the preferential orchestrator selection can additionally be used to take orchestrator burden (the increased CPU/memory required) away from any other node.
  2. Similarly to the example given here, that any non-KV node can be used to the same extent as a service-less node to achieve faster failover, any node running any service can act as an arbiter to achieve a quorum majority. We've recommened, and I believe seen a few customer use, a backup service node for this in the past.

I'm not a fan of naming things, but IMO we should call these something like "Cluster Manager Node" (perhaps "Cluster Manager Only Node"), and explicitly call out that other service nodes will pick up cluster management activites if such a node is used, and that this is perfectly fine for most use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't come up with a better term off the top of my head (and I haven't fully read the draft doc or the associated PRDs). At my previous company, we called a similar feature "standby nodes."

But, assuming that "Arbiter Node" is mentioned either in the UI, the API, or some message someplace in the product, it's too late to try to find the perfect term. I'll go ahead and update this PR with the approved "arbiter Node" term.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this feature is only available in the REST API, it is not in the UI, and it is configured as an absence of services when we set up a node. We have not called this an "arbiter node" anywhere within ns_server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and swapped all instances of serviceless node with arbiter node.

In case where the Master Services are co-located with a highly subscribed services (in particular, the Data Service), this can result in unnecessary latency in failing over the node and ensuring that data can be served.

In consequence, the Master Services should ideally _not_ be co-located with any service.
Such a configuration is possible in Couchbase Enterprise Server 7.6+: one or more _serviceless_ nodes can be added to the cluster, with the intention of ensuring that the Master Services will occupy such a node, and thereby not be co-located with any service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevewatanabe please correct me if I'm wrong, but there's no requirement to use an arbiter node to achieve this goal. The cluster will automatically choose any node that isn't running the data service as a preference. The addition of an arbiter node is useful where there are no available nodes in the cluster in the required configuration such as a cluster of all data nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked @BenHuddleston to chime in as he implemented a weighting system based on the services and their type running on each node. Ian is correct that an arbiter node isn't required....Ben can provide more detailed information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b33f any non-KV node could be used for this. For your interest, the priority order is as follows (higher weight => lower priority, service-less nodes are implicitly 0):

%% Weight values used in mb_master to determine orchestrator placement.
-define(DEFAULT_SERVICE_WEIGHTS, [{kv, 10000},
                                  {index, 1000},
                                  {fts, 1000},
                                  {cbas, 1000},
                                  {n1ql, 100},
                                  {eventing, 100},
                                  {backup, 10}]).

Copy link
Contributor

@ggray-cb ggray-cb Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adde some more detail here to explain the rankings of services. I didn't put down the actual cores, but just mentioned that each "tier" is 10x it's predecessor. Mainly, I think, it just aswers potential user questions on why a node with more services running was picked over a node running just the KV service. Please let me know if that should be removed, or if I got it wrong.

@ggray-cb ggray-cb changed the title Doc 11565 DOC-11185 document fast failover/node without services Mar 5, 2024
@ggray-cb
Copy link
Contributor

ggray-cb commented Mar 5, 2024

Renamed PR to match outstanding JIRA ticket so I don;t lose this thing again.

@ggray-cb
Copy link
Contributor

I will be merging this tomorrow (March 14th) unless there are objections.

@BenHuddleston
Copy link

No new comments from me, thanks Gary.

@bfavini bfavini merged commit 981c7db into couchbase:release/7.6 Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants