Motivation
The Java client introduced SameAuthParamsLookupAutoClusterFailover in apache/pulsar#23129 (merged August 2024, released in Pulsar 4.0.0 and backported to 3.0.7 / 3.3.2). This ServiceUrlProvider implementation addresses a well-known reliability gap in AutoClusterFailover that is particularly relevant to geo-replication deployments sitting behind a Pulsar Proxy.
The problem with AutoClusterFailover: its health probe is a raw TCP connection. In a typical deployment where a Pulsar Proxy fronts the brokers, the TCP probe succeeds as soon as the proxy accepts the connection — even if all brokers behind the proxy have crashed. This means AutoClusterFailover cannot detect broker-layer failure and may reconnect clients to a cluster that is not actually serving requests.
What SameAuthParamsLookupAutoClusterFailover does differently:
- Probes cluster health via a topic lookup (
getBroker() on a configurable test topic) rather than a raw TCP connection. A broker that can respond to a lookup is demonstrably processing requests — the proxy cannot mask broker failure here.
- Introduces a hysteresis state machine with separate
failoverThreshold and recoverThreshold counters (default 5 each), requiring consecutive failures before cutting over and consecutive successes before switching back. This prevents flapping without requiring a coarse switchBackDelay timer.
- Targets geo-replication topologies where all clusters share the same authentication credentials, which is the common case.
Request
Port SameAuthParamsLookupAutoClusterFailover to the C++ client.
The ServiceInfoProvider interface is already part of the C++ public API (include/pulsar/ServiceInfoProvider.h), and AutoClusterFailover is already implemented against it — so the interface contract is defined and the pattern is established. The Java implementation (SameAuthParamsLookupAutoClusterFailover.java) serves as a direct reference.
Impact
The C++ client is the foundation for the Node.js client binding. Once SameAuthParamsLookupAutoClusterFailover is available in C++, it can be surfaced to Node.js consumers as well — a client language that currently has no automatic failover support at all.
This would bring C++ and Node.js deployments to parity with Java on the most important AutoClusterFailover reliability fix for proxy-fronted geo-replication clusters.
References
Motivation
The Java client introduced
SameAuthParamsLookupAutoClusterFailoverin apache/pulsar#23129 (merged August 2024, released in Pulsar 4.0.0 and backported to 3.0.7 / 3.3.2). ThisServiceUrlProviderimplementation addresses a well-known reliability gap inAutoClusterFailoverthat is particularly relevant to geo-replication deployments sitting behind a Pulsar Proxy.The problem with
AutoClusterFailover: its health probe is a raw TCP connection. In a typical deployment where a Pulsar Proxy fronts the brokers, the TCP probe succeeds as soon as the proxy accepts the connection — even if all brokers behind the proxy have crashed. This meansAutoClusterFailovercannot detect broker-layer failure and may reconnect clients to a cluster that is not actually serving requests.What
SameAuthParamsLookupAutoClusterFailoverdoes differently:getBroker()on a configurable test topic) rather than a raw TCP connection. A broker that can respond to a lookup is demonstrably processing requests — the proxy cannot mask broker failure here.failoverThresholdandrecoverThresholdcounters (default 5 each), requiring consecutive failures before cutting over and consecutive successes before switching back. This prevents flapping without requiring a coarseswitchBackDelaytimer.Request
Port
SameAuthParamsLookupAutoClusterFailoverto the C++ client.The
ServiceInfoProviderinterface is already part of the C++ public API (include/pulsar/ServiceInfoProvider.h), andAutoClusterFailoveris already implemented against it — so the interface contract is defined and the pattern is established. The Java implementation (SameAuthParamsLookupAutoClusterFailover.java) serves as a direct reference.Impact
The C++ client is the foundation for the Node.js client binding. Once
SameAuthParamsLookupAutoClusterFailoveris available in C++, it can be surfaced to Node.js consumers as well — a client language that currently has no automatic failover support at all.This would bring C++ and Node.js deployments to parity with Java on the most important
AutoClusterFailoverreliability fix for proxy-fronted geo-replication clusters.References
include/pulsar/ServiceInfoProvider.h— existing C++ interfacelib/AutoClusterFailover.cc— existing C++AutoClusterFailoverimplementation (reference for structure)