Search before creation
Documentation Related
I would like to request clearer official documentation for RocketMQ deployments across two same-city availability zones or data centers.
This is not intended as environment-specific consulting. The goal is to make the official documentation more explicit about the supported production topology, minimum node count, failure recovery boundaries, and active-active limitations for same-city dual-site deployments.
The scenario is:
- two availability zones or data centers in the same city;
- both sites are expected to serve production traffic;
- producers and consumers may connect to either site;
- the deployment should tolerate single-node failures and, where possible, one-site failures;
- the architecture should avoid split-brain, ambiguous failover behavior, or message availability assumptions that are not officially supported.
It would be very helpful if the documentation could clarify the recommended approach for this scenario, especially:
- Whether RocketMQ recommends a single logical cluster stretched across the two sites, or separate RocketMQ clusters with replication/application-level routing.
- The minimum production-ready node count for NameServer, Broker, Controller, and/or DLedger-based deployments in this scenario.
- Whether a third failure domain, arbitration node, or witness-like deployment is required for quorum and split-brain avoidance.
- How Broker master/slave replicas, Controller nodes, DLedger groups, and NameServer nodes should be distributed across the two sites.
- Which failure scenarios can recover automatically, for example single Broker failure, Controller leader failure, one-site failure, NameServer failure, cross-site network partition, or loss of an arbitration node.
- Whether active-active writes to the same logical topic from both sites are supported, discouraged, or intentionally out of scope.
- If active-active writes are not recommended, what the official alternative is, such as active-passive disaster recovery, dual clusters with application-level routing, or another documented pattern.
- Whether there are special limitations for ordered messages, transactional messages, delayed messages, consumer offset consistency, and message duplication during failover.
A reference architecture or decision matrix in the documentation would be valuable. For example, it could compare:
- a single RocketMQ cluster deployed across two same-city sites;
- a two-site deployment plus a third quorum or arbitration failure domain;
- two independent RocketMQ clusters with replication or application-level routing;
- active-passive disaster recovery;
- patterns that are not recommended, such as unsupported active-active writes to the same logical topic.
This clarification would help production users avoid incorrect assumptions about quorum, failover, data consistency, and message availability when designing same-city dual-site RocketMQ architectures.
Are you willing to submit PR?
Search before creation
Documentation Related
I would like to request clearer official documentation for RocketMQ deployments across two same-city availability zones or data centers.
This is not intended as environment-specific consulting. The goal is to make the official documentation more explicit about the supported production topology, minimum node count, failure recovery boundaries, and active-active limitations for same-city dual-site deployments.
The scenario is:
It would be very helpful if the documentation could clarify the recommended approach for this scenario, especially:
A reference architecture or decision matrix in the documentation would be valuable. For example, it could compare:
This clarification would help production users avoid incorrect assumptions about quorum, failover, data consistency, and message availability when designing same-city dual-site RocketMQ architectures.
Are you willing to submit PR?