Improve availability by caching system.peers #864
Labels
community/request
Issues created by external users
kind/enhancement
This is an enhancement of an existing feature
When JVM Cassandra clients connect to YugaByte's CQL interface, they make an immediate, synchronous request to discover other nodes in the cluster by querying the system table for other peers. This information is cached on the client to allow the client to automatically route requests to other peers--both for load balancing and fault tolerance.
In Yugabyte 1.1.13.0-b2 CE, the
system.peers
table is stored on the masters, which means that when a given node n cannot reach a master node which is connected to a majority of the masters (e.g. because the local node is partitioned away from the masters, masters are partitioned from each other, or master nodes have crashed, or some combination thereof), no client can connect to n.While it may be advantageous to perform linearizable reads of the system.peers table in some circumstances, the common case (what clients do automatically at the start of every connection) does not require up-to-date information. Client requests for system.peers are already cached on clients, and even if clients receive only partial lists of peers (including the trivial list of only the node they're talking to), that's still enough to begin doing work.
By caching the system.peers table, and/or returning a trivial row for the current node only when no master is available, YugaByte DB could let clients continue to connect and do work during some fault conditions.
The text was updated successfully, but these errors were encountered: