Skip to content

Commit

Permalink
base: reduce network timeouts
Browse files Browse the repository at this point in the history
This patch reduces the network timeout from 3 seconds to 2 seconds. This
change also affects gRPC keepalive intervals/timeouts.

Furthermore, the RPC heartbeat interval is now reduced to half of the
network timeout (from 3 seconds to 1 second), with a timeout equal to
the network timeout (from 6 seconds to 2 seconds).

When a peer is unresponsive, these timeouts determine how quickly RPC
calls (and thus critical operations such as lease acquisitions) will be
retried against a different node. Reducing them therefore improves
recovery time during infrastructure outages.

Release note (ops change): The network timeout for RPC connections
between cluster nodes has been reduced from 3 seconds to 2 seconds, in
order to reduce unavailability and tail latencies during infrastructure
outages.
  • Loading branch information
erikgrinaker committed Nov 27, 2022
1 parent 1a6e9f8 commit 71fd9eb
Showing 1 changed file with 13 additions and 4 deletions.
17 changes: 13 additions & 4 deletions pkg/base/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,14 @@ const (
defaultHTTPAddr = ":" + DefaultHTTPPort

// NetworkTimeout is the timeout used for network operations.
NetworkTimeout = 3 * time.Second
//
// The maximum RTT between GCP regions is roughly 350 ms (asia-south2 to
// southamerica-west1). Linux has an RTT-dependant retransmission timeout
// (RTO) which we can approximate as 1.5x RTT (smoothed RTT + 4x RTT
// variance), with a lower bound of 200ms, so the worst-case RTO is 600ms. 2s
// should therefore be sufficient under most normal conditions.
// https://datastudio.google.com/reporting/fc733b10-9744-4a72-a502-92290f608571/page/70YCB
NetworkTimeout = 2 * time.Second

// defaultRaftTickInterval is the default resolution of the Raft timer.
defaultRaftTickInterval = 200 * time.Millisecond
Expand All @@ -66,9 +73,11 @@ const (
// each heartbeat.
defaultRaftHeartbeatIntervalTicks = 5

// defaultRPCHeartbeatInterval is the default value of RPCHeartbeatIntervalAndHalfTimeout
// used by the rpc context.
defaultRPCHeartbeatInterval = 3 * time.Second
// defaultRPCHeartbeatInterval is the default value of
// RPCHeartbeatIntervalAndHalfTimeout used by the RPC context. The heartbeat
// timeout is twice this value, and we want that to be equivalent to
// NetworkTimeout to quickly detect peer unavailability.
defaultRPCHeartbeatInterval = NetworkTimeout / 2

// defaultRangeLeaseRenewalFraction specifies what fraction the range lease
// renewal duration should be of the range lease active time. For example,
Expand Down

0 comments on commit 71fd9eb

Please sign in to comment.