forked from elastic/elasticsearch
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds documentation suggesting reducing `tcp_retries2` on Linux to detect network partitions more quickly. Relates elastic#34405
- Loading branch information
1 parent
3515909
commit cdec0f8
Showing
2 changed files
with
52 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
[[system-config-tcpretries]] | ||
=== TCP retransmission timeout | ||
|
||
Each pair of nodes in a cluster communicates via a number of TCP connections | ||
which remain open until one of the nodes shuts down or communication between | ||
the nodes is disrupted by a failure in the underlying infrastructure. | ||
|
||
TCP provides reliable communication over occasionally-unreliable networks by | ||
hiding temporary network disruptions from the communicating applications. Your | ||
operating system will retransmit any lost messages a number of times before | ||
informing the sender of any problem. Most Linux distributions default to | ||
retransmitting any lost packets 15 times. Retransmissions back off | ||
exponentially, so these 15 retransmissions take over 900 seconds to complete. | ||
This means it takes Linux many minutes to detect a network partition or a | ||
failed node with this method. Windows defaults to just 5 retransmissions which | ||
corresponds with a timeout of around 6 seconds. | ||
|
||
The Linux default allows for communication over networks that may experience | ||
very long periods of packet loss, but this default is excessive for production | ||
networks within a single data centre as is the case for most {es} clusters. | ||
Highly-available clusters must be able to detect node failures quickly so that | ||
they can react promptly by reallocating lost shards, rerouting searches and | ||
perhaps electing a new master node. Linux users should therefore reduce the | ||
maximum number of TCP retransmissions. | ||
|
||
You can decrease the maximum number of TCP retransmissions to `5` by running | ||
the following command as `root`. Five retransmissions corresponds with a | ||
timeout of around 6 seconds. | ||
|
||
[source,sh] | ||
------------------------------------- | ||
sysctl -w net.ipv4.tcp_retries2=5 | ||
------------------------------------- | ||
|
||
To set this value permanently, update the `net.ipv4.tcp_retries2` setting in | ||
`/etc/sysctl.conf`. To verify after rebooting, run `sysctl | ||
net.ipv4.tcp_retries2`. | ||
|
||
{es} also implements its own health checks with timeouts that are much shorter | ||
than the default retransmission timeout on Linux. However these health checks | ||
must allow for application-level effects such as garbage collection pauses. We | ||
do not recommend reducing any timeouts related to these application-level | ||
health checks. | ||
|
||
IMPORTANT: This setting applies to all TCP connections and will affect the | ||
reliability of communication with systems outside your cluster too. If your | ||
cluster communicates with external systems over an unreliable network then you | ||
may need to select a higher value for `net.ipv4.tcp_retries2`. For this reason, | ||
{es} does not adjust this setting automatically. |