New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vmstorageUserTimeout
flags to configure TCP User Timeout on Linux
#4423
Conversation
Nice PR! Would you mind fixing tests for CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@wjordan Could you please rebase to the latest version of cluster? |
Use TCP_TIMEOUT_USER linux socket option to close connection based on user provided timeout. Using default timeout and keepalive flow is not sufficient to close connection in some cases which leads to timeout not being applied. Effectively, if connection stops processing new packets it will take up to 1 minute(with default configuration) instead of 1 second(based on keepalive value) to mark connection as broken. This can be tested by adding firewall rule to deny packets to vmstorage port: iptables -A INPUT -p tcp --dport {port} -j DROP To reverse this: iptables -D INPUT -p tcp --dport {port} -j DROP See: #4423 for more details. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
f7427ad
to
3b667c8
Compare
`TCP_USER_TIMEOUT` (since Linux 2.6.37) specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the connection and return `ETIMEDOUT` to the application. Setting a low TCP user timeout allows RPC connections quickly reroute around unavailable storage nodes during network interruptions.
@zekker6 I've rebased the PR, updated the test and added some additional context to the flag descriptions. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## cluster #4423 +/- ##
===========================================
- Coverage 59.35% 59.32% -0.03%
===========================================
Files 393 394 +1
Lines 74723 74735 +12
===========================================
- Hits 44349 44338 -11
- Misses 27944 27964 +20
- Partials 2430 2433 +3
☔ View full report in Codecov by Sentry. |
@wjordan , thanks for the pull request! |
@wjordan , I set default value for
|
FYI, this pull request has been included in cluster version of VictoriaMetrics starting from v1.94.0. |
The
TCP_USER_TIMEOUT
socket option (since Linux 2.6.37) specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the connection and returnETIMEDOUT
to the application.Setting a low user timeout allows RPC connections to be closed and more quickly rerouted to alternate storage nodes during network interruptions:
For
vminsert
, setting a user timeout would allow an insert request to be configured to route around an unavailable storage node in milliseconds, instead of waiting for the full >60-second request timeout to mark the node as broken.For
vmselect
, setting a user timeout would be an additional solution to the issues described in #711 and #3378, allowing rerouted and partial-result queries to be returned more quickly when some storage nodes are unavailable.The added
vmstorageUserTimeout
flags will both default to0
where they will be ignored, so this PR shouldn't introduce any breaking changes.Related note: the existing 1-second keepalive mentioned in #711 (comment) isn't sufficient to automatically mark unavailable nodes as broken in max 1 second as described. First, TCP keepalive only applies to completely idle connections and doesn't work at all if there's data waiting in the send buffer or in transit when the network is interrupted. In that case, the TCP retransmission limit eventually closes the connection (
tcp_retries2
sysctl defaults to 15 retries =~16 minutes). Second, a number of keepalive probes are sent before closing an idle connection (tcp_keepalive_probes
sysctl defaults to 9), so an idle connection would eventually close after 9 seconds of failed probes.See https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ for more discussion of the interactions between
SO_KEEPALIVE
andTCP_USER_TIMEOUT
.