Add `vmstorageUserTimeout` flags to configure TCP User Timeout on Linux #4423

wjordan · 2023-06-07T20:53:25Z

The TCP_USER_TIMEOUT socket option (since Linux 2.6.37) specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the connection and return ETIMEDOUT to the application.

Setting a low user timeout allows RPC connections to be closed and more quickly rerouted to alternate storage nodes during network interruptions:

For vminsert, setting a user timeout would allow an insert request to be configured to route around an unavailable storage node in milliseconds, instead of waiting for the full >60-second request timeout to mark the node as broken.

For vmselect, setting a user timeout would be an additional solution to the issues described in #711 and #3378, allowing rerouted and partial-result queries to be returned more quickly when some storage nodes are unavailable.

The added vmstorageUserTimeout flags will both default to 0 where they will be ignored, so this PR shouldn't introduce any breaking changes.

Related note: the existing 1-second keepalive mentioned in #711 (comment) isn't sufficient to automatically mark unavailable nodes as broken in max 1 second as described. First, TCP keepalive only applies to completely idle connections and doesn't work at all if there's data waiting in the send buffer or in transit when the network is interrupted. In that case, the TCP retransmission limit eventually closes the connection (tcp_retries2 sysctl defaults to 15 retries =~16 minutes). Second, a number of keepalive probes are sent before closing an idle connection (tcp_keepalive_probes sysctl defaults to 9), so an idle connection would eventually close after 9 seconds of failed probes.

See https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ for more discussion of the interactions between SO_KEEPALIVE and TCP_USER_TIMEOUT.

app/vmselect/netstorage/netstorage.go

hagen1778 · 2023-06-09T07:23:07Z

Nice PR! Would you mind fixing tests for CI?

f41gh7

LGTM

zekker6 · 2023-08-09T14:34:08Z

@wjordan Could you please rebase to the latest version of cluster?
Also, it seems like there are tests failing in CI, would you mind checking tests related to this change?

Use TCP_TIMEOUT_USER linux socket option to close connection based on user provided timeout. Using default timeout and keepalive flow is not sufficient to close connection in some cases which leads to timeout not being applied. Effectively, if connection stops processing new packets it will take up to 1 minute(with default configuration) instead of 1 second(based on keepalive value) to mark connection as broken. This can be tested by adding firewall rule to deny packets to vmstorage port: iptables -A INPUT -p tcp --dport {port} -j DROP To reverse this: iptables -D INPUT -p tcp --dport {port} -j DROP See: #4423 for more details. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

`TCP_USER_TIMEOUT` (since Linux 2.6.37) specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the connection and return `ETIMEDOUT` to the application. Setting a low TCP user timeout allows RPC connections quickly reroute around unavailable storage nodes during network interruptions.

wjordan · 2023-08-28T16:59:44Z

@zekker6 I've rebased the PR, updated the test and added some additional context to the flag descriptions.

codecov · 2023-08-28T23:11:37Z

Codecov Report

Patch coverage: 29.41% and project coverage change: -0.03% ⚠️

Comparison is base (db0ae3f) 59.35% compared to head (9bdaf0a) 59.32%.
Report is 10 commits behind head on cluster.

Additional details and impacted files

@@             Coverage Diff             @@
##           cluster    #4423      +/-   ##
===========================================
- Coverage    59.35%   59.32%   -0.03%     
===========================================
  Files          393      394       +1     
  Lines        74723    74735      +12     
===========================================
- Hits         44349    44338      -11     
- Misses       27944    27964      +20     
- Partials      2430     2433       +3

Files Changed	Coverage Δ
lib/netutil/tcpdialer_linux.go	`0.00% <0.00%> (ø)`
lib/netutil/tcpdialer.go	`63.41% <10.00%> (-17.84%)`	⬇️
app/vminsert/netstorage/netstorage.go	`24.29% <100.00%> (ø)`
app/vmselect/netstorage/netstorage.go	`9.45% <100.00%> (ø)`
lib/netutil/conn_pool.go	`36.02% <100.00%> (ø)`

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

valyala · 2023-08-29T09:46:56Z

@wjordan , thanks for the pull request!

valyala · 2023-08-29T10:24:32Z

@wjordan , I set default value for -vmstorageUserTimeout to 3 seconds in the follow-up commit 19d6173 , since the default 0 value results in too long recovery duration when some of vmstorage nodes become unavailable because of networking issues:

minimum 60 seconds for vminsert nodes, since 60 seconds timeout is used for sending data to vmstorage.
up to 16 minutes for vmselect nodes, since there is no write deadline when sending requests from vmselect to vmstorage.

valyala · 2023-10-02T23:04:41Z

FYI, this pull request has been included in cluster version of VictoriaMetrics starting from v1.94.0.

zekker6 requested review from valyala, hagen1778 and f41gh7 June 8, 2023 08:42

hagen1778 reviewed Jun 9, 2023

View reviewed changes

app/vmselect/netstorage/netstorage.go Outdated Show resolved Hide resolved

hagen1778 assigned zekker6 Aug 3, 2023

f41gh7 reviewed Aug 8, 2023

View reviewed changes

zekker6 mentioned this pull request Aug 14, 2023

lib/netutil/tcpdialer: enhance timeout handling #4829

Closed

wjordan force-pushed the tcptimeout branch 2 times, most recently from f7427ad to 3b667c8 Compare August 28, 2023 16:56

wjordan force-pushed the tcptimeout branch from 3b667c8 to 9bdaf0a Compare August 28, 2023 16:58

valyala approved these changes Aug 29, 2023

View reviewed changes

valyala merged commit 2b7b329 into VictoriaMetrics:cluster Aug 29, 2023
8 of 10 checks passed

valyala mentioned this pull request Sep 6, 2023

vmcluster: rerouting enhancement #4922

Closed

3 tasks

valyala mentioned this pull request Feb 7, 2024

Improve detection slow/unhealthy vmstorage #515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `vmstorageUserTimeout` flags to configure TCP User Timeout on Linux #4423

Add `vmstorageUserTimeout` flags to configure TCP User Timeout on Linux #4423

wjordan commented Jun 7, 2023

hagen1778 commented Jun 9, 2023

f41gh7 left a comment

zekker6 commented Aug 9, 2023

wjordan commented Aug 28, 2023

codecov bot commented Aug 28, 2023 •

edited

valyala commented Aug 29, 2023

valyala commented Aug 29, 2023

valyala commented Oct 2, 2023

Add vmstorageUserTimeout flags to configure TCP User Timeout on Linux #4423

Add vmstorageUserTimeout flags to configure TCP User Timeout on Linux #4423

Conversation

wjordan commented Jun 7, 2023

hagen1778 commented Jun 9, 2023

f41gh7 left a comment

Choose a reason for hiding this comment

zekker6 commented Aug 9, 2023

wjordan commented Aug 28, 2023

codecov bot commented Aug 28, 2023 • edited

Codecov Report

valyala commented Aug 29, 2023

valyala commented Aug 29, 2023

valyala commented Oct 2, 2023

Add `vmstorageUserTimeout` flags to configure TCP User Timeout on Linux #4423

Add `vmstorageUserTimeout` flags to configure TCP User Timeout on Linux #4423

codecov bot commented Aug 28, 2023 •

edited