Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vmstorageUserTimeout flags to configure TCP User Timeout on Linux #4423

Merged
merged 1 commit into from Aug 29, 2023

Conversation

wjordan
Copy link

@wjordan wjordan commented Jun 7, 2023

The TCP_USER_TIMEOUT socket option (since Linux 2.6.37) specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the connection and return ETIMEDOUT to the application.

Setting a low user timeout allows RPC connections to be closed and more quickly rerouted to alternate storage nodes during network interruptions:

For vminsert, setting a user timeout would allow an insert request to be configured to route around an unavailable storage node in milliseconds, instead of waiting for the full >60-second request timeout to mark the node as broken.

For vmselect, setting a user timeout would be an additional solution to the issues described in #711 and #3378, allowing rerouted and partial-result queries to be returned more quickly when some storage nodes are unavailable.

The added vmstorageUserTimeout flags will both default to 0 where they will be ignored, so this PR shouldn't introduce any breaking changes.


Related note: the existing 1-second keepalive mentioned in #711 (comment) isn't sufficient to automatically mark unavailable nodes as broken in max 1 second as described. First, TCP keepalive only applies to completely idle connections and doesn't work at all if there's data waiting in the send buffer or in transit when the network is interrupted. In that case, the TCP retransmission limit eventually closes the connection (tcp_retries2 sysctl defaults to 15 retries =~16 minutes). Second, a number of keepalive probes are sent before closing an idle connection (tcp_keepalive_probes sysctl defaults to 9), so an idle connection would eventually close after 9 seconds of failed probes.

See https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ for more discussion of the interactions between SO_KEEPALIVE and TCP_USER_TIMEOUT.

@hagen1778
Copy link
Collaborator

Nice PR! Would you mind fixing tests for CI?

Copy link
Contributor

@f41gh7 f41gh7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zekker6
Copy link
Contributor

zekker6 commented Aug 9, 2023

@wjordan Could you please rebase to the latest version of cluster?
Also, it seems like there are tests failing in CI, would you mind checking tests related to this change?

zekker6 added a commit that referenced this pull request Aug 14, 2023
Use TCP_TIMEOUT_USER linux socket option to close connection based on user provided timeout.
Using default timeout and keepalive flow is not sufficient to close connection in some cases which leads to timeout not being applied.

Effectively, if connection stops processing new packets it will take up to 1 minute(with default configuration) instead of 1 second(based on keepalive value) to mark connection as broken.

This can be tested by adding firewall rule to deny packets to vmstorage port:
iptables -A INPUT -p tcp --dport {port} -j DROP
To reverse this:
iptables -D INPUT -p tcp --dport {port} -j DROP

See: #4423 for more details.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
@wjordan wjordan force-pushed the tcptimeout branch 2 times, most recently from f7427ad to 3b667c8 Compare August 28, 2023 16:56
`TCP_USER_TIMEOUT` (since Linux 2.6.37) specifies the maximum amount of
time that transmitted data may remain unacknowledged before TCP will
forcibly close the connection and return `ETIMEDOUT` to the application.

Setting a low TCP user timeout allows RPC connections quickly reroute
around unavailable storage nodes during network interruptions.
@wjordan
Copy link
Author

wjordan commented Aug 28, 2023

@zekker6 I've rebased the PR, updated the test and added some additional context to the flag descriptions.

@codecov
Copy link

codecov bot commented Aug 28, 2023

Codecov Report

Patch coverage: 29.41% and project coverage change: -0.03% ⚠️

Comparison is base (db0ae3f) 59.35% compared to head (9bdaf0a) 59.32%.
Report is 10 commits behind head on cluster.

Additional details and impacted files
@@             Coverage Diff             @@
##           cluster    #4423      +/-   ##
===========================================
- Coverage    59.35%   59.32%   -0.03%     
===========================================
  Files          393      394       +1     
  Lines        74723    74735      +12     
===========================================
- Hits         44349    44338      -11     
- Misses       27944    27964      +20     
- Partials      2430     2433       +3     
Files Changed Coverage Δ
lib/netutil/tcpdialer_linux.go 0.00% <0.00%> (ø)
lib/netutil/tcpdialer.go 63.41% <10.00%> (-17.84%) ⬇️
app/vminsert/netstorage/netstorage.go 24.29% <100.00%> (ø)
app/vmselect/netstorage/netstorage.go 9.45% <100.00%> (ø)
lib/netutil/conn_pool.go 36.02% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@valyala valyala merged commit 2b7b329 into VictoriaMetrics:cluster Aug 29, 2023
8 of 10 checks passed
@valyala
Copy link
Collaborator

valyala commented Aug 29, 2023

@wjordan , thanks for the pull request!

@valyala
Copy link
Collaborator

valyala commented Aug 29, 2023

@wjordan , I set default value for -vmstorageUserTimeout to 3 seconds in the follow-up commit 19d6173 , since the default 0 value results in too long recovery duration when some of vmstorage nodes become unavailable because of networking issues:

  • minimum 60 seconds for vminsert nodes, since 60 seconds timeout is used for sending data to vmstorage.
  • up to 16 minutes for vmselect nodes, since there is no write deadline when sending requests from vmselect to vmstorage.

@valyala valyala mentioned this pull request Sep 6, 2023
3 tasks
@valyala
Copy link
Collaborator

valyala commented Oct 2, 2023

FYI, this pull request has been included in cluster version of VictoriaMetrics starting from v1.94.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants