Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmagent 1.90.0 couldn't send a block with size #4139

Closed
ihard opened this issue Apr 17, 2023 · 7 comments
Closed

vmagent 1.90.0 couldn't send a block with size #4139

ihard opened this issue Apr 17, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request vmagent

Comments

@ihard
Copy link

ihard commented Apr 17, 2023

Describe the bug

On an absolutely unloaded cluster of version 1.90.0, periodically (once every 30 minutes approximately) I see in the agent logs
2023-04-17T17:11:29.482+0700 warn VictoriaMetrics/app/vmagent/remotewrite/client.go:370 couldn't send a block with size 203 bytes to "7:secret-url": Post "http:// 127.0.0.1:8581/insert/multitenant/prometheus": EOF; re-sending the block in 2.000 seconds
local vmagent sends traffic to vminsert via 127.0.0.1 and gets these errors (seems like things could go wrong here)
launched tcpdump, I see that at the time of the error, RST arrives from the vminsert port

To Reproduce

run Vmagent and VMcluster 1.90.0 on same virtual machine

Version

vmagent:v1.90.0
vminsert:v1.90.0-cluster

Logs

No response

Screenshots

2023-04-17_15-38-08
2023-04-17_15-39-42
2023-04-17_15-40-01
2023-04-17_15-40-22
2023-04-17_15-41-10

Used command-line flags

  - '--promscrape.config=/etc/vmagent/vmagent.yml'
  - '--graphiteListenAddr=:3003'
  - '--httpListenAddr=:8429'
  - '--remoteWrite.maxDiskUsagePerURL=10GB'
  - '--maxConcurrentInserts=100000'
  - '--insert.maxQueueDuration=60s'
  - '--remoteWrite.url=http://127.0.0.1:8581/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/prometheus-60s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8581/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/prometheus-30s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8581/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/prometheus-10s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8591/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/prometheus-60s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8591/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/prometheus-30s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8591/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/prometheus-10s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8581/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/graphite-60s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8581/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/graphite-30s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8581/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/graphite-10s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8591/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/graphite-60s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8591/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/graphite-30s.yml'
  - '--remoteWrite.url=http://127.0.0.1:8591/insert/multitenant/prometheus'
  - '--remoteWrite.urlRelabelConfig=/etc/vmagent/relabel/graphite-10s.yml'

Additional information

No response

@ihard ihard added the bug Something isn't working label Apr 17, 2023
@Amper Amper self-assigned this Apr 17, 2023
@Amper
Copy link
Contributor

Amper commented Apr 18, 2023

Hi @ihard.
Are there any errors in the vminsert log?
What command-line flags are used in vminsert?

@ihard
Copy link
Author

ihard commented Apr 18, 2023

vminsert flag

      - '--httpListenAddr=:8581'
      - '--storageNode=10.111.234.4:8682'
      - '--storageNode=10.111.224.4:8682'
      - '--storageNode=10.111.225.4:8682'
      - '--replicationFactor=2'

vminsert log is null

@IvanZenger
Copy link

IvanZenger commented Jun 1, 2023

Is there any news here? We have the same problem.

The error comes about every 30 minutes couldn't send a block with size 152176 bytes to "2:secret-url": Post "https://vminsert.example.com:8581/api/v1/write": EOF.

We do not have another error on vmagent,vminsert or vmstorage.

vmagent has following configuration:

image

vminsert has following configuration:
image

Dashboards:

image

@wjordan
Copy link

wjordan commented Jun 14, 2023

This kind of error occurs when the HTTP client attempts to reuse an idle connection that's already been closed on the server. This can happen if the client's idle-connection timeout is greater than or equal to the timeout on the server, and both vmagent's client IdleConnTimeout and vminsert's server IdleTimeout are set to 1 minute by default:

IdleConnTimeout: time.Minute,

IdleTimeout: *idleConnTimeout,

idleConnTimeout = flag.Duration("http.idleConnTimeout", time.Minute, "Timeout for incoming idle http connections")

I can think of two possible solutions:

  1. The underlying Go issue is discussed in net/http: Client returns errors on POST if keep-alive connection closes at unfortunate time golang/go#22158, where a workaround was implemented that allows automatically retrying POST requests tagged with a special X-Idempotency-Key header.
    A PR could add this header for VictoriaMetrics client requests that are known to be idempotent. (I'm pretty sure remote-write requests are idempotent thanks to deduplication logic?)
  2. Setting a slightly higher vminsert -http.idleConnTimeout config than the hard-coded 1m vmagent-client timeout (e.g., -http.idleConnTimeout 1m5s) might also help prevent this error, since idle connections would then be more likely to get closed by the client before the server.

@hagen1778 hagen1778 added question The question issue and removed bug Something isn't working need more info labels Aug 18, 2023
@hagen1778
Copy link
Collaborator

As @wjordan mentioned, such errors could happen when connection timeouts on client and server doesn't match. However, this doesn't result in data loss as vmagent will retry the write attempt with a newly established connection.

hagen1778 added a commit that referenced this issue Aug 18, 2023
 Retry failed write request on the closed connection immediately,
 without waiting for backoff. This should improve rules data delivery speed
 and reduce amount of error logs emitted by vmagent when using idle connections.

 #4139

Signed-off-by: hagen1778 <roman@victoriametrics.com>
hagen1778 added a commit that referenced this issue Aug 18, 2023
 Retry failed write request on the closed connection immediately,
 without waiting for backoff. This should improve data delivery speed
 and reduce amount of error logs emitted by vmagent when using idle connections.

 #4139

Signed-off-by: hagen1778 <roman@victoriametrics.com>
f41gh7 added a commit that referenced this issue Aug 23, 2023
* vmagent: retry failed write request on the closed connection

 Retry failed write request on the closed connection immediately,
 without waiting for backoff. This should improve data delivery speed
 and reduce amount of error logs emitted by vmagent when using idle connections.

 #4139

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: retry failed write request on the closed connection

Re-instantinate request before retry as body could have been already spoiled.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
hagen1778 added a commit that referenced this issue Aug 27, 2023
* vmagent: retry failed write request on the closed connection

 Retry failed write request on the closed connection immediately,
 without waiting for backoff. This should improve data delivery speed
 and reduce amount of error logs emitted by vmagent when using idle connections.

 #4139

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: retry failed write request on the closed connection

Re-instantinate request before retry as body could have been already spoiled.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
(cherry picked from commit 992a1c0)
@valyala
Copy link
Collaborator

valyala commented Aug 29, 2023

The commit 992a1c0 should instruct vmagent immediately retry requests to remote storage on closed idle connection, without logging the error. This should remove the couldn't send a block with size ... bytes to "...": Post "...": EOF error logs. This change can be tested by building and running vmagent from this commit according to these docs. If everything is OK, this commit will be included in the next release.

valyala pushed a commit that referenced this issue Aug 29, 2023
* vmagent: retry failed write request on the closed connection

 Retry failed write request on the closed connection immediately,
 without waiting for backoff. This should improve data delivery speed
 and reduce amount of error logs emitted by vmagent when using idle connections.

 #4139

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: retry failed write request on the closed connection

Re-instantinate request before retry as body could have been already spoiled.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
valyala added a commit that referenced this issue Aug 29, 2023
…expectedEOF, since this error isn't returned on stale connection

Also, mention the #4139 in comments to the code
in order to simplify further maintenance of this code.

This is a follow-up for 992a1c0
valyala pushed a commit that referenced this issue Aug 29, 2023
* vmagent: retry failed write request on the closed connection

 Retry failed write request on the closed connection immediately,
 without waiting for backoff. This should improve data delivery speed
 and reduce amount of error logs emitted by vmagent when using idle connections.

 #4139

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: retry failed write request on the closed connection

Re-instantinate request before retry as body could have been already spoiled.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
valyala added a commit that referenced this issue Aug 29, 2023
…expectedEOF, since this error isn't returned on stale connection

Also, mention the #4139 in comments to the code
in order to simplify further maintenance of this code.

This is a follow-up for 992a1c0
valyala added a commit that referenced this issue Aug 29, 2023
…expectedEOF, since this error isn't returned on stale connection

Also, mention the #4139 in comments to the code
in order to simplify further maintenance of this code.

This is a follow-up for 992a1c0
valyala added a commit that referenced this issue Aug 29, 2023
…expectedEOF, since this error isn't returned on stale connection

Also, mention the #4139 in comments to the code
in order to simplify further maintenance of this code.

This is a follow-up for 992a1c0
@valyala valyala added enhancement New feature or request vmagent and removed question The question issue labels Aug 29, 2023
@valyala
Copy link
Collaborator

valyala commented Oct 2, 2023

Starting from v1.94.0, vmagent shouldn't emit couldn't send a block error logs on the first unsuccessful attempt to send data to remote storage. It will retry sending the block immediately over a new connection to the remote storage in this case. Closing the issue as resolved then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vmagent
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants