Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmalert does not respect -remoteWrite.maxBatchSize at shutdown #6025

Closed
BenNF opened this issue Mar 26, 2024 · 6 comments
Closed

vmalert does not respect -remoteWrite.maxBatchSize at shutdown #6025

BenNF opened this issue Mar 26, 2024 · 6 comments
Assignees
Labels
bug Something isn't working vmalert

Comments

@BenNF
Copy link

BenNF commented Mar 26, 2024

Describe the bug

When VmAlerts shuts down it writes the entire remote write input to its remote write target with no consideration if the queue is larger than -remoteWrite.maxBatchSize, this can cause issues when writing to other vm components as it can easily hit the -maxInsertRequestSize limits when typical requests that are batched do not.

The shutdown function of the remote write client takes no consideration of -remoteWrite.maxBatchSize simply writing the entire remaining input in a single flush.

If my understanding is correct this seems as simple as modifying this function to batch the remaining input into a bunch flushes instead of one?

To Reproduce

Most apparent in replay mode:

  1. Run a alert replay with an expression that produces a large number of timeseries off a very short to execute query with a remote write target of a vm component with lowered(or default) -maxInsertRequestSize
  2. Observe that the final series written after replay is complete are not batched and can hit the limit.

Version

Encountered on 1.96.0 but code is unchanged on most recent 1.99.0.

Logs

2024-03-26T19:53:34.973Z	info	VictoriaMetrics/app/vmalert/replay.go:71	replay finished! Imported 967830 samples
2024-03-26T19:53:35.038Z	info	VictoriaMetrics/app/vmalert/remotewrite/client.go:160	shutting down remote write client and flushing remained 197818 series
2024-03-26T19:53:35.166Z	warn	VictoriaMetrics/app/vmalert/remotewrite/client.go:248	attempt 1 to send request failed: unexpected response code 400 for https://xxxxxx:443/api/v1/write. Response body "remoteAddr: \"127.0.0.1:33176\"; requestURI: /api/v1/write; too big unpacked request; mustn't exceed `-maxInsertRequestSize=33554432` bytes; got 45696024 bytes\n" (retriable: false)
2024-03-26T19:53:35.166Z	error	VictoriaMetrics/app/vmalert/remotewrite/client.go:281	attempts to send remote-write request failed - dropping 197818 time series

Screenshots

No response

Used command-line flags

Increased max queue size but otherwise default

            - '-replay.timeFrom=2024-01-30T15:34:21.000Z'
            - '-replay.timeTo=2024-03-26T15:34:21.000Z'
            - '-remoteWrite.maxQueueSize=10000000'

Additional information

No response

@BenNF BenNF added the bug Something isn't working label Mar 26, 2024
@jiekun
Copy link
Contributor

jiekun commented Mar 27, 2024

It's probably a good-first-issue. I'm interested in this one if it needs a fix and no one (+ @BenNF ) is working on it :P

@hagen1778
Copy link
Collaborator

@jiekun sounds good to me! Do you want to be assigned to this issue?

@hagen1778 hagen1778 changed the title VmAlert does not respect -remoteWrite.maxBatchSize at shutdown vmalert does not respect -remoteWrite.maxBatchSize at shutdown Mar 28, 2024
@jiekun
Copy link
Contributor

jiekun commented Mar 28, 2024

@jiekun sounds good to me! Do you want to be assigned to this issue?

Yes please assign it to me. Ty

hagen1778 pushed a commit that referenced this issue Mar 29, 2024
…6039)

During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.

This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others). 

This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.

#6025
hagen1778 pushed a commit that referenced this issue Mar 29, 2024
…6039)

During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.

This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others).

This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.

#6025

(cherry picked from commit 623d257)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
hagen1778 pushed a commit that referenced this issue Mar 29, 2024
…6039)

During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.

This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others).

This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.

#6025

(cherry picked from commit 623d257)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
hagen1778 pushed a commit that referenced this issue Mar 29, 2024
…6039)

During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.

This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others).

This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.

#6025

(cherry picked from commit 623d257)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
@valyala
Copy link
Collaborator

valyala commented Apr 4, 2024

This issue has been fixed in vmalert v1.100.0. Closing it as fixed then.

@valyala valyala closed this as completed Apr 4, 2024
@valyala
Copy link
Collaborator

valyala commented Apr 19, 2024

The bugfix for this issue has been included in v1.93.14 LTS release.

@valyala
Copy link
Collaborator

valyala commented Apr 19, 2024

The bugfix for this issue has been also included in v1.97.4 LTS release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working vmalert
Projects
None yet
Development

No branches or pull requests

4 participants