-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream aggregation: deduplication should keep the sample with biggest timestamp on interval #5643
Labels
Comments
hagen1778
added a commit
that referenced
this issue
Mar 12, 2024
…cator (#5939) Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication This would require more memory for deduplication, since we need to track timestamp for each record. However, deduplication should become more consistent. #5643 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Re-opening the issue until the new release is published with this feature. |
valyala
pushed a commit
that referenced
this issue
Mar 17, 2024
…cator (#5939) Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication This would require more memory for deduplication, since we need to track timestamp for each record. However, deduplication should become more consistent. #5643 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
possibull
pushed a commit
to possibull/VictoriaMetrics
that referenced
this issue
Mar 27, 2024
…cator (VictoriaMetrics#5939) Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication This would require more memory for deduplication, since we need to track timestamp for each record. However, deduplication should become more consistent. VictoriaMetrics#5643 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
possibull
pushed a commit
to possibull/VictoriaMetrics
that referenced
this issue
Mar 27, 2024
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation This is needed for VictoriaMetrics#5931 - Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md - Verify de-duplication logic for samples with different timestamps Updates VictoriaMetrics#5643 Updates VictoriaMetrics#5939
possibull
pushed a commit
to possibull/VictoriaMetrics
that referenced
this issue
Mar 27, 2024
…cator (VictoriaMetrics#5939) Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication This would require more memory for deduplication, since we need to track timestamp for each record. However, deduplication should become more consistent. VictoriaMetrics#5643 --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
possibull
pushed a commit
to possibull/VictoriaMetrics
that referenced
this issue
Mar 27, 2024
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation This is needed for VictoriaMetrics#5931 - Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md - Verify de-duplication logic for samples with different timestamps Updates VictoriaMetrics#5643 Updates VictoriaMetrics#5939
stream de-duplication uses the same logic as the regular de-duplication starting from v1.100.0. Closing the feature request as done. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe
Currently,
--remoteWrite.streamAggr.dedupInterval
option in vmagent reuses last aggregation output for deduplication. But this differs from how VictoriaMetrics deduplicates samples and may result into incorrect output.Describe the solution you'd like
It would be better not to keep the last received value on deduplication interval, but keep the sample with the biggest timestamp on the interval or the biggest value on timestamp conflicts - see https://docs.victoriametrics.com/#deduplication
This should prevent from improper deduplication results when there are more than metrics source producing duplicates. For example, if there is an HA pair of clients pushing data to vmagent then there is a chance that the pushed values will differ. This is extremely important for counters calculation, because it may result in counter to behave more like gauge if clientA will send value lower than clientB.
The text was updated successfully, but these errors were encountered: