Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream aggregation: deduplication should keep the sample with biggest timestamp on interval #5643

Closed
hagen1778 opened this issue Jan 19, 2024 · 2 comments · Fixed by #5939
Closed
Assignees
Labels

Comments

@hagen1778
Copy link
Collaborator

hagen1778 commented Jan 19, 2024

Is your feature request related to a problem? Please describe

Currently, --remoteWrite.streamAggr.dedupInterval option in vmagent reuses last aggregation output for deduplication. But this differs from how VictoriaMetrics deduplicates samples and may result into incorrect output.

Describe the solution you'd like

It would be better not to keep the last received value on deduplication interval, but keep the sample with the biggest timestamp on the interval or the biggest value on timestamp conflicts - see https://docs.victoriametrics.com/#deduplication

This should prevent from improper deduplication results when there are more than metrics source producing duplicates. For example, if there is an HA pair of clients pushing data to vmagent then there is a chance that the pushed values will differ. This is extremely important for counters calculation, because it may result in counter to behave more like gauge if clientA will send value lower than clientB.

@hagen1778 hagen1778 added enhancement New feature or request vmagent labels Jan 19, 2024
@hagen1778 hagen1778 added the TBD To Be Done label Jan 19, 2024
hagen1778 added a commit that referenced this issue Mar 12, 2024
…cator (#5939)

Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

#5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
valyala added a commit that referenced this issue Mar 17, 2024
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for #5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates #5643
Updates #5939
@valyala valyala reopened this Mar 17, 2024
@valyala
Copy link
Collaborator

valyala commented Mar 17, 2024

Re-opening the issue until the new release is published with this feature.

valyala pushed a commit that referenced this issue Mar 17, 2024
…cator (#5939)

Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

#5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
valyala added a commit that referenced this issue Mar 17, 2024
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for #5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates #5643
Updates #5939
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
…cator (VictoriaMetrics#5939)

Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

VictoriaMetrics#5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for VictoriaMetrics#5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates VictoriaMetrics#5643
Updates VictoriaMetrics#5939
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
…cator (VictoriaMetrics#5939)

Apply the same deduplication logic as in https://docs.victoriametrics.com/#deduplication
This would require more memory for deduplication, since we need to track timestamp
for each record. However, deduplication should become more consistent.

VictoriaMetrics#5643

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
- Properly set pushSample.timestamp when flushing de-duplicated samples to stream aggregation
  This is needed for VictoriaMetrics#5931

- Re-classify this change as feature instead of bugfix at docs/CHANGELOG.md

- Verify de-duplication logic for samples with different timestamps

Updates VictoriaMetrics#5643
Updates VictoriaMetrics#5939
@valyala
Copy link
Collaborator

valyala commented Apr 4, 2024

stream de-duplication uses the same logic as the regular de-duplication starting from v1.100.0.

Closing the feature request as done.

@valyala valyala closed this as completed Apr 4, 2024
@valyala valyala removed the TBD To Be Done label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants