Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream aggregation: fix possible duplicated aggregation results #7118

Merged
merged 3 commits into from
Sep 30, 2024

Conversation

Haleygo
Copy link
Contributor

@Haleygo Haleygo commented Sep 27, 2024

When ingesting samples with the same labels(duplicated samples or samples with the same labels after by or without options). They could register different entries for the same labelset in LabelsCompressor.
For example, both index 99 and 100 can be assigned to label foo=1 in two concurrent pushes. Then due to differing label indexes in encoded keys, the samples will appear as distinct in aggrState, resulting in duplicated results after decompressing the label indexes.

buf = compressLabels(buf, inputLabels.Labels, outputLabels.Labels)

In this pull request, since we need to store idxToLabel first to ensure the idx can be searched after lc.labelToIdxStore,
the lc.idxToLabel still could contain a duplicated entries [100]="foo=1". But given the low likelihood of this issue and the size of idxToLabel, it should be fine.

@Haleygo Haleygo force-pushed the fix-duplicated-aggregation-results branch 2 times, most recently from 183c227 to 1f0f380 Compare September 27, 2024 12:15
@hagen1778 hagen1778 added the bug Something isn't working label Sep 30, 2024
Copy link
Contributor

@AndrewChubatiuk AndrewChubatiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Before, same labelset could create multiple indexes in LabelsCompressor when there are concurrent sample ingestions, leading to duplicated states in aggrState and results.
@Haleygo Haleygo force-pushed the fix-duplicated-aggregation-results branch from d74c77f to 7b63105 Compare September 30, 2024 10:29
Copy link
Collaborator

@hagen1778 hagen1778 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@f41gh7 f41gh7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@f41gh7 f41gh7 merged commit 664f337 into master Sep 30, 2024
8 checks passed
@f41gh7 f41gh7 deleted the fix-duplicated-aggregation-results branch September 30, 2024 12:25
valyala pushed a commit that referenced this pull request Sep 30, 2024
When ingesting samples with the same labels(duplicated samples or
samples with the same labels after `by` or `without` options). They
could register different entries for the same labelset in
LabelsCompressor.
For example, both index 99 and 100 can be assigned to label `foo=1` in
two concurrent pushes. Then due to differing label indexes in encoded
keys, the samples will appear as distinct in aggrState, resulting in
duplicated results after decompressing the label indexes.

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/fbde238cdcdf4e2c892d85a3e9e2be6e54e69cef/lib/streamaggr/streamaggr.go#L933

In this pull request, since we need to store `idxToLabel` first to
ensure the idx can be searched after `lc.labelToIdxStore`,
the `lc.idxToLabel` still could contain a duplicated entries
[100]="foo=1". But given the low likelihood of this issue and the size
of idxToLabel, it should be fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants