Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vmstorage Dedup handles NaN #5587

Closed
3 tasks
okzheng opened this issue Jan 9, 2024 · 2 comments
Closed
3 tasks

Vmstorage Dedup handles NaN #5587

okzheng opened this issue Jan 9, 2024 · 2 comments
Assignees
Labels
question The question issue

Comments

@okzheng
Copy link
Contributor

okzheng commented Jan 9, 2024

Is your question request related to a specific component?

vmstorage

Describe the question in detail

I have 2 vmagent pods in a k8s deployment and these pods are configured to scrape metrics from the same targets, which include a service backed by two pods. A ServiceMonitor has been set up to discover and scrape these metrics.
At the same time, I configured relabeling in servicemonitor to drop different labels. So the metrics scraped from the two pods contain the same label-set.
However, I have observed an issue where random gaps appear in the metrics during a rolling update of the service.
I guess that when a pod is deleted, the original metrics will be marked as stale by vmagent, and the corresponding value and the metrics collected by the normal pod are not processed correctly in the deduplication logic.
Upon reading the deduplication function in lib/storage/dedup.go, I don't find any special handling of NaN. Then when NaN is compared with a meaningful value, it may cause the preserved value to be related to the order of arrival. This is inconsistent with the original intention of designing (perserving the maximum value when the timestamps are the same).
I'm not sure if my understanding is wrong or if it is the expected behavior.
Thanks for explaining it.

Troubleshooting docs

@okzheng okzheng added the question The question issue label Jan 9, 2024
@Amper Amper self-assigned this Jan 9, 2024
@hagen1778 hagen1778 assigned hagen1778 and unassigned Amper Jan 11, 2024
hagen1778 pushed a commit that referenced this issue Jan 11, 2024
See #5587

Signed-off-by: hagen1778 <roman@victoriametrics.com>
@hagen1778
Copy link
Collaborator

Hello @okzheng and thank you for a good question!

This is expected behavior. Prometheus stale marker should be treated as any other value. Consider a case when vmagent was configured to scrape a target with 1m interval. On 4th minute vmagent reported a stale marker for this target. If user would configure the deduplication interval equal to 5min after this, then the stale marker should be preserved as the last value.

As a workaround for your case, I'd propose disabling stale markers on vmagent side.

@okzheng
Copy link
Contributor Author

okzheng commented Jan 12, 2024

Thanks for your persuasive explanation and proposal! @hagen1778

@okzheng okzheng closed this as completed Jan 12, 2024
valyala added a commit that referenced this issue Jan 16, 2024
…are treated as an ordinary values during de-duplication

This is a follow-up for d374595
Updates #5587
valyala pushed a commit that referenced this issue Jan 16, 2024
See #5587

Signed-off-by: hagen1778 <roman@victoriametrics.com>
valyala added a commit that referenced this issue Jan 16, 2024
…are treated as an ordinary values during de-duplication

This is a follow-up for d374595
Updates #5587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question The question issue
Projects
None yet
Development

No branches or pull requests

3 participants