Vmstorage Dedup handles NaN #5587

okzheng · 2024-01-09T09:46:05Z

Is your question request related to a specific component?

vmstorage

Describe the question in detail

I have 2 vmagent pods in a k8s deployment and these pods are configured to scrape metrics from the same targets, which include a service backed by two pods. A ServiceMonitor has been set up to discover and scrape these metrics.
At the same time, I configured relabeling in servicemonitor to drop different labels. So the metrics scraped from the two pods contain the same label-set.
However, I have observed an issue where random gaps appear in the metrics during a rolling update of the service.
I guess that when a pod is deleted, the original metrics will be marked as stale by vmagent, and the corresponding value and the metrics collected by the normal pod are not processed correctly in the deduplication logic.
Upon reading the deduplication function in lib/storage/dedup.go, I don't find any special handling of NaN. Then when NaN is compared with a meaningful value, it may cause the preserved value to be related to the order of arrival. This is inconsistent with the original intention of designing (perserving the maximum value when the timestamps are the same).
I'm not sure if my understanding is wrong or if it is the expected behavior.
Thanks for explaining it.

Troubleshooting docs

General - https://docs.victoriametrics.com/Troubleshooting.html
vmagent - https://docs.victoriametrics.com/vmagent.html#troubleshooting
vmalert - https://docs.victoriametrics.com/vmalert.html#troubleshooting

See #5587 Signed-off-by: hagen1778 <roman@victoriametrics.com>

hagen1778 · 2024-01-11T10:58:09Z

Hello @okzheng and thank you for a good question!

This is expected behavior. Prometheus stale marker should be treated as any other value. Consider a case when vmagent was configured to scrape a target with 1m interval. On 4th minute vmagent reported a stale marker for this target. If user would configure the deduplication interval equal to 5min after this, then the stale marker should be preserved as the last value.

As a workaround for your case, I'd propose disabling stale markers on vmagent side.

okzheng · 2024-01-12T15:36:44Z

Thanks for your persuasive explanation and proposal！ @hagen1778

…are treated as an ordinary values during de-duplication This is a follow-up for d374595 Updates #5587

See #5587 Signed-off-by: hagen1778 <roman@victoriametrics.com>

…are treated as an ordinary values during de-duplication This is a follow-up for d374595 Updates #5587

okzheng added the question The question issue label Jan 9, 2024

Amper self-assigned this Jan 9, 2024

hagen1778 assigned hagen1778 and unassigned Amper Jan 11, 2024

hagen1778 pushed a commit that referenced this issue Jan 11, 2024

docs: mention staleNaN handling during deduplication

d374595

See #5587 Signed-off-by: hagen1778 <roman@victoriametrics.com>

okzheng closed this as completed Jan 12, 2024

valyala added a commit that referenced this issue Jan 16, 2024

docs/Single-server-VictoriaMetrics.md: explain why staleness markers …

19c0454

…are treated as an ordinary values during de-duplication This is a follow-up for d374595 Updates #5587

valyala pushed a commit that referenced this issue Jan 16, 2024

docs: mention staleNaN handling during deduplication

e100520

See #5587 Signed-off-by: hagen1778 <roman@victoriametrics.com>

valyala added a commit that referenced this issue Jan 16, 2024

docs/Single-server-VictoriaMetrics.md: explain why staleness markers …

8d13b54

…are treated as an ordinary values during de-duplication This is a follow-up for d374595 Updates #5587

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vmstorage Dedup handles NaN #5587

Vmstorage Dedup handles NaN #5587

okzheng commented Jan 9, 2024

hagen1778 commented Jan 11, 2024

okzheng commented Jan 12, 2024

Vmstorage Dedup handles NaN #5587

Vmstorage Dedup handles NaN #5587

Comments

okzheng commented Jan 9, 2024

Is your question request related to a specific component?

Describe the question in detail

Troubleshooting docs

hagen1778 commented Jan 11, 2024

okzheng commented Jan 12, 2024