Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last value in graph is wrong (query_range/maxStepForPointsAdjustment) #1442

Closed
mxsiegle opened this issue Jul 9, 2021 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@mxsiegle
Copy link

mxsiegle commented Jul 9, 2021

Hey,

I noticed that all of our graphs in Grafana have a drop in their value for the last datapoint. It looks like this:

image

This only affects range queries (/api/query_range); instant values are calculated correctly. The actual metric values are correct as well, i.e. if you wait a moment until new data arrives, the datapoint which had issues suddenly reports the correct value while the new one is wrong. You can repeat that indefinitely, it's always the latest datapoint that is wrong.

While troubleshooting, I made the following observations:

Sometimes the values are reported correctly, other times they are not. This is how it looks like when I refresh the table a couple of times. Note the last entry on the bottom right:

2021-07-09_10-44-10

When I run VMSelect with -search.disableCache the wrong value appears 100% of the time. So somehow the cache affects the value and sometimes reports it correctly.

The issue only appears when a time range is selected with a step value higher than the max. adjusted time set by -search.maxStepForPointsAdjustment.
The default value is 1m0s. When I selected 'Last 6 hours' my step value in Grafana becomes 20s and the correct value is reported:

image

When I change it to 'Last 7 days' it has 5m steps and then the issue is there:

image

If I set -search.maxStepForPointsAdjustment to e.g. 6m, the issue will disappear for 'Last 7 days' but it will re-appear of course as soon as the step value is bigger than that (e.g. 'Last 30 days' -> 30m steps).
Can anyone tell me why this is happening and how I can fix it?

Setup: Cluster with 4x VMStorage, 4x VMSelect and 3x VMInsert.
Version: v1.62.0
Retention: 90d
Scrape interval: 20s (VMAgent)

It does not matter which metric I choose or which target, all of them have that issue. Weirdly, recordings are totally fine. I assume it's because they are using the (correct) instant values but for some reason their last values are reported correctly in the graphs/tables.

I only noticed this behavior recently. Before, we had updated to v1.62.0 so I thought maybe it's that, but a rollback to v1.59.0 and v1.55.1 didn't make any difference (those were the versions we were running before).

Thanks in advance.

@valyala valyala added the bug Something isn't working label Jul 9, 2021
@valyala
Copy link
Collaborator

valyala commented Aug 11, 2021

This may be related to #1526

@valyala
Copy link
Collaborator

valyala commented Aug 15, 2021

FYI, VictoriaMetrics and vmagent gained support for Prometheus staleness markers starting from the release v1.64.0.

@hagen1778
Copy link
Collaborator

Closing the issue as inactive. Feel free to reopen if the problem still exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants