-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query: different results for rate function when not dedup or using implicit step interval #7341
Comments
Is it the same if you specify the timestamp? How could we reproduce this? |
FIWI: since subqueries ( even with same step ) are a very different construct, it is not unimaginable that they return different results. As for the dedup ~ i dont know whats happening here. |
@GiedriusS I'm using a specific timestamp. @MichaHoffmann I tried to debug it a bit, I suspect it's something where the dedup choose which timestamps to use from the samples, but I got a little lost there. |
found something interesting, if I do
without deduplication in a specific timestamp I get the following:
server2:
but when I do the same with deduplication, I get one more sample, the interval between the first to the second is much small (23 seconds) than the rest (60 seconds), and the diff in values is the 0, since the first sample and second are the same.
it seems it takes the first from servers 1 and then the rest from server2. any idea? |
I managed to reproduce this use case in a unit test.
The first sample doesn't get deduplicated and it takes the first samples from both sets. I'm not sure if it's the expected behavior |
Thank you for the test and the debug work! Ill look into this over weekend |
So, yeah; the scrape interval is large enough that the dedup algorithm thinks that the second sample of the first iterator is actually missing and that we need to fill with the second iterator from now on. This might be a bit hard to solve since we ( right now ) dont know the proper scrape interval apriori. I think we could maybe add a configurable flag like @GiedriusS @fpetkovski @yeya24 how does that sound? |
perhaps I'm mixing, but shouldn't --query.default-step affect that penalty? |
Hi. |
hi @MichaHoffmann @GiedriusS @fpetkovski @yeya24 |
Thanos, Prometheus and Golang version used:
thanos query 0.35.0
thanos sidecar 0.35.0
prometheus 2.45.0
What happened:
when calculating rate for a metric, there are cases where the result if different when dedup is enabled/disabled.
In addition there is the same difference if using a step [5m:1m] and not using [5m] although the default step is the same.
What you expected to happen:
I expect the rate result to be the same (or much closer)
How to reproduce it (as minimally and precisely as possible):
My environment has 2 prometheus servers which are replicas of each other (scraping the same targets)
sidecars external labels: fqdn=, monitor="master", tenant_id="default-tenant"
I start thanos with those 2 servers as stores, replica-label=fqdn
then when querying in thanos the rate query in a specific time:
rate(my_metric{_label_0="service",_label_1="requests",_label_2="timer", server="server00102"}[5m])
using dedup and without dedup, the results are completely different
the results are also different if querying with step (same results as without dedup)
rate(my_metric{_label_0="service",_label_1="requests",_label_2="timer", server="server00102"}[5m:1m])
The text was updated successfully, but these errors were encountered: