-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cortex 1.16.0 Upgrade Error:LabelValues() from merge generic querier for label #5701
Comments
Looks like the error is coming from https://github.com/cortexproject/cortex/blob/master/vendor/github.com/prometheus/prometheus/storage/merge.go#L165 |
Context canceled are either timeout or context canceled. If it is requests being canceled then it should be fine. If it is timeout then probably you need to check your service.
|
@yeya24 we haven't been able to figure out where the requests are coming from because when we look at the querier and ruler we don't see timeouts to correlate the error above to. Are there other places we should look to track it down? W We decided to test rolling back and the errors stopped once we rolled back to v1.15.3 |
Are you seeing some impact? @yeya24 could this be because we fixed some context cancelletion propagation and now we are actually canceling the queries that times out? (if so this is actually a good thing?) I think to prove that this may be the case, could you keep the ingester on 1.16 and all other query components on 1.15.3 (queriers, querry-frontend and -if used - query scheduler?) |
Hey @alanprot I was able to bump the ingesters up to v1.16.0 but we still see these in the logs: It seems like it continuously repeats. We haven't seen impact or have heard from users regarding any impact. Its just new log errors we haven't seen before and don't want to move to production in case there is some impact that we don't understand. |
@dpericaxon Did you see any real availability drop on your query side? I think it is expected because we have quorum and some requests are canceled anyway. |
@yeya24 we updated in a few environments and didn't notice any issues currently. So we might be good to close this! Thank you for all of your help! |
Describe the bug
After upgrading from Cortex 1.15.3 to 1.16.0 we started seeing errors like these on our ingesters:
I checked the Queriers and don't see any errors, all I see are 200's.
On the distributors we started seeing these errors:
The
err: duplicate sample for timestamp
is familiar to us but the beginning part of that log line related tomaxFailure (quorum)
is new. Are these expected?We do see some of these happening on store-gateway:
We aren't sure if these messages are just noise or if they were cancelled requests and thats why we see the error?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment:
Additional Context
The text was updated successfully, but these errors were encountered: