-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code_verb:apiserver_request_total:increase30d is failing to evaluate #503
Comments
The Helm In addition I confirm this calculation adheres to increased CPU load :( |
I've seen this before you need to bump the max samples limit. I think I did something like |
I did find the max-samples solution but if it's required it should be set by kube-prometheus so that the default rules don't fail.. |
In addition increasing max-samples increases the CPU load and needed time to calculate such metric, so it is not a straightforward option. |
One option to improve this would be to have the availability calculations in its own group. We then are able to set a higher evaluation interval, like every 3min (taking 5m staleness into account). |
Might it be reasonable to change the query to something like: |
In those recording rules we don't have subqueries like that. It's literally just summing up the counts of requests. For 28d that might still be too many data points. |
@metalmatze synced the updated rules downstream to the Helm chart, and still seeing a couple errors for too many samples. For example:
|
Closing as this appears to be already fixed. Please reopen if this is still an issue. |
What happened?
I upgraded from an older version from a few months ago to the latest 0.5 release with jb and
Prometheus is failing rule evaluations
starting to fire immediately. I checked Prometheus and https://github.com/coreos/kube-prometheus/blob/dcc46c8aa8c242b845024188a66171b5f08b8513/manifests/prometheus-rules.yaml#L393 is in ERR state:query processing would load too many samples into memory in query execution
Did you expect to see some different?
The rule included shouldn't be failing
How to reproduce it (as minimally and precisely as possible):
Environment
GKE
Prometheus Operator version:
v0.38.1
Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-26T06:16:15Z", GoVersion:"go1.14", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-gke.5", GitCommit:"a5bf731ea129336a3cf32c3375317b3a626919d7", GitTreeState:"clean", BuildDate:"2020-03-31T02:49:49Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes cluster kind:
GKE
level=warn ts=2020-04-18T11:01:09.892Z caller=manager.go:525 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: code_verb:apiserver_request_total:increase30d\nexpr: sum by(code, verb) (increase(apiserver_request_total{job=\"apiserver\"}[30d]))\n" err="query processing would load too many samples into memory in query execution"
The text was updated successfully, but these errors were encountered: