increase time period for rate over cadvisor metrics #254

paulfantom · 2019-10-02T14:34:09Z

Workaround for an issue described in prometheus-operator/prometheus-operator#2677 (comment)

s-urbaniak · 2019-10-02T14:36:26Z

i think you have to regenerate the assets too

s-urbaniak · 2019-10-02T14:40:12Z

/lgtm

tbarrella · 2019-10-03T21:29:32Z

Hello, I was looking into applying this change and noticed this comment in the prometheus-adapter code, saying that window should match the duration in the rate queries. Should the nodeQuery and window be adjusted to 5m as well?

I was also curious about the trade-offs between using a larger window in prometheus adapter and reducing the cAdvisor scraping interval, which also seems to address the issue

gitfool · 2019-10-06T19:01:20Z

So nodeQuery should also be changed to use 5m, right?

Another question; is there a reason it uses kubelet (cAdvisor) metric for containerQuery and node exporter metric for nodeQuery? Wouldn’t it be better, or at least more consistent, to use the same metric source for both?

s-urbaniak · 2019-10-07T08:42:17Z

@giftool afaik nodeQuery is not affected as it is sourced from node exporter, not from cadvisor.

gitfool · 2019-10-07T18:49:10Z

@s-urbaniak okay but now I'm confused as my understanding was that, for the resource metrics config, any rate used in a containerQuery or nodeQuery should use the same value as specified in window, regardless of other problems like jitter.

Also, could you enlighten me about the split between kubelet and node exporter metrics. Is one preferred over the other for some reason?

paulfantom · 2019-10-07T18:54:25Z

Also, could you enlighten me about the split between kubelet and node exporter metrics. Is one preferred over the other for some reason?

Node-exporter gives much more information about node itself as this is its main job. Kubelet, on the other hand, is aimed at pods/containers thus giving better information about those. Just a matter of selecting the best tool for the job.

okay but now I'm confused as my understanding was that, for the resource metrics config, any rate used in a containerQuery or nodeQuery should use the same value as specified in window, regardless of other problems like jitter.

Yes, rate window in nodeQuery should be fixed and changed to 5m. Do you want to create a PR with a fix? :)

gitfool · 2019-10-07T19:37:23Z

Node-exporter gives much more information about node itself as this is its main job. Kubelet, on the other hand, is aimed at pods/containers thus giving better information about those. Just a matter of selecting the best tool for the job.

@paulfantom I can appreciate that but it does add to the confusion for Prometheus newbies like myself. For simple resource metrics, specifically cpu and memory usage, is there a difference between the quality of values coming from the kubelet vs the node exporter?

I'm currently using the kubelet for all resource metrics as follows (using pulumi-kubernetes):

const prometheusAdapterChart = new k8s.helm.v2.Chart("pa", {
    repo: "stable",
    chart: "prometheus-adapter",
    version: "1.3.0",
    namespace: "monitoring",
    transformations: [ setMonitoringNamespace ],
    values: {
        prometheus: { url: "http://po-prometheus-operator-prometheus" },
        rules: {
            resource: {
                cpu: {
                    containerQuery: "sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!='',container!='POD',pod!=''}[5m])) by (<<.GroupBy>>)",
                    nodeQuery: "sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,id='/'}[5m])) by (<<.GroupBy>>)",
                    resources: {
                        overrides: {
                            node: { resource: "node" },
                            namespace: { resource: "namespace" },
                            pod: { resource: "pod" }
                        }
                    },
                    containerLabel: "container"
                },
                memory: {
                    containerQuery: "sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!='',container!='POD',pod!=''}) by (<<.GroupBy>>)",
                    nodeQuery: "sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)",
                    resources: {
                        overrides: {
                            node: { resource: "node" },
                            namespace: { resource: "namespace" },
                            pod: { resource: "pod" }
                        }
                    },
                    containerLabel: "container"
                },
                window: "5m"
            }
        }
    }
}, {
    dependsOn: [ monitoringNamespace, prometheusOperatorChart ],
    provider: provider
});

Previously I was actually using a mix with the node exporter as follows:

const prometheusAdapterChart = new k8s.helm.v2.Chart("pa", {
    repo: "stable",
    chart: "prometheus-adapter",
    version: "1.3.0",
    namespace: "monitoring",
    transformations: [ setMonitoringNamespace ],
    values: {
        prometheus: { url: "http://po-prometheus-operator-prometheus" },
        rules: {
            resource: {
                cpu: {
                    containerQuery: "sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!='',container!='POD',pod!=''}[5m])) by (<<.GroupBy>>)",
                    nodeQuery: "sum(1 - rate(node_cpu_seconds_total{mode='idle'}[5m]) * on(namespace,pod) group_left(node) node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>}) by (<<.GroupBy>>)",
                    resources: {
                        overrides: {
                            node: { resource: "node" },
                            namespace: { resource: "namespace" },
                            pod: { resource: "pod" }
                        }
                    },
                    containerLabel: "container"
                },
                memory: {
                    containerQuery: "sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!='',container!='POD',pod!=''}) by (<<.GroupBy>>)",
                    nodeQuery: "sum((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) * on(namespace,pod) group_left(node) node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>}) by (<<.GroupBy>>)",
                    resources: {
                        overrides: {
                            node: { resource: "node" },
                            namespace: { resource: "namespace" },
                            pod: { resource: "pod" }
                        }
                    },
                    containerLabel: "container"
                },
                window: "5m"
            }
        }
    }
}, {
    dependsOn: [ monitoringNamespace, prometheusOperatorChart ],
    provider: provider
});

It took me a while to work out what I needed to do to get the memory nodeQuery working with the node exporter. Examples, like the config used in this repo, didn't work until I realized I also needed to get the node from node_namespace_pod:kube_pod_info: as the instance label did not match the node name, presumably due to environmental differences when running on AWS EKS.

Anyway, this exercise taught me a lot about the Prometheus Adapter but left me with questions about what was the best way to do things and why, so I appreciate any insights. 😃

increase time period for rate over cadvisor metrics

c72ae7b

manifests: regenerate

dfb6268

paulfantom merged commit 21ace9b into prometheus-operator:master Oct 2, 2019

paulfantom mentioned this pull request Oct 6, 2019

fix incorrect window in containerQuery #257

Merged

gitfool mentioned this pull request Oct 7, 2019

Fix nodeQuery rate for window #259

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

increase time period for rate over cadvisor metrics #254

increase time period for rate over cadvisor metrics #254

paulfantom commented Oct 2, 2019

s-urbaniak commented Oct 2, 2019

s-urbaniak commented Oct 2, 2019

tbarrella commented Oct 3, 2019

gitfool commented Oct 6, 2019 •

edited

s-urbaniak commented Oct 7, 2019

gitfool commented Oct 7, 2019

paulfantom commented Oct 7, 2019

gitfool commented Oct 7, 2019 •

edited

increase time period for rate over cadvisor metrics #254

increase time period for rate over cadvisor metrics #254

Conversation

paulfantom commented Oct 2, 2019

s-urbaniak commented Oct 2, 2019

s-urbaniak commented Oct 2, 2019

tbarrella commented Oct 3, 2019

gitfool commented Oct 6, 2019 • edited

s-urbaniak commented Oct 7, 2019

gitfool commented Oct 7, 2019

paulfantom commented Oct 7, 2019

gitfool commented Oct 7, 2019 • edited

gitfool commented Oct 6, 2019 •

edited

gitfool commented Oct 7, 2019 •

edited