[integrations-core][kubernetes_state] Metrics update #853

CharlyF · 2017-11-06T20:01:01Z

What does this PR do?

Adding new metrics from Kubernetes_state.

Motivation

We have received requests from the community.

Testing Guidelines

I will add unitests if the methods to collects data (e.g. terminated/waiting reason) are the way we chose to proceed.

Versioning

Bumped the version check in manifest.json
Updated CHANGELOG.md. Please use Unreleased as the date in the title
for the new section.

xvello

Good first iteration, please also update the metadata.csv

xvello · 2017-11-07T15:24:05Z

kubernetes_state/check.py

@@ -68,6 +72,8 @@ def __init__(self, name, init_config, agentConfig, instances=None):
            'kube_pod_container_status_running': 'container.running',
            'kube_pod_container_status_terminated': 'container.terminated',
            'kube_pod_container_status_waiting': 'container.waiting',
+            'kube_pod_container_resource_requests_nvidia_gpu_devices': 'container.gpu.resource_request',


let's just rename them container.gpu.request / .limit like for cpu and memory

xvello · 2017-11-08T08:44:59Z

kubernetes_state/test_kubernetes_state.py

@@ -73,9 +82,9 @@ def assertMetricNotAllZeros(self, metric_name):

    @mock.patch('checks.prometheus_check.PrometheusCheck.poll')
    def test__update_kube_state_metrics(self, mock_poll):
-        f_name = os.path.join(os.path.dirname(__file__), 'ci', 'fixtures', 'prometheus', 'protobuf.bin')


Please remove the protobuf payload if unused, we can still retrieve it from the git history

xvello · 2017-11-08T08:55:22Z

kubernetes_state/check.py

+                tags = [self._format_tag(label.name, label.value) for label in metric.label]
+                self.gauge(metric_name, 0, tags)
+
+    def kube_cronjob_next_schedule_time(self, message, **kwargs):


Great use case, we must test during RC phase that it recovers correctly (one could taint noschedule all the nodes, make sure the cronjob goes critical, then remove the taint and observe it recovering).

You should add a message in the service check to compute the delay (now - scheduled)

xvello · 2017-11-08T08:59:17Z

kubernetes_state/check.py

@@ -231,6 +254,33 @@ def kube_pod_status_phase(self, message, **kwargs):
            status = self.pod_phase_to_status.get(phase, self.UNKNOWN)
            self.service_check(check_basename + phase, status, tags=tags)

+    def kube_cronjob_status_last_schedule_time(self, message, **kwargs):


Seems redundant with kube_cronjob_next_schedule_time, and the .delay name is misleading (I'm expecting it to be positive it scheduling is lagging behind). I'd rather remove it unless we have a clear use case for this gauge.

Got it, I like the recommendation of adding a message in the SC kube_cronjob_next_schedule_time.

xvello · 2017-11-08T09:04:44Z

kubernetes_state/check.py

@@ -326,6 +376,22 @@ def kube_node_spec_unschedulable(self, message, **kwargs):
        else:
            self.log.error("Metric type %s unsupported for metric %s" % (message.type, message.name))

+    def kube_pod_container_status_waiting_reason(self, message, **kwargs):


For these two metrics, I'm not sure whether we should tag by podname, or just increment a node counter. @hkaj what's your opinion on that?

We need the pod name, node name and reason for this metric. One reason a pod can be waiting is ErrImagePull, as a user I'd like to know if I fat-fingered the image name in my pod spec and all the pods of a same family are failing because of that.

I would also like to know if a node is making all its containers wait because it can't pull images or its containers are stuck in creation.

Now how we expose this info is up to discussion. I think the metric name should be self.NAMESPACE + '.pod_container_status_waiting.' + reason. Reasons are finite and hardcoded here: https://github.com/kubernetes/kube-state-metrics/blob/b26cfb26fe5e858f8033a641a321abef25496c62/collectors/pod.go#L38 so it should be fine. If we add pods and nodes as tags we should be able to group/filter across the cluster quite easily. I'm open to other options though @CharlyF & @xvello

@hkaj I liked the option of not adding reason in the metric name so that one could monitor the metric reason and be alerted if there are more than X pods waiting because of {{reason}} on {{node.name}}.
Less cumbersome to set for a user imo.

Yeah that's a good point, then what do you think of removing .reason from the metric name? pod.containers_waiting with the reason tag?

So I think reason should remain in the name to distinguish the metric from

integrations-core/kubernetes_state/check.py

Line 74 in da26a02

'kube_pod_container_status_waiting': 'container.waiting',

(so that both can be used)

hmm shouldn't we just add the reason tag to the existing metric then? Does the new one add any value?

I actually get the point - Indeed the .reason will be at one, on the tag representing the reason whereas the regular is at 1 if the container is waiting.
https://github.com/kubernetes/kube-state-metrics/blob/master/collectors/pod_test.go#L284-L292.

Now, I can't just remove kube_pod_container_status_waiting and I don't know if using kube_pod_container_status_waiting_reason to override or even replace container.waiting is a good idea because this metric became available with 1.1.0 so people monitoring the metric already might start missing it. Thoughts ?

could we keep the existing metric, but instead of adding the new one we match its pod with the existing metric and just add the reason as a tag to it?

Best solution I came up with is adding a meta crawl first DataDog/dd-agent#3583

Why did I implement it this way:
The prometheus process_metric method goes through all the items in ignored, metrics_mapper and then getattr (i.e. methods in k8s-state check with the metric name as a method name).
While processing the waiting_reason metric, we don't have access to the message of waiting (i.e. pod name).
The approach is: If we see the new waiting_reason, populate the old metric name and add the reason as a tag.
We can't use a boolean because we can't trust the order in which the metrics are coming from prometheus and hence, might miss the _reason and compute the old one before.
The solution:
We can't use a boolean for all the new metrics because the waiting_reason might be exposed but terminated_reason might not and we don't want to miss terminated.
Secondly, from my testings, the decision of whether to compute the new/old metrics or not has to be before processing the metrics in the kube-state check because getattr is non blocking and metrics are missed.
Using a dict of meta metrics will make this re-usable should we want to process several times the prometheus output to associate more metadata with the underlying pods.
cc @xvello and @hkaj

hkaj · 2017-11-13T12:39:48Z

kubernetes_state/check.py

+        check_basename = self.NAMESPACE + '.cronjob.on_schedule_check'
+        curr_time = int(time.time())
+        for metric in message.metric:
+            on_schedule = int(metric.gauge.value) - curr_time


Just to be sure: there's no TZ issue with this?

There should not be a TZ issue as we rely on the epoch timestamp

hkaj · 2017-11-13T14:22:39Z

kubernetes_state/check.py

@@ -326,6 +376,22 @@ def kube_node_spec_unschedulable(self, message, **kwargs):
        else:
            self.log.error("Metric type %s unsupported for metric %s" % (message.type, message.name))

+    def kube_pod_container_status_waiting_reason(self, message, **kwargs):


We need the pod name, node name and reason for this metric. One reason a pod can be waiting is ErrImagePull, as a user I'd like to know if I fat-fingered the image name in my pod spec and all the pods of a same family are failing because of that.

I would also like to know if a node is making all its containers wait because it can't pull images or its containers are stuck in creation.

Now how we expose this info is up to discussion. I think the metric name should be self.NAMESPACE + '.pod_container_status_waiting.' + reason. Reasons are finite and hardcoded here: https://github.com/kubernetes/kube-state-metrics/blob/b26cfb26fe5e858f8033a641a321abef25496c62/collectors/pod.go#L38 so it should be fine. If we add pods and nodes as tags we should be able to group/filter across the cluster quite easily. I'm open to other options though @CharlyF & @xvello

hkaj · 2017-11-13T14:23:01Z

kubernetes_state/check.py

+    def kube_pod_container_status_waiting_reason(self, message, **kwargs):
+        """ Reason a container is in a waiting state """
+        # Capturing the pod name so the metric can be compared to kube_pod_container_status_waiting
+        metric_name = self.NAMESPACE + '.pod_container_status_waiting.reason'


not sure about this metric name, see my comment above

CharlyF · 2017-11-15T10:03:18Z

Sister PR: DataDog/dd-agent#3583

CharlyF · 2017-11-15T17:37:44Z

@hkaj I have rolled back my work on adding the reason tag to the legacy metric because of the following.
As it stands, the .waiting and the .waiting.reason (respectively .terminated and .terminated.reason) are not consistent.
This output will better explain the problem:
Run at T1

kube_pod_container_status_waiting{container="nginxs",namespace="default",pod="nginxs-d78b8df7d-bc6vp"} 1
[...]
kube_pod_container_status_waiting_reason{container="nginx",namespace="default",pod="nginx-58dd565ddf-lvpxt",reason="ErrImagePull"} 0
kube_pod_container_status_waiting_reason{container="nginxs",namespace="default",pod="nginxs-d78b8df7d-bc6vp",reason="ContainerCreating"} 0
kube_pod_container_status_waiting_reason{container="nginxs",namespace="default",pod="nginxs-d78b8df7d-bc6vp",reason="ErrImagePull"} 1

Run at T1 + 30seconds

kube_pod_container_status_waiting{container="nginxs",namespace="default",pod="nginxs-d78b8df7d-bc6vp"} 1
[...]
kube_pod_container_status_waiting_reason{container="nginx",namespace="default",pod="nginx-58dd565ddf-lvpxt",reason="ErrImagePull"} 0
kube_pod_container_status_waiting_reason{container="nginxs",namespace="default",pod="nginxs-d78b8df7d-bc6vp",reason="ContainerCreating"} 0
kube_pod_container_status_waiting_reason{container="nginxs",namespace="default",pod="nginxs-d78b8df7d-bc6vp",reason="ErrImagePull"} 0

As you can see, the .reason will be at 1 but only for a few seconds, but then will be back to 0. Whereas the pod is still waiting so the waiting metric is at 1.
This is probably due to the fact that the reason is at first ErrImagePull but then goes to ImagePullBackOff (which is not an available reason).

An image:

I am implementing these back into the check as counts so they can be used along with the regular .waiting metrics. The point being to see across the last hour, how many OOM have we had (with the termination.reason metric) etc:

cc: @xvello

xvello · 2017-11-16T09:21:45Z

kubernetes_state/metadata.csv

@@ -34,6 +36,9 @@ kubernetes_state.limitrange.memory.max_limit_request_ratio,gauge,,,,Maximum memo
 kubernetes_state.node.cpu_capacity,gauge,,cpu,,The total CPU resources of the node,0,kubernetes,k8s_state.node.cpu_capacity
 kubernetes_state.node.memory_capacity,gauge,,byte,,The total memory resources of the node,0,kubernetes,k8s_state.node.memory_capacity
 kubernetes_state.node.pods_capacity,gauge,,,,The total pod resources of the node,0,kubernetes,k8s_state.node.pods_capacity
+kubernetes_state.node.gpu.cards_allocatable,gauge,,,The Nvidia GPU resources of a node that are available for scheduling,0,kubernetes,k8s_state.node.gpu.cards_allocatable


Let's remove the chameleon from our descriptions

xvello · 2017-11-16T09:22:29Z

kubernetes_state/check.py

@@ -273,6 +297,32 @@ def kube_pod_status_phase(self, message, **kwargs):
            status = self.pod_phase_to_status.get(phase, self.UNKNOWN)
            self.service_check(check_basename + phase, status, tags=tags)

+    def kube_pod_container_status_waiting_reason(self, message, **kwargs):
+        metric_name = self.NAMESPACE + '.container.status_report.count.waiting'


Name LGTM, need to add it to the metadata though

xvello · 2017-11-16T09:25:41Z

kubernetes_state/check.py

+    def kube_pod_container_status_waiting_reason(self, message, **kwargs):
+        metric_name = self.NAMESPACE + '.container.status_report.count.waiting'
+        for metric in message.metric:
+            tags = [self._format_tag(label.name, label.value) for label in metric.label]


We should at least remove the container tag from these two metrics, see https://github.com/kubernetes/kube-state-metrics/blob/master/Documentation/pod-metrics.md

I'm still thinking even pod cardinality is overkill for that metric, I'd remove the pod label too, especially as we'll be submitting counts even on normal operations (terminated OK, waiting on containercreating), that'll spam a lot of contexts.
@hkaj can you chime in on that?

Let's rename container in kube_container_name to be consistent with the kubernetes integration
Let's not submit the pod tag, and add a //TODO in agent6, use the tagger to get pod creator tags.

//todo should also be here so users know that we purposefully skipped it but it's coming

xvello

Optimizations/nitpick, plus don't forget https://github.com/DataDog/integrations-core/pull/853/files#r151357487

Then we're good to go!

xvello · 2017-11-16T12:43:19Z

kubernetes_state/check.py

@@ -172,6 +173,8 @@ def check(self, instance):
        self.job_succeeded_count = defaultdict(int)
        self.job_failed_count = defaultdict(int)

+        self.whitelisted_reasons = {"waiting":["ErrImagePull"],"terminated":["OOMKilled","ContainerCannotRun","Error"]}


They'll be created at every run. you can put them as module constants at the top, like METRIC_TYPES. You should assign two separate arrays instead of a dict for performance.

xvello · 2017-11-16T12:43:34Z

kubernetes_state/check.py

@@ -91,6 +91,7 @@ def __init__(self, name, init_config, agentConfig, instances=None):
            'kube_replicationcontroller_status_replicas': 'replicationcontroller.replicas',
            'kube_statefulset_replicas': 'statefulset.replicas_desired',
            'kube_statefulset_status_replicas': 'statefulset.replicas',
+


let's remove that empty line

xvello · 2017-11-16T12:45:04Z

kubernetes_state/check.py

                        tags.append(self._format_tag(label.name, label.value))
                    else:
-                        reason = False
+                        report = True


do you mean skip_metric = True ?

You can also start with skip_metric = True then put it to false when encountering a valid reason. That way you don't need the else

xvello

:chameleon:

CharlyF added containers enhancement do-not-merge/WIP and removed do-not-merge/WIP labels Nov 6, 2017

CharlyF changed the title ~~[WIP][integrations-core][kubernetes_state] Metrics update~~ [integrations-core][kubernetes_state] Metrics update Nov 7, 2017

CharlyF requested a review from xvello November 7, 2017 18:04

xvello suggested changes Nov 8, 2017

View reviewed changes

CharlyF force-pushed the charly/kube_state_metrics_update branch from 97d75ed to da26a02 Compare November 8, 2017 15:24

hkaj reviewed Nov 13, 2017

View reviewed changes

CharlyF added 8 commits November 14, 2017 18:27

adding pod and node gpu metrics

e9f45fd

adding ignored metrics and 2 cronjob methods

c88bf3b

adding specific methods for waiting and terminated reason metrics

5693a04

adding comments and pvc metric

7a20eb5

bumping manifest, updating Changelog, adding metrics to the check

55f829c

adding k8s_state blob to test new metrics. Using txt encoding.

b6e1227

updating per @xvello's review

96ad61d

reasing off of [#801] and adding prometheus.txt

c53fa39

CharlyF force-pushed the charly/kube_state_metrics_update branch from e158e97 to c53fa39 Compare November 14, 2017 17:33

charlyF added 2 commits November 15, 2017 10:35

adding meta list for the new _reason metrics

46e64e9

correcting test payload

e9d242c

charlyF added 3 commits November 15, 2017 16:03

option to disable the pre_processing

5b30da4

rollback to the .reason metric but as a count and legacy metric

9b9d76e

removing preprocessing feature

61f0780

fixing proactively the check

322def0

CharlyF mentioned this pull request Nov 15, 2017

Updating pod.phase Service Check #874

Merged

2 tasks

charlyF added 2 commits November 15, 2017 22:22

updating test metric

d760182

updating to more explicite metric name

5b24363

xvello suggested changes Nov 16, 2017

View reviewed changes

charlyF added 3 commits November 16, 2017 11:59

updating metadata, adding reason whitelists

f215b58

updating tests

a8c203b

reason whitelist has to be global

e47d2b9

xvello suggested changes Nov 16, 2017

View reviewed changes

charlyF added 2 commits November 16, 2017 13:47

bad variable name

33c8252

nitpicks per @xvello's review

dd49f67

xvello approved these changes Nov 16, 2017

View reviewed changes

CharlyF merged commit a0bc1be into master Nov 16, 2017

CharlyF deleted the charly/kube_state_metrics_update branch November 21, 2017 13:24

CharlyF mentioned this pull request Dec 8, 2017

[Kubernetes] missing new data from kube-state-metrics 1.0 and 1.1 #802

Closed

deiwin mentioned this pull request Jul 25, 2018

kubernetes_state: plumb more container waiting reasons #1763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[integrations-core][kubernetes_state] Metrics update #853

[integrations-core][kubernetes_state] Metrics update #853

CharlyF commented Nov 6, 2017 •

edited

xvello left a comment

xvello Nov 7, 2017

xvello Nov 8, 2017

xvello Nov 8, 2017

xvello Nov 8, 2017

xvello Nov 8, 2017

CharlyF Nov 8, 2017

xvello Nov 8, 2017

hkaj Nov 13, 2017

CharlyF Nov 13, 2017

hkaj Nov 13, 2017

CharlyF Nov 13, 2017

hkaj Nov 13, 2017

CharlyF Nov 14, 2017

hkaj Nov 14, 2017

CharlyF Nov 15, 2017

CharlyF Nov 15, 2017 •

edited

hkaj Nov 13, 2017

CharlyF Nov 13, 2017

hkaj Nov 13, 2017

hkaj Nov 13, 2017

CharlyF commented Nov 15, 2017

CharlyF commented Nov 15, 2017 •

edited

xvello Nov 16, 2017

xvello Nov 16, 2017

xvello Nov 16, 2017

xvello Nov 16, 2017

hkaj Nov 16, 2017

xvello left a comment

xvello Nov 16, 2017

xvello Nov 16, 2017

xvello Nov 16, 2017

xvello left a comment

[integrations-core][kubernetes_state] Metrics update #853

[integrations-core][kubernetes_state] Metrics update #853

Conversation

CharlyF commented Nov 6, 2017 • edited

What does this PR do?

Motivation

Testing Guidelines

Versioning

xvello left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CharlyF Nov 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CharlyF commented Nov 15, 2017

CharlyF commented Nov 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xvello left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xvello left a comment

Choose a reason for hiding this comment

CharlyF commented Nov 6, 2017 •

edited

CharlyF Nov 15, 2017 •

edited

CharlyF commented Nov 15, 2017 •

edited