[kubernetes] Ingest k8s events + limits and requests metrics #2551

masci · 2016-05-27T17:12:56Z

What

Fetch events from the Kubernetes API and feed them to Datadog.
Add new limits and requests metrics.

Logic recap:

~~Each agent fetches the whole list of available events and discards those not regarding pods running on the same node.~~
If the collect_events config param is set to True, the agent fetches the whole list of available events
~~The strategy above could be replaced with delegating~~ Users should delegate one and only one agent pod to collect and send all the events - that agent possibly being the one running on k8s master. This unfortunately doesn't work on GCE where the master is not part of the cluster but it's provided by the platform as an external service instead. How to achieve this part is left to the users.
To avoid duplicates, the timestamp of the oldest event seen is stored in the Kubeutil singleton and used to filter out older events.

Open issues:

There are events not directly tied to Pods, like ReplicaSet events. Those are discarded for the moment.
There are serious concerns about flooding since there's no way at the moment to perform selection queries through the event API. We rely on the fact that k8s events currently have a TTL limited to 1 hour.

masci · 2016-05-27T17:18:16Z

tests/checks/fixtures/kubernetes/events.json

+      "type": "Normal"
+    }
+  ]
+}


This fixture is a little bit large but contains a different kind of events, could be useful later.

irabinovitch · 2016-06-01T09:01:36Z

@masci @remh Given the discussion about above GKE vs K8S should we reach out to our friends at Google for feedback on approach?

irabinovitch · 2016-06-01T23:17:35Z

Do we have examples of the "extraneous" events that aren't tied to pods or replica sets?

masci · 2016-06-01T23:56:21Z

@irabinovitch generally speaking, every node having the Kind field set to something different from Pod, for example Created pod: xxx having ReplicaSet kind, or Scaled up xxx of Deployment kind.

query the master API for events fixed line endings query the master API pass the auth token more logs resolve host_ip minor improvements, fixed log prints post events to DD fixed timestamp evaluation fix timestamp evaluation again on ts logic do not overwrite last timestamp with 0 skip events from other nodes default value if port is missing from instance update skip condition debug logs++ use node ip as host moved timestamp register to kubeutil fixed tests added kubeutil tests added tests for events

therc · 2016-07-20T16:27:40Z

For people with multiple masters, saying "make the pod running on the master be the one to get all events" doesn't work. Perhaps a PetSet might help? Or perhaps just run a DaemonSet on every node, then create an additional ReplicaSet/Deployment with a single agent pod that only reports events (the other metrics are left to the regular DaemonSet pod). So, on one machine, you'd have two agents, with no overlap between the data they report back.

onbjerg · 2016-08-03T00:00:24Z

Hi @masci! I was notified about this PR by @irabinovitch. I just opened a new PR (#2728) about k8s deployment metrics. We should see if we could somehow combine the two.

A quick recap of what I did:

Queried the deployments endpoint (/apis/extensions/v1beta1/deployments)
Looped over each deployment and pulled data for the 3 metrics in [kubernetes] add metrics for k8s deployments #2728 (kubernetes.pods.desired, kubernetes.pods.available, kubernetes.pods.unavailable), tagged with kube_deployment:{name}
Resolved the deployment name for a pod based on the ReplicaSet name in kubernetes.io/created-by and the pod template hash (see L108 of kubeutil.py in the PR). This is used to tag the subcontainers with kube_deployment:{name}.

Other than that there's the obvious new fixture and some added cases in the tests. I also opened an issue #2722.

onbjerg · 2016-08-03T01:05:03Z

Also: Are there any events for deployments? Currently I do not have deploy markers on my graphs for my team, because we are deploying using kubectl. It would be cool to have some sort of way to detect a k8s deploy.

hkaj · 2016-08-18T15:21:25Z

utils/kubeutil.py

@@ -93,5 +106,101 @@ def extract_kube_labels(self, pods_list, excluded_keys=None):

        return kube_labels

+    def extract_meta(self, pods_list, field_name):


do we need this? It's nice and it's tested ( 🍪 ) but I don't see any usage of it.

hkaj · 2016-08-18T17:01:31Z

Let's make sure that it works nicely with 1.2 and 1.3, then it's good to go.

hkaj · 2016-08-23T13:22:09Z

all good on 1.2 too, you can squash and 👍

therc · 2016-08-30T17:42:24Z

Is this available for testing in any of the Docker images?

masci · 2016-08-31T08:42:18Z

You can try the nightly tag from https://hub.docker.com/r/datadog/docker-dd-agent/tags/ but please consider the agent as unstable. Events collection is disabled by default, you can activate it in the check configuration file if you want. Later it'll be possible to instrument that feature through env variables, see DataDog/docker-dd-agent#117

therc · 2016-08-31T16:23:36Z

I tried nightly, but I'm not sure it has the latest:

-rw-r--r--. 4 root root 10843 Aug 19 04:04 kubernetes.py
-rw-r--r--. 1 root root 9823 Aug 31 15:41 kubernetes.pyc

It doesn't e.g. import the time module as done in this PR.

therc · 2016-08-31T16:39:32Z

From docker inspect:

"Created": "2016-08-29T16:41:53.174520676Z",

From https://hub.docker.com/r/datadog/docker-dd-agent/builds/bc5brbm22yzmrwsiruph9ua/

2016-08-29T16:40:33.009Z

Nothing else lines up exactly, either. E.g. I can't find a revision of kubernetes.py that is exactly 10843 bytes.

therc · 2016-08-31T16:51:18Z

Nevermind, I was looking at the commit history in the PR, not in the repo. It turns out that the image has kubernetes.py from my Aug 7 change. :-)

https://github.com/DataDog/dd-agent/blob/ef3aa479ea2aae19cb74a5f14e70a45566197550/checks.d/kubernetes.py

So, despite the nightly name, it looks like the source tree has .py files from Aug 19.

rosskukulinski · 2016-09-12T16:55:47Z

Important note: This requires that the kubernetes service exists within the namespace dd-agent is running. This is not always the case (especially in namespaces that are not default).

The preferred mechanism to get the API Server name is by using the environmental variables injected into every pod: KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT.

rosskukulinski · 2016-09-12T16:57:53Z

This also only gets the events for the default namespace. I believe it should pull events from all namespaces.

rosskukulinski · 2016-09-12T16:59:25Z

cc: @remh who I was chatting with on IRC

masci · 2016-09-12T17:19:37Z

Hi @rosskukulinski and thanks for your feedback, very precious.

You can specify any namespace other than default in the configuration file under the instance section. Anyway, we could (and probably should) collect all the events with no regards for the namespace, but I'd like this to be behind some sort of configuration parameter with reasonable defaults.

Since this code is already in master and is going to be shipped with 5.9, do you mind to open two new, different issues to track and discuss:
a) the way we get the api URL
b) how we can improve events gathering when someone is interested in any namespace available

A PR would work well too 😜 !

Thanks!

rosskukulinski · 2016-09-12T17:45:33Z

done, thanks @masci. Up to my clients as to whether I invest the development time in submitting PRs.

therc · 2016-09-22T21:58:14Z

checks.d/kubernetes.py

+                # requests
+                try:
+                    for request, value_str in container['resources']['requests'].iteritems():
+                        values = [float(s) for s in prog.findall(value_str)]


Isn't this going to parse eg. 500m as 500 instead of 0.5?

FYI: the Kubernetes code dealing with quantities lives in

https://github.com/kubernetes/kubernetes/blob/master/pkg/api/resource/quantity.go
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/resource/suffix.go

jonmoter · 2016-10-06T23:30:16Z

@masci, you said:

You can specify any namespace other than default in the configuration file under the instance section.

I'm looking at the kubernetes.yaml.example file, and I don't see anything documented on how to change the namespace where it looks for the kubernetes service. Could you clarify?

masci · 2016-10-07T13:57:00Z

Hi @jonmoter
that's because documentation and the example yaml file are not up to date unfortunately but basically all you have to do is

instance:
  namespace: my-other-namespace

for more details on the issue and provide feedback let's use #2838

masci force-pushed the massi/ingest_k8s_events branch from a901d9d to 57cb066 Compare May 27, 2016 17:15

masci added checks feature labels May 27, 2016

masci reviewed May 27, 2016
View reviewed changes

masci force-pushed the massi/ingest_k8s_events branch from c7996e2 to bc586d5 Compare June 6, 2016 17:49

masci changed the title ~~[WIP][kubernetes] Ingest Kubernetes events~~ [kubernetes] Ingest Kubernetes events Jun 6, 2016

masci changed the title ~~[kubernetes] Ingest Kubernetes events~~ [kubernetes] Ingest k8s events + limits and requests metrics Jun 21, 2016

masci force-pushed the massi/ingest_k8s_events branch from fa3dd1d to d802dad Compare June 21, 2016 15:37

Massimiliano Pippi and others added 9 commits July 5, 2016 15:07

conditionally execute events fetching

7fd85f4

do not filter events by pod anymore

beecadd

explicit the default value

ccbba19

[skip ci] comment the default value

4d54479

fixed tests

42c0507

wip

e024215

fixed tests according to new fixtures

4c7b121

small fix

c612a72

masci force-pushed the massi/ingest_k8s_events branch 4 times, most recently from 7d97705 to f5cf8f6 Compare July 5, 2016 15:44

fixed tests

15f0912

masci force-pushed the massi/ingest_k8s_events branch from f5cf8f6 to 15f0912 Compare July 5, 2016 15:54

hkaj added this to the 5.9.0 milestone Jul 13, 2016

masci modified the milestones: 5.9.0, 5.10.0 Aug 18, 2016

remove overly verbose debug log

04e46be

hkaj reviewed Aug 18, 2016
View reviewed changes

Massimiliano Pippi added 3 commits August 18, 2016 18:22

avoid logging auth token

4f7c4c1

added notes on unused methods

bd54009

[ci skip] fix boolean values syntax in the yaml file

6f770a4

calm down the logger in case limits were not set

fa37f26

masci merged commit d79f7c8 into master Aug 23, 2016

masci deleted the massi/ingest_k8s_events branch August 23, 2016 14:30

This was referenced Sep 12, 2016

[kubenetes] Lookup k8s API endpoint using environment variables #2837

Closed

[kubernetes] Events from all namespaces #2838

Closed

therc reviewed Sep 22, 2016

View reviewed changes

arohner mentioned this pull request Oct 17, 2016

Use the env var KUBERNETES_SERVICE_HOST if present #2921

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kubernetes] Ingest k8s events + limits and requests metrics #2551

[kubernetes] Ingest k8s events + limits and requests metrics #2551

masci commented May 27, 2016 •

edited

masci May 27, 2016

irabinovitch commented Jun 1, 2016

irabinovitch commented Jun 1, 2016

masci commented Jun 1, 2016

therc commented Jul 20, 2016

onbjerg commented Aug 3, 2016 •

edited

onbjerg commented Aug 3, 2016

hkaj Aug 18, 2016

hkaj commented Aug 18, 2016

hkaj commented Aug 23, 2016

therc commented Aug 30, 2016

masci commented Aug 31, 2016

therc commented Aug 31, 2016

therc commented Aug 31, 2016

therc commented Aug 31, 2016

rosskukulinski commented Sep 12, 2016

rosskukulinski commented Sep 12, 2016

rosskukulinski commented Sep 12, 2016

masci commented Sep 12, 2016

rosskukulinski commented Sep 12, 2016

therc Sep 22, 2016

therc Sep 22, 2016

jonmoter commented Oct 6, 2016

masci commented Oct 7, 2016

		@@ -93,5 +106,101 @@ def extract_kube_labels(self, pods_list, excluded_keys=None):

		return kube_labels

		def extract_meta(self, pods_list, field_name):

[kubernetes] Ingest k8s events + limits and requests metrics #2551

[kubernetes] Ingest k8s events + limits and requests metrics #2551

Conversation

masci commented May 27, 2016 • edited

What

masci May 27, 2016

Choose a reason for hiding this comment

irabinovitch commented Jun 1, 2016

irabinovitch commented Jun 1, 2016

masci commented Jun 1, 2016

therc commented Jul 20, 2016

onbjerg commented Aug 3, 2016 • edited

onbjerg commented Aug 3, 2016

hkaj Aug 18, 2016

Choose a reason for hiding this comment

hkaj commented Aug 18, 2016

hkaj commented Aug 23, 2016

therc commented Aug 30, 2016

masci commented Aug 31, 2016

therc commented Aug 31, 2016

therc commented Aug 31, 2016

therc commented Aug 31, 2016

rosskukulinski commented Sep 12, 2016

rosskukulinski commented Sep 12, 2016

rosskukulinski commented Sep 12, 2016

masci commented Sep 12, 2016

rosskukulinski commented Sep 12, 2016

therc Sep 22, 2016

Choose a reason for hiding this comment

therc Sep 22, 2016

Choose a reason for hiding this comment

jonmoter commented Oct 6, 2016

masci commented Oct 7, 2016

masci commented May 27, 2016 •

edited

onbjerg commented Aug 3, 2016 •

edited