New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kubernetes] Ingest k8s events + limits and requests metrics #2551
Conversation
a901d9d
to
57cb066
Compare
"type": "Normal" | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixture is a little bit large but contains a different kind of events, could be useful later.
Do we have examples of the "extraneous" events that aren't tied to pods or replica sets? |
@irabinovitch generally speaking, every node having the |
c7996e2
to
bc586d5
Compare
fa3dd1d
to
d802dad
Compare
query the master API for events fixed line endings query the master API pass the auth token more logs resolve host_ip minor improvements, fixed log prints post events to DD fixed timestamp evaluation fix timestamp evaluation again on ts logic do not overwrite last timestamp with 0 skip events from other nodes default value if port is missing from instance update skip condition debug logs++ use node ip as host moved timestamp register to kubeutil fixed tests added kubeutil tests added tests for events
7d97705
to
f5cf8f6
Compare
f5cf8f6
to
15f0912
Compare
For people with multiple masters, saying "make the pod running on the master be the one to get all events" doesn't work. Perhaps a PetSet might help? Or perhaps just run a DaemonSet on every node, then create an additional ReplicaSet/Deployment with a single agent pod that only reports events (the other metrics are left to the regular DaemonSet pod). So, on one machine, you'd have two agents, with no overlap between the data they report back. |
Hi @masci! I was notified about this PR by @irabinovitch. I just opened a new PR (#2728) about k8s deployment metrics. We should see if we could somehow combine the two. A quick recap of what I did:
Other than that there's the obvious new fixture and some added cases in the tests. I also opened an issue #2722. |
Also: Are there any events for deployments? Currently I do not have deploy markers on my graphs for my team, because we are deploying using kubectl. It would be cool to have some sort of way to detect a k8s deploy. |
@@ -93,5 +106,101 @@ def extract_kube_labels(self, pods_list, excluded_keys=None): | |||
|
|||
return kube_labels | |||
|
|||
def extract_meta(self, pods_list, field_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? It's nice and it's tested ( 🍪 ) but I don't see any usage of it.
Let's make sure that it works nicely with 1.2 and 1.3, then it's good to go. |
all good on 1.2 too, you can squash and 👍 |
Is this available for testing in any of the Docker images? |
You can try the |
I tried nightly, but I'm not sure it has the latest: -rw-r--r--. 4 root root 10843 Aug 19 04:04 kubernetes.py It doesn't e.g. import the time module as done in this PR. |
From
From https://hub.docker.com/r/datadog/docker-dd-agent/builds/bc5brbm22yzmrwsiruph9ua/ 2016-08-29T16:40:33.009Z Nothing else lines up exactly, either. E.g. I can't find a revision of kubernetes.py that is exactly 10843 bytes. |
Nevermind, I was looking at the commit history in the PR, not in the repo. It turns out that the image has kubernetes.py from my Aug 7 change. :-) So, despite the nightly name, it looks like the source tree has .py files from Aug 19. |
Important note: This requires that the The preferred mechanism to get the API Server name is by using the environmental variables injected into every pod: |
This also only gets the events for the |
cc: @remh who I was chatting with on IRC |
Hi @rosskukulinski and thanks for your feedback, very precious. You can specify any namespace other than Since this code is already in A PR would work well too 😜 ! Thanks! |
done, thanks @masci. Up to my clients as to whether I invest the development time in submitting PRs. |
# requests | ||
try: | ||
for request, value_str in container['resources']['requests'].iteritems(): | ||
values = [float(s) for s in prog.findall(value_str)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to parse eg. 500m as 500 instead of 0.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: the Kubernetes code dealing with quantities lives in
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/resource/quantity.go
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/resource/suffix.go
@masci, you said:
I'm looking at the kubernetes.yaml.example file, and I don't see anything documented on how to change the namespace where it looks for the kubernetes service. Could you clarify? |
What
limits
andrequests
metrics.Logic recap:
Each agent fetches the whole list of available events and discards those not regarding pods running on the same node.collect_events
config param is set toTrue
, the agent fetches the whole list of available eventsThe strategy above could be replaced with delegatingUsers should delegate one and only one agent pod to collect and send all the events - that agent possibly being the one running on k8s master. This unfortunately doesn't work on GCE where the master is not part of the cluster but it's provided by the platform as an external service instead. How to achieve this part is left to the users.Kubeutil
singleton and used to filter out older events.Open issues: