Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker stats without CPU precentage calculation #28881

Closed
ruflin opened this issue Nov 28, 2016 · 22 comments
Closed

Docker stats without CPU precentage calculation #28881

ruflin opened this issue Nov 28, 2016 · 22 comments
Labels
area/api kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Comments

@ruflin
Copy link

ruflin commented Nov 28, 2016

When using the docker stats api currently each request to the API will take up to 2 seconds. As explained in #23188 the docker API needs two measurement points to calculate an average.

For stats collectors like Metricbeat it would be useful to get the raw cpu values and either do the calculations on the collector side or directly when viewing the data on the storage system (in our case Elasticsearch).

I'm proposing to add a flag/param to the docker stats API request that skips average calculations and return only raw values instead. This will speed up the API request and will potentially also reduce the load on the docker service as the request is open for a much shorter time.

Collecting stats for a large number of containers has currently to be done one by one which leads to lots of open http requests and is quite slow because of the above issue. This will potentially be solved with #25361 by @WeiZhang555 but also as part of this bulk request it would be nice to have the flag to disable calculations.

@thaJeztah
Copy link
Member

I wonder if this would be more suitable for the metrics roadmap; #27307

@thaJeztah thaJeztah added area/api kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. and removed area/swarm labels Nov 28, 2016
@ruflin
Copy link
Author

ruflin commented Nov 28, 2016

@thaJeztah I was looking at #27307 and the reason I did not add it there is because of the Prometheus specific discussions. But looking at the current implementations it looks like they are quite generic and not necessarily Prometheus specific (which I like). So this would also fit on the Roadmap, but then probably the title should be adjusted to make it a generic metrics roadmap discussion.

@WeiZhang555
Copy link
Contributor

In #23188, I think this is already answered. The daemon is doing the collecting periodly(every 1 second), even you disabled the CPU Percentage calculation (means don't wait the PreCPUStat), every API call still needs about 1s.

For many containers and many API calls, I think #25361 should help a lot with a "stating all containers in one http API call".

@ruflin
Copy link
Author

ruflin commented Nov 29, 2016

@WeiZhang555 Instead of waiting 1s, just the current one in memory could be returned. I haven't checked the code yet for the daemon, but I would assume this could also be fetched on request time instead of fetching it every second which would also eliminate the wait time.

I see docker stats and fetching data from the api as two different use cases. For docker stats some prefetching and calculations are needed, as this is what the end user will consume directly. The API is consumed by stats collectors which can do additional processing with the data.

@brian-brazil
Copy link

Just a quick note to add that Prometheus would also be looking for the raw data, and there are parsers available to read our data format.

@ruflin
Copy link
Author

ruflin commented Dec 12, 2016

@brian-brazil Can you elaborate on that? Do you mean that docker would by default expose the "raw" metric data in "prometheus" format?

@cpuguy83
Copy link
Member

@ruflin If we can reduce the load caused by stats collection I could definitely see that as an option at least. Unfortunately right now performance for stats collection is unacceptable for more than just a few containers.... maybe it could be exposed as an option regardless and leave it up to the admin to decide if they want it on or off.

In 1.13 dockerd will start exposing a bunch of it's own internal runtime metrics (not container related) for prometheus.

@ruflin
Copy link
Author

ruflin commented Dec 13, 2016

@cpuguy83 I assume the load caused by stats collection is more the amount of stats than can be collected from each container and less about the format. Do the stats also cause load if no collection happens from an external tool? If yes, agree an option to turn it off would be nice.

Will the stats exposed in 1.13 from dockerd also be in JSON format? Any PR or Issue to have a look at? Looking forward to potentially also add these metrics to metricbeat :-)

@cpuguy83
Copy link
Member

@ruflin it's a combination of the way runc collects stats and how docker collects stats from runc (polling).. the format probably doesn't help (goes through multiple transitions), but is probably the least of things to worry about.
It's being worked on in runc.

The internal docker daemon metrics (these are not container stats) are exposed specifically for prometheus.

@ruflin
Copy link
Author

ruflin commented Dec 15, 2016

@cpuguy83 Specifically for prometheus means in this format here? https://prometheus.io/docs/instrumenting/exposition_formats/ Any issue / pr I could have a look at?

@garnier-quentin
Copy link

I'm proposing to add a flag/param to the docker stats API request that skips average calculations and return only raw values instead. This will speed up the API request and will potentially also reduce the load on the docker service as the request is open for a much shorter time.

I'm 100% agreed. I develop a plugin for centreon/icinga/shinken/... systems. And i only need raw values. It's my plugin which manage counters (like for SNMP counters) to have average.
Moreover, it will also be great to not wait for the newest value. It will be great to give the current daemon value have. It's like SNMP or other systems.

@ruflin
Copy link
Author

ruflin commented Jul 4, 2017

As an update, #25361 got closed without merging which makes this even more relevant. I didn't follow closely what happened on the stats side since then. Does anyone know more details?

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 4, 2017

@ruflin Stats are fully integrated into containerd 1.0 and are much faster. Once containerd 1.0 is ready, this should be drastically improved.

@ruflin
Copy link
Author

ruflin commented Jul 6, 2017

@cpuguy83 Thanks for the heads up. Will this mean the API will change? If yes, will there be 2 API's for some time?

@garnier-quentin
Copy link

An option could be enough no ? Like: url?stream=false&type=raw

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 6, 2017

@ruflin There would not be an API change immediately.

@thaJeztah
Copy link
Member

@cpuguy83 did we switch to the containerd-based stats already? (i.e., can this be closed?)

@cpuguy83
Copy link
Member

Yes it is using containerd to collect stats, which should be more performant now (or at least that's my understanding of certain changes), but I have not done any testing on this.

@ruflin
Copy link
Author

ruflin commented Apr 2, 2019

Any chance someone could point to the PR's where this was changed. What I'm most curious about is if the issue around frequent collection interval was solved and if we can access now raw values.

@thaJeztah
Copy link
Member

This would be part of the move to containerd 1.x, which started with #34895 (but there's been follow-ups)

@raags
Copy link

raags commented Dec 26, 2020

Seems like one-shot from here #40478 fixes this issue.

@cpuguy83
Copy link
Member

Yep, seems so. The stats api now supports a query param ("one-shot=true") to only do 1 collection.
We also did a a couple other things that sped up collection in 20.10 which were backported to 19.03. Later versions of 19.03 take just over 1s instead of just about 2s to collect.

20.10 with this bee param I've seen about 100ms to do a collection.

Thank!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.
Projects
None yet
Development

No branches or pull requests

8 participants