Docker stats without CPU precentage calculation #28881

ruflin · 2016-11-28T14:02:28Z

When using the docker stats api currently each request to the API will take up to 2 seconds. As explained in #23188 the docker API needs two measurement points to calculate an average.

For stats collectors like Metricbeat it would be useful to get the raw cpu values and either do the calculations on the collector side or directly when viewing the data on the storage system (in our case Elasticsearch).

I'm proposing to add a flag/param to the docker stats API request that skips average calculations and return only raw values instead. This will speed up the API request and will potentially also reduce the load on the docker service as the request is open for a much shorter time.

Collecting stats for a large number of containers has currently to be done one by one which leads to lots of open http requests and is quite slow because of the above issue. This will potentially be solved with #25361 by @WeiZhang555 but also as part of this bulk request it would be nice to have the flag to disable calculations.

thaJeztah · 2016-11-28T14:45:04Z

I wonder if this would be more suitable for the metrics roadmap; #27307

ruflin · 2016-11-28T15:02:42Z

@thaJeztah I was looking at #27307 and the reason I did not add it there is because of the Prometheus specific discussions. But looking at the current implementations it looks like they are quite generic and not necessarily Prometheus specific (which I like). So this would also fit on the Roadmap, but then probably the title should be adjusted to make it a generic metrics roadmap discussion.

WeiZhang555 · 2016-11-29T02:48:03Z

In #23188, I think this is already answered. The daemon is doing the collecting periodly(every 1 second), even you disabled the CPU Percentage calculation (means don't wait the PreCPUStat), every API call still needs about 1s.

For many containers and many API calls, I think #25361 should help a lot with a "stating all containers in one http API call".

ruflin · 2016-11-29T20:53:32Z

@WeiZhang555 Instead of waiting 1s, just the current one in memory could be returned. I haven't checked the code yet for the daemon, but I would assume this could also be fetched on request time instead of fetching it every second which would also eliminate the wait time.

I see docker stats and fetching data from the api as two different use cases. For docker stats some prefetching and calculations are needed, as this is what the end user will consume directly. The API is consumed by stats collectors which can do additional processing with the data.

brian-brazil · 2016-12-12T10:01:05Z

Just a quick note to add that Prometheus would also be looking for the raw data, and there are parsers available to read our data format.

ruflin · 2016-12-12T11:37:37Z

@brian-brazil Can you elaborate on that? Do you mean that docker would by default expose the "raw" metric data in "prometheus" format?

cpuguy83 · 2016-12-12T13:33:22Z

@ruflin If we can reduce the load caused by stats collection I could definitely see that as an option at least. Unfortunately right now performance for stats collection is unacceptable for more than just a few containers.... maybe it could be exposed as an option regardless and leave it up to the admin to decide if they want it on or off.

In 1.13 dockerd will start exposing a bunch of it's own internal runtime metrics (not container related) for prometheus.

ruflin · 2016-12-13T07:57:26Z

@cpuguy83 I assume the load caused by stats collection is more the amount of stats than can be collected from each container and less about the format. Do the stats also cause load if no collection happens from an external tool? If yes, agree an option to turn it off would be nice.

Will the stats exposed in 1.13 from dockerd also be in JSON format? Any PR or Issue to have a look at? Looking forward to potentially also add these metrics to metricbeat :-)

cpuguy83 · 2016-12-13T15:21:38Z

@ruflin it's a combination of the way runc collects stats and how docker collects stats from runc (polling).. the format probably doesn't help (goes through multiple transitions), but is probably the least of things to worry about.
It's being worked on in runc.

The internal docker daemon metrics (these are not container stats) are exposed specifically for prometheus.

ruflin · 2016-12-15T06:25:36Z

@cpuguy83 Specifically for prometheus means in this format here? https://prometheus.io/docs/instrumenting/exposition_formats/ Any issue / pr I could have a look at?

garnier-quentin · 2017-06-30T14:39:11Z

I'm proposing to add a flag/param to the docker stats API request that skips average calculations and return only raw values instead. This will speed up the API request and will potentially also reduce the load on the docker service as the request is open for a much shorter time.

I'm 100% agreed. I develop a plugin for centreon/icinga/shinken/... systems. And i only need raw values. It's my plugin which manage counters (like for SNMP counters) to have average.
Moreover, it will also be great to not wait for the newest value. It will be great to give the current daemon value have. It's like SNMP or other systems.

ruflin · 2017-07-04T07:20:21Z

As an update, #25361 got closed without merging which makes this even more relevant. I didn't follow closely what happened on the stats side since then. Does anyone know more details?

cpuguy83 · 2017-07-04T11:18:10Z

@ruflin Stats are fully integrated into containerd 1.0 and are much faster. Once containerd 1.0 is ready, this should be drastically improved.

ruflin · 2017-07-06T07:18:31Z

@cpuguy83 Thanks for the heads up. Will this mean the API will change? If yes, will there be 2 API's for some time?

garnier-quentin · 2017-07-06T07:30:20Z

An option could be enough no ? Like: url?stream=false&type=raw

cpuguy83 · 2017-07-06T19:31:45Z

@ruflin There would not be an API change immediately.

thaJeztah · 2019-03-29T11:36:35Z

@cpuguy83 did we switch to the containerd-based stats already? (i.e., can this be closed?)

cpuguy83 · 2019-03-29T19:30:18Z

Yes it is using containerd to collect stats, which should be more performant now (or at least that's my understanding of certain changes), but I have not done any testing on this.

ruflin · 2019-04-02T12:30:44Z

Any chance someone could point to the PR's where this was changed. What I'm most curious about is if the issue around frequent collection interval was solved and if we can access now raw values.

thaJeztah · 2019-04-02T12:42:50Z

This would be part of the move to containerd 1.x, which started with #34895 (but there's been follow-ups)

raags · 2020-12-26T14:35:45Z

Seems like one-shot from here #40478 fixes this issue.

cpuguy83 · 2020-12-26T15:28:12Z

Yep, seems so. The stats api now supports a query param ("one-shot=true") to only do 1 collection.
We also did a a couple other things that sped up collection in 20.10 which were backported to 19.03. Later versions of 19.03 take just over 1s instead of just about 2s to collect.

20.10 with this bee param I've seen about 100ms to do a collection.

Thank!

GordonTheTurtle added the area/swarm label Nov 28, 2016

thaJeztah added area/api kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. and removed area/swarm labels Nov 28, 2016

garnier-quentin mentioned this issue Dec 16, 2018

[cloud::docker::restapi::plugin]CPU load average question centreon/centreon-plugins#1120

Closed

cpuguy83 closed this as completed Dec 26, 2020

jsoriano mentioned this issue Jan 4, 2021

Leverage one-shot container metrics in Docker metricsets elastic/beats#23345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker stats without CPU precentage calculation #28881

Docker stats without CPU precentage calculation #28881

ruflin commented Nov 28, 2016

thaJeztah commented Nov 28, 2016

ruflin commented Nov 28, 2016

WeiZhang555 commented Nov 29, 2016

ruflin commented Nov 29, 2016

brian-brazil commented Dec 12, 2016

ruflin commented Dec 12, 2016

cpuguy83 commented Dec 12, 2016

ruflin commented Dec 13, 2016

cpuguy83 commented Dec 13, 2016

ruflin commented Dec 15, 2016

garnier-quentin commented Jun 30, 2017

ruflin commented Jul 4, 2017

cpuguy83 commented Jul 4, 2017

ruflin commented Jul 6, 2017

garnier-quentin commented Jul 6, 2017

cpuguy83 commented Jul 6, 2017

thaJeztah commented Mar 29, 2019

cpuguy83 commented Mar 29, 2019

ruflin commented Apr 2, 2019

thaJeztah commented Apr 2, 2019

raags commented Dec 26, 2020

cpuguy83 commented Dec 26, 2020

Docker stats without CPU precentage calculation #28881

Docker stats without CPU precentage calculation #28881

Comments

ruflin commented Nov 28, 2016

thaJeztah commented Nov 28, 2016

ruflin commented Nov 28, 2016

WeiZhang555 commented Nov 29, 2016

ruflin commented Nov 29, 2016

brian-brazil commented Dec 12, 2016

ruflin commented Dec 12, 2016

cpuguy83 commented Dec 12, 2016

ruflin commented Dec 13, 2016

cpuguy83 commented Dec 13, 2016

ruflin commented Dec 15, 2016

garnier-quentin commented Jun 30, 2017

ruflin commented Jul 4, 2017

cpuguy83 commented Jul 4, 2017

ruflin commented Jul 6, 2017

garnier-quentin commented Jul 6, 2017

cpuguy83 commented Jul 6, 2017

thaJeztah commented Mar 29, 2019

cpuguy83 commented Mar 29, 2019

ruflin commented Apr 2, 2019

thaJeztah commented Apr 2, 2019

raags commented Dec 26, 2020

cpuguy83 commented Dec 26, 2020