New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPU Entitlement gauge metric & Deprecate CPU Entitlement counter metric #897
Comments
@mkocher I find this change as a must and as a great enhancement. It doesn't make sense to show the app's CPU usage as percentage from the CPU available to the whole VM. It would be much better to show how much of the entitled/available CPU for the app is being used at the moment. Here is an example from an app which we have running in one of our foundations in which the differences can be clearly seen: cf app cf app cf-app-monitoring
Showing health and status for app cf-app-monitoring in org <reducted> / space <reducted> as <reducted>...
name: cf-app-monitoring
requested state: started
routes: cf-app-monitoring.<reducted>
last uploaded: Thu 23 Mar 13:47:59 UTC 2023
stack: cflinuxfs4
buildpacks:
name version detect output buildpack name
staticfile_buildpack 1.6.0 staticfile staticfile
type: web
sidecars:
instances: 2/2
memory usage: 64M
state since cpu memory disk logging details
#0 running 2024-01-12T17:01:09Z 1.1% 15.6M of 64M 5.2M of 1G 0/s of unlimited
#1 running 2024-01-12T17:43:15Z 1.1% 15.7M of 64M 5.2M of 1G 0/s of unlimited cf cpu-entitlement cf cpu-entitlement cf-app-monitoring
Note: This plugin is experimental.
Showing CPU usage against entitlement for app cf-app-monitoring in org <reducted> / space <reducted> as <reducted>...
avg usage curr usage
#0 55.98% 54.78%
#1 58.97% 57.16%
WARNING: Instance #0 was over entitlement from 2024-01-12 17:01:11 to 2024-01-12 17:01:26
WARNING: Instance #1 was over entitlement from 2024-01-12 17:43:23 to 2024-01-12 17:44:23 We should be careful about this change when rolling it out as this would be a breaking change if we stop emitting the current metric by default. We should be loud when announcing this and provide ops files in cf-deployment for activating and switching configuration. |
👍 glad to hear you're in favor Agreed we need to make this backwards compatible, though I'd prefer to turn off the old metrics by default sooner than later. I don't think many people look at them, and container metrics generate a ton of individual time series which can put a burden on some metric stores. |
We checked App Autoscaler Release and searched for absolute_entitlement and absolute_usage and got no results. So we think this is safe from that perspective. |
Dear @mkocher, @chombium, as far as i remember 'cf app' metric shows the percentage the container is currently using from a single CPU core, but not from the entire host VM's CPU. I.e. if we take the @chombium 's example above the app is currently consuming 1.1% from a single host CPU. On our CF deployments we allow CPU burst, in this case if the application is using more CPU we have seen this metric to spike up to several hundreds %. Like for example 300%, in this case the container is consuming 3 CPU out of all available on the host. In general the max value this metric can produce is: (100*N)% where N is the number of CPU cores the host VM has. This metric is an easy way to see if the application is currently bursting when debugging. The CPUEntitlement metric is really a good one, but it has different semantic it shows where the container is positioned with its average/current CPU consumption according to what it is entitled to. Also have in mind that the first metric comes for free while for the 'cf cpu' you need to install a cf cli plugin. |
Yep, the current cpu metric is out of 100*NumberOfCores, not 100. I'm not sure why we do that as an industry, but it is the convention. Apps however aren't allocated cores, they're allocated shares. So using more than 100% doesn't indicate one way or the other if the app is bursting. |
@mkocher CI is failing with rep-spec windows diversion. Should this be applied to rep-windows too? |
Oops. It has been 0️⃣ days since we forgot about windows. As far as we can tell this should be applied verbatim to Windows as well. We'll take a look. |
#901 fixes the windows issue. It also makes Diego releasable again as it make the change non-breaking. |
Will the official container metrics documentation still be updated? |
Summary
Since the beginning of time Cloud Foundry has a CPU metric which has represented the % of the entire host VM's CPU that a container is running. This number does not reflect that the container is sharing the host with a bunch of other containers.
A while ago a AbsoulteCPUEntitlement & AbsoluteCPUUsage metrics were added. This allowed astute users to be able to calculate a percentage of the entitlement being used. A CPU Entitlement plugin was produced to enable users to see this metric.
Upon seeing this metric users found it valuable and wanted to see it in more places and do more with it. When evaluating how to expose it in the Cloud Controller API we realized that while the counter has some advantages, it requires substantial calculations within Cloud Controller, and at least two metrics envelopes to calculate the delta.
Having validated the value and found problems with the approach, we'd like to:
Diego repo
Describe alternatives you've considered (optional)
The text was updated successfully, but these errors were encountered: