New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processes plugin: Add support for Linux Delay Accounting. #2598
Conversation
it seems the STRERROR macro isn't used properly, will error out on centos7; and I don't see where STRERRNO is defined? [edit] won't compile on a debian stretch for the same reasons.] ok, scratch that, won't compile if the environment is 5.8.0; works flawlessly in master. |
Yeah, the |
This allows us to print helpful error messages to the user if something goes wrong.
src/processes.c
Outdated
"for the \"CollectDelayAccounting\" option."); | ||
#endif | ||
} else { | ||
ERROR("processes plugin: Option `%s' not allowed heeere.", c->key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix a typo here ;-)
We are watch for changes! ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye, thanks!
src/collectd.conf.pod
Outdated
Delay Accounting provides the time processes wait for the CPU to become | ||
available, for I/O operations to finish, for pages to be swapped in and for | ||
freed pages to be reclaimed. The metrics are reported as a percentage, e.g. | ||
C<percent-delay-cpu>. Disabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
IMHO documented type instance does not match to implemented - there will be no 'delay' word.
Can you please re-check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is amazing feature! That is much-much more useful than mine try to add metrics of processes/threads states (similar to 'ps_state' metrics reported for a system-wide process list, but for selected processes only and their threads). |
Also, want to notice - for myself, in my systems I will report these metrics with a 'ps_delay' type, not a 'percent'. Plugins may report different sets of metrics which units are percents, but related to different datasets ("delays", "cpu usage", other ratios). I dislike 'percent' as a type in a such cases. |
This fixes a regression introduced in 17b81d4.
Agreed. What blows my mind is that Please feel free to merge when you're happy – I don't have any pending changes on this. Best regards, |
cc @tokkee Hi! I have a minor note for you, related to this change.
I think |
Ok, if we leave reported type as a |
@rpv-tomsk Regarding the percent type: it has one serious shortcoming: its maximum value of (a little over) 100. A process with five threads can be blocked for five seconds every second, i.e. 500%. There are two ways to fix this:
I suggest to introduce P.S.: a third option would be to re-use another existing type, but there are no good choices. |
Of course, I like |
But those fields report entirely different metrics …? |
I'm unsure which one we should use. My thought was the following: suppose the process awakened and it takes 50 units of time before it went to sleep again. In real world that was 100 units of time, so the process uses 50% of CPU. During that awakened time it might to use 50 units of CPU, or it might to be delayed for a while. For example, it can spent 5 units of time for IO waiting. So, then we report 10% as 'io delay'. Without a such normalisation we would report only 5% as 'io delay'. I hoped what we can get "50 units" value from I think both variants might be accepted, with delay time normalized to process CPU usage or not... |
Yeah, I also see 140% as IO wait value . ) Also, I missed that fact even at a chart I posted as example. It presents there too. |
Each of these variangs has own advantages: |
Agreed, let's go with the wall clock time for now. We can fiddle with the metrics some more in the future. |
Linux Delay Accounting reports the time a task was delayed by
This patch adds four metrics per configured process, one for each of the bullet points. Metrics are reported in percent rather than, for example, nanoseconds per second.