[collectd 6] cpu plugin: Align metrics with OpenTelemetry recommendations. #4216

octo · 2023-12-29T09:06:22Z

Add metric description and unit.
Update label names (e.g. "cpu" → "system.cpu.logical_number")
Divide rates by number of CPUs, so that the sum of all rates equals to 1. (Previously the sum of all rates was equal to the number of logical CPUs)
Remove the cpu=total label when aggregating CPUs.
Re-implement the aggregation logic because the old aggregation logic was not testable and the required changes quite complex. The new aggregation logic encapsulated all state keeping in a new type, usage_t and unit tests ensure that it functions as expected.
Scale the "usage" counter so that it counts microseconds spent in each state. This differs from what OpenTelemetry wants, which is a floating point counter type. collectd doesn't have that. Scaling counters requires converting the count to a rate and then back to a count, making this transformation quite hard to implement (see previous point).

ChangeLog: CPU plugin: The metric schema has been aligned with OpenTelemetry semantic conventions.

eero-t · 2024-01-08T18:16:44Z

Solaris builds show quite a few warnings (not completely overlapping):

And the other of them fails to:

  CC       src/cpu_test.o
"src/cpu.c", line 110: cannot find include file: <statgrab.h>

octo · 2024-01-10T19:06:19Z

  CC       src/cpu_test.o
"src/cpu.c", line 110: cannot find include file: <statgrab.h>

Thanks for pointing this out. Fixed.

eero-t

Bug in config_keys[] needs to be fixed.

Also, does CPU plugin need to handle run-time CPU hot-plugging, or why states is 1D dynamically resized [cpu*states] array [1], instead init-allocated 2D [cpu][state] array?

[1] I'd rather avoid things like cpu_num = u->states_num / STATE_MAX and active_index = (cpu * STATE_MAX) + STATE_ACTIVE if hot-plug support is not needed, as they complicate code significantly.

src/cpu_test.c

src/collectd.conf.in

src/collectd.conf.pod

src/cpu.c

octo · 2024-01-19T20:24:12Z

Thanks for the review @eero-t. I think I have addressed all your points.

eero-t

Review is fairly superficial, but as there are tests, I can approve once invalid comment is fixed (not setting this as approved yet, because I do not trust automerge bot).

src/cpu.c

eero-t

Approved, but apparently the updated comment needs clang-formatting.

* Add metric description and unit. * Update label names (e.g. "cpu" → "system.cpu.logical_number") * Divide rates by number of CPUs, so that the sum of all rates equals to 1. (Previously the sum of all rates was equal to the number of logical CPUs) * Remove the "cpu=total" label when aggregating CPUs.

* The options `ReportUsage`, `ReportUtilization`, and `ReportNumCpu` control which metrics are emitted on a high level. Other options no longer influence *what* is being collected. This also allows to report usage and utilization metrics simultaneously. * The documentation has been updated to reflect that the plugin no longer emits a percentage, but a ratio for utilization metrics.

This one is done in a second aggregation loop, because we require a CPU-level aggregate rate to be available to properly scale the counter.

…o` and `usage_count`.

…ount`. The functionality is tested in the test cases for `usage_ratio` and `usage_count` and there is no need to test these separately. The opposite: the rest of the CPU plugin only uses `usage_ratio` and `usage_count`, so testing the global variants leaks abstraction.

This way the use of the field is much easier to understand when reading the code.

This will just work transparently.

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>

Also sort `config_keys`.

…unction.

… metric.

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>

octo · 2024-01-22T15:08:22Z

Approved, but apparently the updated comment needs clang-formatting.

Fixed. Thanks @eero-t!

octo requested a review from a team as a code owner December 29, 2023 09:06

octo mentioned this pull request Dec 29, 2023

[collectd 6] Populate the unit field in several important plugins. #4211

Closed

collectd-bot added this to the 6.0 milestone Dec 29, 2023

octo added this to In progress in collectd 6 via automation Dec 29, 2023

octo modified the milestones: 6.0, 6 MVP Jan 3, 2024

octo force-pushed the 6/cpu branch 2 times, most recently from 000d934 to 0d3380e Compare January 4, 2024 05:59

octo mentioned this pull request Jan 4, 2024

[collectd 6] RFC: Automatic release process #4222

Open

octo requested a review from a team as a code owner January 8, 2024 16:36

octo force-pushed the 6/cpu branch from 80c0f35 to 5cd1bc2 Compare January 10, 2024 19:06

octo added Feature Automerge Labels PRs to be merged by a bot once approved labels Jan 11, 2024

octo force-pushed the 6/cpu branch from 5cd1bc2 to f5e4147 Compare January 15, 2024 19:16

eero-t requested changes Jan 19, 2024

View reviewed changes

eero-t reviewed Jan 22, 2024

View reviewed changes

src/cpu.c Outdated Show resolved Hide resolved

eero-t approved these changes Jan 22, 2024

View reviewed changes

octo force-pushed the 6/cpu branch from 92a51c2 to d2d05f0 Compare January 22, 2024 15:07

octo added 9 commits January 22, 2024 16:07

cpu plugin: Use constants for label names.

8f12d2c

cpu plugin: Remove overly verbose prefix from "state" constants.

875ee28

cpu plugin: Add a very simple usage_t type for aggregation.

f91837f

cpu plugin: Ensure unpopulated states return NAN.

6597f39

cpu plugin: Aggregate all non-idle states into the "active" state.

04cb862

cpu plugin: Add usage_global_rate.

e92b2af

cpu plugin: Implement usage_ratio.

23a1dd5

octo and others added 25 commits January 22, 2024 16:07

cpu plugin: Implement usage_global_ratio().

ed7b830

cpu plugin: Move aggregation into a central finalize() function.

65decca

cpu plugin: Implement usage_count().

d9643d9

This one is done in a second aggregation loop, because we require a CPU-level aggregate rate to be available to properly scale the counter.

cpu plugin: Implement usage_global_count().

97c4b87

cpu plugin: Move type definitions close together.

fa64140

cpu plugin: Update code to use the usage_t type.

d45c99f

cpu plugin: Add a cpu_num field to usage_t.

92ccbac

cpu plugin: Use the cpu_num field in usage_t.

080d1f9

cpu plugin: Skip states that don't have any data.

0d1c86b

cpu plugin: Link the unit test with libstatgrab, if so configured.

5d377ab

cpu plugin: Remove unused variable.

a7a03ca

cpu plugin: Fold testing for CPU_ALL into the tests for `usage_rati…

efc37d4

…o` and `usage_count`.

cpu plugin: Rename has_value to has_rate.

51253c8

This way the use of the field is much easier to understand when reading the code.

cpu plugin: Fold the usage_global_count function into usage_count.

e4efd4c

cpu plugin: Remove the special "global" case from usage_ratio.

83a7070

This will just work transparently.

cpu plugin: Fold usage_global_rate into usage_rate.

b04f6af

cpu plugin: Use a const instead of repeating a literal multiple times.

2913513

cpu plugin: Apply code review suggestions.

bb04104

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>

cpu plugin: Add "ReportUsage" to config_keys.

7ca9e89

Also sort `config_keys`.

cpu plugin: Sort CPU states alphabetically.

41081be

cpu plugi: Consistently set the finalized field at the end of the f…

98981a5

…unction.

cpu plugin: Improve the description of the system.cpu.logical.count…

f3fe572

… metric.

cpu plugin: Improve error checking; define and use USAGE_UNAVAILABLE.

b30deea

cpu plugin: Fix function comment.

d591c43

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>

octo force-pushed the 6/cpu branch from d2d05f0 to d591c43 Compare January 22, 2024 15:08

octo merged commit 2f47e54 into collectd:collectd-6.0 Jan 22, 2024
27 checks passed

collectd 6 automation moved this from In progress to Done Jan 22, 2024

octo deleted the 6/cpu branch January 22, 2024 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[collectd 6] cpu plugin: Align metrics with OpenTelemetry recommendations. #4216

[collectd 6] cpu plugin: Align metrics with OpenTelemetry recommendations. #4216

octo commented Dec 29, 2023 •

edited

eero-t commented Jan 8, 2024

octo commented Jan 10, 2024

eero-t left a comment

octo commented Jan 19, 2024

eero-t left a comment

eero-t left a comment

octo commented Jan 22, 2024

[collectd 6] cpu plugin: Align metrics with OpenTelemetry recommendations. #4216

[collectd 6] cpu plugin: Align metrics with OpenTelemetry recommendations. #4216

Conversation

octo commented Dec 29, 2023 • edited

eero-t commented Jan 8, 2024

octo commented Jan 10, 2024

eero-t left a comment

Choose a reason for hiding this comment

octo commented Jan 19, 2024

eero-t left a comment

Choose a reason for hiding this comment

eero-t left a comment

Choose a reason for hiding this comment

octo commented Jan 22, 2024

octo commented Dec 29, 2023 •

edited