Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collectd 6] cpu plugin: Align metrics with OpenTelemetry recommendations. #4216

Merged
merged 35 commits into from
Jan 22, 2024

Conversation

octo
Copy link
Member

@octo octo commented Dec 29, 2023

  • Add metric description and unit.
  • Update label names (e.g. "cpu" → "system.cpu.logical_number")
  • Divide rates by number of CPUs, so that the sum of all rates equals to 1. (Previously the sum of all rates was equal to the number of logical CPUs)
  • Remove the cpu=total label when aggregating CPUs.
  • Re-implement the aggregation logic because the old aggregation logic was not testable and the required changes quite complex. The new aggregation logic encapsulated all state keeping in a new type, usage_t and unit tests ensure that it functions as expected.
  • Scale the "usage" counter so that it counts microseconds spent in each state. This differs from what OpenTelemetry wants, which is a floating point counter type. collectd doesn't have that. Scaling counters requires converting the count to a rate and then back to a count, making this transformation quite hard to implement (see previous point).

ChangeLog: CPU plugin: The metric schema has been aligned with OpenTelemetry semantic conventions.

@octo octo requested a review from a team as a code owner December 29, 2023 09:06
@collectd-bot collectd-bot added this to the 6.0 milestone Dec 29, 2023
@octo octo added this to In progress in collectd 6 via automation Dec 29, 2023
@octo octo modified the milestones: 6.0, 6 MVP Jan 3, 2024
@octo octo force-pushed the 6/cpu branch 2 times, most recently from 000d934 to 0d3380e Compare January 4, 2024 05:59
@octo octo requested a review from a team as a code owner January 8, 2024 16:36
@eero-t
Copy link
Contributor

eero-t commented Jan 8, 2024

Solaris builds show quite a few warnings (not completely overlapping):

And the other of them fails to:

  CC       src/cpu_test.o
"src/cpu.c", line 110: cannot find include file: <statgrab.h>

@octo
Copy link
Member Author

octo commented Jan 10, 2024

  CC       src/cpu_test.o
"src/cpu.c", line 110: cannot find include file: <statgrab.h>

Thanks for pointing this out. Fixed.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in config_keys[] needs to be fixed.

Also, does CPU plugin need to handle run-time CPU hot-plugging, or why states is 1D dynamically resized [cpu*states] array [1], instead init-allocated 2D [cpu][state] array?

[1] I'd rather avoid things like cpu_num = u->states_num / STATE_MAX and active_index = (cpu * STATE_MAX) + STATE_ACTIVE if hot-plug support is not needed, as they complicate code significantly.

src/cpu_test.c Outdated Show resolved Hide resolved
src/collectd.conf.in Show resolved Hide resolved
src/collectd.conf.pod Outdated Show resolved Hide resolved
src/cpu.c Show resolved Hide resolved
src/cpu.c Show resolved Hide resolved
src/cpu.c Outdated Show resolved Hide resolved
src/cpu.c Outdated Show resolved Hide resolved
src/cpu.c Show resolved Hide resolved
src/cpu.c Outdated Show resolved Hide resolved
src/cpu.c Show resolved Hide resolved
@octo
Copy link
Member Author

octo commented Jan 19, 2024

Thanks for the review @eero-t. I think I have addressed all your points.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review is fairly superficial, but as there are tests, I can approve once invalid comment is fixed (not setting this as approved yet, because I do not trust automerge bot).

src/cpu.c Outdated Show resolved Hide resolved
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but apparently the updated comment needs clang-formatting.

* Add metric description and unit.
* Update label names (e.g. "cpu" → "system.cpu.logical_number")
* Divide rates by number of CPUs, so that the sum of all rates equals to 1.
  (Previously the sum of all rates was equal to the number of logical CPUs)
* Remove the "cpu=total" label when aggregating CPUs.
 *  The options `ReportUsage`, `ReportUtilization`, and `ReportNumCpu` control
    which metrics are emitted on a high level. Other options no longer influence
    *what* is being collected. This also allows to report usage and utilization
    metrics simultaneously.
 *  The documentation has been updated to reflect that the plugin no longer
    emits a percentage, but a ratio for utilization metrics.
octo and others added 25 commits January 22, 2024 16:07
This one is done in a second aggregation loop, because we require a CPU-level
aggregate rate to be available to properly scale the counter.
…ount`.

The functionality is tested in the test cases for `usage_ratio` and
`usage_count` and there is no need to test these separately. The opposite: the
rest of the CPU plugin only uses `usage_ratio` and `usage_count`, so testing
the global variants leaks abstraction.
This way the use of the field is much easier to understand when reading the
code.
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
@octo
Copy link
Member Author

octo commented Jan 22, 2024

Approved, but apparently the updated comment needs clang-formatting.

Fixed. Thanks @eero-t!

@octo octo merged commit 2f47e54 into collectd:collectd-6.0 Jan 22, 2024
27 checks passed
collectd 6 automation moved this from In progress to Done Jan 22, 2024
@octo octo deleted the 6/cpu branch January 22, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Automerge Labels PRs to be merged by a bot once approved Feature
Projects
collectd 6
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants