[collectd 6] disk plugin: Align metrics with OpenTelemetry recommendations. #4217

octo · 2023-12-29T09:08:11Z

Rename labels to system.device and disk.io.direction.
Rename system.disk.time to system.disk.operation_time.
Add descriptions and units to all metric families.
Add the "utilization" metric to FreeBSD.

ChangeLog: Disk plugin: The metric schema has been aligned with OpenTelemetry semantic conventions.

eero-t

In general code changes look OK, but:

I think it would be better to handle disk: add /proc/diskstats fields 15 to 20 in KERNEL_LINUX #4242 in main, and merge related main changes to disk.c before this PR.
Reporting (some) time units as seconds (s) is IMHO not good while collectd counters are integers i.e. do not support subsecond timings. Meaning that one does not see from counters very small usage spikes i.e. is disk completely idle or not.
Time manipulation for values appended to fam_ops_time is confusing as they can be multiplied or divided by different values, and variable names do not indicate their time units (like they IMHO should).

src/disk.c

octo · 2024-01-19T21:38:18Z

The v6 code is so different that I don't think resolving the resulting merge conflict is any easier than reimplementing the feature. Writing the metric families is more effort than plumbing the counters through.

Ok, fair enough.

I have changed all platforms to report microseconds, and changed the metric family unit to reflect that.
...
I have added appropriate suffixes to all variable and field names.

Thanks, it's much more readable now!

(I.e. only remaining item is odd fam_disk_io_time -> fam_ops_time change.)

src/disk.c

eero-t

Value wrap-up calculations in disk.c are broken, see: grep UINT_MAX disk.c

They deduct 64-bit (signed) derive_t value from 32-bit (unsigned) UINT_MAX value.

Maybe collectd should have DERIVE_T_MAX define for this (both in v5 & v6 branches)?

eero-t

Approved. I'll file separate bug about the UINT_MAX vs derive_t issue, as it's already in the v5 version of disk plugin, and not introduced by these changes.

octo · 2024-01-22T17:27:11Z

Value wrap-up calculations in disk.c are broken, see: grep UINT_MAX disk.c

They deduct 64-bit (signed) derive_t value from 32-bit (unsigned) UINT_MAX value.

Maybe collectd should have DERIVE_T_MAX define for this (both in v5 & v6 branches)?

That's an excellent callout, thank you! We actually have a function that is doing this right, counter_diff(). I ended up refactoring a lot of the Linux code; being one of the oldest pieces of code in collectd it was quite in need of a facelift.

After a lot of fiddling I figured out that we calculated the average time per operation … just for that variable never to be read. I think the existing metrics are much better at conveying useful information about the disk, so I remove the calculcation and made the plugin a good deal simpler.

We also unnecessarily stored a bunch of stuff in global state, which I have now removed.

eero-t · 2024-01-22T17:36:16Z

Gah, read Linux diskstats documentation, and counters use native unsigned long (see the new bug).

I.e. they are indeed 64-bit, but only on 64-bit host. For 32-bit hosts, UINT_MAX was actually the correct maximum value.

eero-t · 2024-01-22T17:54:21Z

Nice, counter_diff() already handles transparently unsigned 32-bit vs 64-bit wrap up => it will fix #4245 for v6.

The very best changes are ones that remove lots of code, and still manage retain the (relevant) functionality. :-)

On (very) quick check of the changes, they seemed OK, i.e. from my side this still seems fine for merging.

eero-t · 2024-01-22T18:06:44Z

As to adding tests for disk.c (in another PR)...

I think that could be done for the Linux functionality by extracting that from disk_read() to another function, one that takes path to /proc/diskstats as an argument. Test code can then ask it to read pre-prepared regular file(s) instead, to verify that reported metric updates are as expected, and to verify that invalid file content is handled gracefully.

(CPU plugin test coverage could be similarly extended to parsing of /proc/stats file. At that point some helper functions for dealing with temporary test files could be useful.)

octo · 2024-01-22T18:28:36Z

As to adding tests for disk.c (in another PR)...

I think we need to separate I/O and parsing. Maybe a schema along the lines of:

int parse(char const *data, metric_family_t families[]);

int read_callback(void) {
  char *buffer = read_text_file_contents();
  parse(buffer);
}

This will allow us to test parse() without using I/O.

We should also split plugins into multiple files, one for each operating system we support. E.g. disk.c, disk_linux.c, disk_freebsd.c etc. The current approach is hard to read and hard to maintain.

* Rename labels to `system.device` and `disk.io.direction`. * Rename `system.disk.time` to `system.disk.operation_time`. * Add descriptions and units to all metric families. * Add the "utilization" metric to FreeBSD.

Also detone the time scale in the variable names.

The fields do not mean what we thought they meant. "wtime" means "wait time", "rtime" means "run time". Fixes: collectd#3875 (for collectd 6)

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>

…oseconds.

…`, and `system.disk.pending_operations` for Solaris.

* Use `counter_diff` to calculate counter differences. * Use `counter_t` as we actually want the counter overflow behavior for these metrics. * Remove the `has_...` fields from the global data struct. * Use `value_to_rate()` to calculate disk busyness. * Use `strtoull()` to parse counter values.

The result was unused.

octo requested a review from a team as a code owner December 29, 2023 09:08

octo mentioned this pull request Dec 29, 2023

[collectd 6] Populate the unit field in several important plugins. #4211

Closed

collectd-bot added this to the 6.0 milestone Dec 29, 2023

octo added this to In progress in collectd 6 via automation Dec 29, 2023

octo modified the milestones: 6.0, 6 MVP Jan 3, 2024

octo added Feature Automerge Labels PRs to be merged by a bot once approved labels Jan 11, 2024

octo force-pushed the 6/disk branch 2 times, most recently from f78bdb7 to 96a53a6 Compare January 15, 2024 19:25

eero-t reviewed Jan 19, 2024

View reviewed changes

src/disk.c Outdated Show resolved Hide resolved

src/disk.c Outdated Show resolved Hide resolved

src/disk.c Outdated Show resolved Hide resolved

src/disk.c Outdated Show resolved Hide resolved

eero-t reviewed Jan 22, 2024

View reviewed changes

src/disk.c Outdated Show resolved Hide resolved

eero-t requested changes Jan 22, 2024

View reviewed changes

eero-t approved these changes Jan 22, 2024

View reviewed changes

eero-t mentioned this pull request Jan 22, 2024

Incorrect diskstat value wrap up calculations for 64-bit platforms in disk plugin #4245

Open

octo and others added 9 commits January 22, 2024 19:28

disk plugin: Align metrics with OpenTelemetry recommendations.

aee7e30

* Rename labels to `system.device` and `disk.io.direction`. * Rename `system.disk.time` to `system.disk.operation_time`. * Add descriptions and units to all metric families. * Add the "utilization" metric to FreeBSD.

disk plugin: Fix incorrect type.

d143691

disk plugin: Use constants for the "direction" label values.

2a7ee39

disk plugin: Scale all time counters to microseconds.

273433d

Also detone the time scale in the variable names.

disk plugin: Remove ops_time metric for Solaris.

94fe89a

The fields do not mean what we thought they meant. "wtime" means "wait time", "rtime" means "run time". Fixes: collectd#3875 (for collectd 6)

disk plugin: Improve metric descriptions.

46ebb7f

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>

disk plugin: Consistently scale the fam_disk_io_time metric to micr…

ee98604

…oseconds.

disk plugin: Add system.disk.io_time, `system.disk.weighted_io_time…

df5df58

…`, and `system.disk.pending_operations` for Solaris.

octo added 2 commits January 22, 2024 19:28

disk plugin: Remove the "avg time per op" calculation.

911f772

The result was unused.

disk plugin: Do submit metrics in the first iteration.

a02cb8f

octo force-pushed the 6/disk branch from de64d6d to a02cb8f Compare January 22, 2024 18:28

octo merged commit 008ef61 into collectd:collectd-6.0 Jan 22, 2024
27 checks passed

collectd 6 automation moved this from In progress to Done Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[collectd 6] disk plugin: Align metrics with OpenTelemetry recommendations. #4217

[collectd 6] disk plugin: Align metrics with OpenTelemetry recommendations. #4217

octo commented Dec 29, 2023

eero-t left a comment •

edited

octo commented Jan 19, 2024 •

edited by eero-t

eero-t left a comment

eero-t left a comment

octo commented Jan 22, 2024

eero-t commented Jan 22, 2024

eero-t commented Jan 22, 2024

eero-t commented Jan 22, 2024

octo commented Jan 22, 2024

[collectd 6] disk plugin: Align metrics with OpenTelemetry recommendations. #4217

[collectd 6] disk plugin: Align metrics with OpenTelemetry recommendations. #4217

Conversation

octo commented Dec 29, 2023

eero-t left a comment • edited

Choose a reason for hiding this comment

octo commented Jan 19, 2024 • edited by eero-t

eero-t left a comment

Choose a reason for hiding this comment

eero-t left a comment

Choose a reason for hiding this comment

octo commented Jan 22, 2024

eero-t commented Jan 22, 2024

eero-t commented Jan 22, 2024

eero-t commented Jan 22, 2024

octo commented Jan 22, 2024

eero-t left a comment •

edited

octo commented Jan 19, 2024 •

edited by eero-t