Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collectd 6] Sysman plugin improvements #4109

Merged
merged 6 commits into from
May 5, 2023

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Apr 20, 2023

ChangeLog: gpu_sysman plugin: Misc improvements

Improvements:

  • Much improved logging of enabled/disabled metrics [1]
  • Log device (memory) ECC state at start
  • Add memory types from L0 v1.3 spec
  • Fix error value for power limit failure log message

[1] Plugin supports large number of Sysman provided metrics. It tries each metric, logs failure for each missing one, and disables further queries for that. However, it could be hard to determine which metrics are actually enabled in the end, especially for older integrated GPUs which provide data only for couple of metrics.

After this change, plugin will log a list of the enabled (and disabled) metrics at end of query round, if the enabled metric set changed (which should happen only few times at startup).

Added in L0 spec v1.3.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Added in L0 spec v1.4.

Requires loader 1.8.0 version released in May 2022.

(With minor cleanup comments from Alexey applied.)

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
- Move enabled/disabled metric reporting to a separate function
- Report metrics enabling and metric details enabling separately
- Error if all metrics are disabled, regardless of detail options
- Explicitly log what metrics are still being reported if any of
  them were disabled at run-time

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Fixes: 55a9296

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
@eero-t eero-t requested a review from a team as a code owner April 20, 2023 11:24
@collectd-bot collectd-bot added this to the 6.0 milestone Apr 20, 2023
if (!disabled) {
INFO("- none");
}
INFO("Enabled metrics:");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this out of this if block. The enabled metrics are printed unconditionally, and so should this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All metrics printing is conditional on config.gpuinfo. But I moved it out, as code is clearer that way.

src/gpu_sysman.c Outdated Show resolved Hide resolved
"none" could be logged erronously when "DisableEngine" is set.

Add test code for that (does not validate the log output, but
runs that and "no metrics + logging" cases so output can be
checked manually).

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
@mrunge mrunge merged commit c740407 into collectd:collectd-6.0 May 5, 2023
20 of 26 checks passed
@eero-t eero-t deleted the sysman-improvements branch May 26, 2023 08:32
@eero-t eero-t added the Feature label Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants