New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDT monitoring plugin #1970
RDT monitoring plugin #1970
Conversation
The rdtmon plugin collects information provided by monitoring features of Intel Resource Director Technology (Intel(R) RDT) like Cache Monitoring Technology (CMT), Memory Bandwidth Monitoring (MBM). Change-Id: Ie45344c1035c522fcd918a1dd2427a2da2e173bb Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
Change-Id: I94d5e7c877bd99a5da8e725efef0bd700f339016 Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @serhiyx, thank you very much for your PR! Overall the code looks good, the review comments are mostly simple coding style issues that are easily fixable.
My biggest concern is the mbm type with its six data sources. We need to break this up into multiple "value lists" with a single value each. I've left a more detailed comment inline.
Also I'm unsure about the llc type. What is it representing? Number of bytes in the cache? Could the bytes type serve this purpose?
Best regards,
—octo
|
|
||
| struct rdtmon_core_group_s { | ||
| char *desc; | ||
| int num_cores; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: array indexes and sizes should be of type size_t.
| struct rdtmon_ctx_s { | ||
| rdtmon_core_group_t cgroups[RDTMON_MAX_CORES]; | ||
| struct pqos_mon_data *pgroups[RDTMON_MAX_CORES]; | ||
| int num_groups; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: array indexes and sizes should be of type size_t.
|
|
||
| if (ret != PQOS_RETVAL_OK) { | ||
| ERROR(RDTMON_PLUGIN ": Error starting monitoring (pqos status=%d)", ret); | ||
| return (-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning here means that only groups up to a failing group are collected. I think it would be more intuitive if following groups would be collected, too, i.e. if this was a continue instead of a return.
| DEBUG(RDTMON_PLUGIN ": rdtmon_shutdown."); | ||
|
|
||
| if (g_rdtmon == NULL) { | ||
| ERROR(RDTMON_PLUGIN ": rdtmon_shutdown: plugin not initialized."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please simply return zero without an error in this case. An error should already have been printed during initialization.
| plugin_dispatch_values(&vl); | ||
| } | ||
|
|
||
| static int rdtmon_read(user_data_t *ud) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the "unused" attribute to ud: __attribute_((unused)) user_data_t *ud
| cg->desc = desc; | ||
|
|
||
| for (int i = 0; i < num_cores; i++) | ||
| cg->cores[i] = (unsigned)cores[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(rdtmon_core_group_t*)->cores is of type unsigned, all other places appear to use uint64_t. Please change either the callers of this function or the struct to unify this.
| assert(item != NULL); | ||
|
|
||
| for (int j = 0; j < item->values_num; j++) { | ||
| if (item->values[j].value.string != NULL && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please invert this logic to make the loop more readable:
if ((item->values[j].value.string == NULL) || (strlen(item->values[j].value.string) == 0))
continue;
/* rest of for-loop */|
|
||
| static rdtmon_ctx_t *g_rdtmon = NULL; | ||
|
|
||
| static int isdup(const uint64_t *nums, unsigned size, uint64_t val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use size_t for array sizes and indexes.
| */ | ||
| static unsigned strlisttonums(char *s, uint64_t *nums, unsigned max) { | ||
| int ret; | ||
| unsigned index = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use size_t for array sizes and indexes.
| ret = strtouint64(p + 1, &end); | ||
| if (ret < 0) | ||
| return (0); | ||
| if (start > end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 8-4 really a syntax you want to support? Please consider returning an error instead.
1. use size_t type for all arrays and indexes 2. change malloc()/memset() to calloc() 3. fix minor code style issues 4. add range validation of core id values 5. use 'bytes' type for LLC value 6. add 'memory_bandwidth' type for MBM values Change-Id: I5e577dcda19bc9799e7b79f9d0334c6f21b60f0d Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
|
hi @octo, thanks for your feedback. I've updated PR with fixes for all review comments. |
|
Hi @serhiyx, thank you very much for your update! The code looks good to me. One suggestion: the "mon" in "rdtmon" doesn't add much information. Is this named after something that users of this technology are already familiar with, such as a command line tool exposing the same metrics? If not, what do you think about renaming this plugin to "intel_rdt"? Best regards, |
Change-Id: Id23eb96fd37e6d4fc5fdf7e7ed58d9e74a33cca0 Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
|
Hi @octo , "intel_rdt" is good name for this plugin. Will implement the changes and update the review. Regards, |
|
Thanks @serhiyx! |
The rdtmon plugin collects information provided by monitoring features of Intel Resource Director Technology (Intel(R) RDT) like Cache Monitoring Technology (CMT), Memory Bandwidth Monitoring (MBM). CMT monitors last level cache occupancy (LLC). MBM supports two types of events reporting local and remote memory bandwidth. Local memory bandwidth (MBL) reports the bandwidth of accessing memory associated with the local socket. Remote memory bandwidth (MBR) reports the bandwidth of accessing the remote socket. Also this technology allows to monitor instructions per clock (IPC). All events are reported on a per core basis. Monitoring of the events can be configured for group of cores. Plugin uses PQoS library for Intel(R) RDT. This is collectd/collectd#1970 Change-Id: I3d126e3325c82e7f2a9a95c9152d899f17225f87
The rdtmon plugin collects information provided by monitoring features of Intel Resource Director Technology (Intel(R) RDT) like Cache Monitoring Technology (CMT), Memory Bandwidth Monitoring (MBM). CMT monitors last level cache occupancy (LLC). MBM supports two types of events reporting local and remote memory bandwidth. Local memory bandwidth (MBL) reports the bandwidth of accessing memory associated with the local socket. Remote memory bandwidth (MBR) reports the bandwidth of accessing the remote socket. Also this technology allows to monitor instructions per clock (IPC).
All events are reported on a per core basis. Monitoring of the events can be configured for group of cores.
Plugin uses PQoS library for Intel(R) RDT. https://github.com/01org/intel-cmt-cat