Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDT monitoring plugin #1970

Merged
merged 4 commits into from Oct 6, 2016
Merged

RDT monitoring plugin #1970

merged 4 commits into from Oct 6, 2016

Conversation

serhiyx
Copy link
Contributor

@serhiyx serhiyx commented Sep 30, 2016

The rdtmon plugin collects information provided by monitoring features of Intel Resource Director Technology (Intel(R) RDT) like Cache Monitoring Technology (CMT), Memory Bandwidth Monitoring (MBM). CMT monitors last level cache occupancy (LLC). MBM supports two types of events reporting local and remote memory bandwidth. Local memory bandwidth (MBL) reports the bandwidth of accessing memory associated with the local socket. Remote memory bandwidth (MBR) reports the bandwidth of accessing the remote socket. Also this technology allows to monitor instructions per clock (IPC).
All events are reported on a per core basis. Monitoring of the events can be configured for group of cores.
Plugin uses PQoS library for Intel(R) RDT. https://github.com/01org/intel-cmt-cat

The rdtmon plugin collects information provided by monitoring features of
Intel Resource Director Technology (Intel(R) RDT) like Cache Monitoring
Technology (CMT), Memory Bandwidth Monitoring (MBM).

Change-Id: Ie45344c1035c522fcd918a1dd2427a2da2e173bb
Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
Change-Id: I94d5e7c877bd99a5da8e725efef0bd700f339016
Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
Copy link
Member

@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @serhiyx, thank you very much for your PR! Overall the code looks good, the review comments are mostly simple coding style issues that are easily fixable.

My biggest concern is the mbm type with its six data sources. We need to break this up into multiple "value lists" with a single value each. I've left a more detailed comment inline.

Also I'm unsure about the llc type. What is it representing? Number of bytes in the cache? Could the bytes type serve this purpose?

Best regards,
—octo


struct rdtmon_core_group_s {
char *desc;
int num_cores;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: array indexes and sizes should be of type size_t.

struct rdtmon_ctx_s {
rdtmon_core_group_t cgroups[RDTMON_MAX_CORES];
struct pqos_mon_data *pgroups[RDTMON_MAX_CORES];
int num_groups;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: array indexes and sizes should be of type size_t.


if (ret != PQOS_RETVAL_OK) {
ERROR(RDTMON_PLUGIN ": Error starting monitoring (pqos status=%d)", ret);
return (-1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning here means that only groups up to a failing group are collected. I think it would be more intuitive if following groups would be collected, too, i.e. if this was a continue instead of a return.

DEBUG(RDTMON_PLUGIN ": rdtmon_shutdown.");

if (g_rdtmon == NULL) {
ERROR(RDTMON_PLUGIN ": rdtmon_shutdown: plugin not initialized.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simply return zero without an error in this case. An error should already have been printed during initialization.

plugin_dispatch_values(&vl);
}

static int rdtmon_read(user_data_t *ud) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the "unused" attribute to ud: __attribute_((unused)) user_data_t *ud

cg->desc = desc;

for (int i = 0; i < num_cores; i++)
cg->cores[i] = (unsigned)cores[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(rdtmon_core_group_t*)->cores is of type unsigned, all other places appear to use uint64_t. Please change either the callers of this function or the struct to unify this.

assert(item != NULL);

for (int j = 0; j < item->values_num; j++) {
if (item->values[j].value.string != NULL &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please invert this logic to make the loop more readable:

if ((item->values[j].value.string == NULL) || (strlen(item->values[j].value.string) == 0))
  continue;
/* rest of for-loop */


static rdtmon_ctx_t *g_rdtmon = NULL;

static int isdup(const uint64_t *nums, unsigned size, uint64_t val) {
Copy link
Member

@octo octo Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use size_t for array sizes and indexes.

*/
static unsigned strlisttonums(char *s, uint64_t *nums, unsigned max) {
int ret;
unsigned index = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use size_t for array sizes and indexes.

ret = strtouint64(p + 1, &end);
if (ret < 0)
return (0);
if (start > end) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 8-4 really a syntax you want to support? Please consider returning an error instead.

1. use size_t type for all arrays and indexes
2. change malloc()/memset() to calloc()
3. fix minor code style issues
4. add range validation of core id values
5. use 'bytes' type for LLC value
6. add 'memory_bandwidth' type for MBM values

Change-Id: I5e577dcda19bc9799e7b79f9d0334c6f21b60f0d
Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
@serhiyx
Copy link
Contributor Author

serhiyx commented Oct 5, 2016

hi @octo, thanks for your feedback. I've updated PR with fixes for all review comments.

@octo
Copy link
Member

octo commented Oct 5, 2016

Hi @serhiyx, thank you very much for your update! The code looks good to me.

One suggestion: the "mon" in "rdtmon" doesn't add much information. Is this named after something that users of this technology are already familiar with, such as a command line tool exposing the same metrics? If not, what do you think about renaming this plugin to "intel_rdt"?

Best regards,
—octo

Change-Id: Id23eb96fd37e6d4fc5fdf7e7ed58d9e74a33cca0
Signed-off-by: Serhiy Pshyk <serhiyx.pshyk@intel.com>
@serhiyx
Copy link
Contributor Author

serhiyx commented Oct 6, 2016

Hi @octo ,

"intel_rdt" is good name for this plugin. Will implement the changes and update the review.

Regards,
Serhiy

@octo octo merged commit 141fd30 into collectd:master Oct 6, 2016
@octo
Copy link
Member

octo commented Oct 6, 2016

Thanks @serhiyx!

@maryamtahhan maryamtahhan deleted the feat_rdtmon branch October 25, 2016 08:58
rdoproject pushed a commit to centos-opstools/collectd that referenced this pull request Aug 4, 2017
The rdtmon plugin collects information provided by monitoring features of
Intel Resource Director Technology (Intel(R) RDT) like Cache Monitoring
Technology (CMT), Memory Bandwidth Monitoring (MBM). CMT monitors last level
cache occupancy (LLC). MBM supports two types of events reporting local and
remote memory bandwidth. Local memory bandwidth (MBL) reports the bandwidth
of accessing memory associated with the local socket. Remote memory
bandwidth (MBR) reports the bandwidth of accessing the remote socket. Also
this technology allows to monitor instructions per clock (IPC).
All events are reported on a per core basis. Monitoring of the events can
be configured for group of cores.
Plugin uses PQoS library for Intel(R) RDT.

This is collectd/collectd#1970

Change-Id: I3d126e3325c82e7f2a9a95c9152d899f17225f87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants