Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virt plugin: support virDomainBlockStatsFlags #2103

Merged
merged 5 commits into from
Feb 14, 2017

Conversation

ghost
Copy link

@ghost ghost commented Dec 13, 2016

We want to export more disk stats, like disk_time.
To do so, we must use the virDomainBlockStatsFlags API, but it
could not be available on all the platforms collectd supports.
To cope with that, we add a minimal layer around the API, and
we use virDomainBlockStatsFlags only if available.

This patch adds the code to use the new API and the fallback code
in the case the platform' libvirt is too old.

Signed-off-by: Francesco Romani fromani@redhat.com

src/virt.c Outdated
if ((binfo->bi.rd_bytes != -1) && (binfo->bi.wr_bytes != -1))
submit_derive2("disk_octets", (derive_t)binfo->bi.rd_bytes,
(derive_t)binfo->bi.wr_bytes, dom, type_instance);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but it looks like you're not sending the new statistics you fetch from libvirt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't! here I'm new adding support for the new API, and I plan to use it in another PR. I can add the patch consuming the data in this PR if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be nice.

@ghost
Copy link
Author

ghost commented Dec 13, 2016

@rubenk, @octo

Next, I'd like to add the block device "flush" stats:

  • fl_req: total flush requests

  • fl_total_times: total time (ns) spent on cache flushing

But I'm not sure how to map them. I was thinking to extend the existing types like this:

diff --git a/src/types.db b/src/types.db
index 6855187..bc2985c 100644
--- a/src/types.db
+++ b/src/types.db
@@ -52,8 +52,10 @@ disk_latency read:GAUGE:0:U, write:GAUGE:0:U
disk_merged read:DERIVE:0:U, write:DERIVE:0:U
disk_octets read:DERIVE:0:U, write:DERIVE:0:U
disk_ops read:DERIVE:0:U, write:DERIVE:0:U
+disk_ops_flush read:DERIVE:0:U, write:DERIVE:0:U, flush:DERIVE:0:U
disk_ops_complex value:DERIVE:0:U
disk_time read:DERIVE:0:U, write:DERIVE:0:U
+disk_time_flush read:DERIVE:0:U, write:DERIVE:0:U, flush:DERIVE:0:U
dns_answer value:DERIVE:0:U
dns_notify value:DERIVE:0:U
dns_octets queries:DERIVE:0:U, responses:DERIVE:0:U

Or perhaps I can add a new type for disk_flush. What do you think?

src/virt.c Outdated
@@ -380,6 +380,10 @@ static void disk_submit(struct lv_block_info *binfo, virDomainPtr dom,
if ((binfo->bi.rd_bytes != -1) && (binfo->bi.wr_bytes != -1))
submit_derive2("disk_octets", (derive_t)binfo->bi.rd_bytes,
(derive_t)binfo->bi.wr_bytes, dom, type_instance);

if ((binfo->rd_total_times != -1) && (binfo->wr_total_times))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if wr_total_times is 0? Isn't that a valid reading?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly mistake, fixing.

@ghost ghost force-pushed the pr-block-stats-flags branch from efe76d0 to a69524b Compare December 14, 2016 10:12
@rubenk
Copy link
Contributor

rubenk commented Dec 14, 2016

We tend to be somewhat hesitant to adding new types and map stats to existing types as much as possible. I'm not sure what the best type would be in this case. cc'ing @octo who might have an idea.

@ghost
Copy link
Author

ghost commented Dec 15, 2016

@rubenk @octo perhaps we can avoid to extend the default types.db. We can ship our own amended, or add a snippet if you accept #2068 .

Is it ok to submit some stats not declared in the default types.db? e.g. something like

submit_derive2("disk_flush", (derive_t)binfo->fl_req, (derive_t)binfo->fl_total_times, dom, type_instance);

@octo
Copy link
Member

octo commented Dec 22, 2016

@mojaves you get the best types when you think of them as "units". For the most part, I think you answer the question youself:

fl_req: total flush requests

→ type total_requests, type instance flush

fl_total_times: total time (ns) spent on cache flushing

Add a new entry to types.db:

total_time_in_ns        value:DERIVE:0:U

Best regards,
—octo

@ghost
Copy link
Author

ghost commented Jan 1, 2017

@octo thanks! will update my patch soon.

@ghost
Copy link
Author

ghost commented Jan 9, 2017

I see we have already total_time_in_ms in types.db. I will investigate if ms resolution is good for us, so we can avoid adding a new type.

@ghost ghost force-pushed the pr-block-stats-flags branch from 7cadb79 to a04ab87 Compare January 10, 2017 10:37
@ghost
Copy link
Author

ghost commented Jan 10, 2017

Looks like the ms resolution is sufficient for use for the time being. We'll discuss in the future a finer-grained resolution (ns) should the need arise.
I had to update my last commit because we need to report per-disk flush stats, so I added the disk name in the type instance. It's not the most elegant solution out there, but should be good enough for a start. Suggestions of course welcome =)

Other than that the patches are under testing on my side. So far, so good, no issues spotted.

@ghost
Copy link
Author

ghost commented Jan 17, 2017

Other than that the patches are under testing on my side. So far, so good, no issues spotted.
tests completed on my side, no issues.

@ghost
Copy link
Author

ghost commented Jan 25, 2017

ping?

1 similar comment
@ghost
Copy link
Author

ghost commented Feb 2, 2017

ping?

@ghost ghost closed this Feb 2, 2017
@ghost ghost reopened this Feb 2, 2017
@ghost
Copy link
Author

ghost commented Feb 2, 2017

hit wrong button, reponened

src/virt.c Outdated
@@ -640,27 +770,22 @@ static int lv_read(user_data_t *ud) {

/* Get block device stats for each domain. */
for (int i = 0; i < state->nr_block_devices; ++i) {
struct _virDomainBlockStats stats;
struct block_device *bdev = &(state->block_devices[i]);
int ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you declare this on line 778?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will move.

@rubenk
Copy link
Contributor

rubenk commented Feb 10, 2017

Hi @mojaves, I'm sorry for the late response. Life got in the way..

In general this looks good, I left one minor comment.
One thing I've been thinking about, is if it would be possible to make gathering the new statistics optional. My guess is not everyone is interested in flush requests for example, I know I have no use for it in my environment but they will still be gathered and stored for thousands of vms. It would be great if we could make it configurable which stats to gather. I'd love to hear your thoughts on this.

@ghost
Copy link
Author

ghost commented Feb 13, 2017

Hi @rubenk , no problem, I totally understand.

I'm totally fine with configurable stats, I also think it is a good idea. Let me refresh my patches.

We want to export more disk stats, like disk_time.
To do so, we must use the virDomainBlockStatsFlags API, but it
could not be available on all the platforms collectd supports.
To cope with that, we add a minimal layer around the API, and
we use virDomainBlockStatsFlags only if available.

This patch adds the code to use the new API and the fallback code
in the case the platform' libvirt is too old.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Now that we switched to virDomainBLockStatsFlags, we can
report the `flush` stats.

Signed-off-by: Francesco Romani <fromani@redhat.com>
We are adding more metrics to the virt plugin, but not everyone
may be interested in those.
This patch add one more option to the virt plugin to enable
or disable the new stats.
For backward compatibility, the default is disabled.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani
Copy link
Contributor

sorry about the mess, I'm transitioning from my personal account (@mojaves) to my work account (@fromanirh). Should be OK with my last push. If it isn't, I'm fine opening a new cleaner pull request.

@ghost
Copy link
Author

ghost commented Feb 13, 2017

This is the confirmation from the other side that I'm transitioning my collectd contributions from @mojaves (personal account) to @fromanirh (work account)

src/virt.c Outdated
char *exstats[EX_STATS_MAX_FIELDS];
int numexstats;

numexstats = strsplit(localvalue, exstats, STATIC_ARRAY_SIZE(exstats));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you combine line 610 and 612?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/virt.c Outdated
int rc = -1;
int ret;

ret = virDomainBlockStatsFlags(dom, path, NULL, &nparams, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please combine this with line 657

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/virt.c Outdated
continue;

char *type_instance = NULL;
if (blockdevice_format_basename && blockdevice_format == source)
type_instance = strdup(basename(state->block_devices[i].path));
type_instance = strdup(basename(bdev->path));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check for failure of strdup()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but rearranging the code in a slightly and hopefully better way

src/virt.c Outdated
if ((stats.rd_req != -1) && (stats.wr_req != -1))
submit_derive2("disk_ops", (derive_t)stats.rd_req, (derive_t)stats.wr_req,
state->block_devices[i].dom, type_instance);
type_instance = strdup(bdev->path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


Enable extra statistics. This allows the plugin to reported more detailed
statistics about the behaviour of Virtual Machines. The argument is a
space-separated list of selectors. Currently only B<disk> is supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally interpreted this as "this enables disk statistics in general". Might be nice to add a note that setting "disk" enables extra dist statistics like flush requests, with the emphasis on extra.

@@ -8076,6 +8076,12 @@ How many read instances you want to use for this plugin. The default is one,
and the sensible setting is a multiple of the B<ReadThreads> value.
If you are not sure, just use the default setting.

=item B<ExtraStats> B<string>

Enable extra statistics. This allows the plugin to reported more detailed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rubenk
Copy link
Contributor

rubenk commented Feb 13, 2017

Thanks @fromanirh. I left just a few minor nits and then this is good to go in.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@rubenk rubenk merged commit a23eb15 into collectd:master Feb 14, 2017
@octo octo added this to the 5.8 milestone Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants