New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/iostat: implement 'ceph iostat' as a mgr plugin #20100
Conversation
IO statistics given by
|
src/ceph.in
Outdated
if len(childargs) > 0 and childargs[0] == 'iostat': | ||
def call_iostat(): | ||
while 1: | ||
subprocess.call([sys.argv[0], 'mgr', 'iostat']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, it doesn't seem like there's a way to keep an open connection between a mgr plugin and the client. The alternative, which is implemented here, is having the client invoke the iostat
plugin periodically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuchang0812 Thanks for the comments. The issue is that (to my knowleddge) there's no way for the plugin to continuously communicate with the client. Adding a blocking mode for the client wouldn't solve that.
src/mon/PGMap.cc
Outdated
@@ -1367,6 +1367,8 @@ void PGMap::dump(Formatter *f) const | |||
dump_pg_stats(f, false); | |||
dump_pool_stats(f); | |||
dump_osd_stats(f); | |||
pg_sum_delta.dump(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have PGMapDigest::overall_client_io_rate_summary
function, we could achieve iostats
command via adding a python_get('client_io`) interface. likes 6272c26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, pg_sum_delta
(dumped here) is what is indeed used by PGMapDigest::overall_client_io_rate_summary
. I added 'io_rate' as a way to fetch it from a mgr plugin.
I see three reasonable-ish options here:
I think my instinct is for option 3, anyone else? |
+1 |
@jcsp In option 3, there won't really be any polling though, correct? There will still be repeated calls from the client, just like in this PR. In any case, I agree with you for option 3 - I'll update this PR accordingly. |
@mogeb right, the basic behaviour (sending an MCommand repeatedly) would be the same, it would just be a more generic mechanism so that the CLI doesn't have to be hardcoded with the "magic" commands. I guess you'll probably want a convention for passing a "first=true" or similar argument, so that the server can generate the table header on the first call, then just output table lines later. Perhaps also have the CLI send the terminal width as a parameter of the command too. |
@jcsp I was hoping to keep the formatting in the client. Maybe the plugin can expose a What do you think about formatting the returned values in JSON, and having the client unpack it and print it however it wants? This adds more flexibility too, so one can easily fetch the current throughput/iops. Or is this extra conversion something we want to avoid? |
The trouble is that the client doesn't know anything about the data it's printing, so it generally can't do anything but pass through strings (and if we're just passing through strings, why not just send the whole line?). For example, some plugins might want to selectively drop columns (like we do with "daemonperf") depending on the width of the terminal, and the client doesn't know the rules for that. Colour coding is also simpler if the plugin can just output its colourized lines rather than having to invent a scheme for the plugin to hint to the client about colours. |
@jcsp Ah, I understand - you want to get completely rid of all command-specific code on the client. That part went over my head. Ok, I'll update this PR then. |
@jcsp Ok, here area a few points regarding option 3:
If that makes sense, I'll go ahead and update the output of iostat (table header, etc.) as we discussed. |
Fortunately, the commands are definitely not hardcoded in ceph.in (although a few special ones are). The command descriptions are present in the COMMANDS member of a python module, or in mon/MonCommands.h for the C++ ones. These are passed to clients (such as the CLI) by the special "get_command_descriptions" command, and interpreted by parse_json_funcsigs and similar code in src/pybind/ceph_argparse.py |
7ee547f
to
2fa6386
Compare
@jcsp I've updated (and rebased) the PR. I've added a |
src/ceph.in
Outdated
print(outs, file=sys.stderr) | ||
sleep(1) | ||
else: | ||
ret, outbuf, outs = json_command(cluster_handle, target=target, argdict=valid_dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be something like
while True:
// run the command
if 'poll' in valid_dict and valid_dict['poll']:
break
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's clearer. Added.
src/pybind/ceph_argparse.py
Outdated
@@ -24,7 +24,7 @@ | |||
|
|||
|
|||
FLAG_MGR = 8 # command is intended for mgr | |||
|
|||
FLAG_POLL = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind adding a brief comment telling us where the magic number comes from (i.e., mon/MonCommands.h's command flag)? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right :) Done.
src/ceph.in
Outdated
print(outbuf) | ||
if outs: | ||
print(outs, file=sys.stderr) | ||
sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if the polling interval was configurable, defaulting to some value (may it be 1 or something else). Maybe an option passed to the cli tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was thinking of having it in the command description itself, but I think a flag to the cli tool is just as valid and definitely easier. Done.
1110767
to
e1c2976
Compare
I like this. Since we're extending the command definition language, let's take the opportunity to make it futureproof and add a way for polling commands to supply a title and have some knowledge of the terminal size to do neat formatting (I'm thinking of commands that have a more table-like format). Here's what I think we should do for futureproofing:
We can do that in a followup PR as long as we land it before Mimic, or do it in this PR: what do you think @mogeb ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018-04-11T03:23:37.356 INFO:tasks.workunit.client.0.mira101.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1873: test_mon_pg: ceph pg 1.0 query
2018-04-11T03:23:37.665 INFO:tasks.workunit.client.0.mira101.stderr:no valid command found; 10 closest matches:
2018-04-11T03:23:37.665 INFO:tasks.workunit.client.0.mira101.stderr:pg scrub <pgid>
2018-04-11T03:23:37.665 INFO:tasks.workunit.client.0.mira101.stderr:pg debug unfound_objects_exist|degraded_pgs_exist
2018-04-11T03:23:37.665 INFO:tasks.workunit.client.0.mira101.stderr:pg ls {<int>} {<states> [<states>...]}
2018-04-11T03:23:37.665 INFO:tasks.workunit.client.0.mira101.stderr:pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>
}
2018-04-11T03:23:37.666 INFO:tasks.workunit.client.0.mira101.stderr:pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
2018-04-11T03:23:37.666 INFO:tasks.workunit.client.0.mira101.stderr:pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
2018-04-11T03:23:37.666 INFO:tasks.workunit.client.0.mira101.stderr:pg dump_pools_json
2018-04-11T03:23:37.666 INFO:tasks.workunit.client.0.mira101.stderr:pg ls-by-pool <poolstr> {<states> [<states>...]}
2018-04-11T03:23:37.666 INFO:tasks.workunit.client.0.mira101.stderr:pg dump_json {all|summary|sum|pools|osds|pgs [all|summary|sum|pools|osds|pgs...]}
2018-04-11T03:23:37.666 INFO:tasks.workunit.client.0.mira101.stderr:pg map <pgid>
2018-04-11T03:23:37.666 INFO:tasks.workunit.client.0.mira101.stderr:Error EINVAL: invalid command
this change breaks "ceph pg 1.0 query".
8d09e5e
to
d729c9d
Compare
@tchaikov fixed |
retest this please |
src/pybind/mgr/iostat/__init__.py
Outdated
@@ -0,0 +1 @@ | |||
from module import Module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mogeb should be from .module import Module
@mogeb could you highlight the changes you made in the latest version? for instance, how did you fix "this change breaks "ceph pg 1.0 query"? |
@@ -999,6 +1001,9 @@ def validate(args, signature, flags=0, partial=False): | |||
if flags & FLAG_MGR: | |||
d['target'] = ('mgr','') | |||
|
|||
if flags & FLAG_POLL: | |||
d['poll'] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov The problem was that ceph.in
looks at len(valid_dict)
(ie the number of parameters) to decide whether it's a pg
command or not. See here. Setting valid_dict['poll'] = False
(previously done here) broke that logic. The fix is to keep valid_dict['poll']
unset, unless the command server-side expects it. This is OK because the length of valid_dict
isn't used for the plugin. There's already a check that 'poll' in valid_dict
whenever we want to access it, so having it unset is not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, thanks for the explanation! actually, i was debugging this yesterday.
@tchaikov sorry, I hesitated to do that :) I added a comment that explains the problem. |
src/ceph.in
Outdated
file=sys.stderr) | ||
break | ||
if outbuf: | ||
print(outbuf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if outbuf
is human-readable content, we should print(outbuf.decode('utf-8'))
, this is important in py3.
src/ceph.in
Outdated
if valid_dict: | ||
if parsed_args.output_format: | ||
valid_dict['format'] = parsed_args.output_format | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop this empty line.
src/ceph.in
Outdated
while True: | ||
ret, outbuf, outs = json_command(cluster_handle, target=target, argdict=valid_dict, | ||
inbuf=inbuf) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop this empty line.
src/ceph.in
Outdated
if 'poll' not in valid_dict or not valid_dict['poll']: | ||
# Don't print here if it's not a polling command | ||
break | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop this empty line.
item.polling = false; | ||
PyObject *pPoll = PyDict_GetItemString(command, "poll"); | ||
if (pPoll) { | ||
std::string polling = PyString_AsString(pPoll); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just use boost::iequals(polling, "true")
src/pybind/ceph_argparse.py
Outdated
@@ -787,6 +788,7 @@ def parse_json_funcsigs(s, consumer): | |||
cmd['sig'] = parse_funcsig(cmd['sig']) | |||
# just take everything else as given | |||
sigdict[cmdtag] = cmd | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop this empty line.
Allow a mgr module to fetch 'io_rate' to access pg_sum_delta, which holds the IO activity recently completed by the OSDs. Signed-off-by: Mohamad Gebai <mgebai@suse.com>
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
Signed-off-by: Mohamad Gebai <mgebai@suse.com>
@tchaikov sorry for the added iteration. Should be all done. |
no worries. thanks for your persistence! |
Requires pg_dump to include the PGMap::Incremental object, which holds
the IO activity recently completed by the OSDs.
Signed-off-by: Mohamad Gebai mgebai@suse.com