mgr, rbd: report rbd images perf stats to mgr #16071

Yan-waller · 2017-07-03T08:35:50Z

As we know, one perfcounter metric was created in ImageCtx when we opened a rbd image , but these metrics data is scattered and reside in kinds of clients, furthermore, a rbd image could be opened simultaneously by more than one client. report these information (especially ops, bytes, latency ) to MGR may be useful.

we can get perf information about each rbd image:

[root@ceph192-10-10-86 ~]# ceph mgr dump imgs_perf
dumping: all
IMAGE_ID          IOPS IOPS_RD IOPS_WR | THROUGHPUT THRU_RD  THRU_WR | LATENCY LAT_RD LAT_WR | POOL.IMAGE         
1 0.457923d1b58ba    0       0       0 |          0        0       0 |       0      0      0 | pool1.snap1_clone  
2 0.5a2fc2b9d8e1     0       0       0 |          0        0       0 |       0      0      0 | pool1.lun0_migrate 
3 0.5cde7238e1f29    0       0       0 |          0        0       0 |       0      0      0 | pool1.lun3         
4 0.ccae238e1f29  3296    3296       0 |   27000832 27000832       0 |      11     11      0 | pool1.lun1

Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>

Yan-waller · 2017-07-03T10:57:55Z

@dillaman, hello Jason, could you help to have a look ?

jcsp · 2017-07-03T16:18:57Z

It is problematic to be sending performance reports to ceph-mgr from every image, from every RBD client. It's potentially a very large number.

My preferred solution is to do all this server-side with sampling, this has been discussed in CDM calls a couple of times:
http://pad.ceph.com/p/ceph-top (https://youtu.be/IgpVOOVNJc0)

I would prefer to avoid hard-coding RBD specific stuff into the C++ bits of ceph-mgr. Also, taking the performance report messages as commands is not ideal -- we may want to throttle report messages and prioritise commands (commands should not be for "bulk" things).

dillaman · 2017-07-03T18:15:09Z

The existing librbd perf counters should be enough to provide stats without the need to build a highly specific messag structure.

I agree with John that is would not be a good idea to firehose this data to the mgr -- but I also don't believe that the OSD-based statistical stats are the end solution since that can never provide the latencies that the client is actually experiencing. Perhaps something where librbd would only send the perf counters to mgr when some SLA was violated?

Yan-waller · 2017-07-04T01:34:58Z

@jcsp Hmm, I also worried about the large number RBD clients, thanks for your reply. I tried to access http://pad.ceph.com/p/ceph-top for more information, but got the following error:

An error occured
The error was reported with the following id: 'PLJDqm0DJRP92o0sl7yb'

Please send this error message to us: 
'ErrorId: PLJDqm0DJRP92o0sl7yb
URL: http://pad.ceph.com/p/ceph-top
UserAgent: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; HRTS; InfoPath.2; rv:11.0) like Gecko

Yan-waller · 2017-07-04T01:46:17Z

@dillaman thanks for your reply. Customer require things like these, it seems hard to find an ideal solution.

jcsp · 2017-07-04T23:34:25Z

@Yan-waller weird that pad.ceph.com isn't working for you. The pad is not that detailed anyway, this link is more verbose (http://tracker.ceph.com/projects/ceph/wiki/Live_Performance_Probes), although that's just something I wrote myself rather than being notes on the discussion.

@dillaman you certainly have a point about the server-side statistics not telling the full story. I wonder if there's a risk of overestimating the scope of what Ceph can reasonably be expected to self-monitor though -- if we were building an NFS filer, nobody would expect our monitoring to include stats from the clients that NFS mounted the NFS shares. Same for the HTTP clients sending requests to RGW instances, or iSCSI initiators. The norm is usually that monitoring the client side is more of the user/application's responsibility, than the storage system's.

I am probably a bit biased and mostly just trying to rationalize my desire to avoid clients talking to the mgr. We do also have precedent for RADOS clients being considered "server-side" when they are RGW gateways or NFS gateways.

The SLA violation point is interesting. This has probably already been discussed, but I wonder if that should be a KVM/qemu thing rather than a librdb thing? It seems like if the goal is to monitor what the VM guest really sees performance-wise, then the instrumentation should be as close to the top as possible. I would tend to assert that our clusters don't exist in isolation, and if someone has storage SLAs then they probably have network SLAs too, and what monitors those, and could the same thing monitor the storage piece? Ultimately, I still expect that operational indicators that we send up to mgr will mostly get reported onwards to something else (nagios, zabbix, snmp, etc), so for O(clients) monitoring jobs we should consider skipping the middleman.

dillaman · 2017-07-05T18:14:33Z

@jcsp

if we were building an NFS filer, nobody would expect our monitoring to include stats from the clients that NFS mounted the NFS shares. Same for the HTTP clients sending requests to RGW instances, or iSCSI initiators. The norm is usually that monitoring the client side is more of the user/application's responsibility, than the storage system's.

... and yet that is the constant ask from users.

jcsp · 2017-07-05T20:57:37Z

I think that from where we are today, if you ask people what they want they'll say they want it all, including the clients-eye view in addition to per-client server-side stats.

However, if we look at it from a future position where we already have excellent (server-side) monitoring that gives them per-client throughput, latency and op type breakdown, I'm not sure how strong the push would be to implement the separate monitoring path to go get the client's view of the same set of stats.

None of which is to say they can't monitor it, but I'm not sure how strong the motivation would be to take care of that monitoring inside Ceph. I think it's completely credible to tell a story where the Ceph end-game is that we provide per-client/per-image monitoring of what's hitting the Ceph cluster, and if they want to know what that looks like from the client side then they should go monitor their client or their application.

Yan-waller · 2017-07-11T08:06:09Z

have no good idea about it now, close this pr and thank you all.

Yan-waller added 3 commits July 3, 2017 17:00

osd: add op_stat_t for perf stats

8ac0ed3

Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>

librbd: gather image perf stats

bcdebea

Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>

mgr: add rbd image perf stats

65d6643

Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>

Yan-waller force-pushed the wip-waller-0703imagesperf branch from 21dd0ad to 65d6643 Compare July 3, 2017 09:04

xiexingguo added feature mgr rbd labels Jul 3, 2017

Yan-waller closed this Jul 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr, rbd: report rbd images perf stats to mgr #16071

mgr, rbd: report rbd images perf stats to mgr #16071

Yan-waller commented Jul 3, 2017

Yan-waller commented Jul 3, 2017

jcsp commented Jul 3, 2017

dillaman commented Jul 3, 2017

Yan-waller commented Jul 4, 2017 •

edited

Yan-waller commented Jul 4, 2017 •

edited

jcsp commented Jul 4, 2017

dillaman commented Jul 5, 2017

jcsp commented Jul 5, 2017

Yan-waller commented Jul 11, 2017

mgr, rbd: report rbd images perf stats to mgr #16071

mgr, rbd: report rbd images perf stats to mgr #16071

Conversation

Yan-waller commented Jul 3, 2017

Yan-waller commented Jul 3, 2017

jcsp commented Jul 3, 2017

dillaman commented Jul 3, 2017

Yan-waller commented Jul 4, 2017 • edited

Yan-waller commented Jul 4, 2017 • edited

jcsp commented Jul 4, 2017

dillaman commented Jul 5, 2017

jcsp commented Jul 5, 2017

Yan-waller commented Jul 11, 2017

Yan-waller commented Jul 4, 2017 •

edited

Yan-waller commented Jul 4, 2017 •

edited