Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/dashboard: added iSCSI IOPS/throughput metrics #18653

Merged
merged 6 commits into from Nov 3, 2017

Conversation

dillaman
Copy link

No description provided.

Jason Dillaman added 4 commits October 29, 2017 08:13
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@dillaman
Copy link
Author

screenshot from 2017-10-31 11-05-17

Jason Dillaman added 2 commits October 31, 2017 11:11
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Fixes: http://tracker.ceph.com/issues/21391
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
image['stats'][s] = self._module.get_rate(
'tcmu-runner', service_id, perf_key)
image['stats_history'][s] = self._module.get_counter(
'tcmu-runner', service_id, perf_key)[perf_key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Just a question to understand how ceph-mgr works with perf counters. Does _module.get_counter query ceph-mgr's collected in memory data (then how it is collected?) or does it query the service (tcmu-runner)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trociny tcmu-runner will now periodically send perf counters to ceph-mgr if the priority of the counter is high enough. Therefore, this get_counter is reading the latest perf stats that are in-memory within the ceph-mgr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Could you please point me to the tcmu-runner code that does this? I'd like to look at it as an example.

And thinking about this. Don't we want to make librbd send this instead (probably if some configuration option is enabled)? I had an impression we were going to do this anyway to have reports like top rbd client?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trociny It actually is librbd that is sending the stats. See commit 7a9d10a in this PR. The top-most image in a clone hierarchy (i.e. the one you open) will send read/write throughput and op stats at priority level PRIO_USEFUL.

The MgrClient will by default send any perf counters at level PRIO_USEFUL or above to ceph-mgr (

.set_default((int64_t)PerfCountersBuilder::PRIO_USEFUL)
) for any service daemons.

The tcmu-runner is registering itself as a service daemon so those stats will automatically be exported (https://github.com/open-iscsi/tcmu-runner/blob/8777084029c7708d8fcdbb79e23f086e730dd2e5/rbd.c#L156)

Copy link
Contributor

@trociny trociny Nov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Thank you for the explanation! I saw the commit that set PRIO_USEFUL but this did not help me to understand how the magic actually happened.

So after 7a9d10a is merged we could tweak "rbd_ls" dashboard to add IO metrics similarly to rbd_iscsi dashboard?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trociny The ceph-mgr perf gathering design won't really scale well to potentially tens of thousands of images sending it perf stats. I think in the future there will be two approaches for generic RBD image metric gathering: (1) an OSD-based statistical approach to provide an "rbd top"-like tool for the whole cluster and (2) a per-image opt-in system where an admin enables perf gathering for one (or a few) select RBD images which can be sent to the ceph-mgr / dashboard.

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@trociny
Copy link
Contributor

trociny commented Nov 2, 2017

@ceph-jenkins retest this please

@trociny trociny merged commit 15a9796 into ceph:master Nov 3, 2017
@dillaman dillaman deleted the wip-21391 branch November 3, 2017 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants