New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/dashboard: added iSCSI IOPS/throughput metrics #18653
Conversation
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Fixes: http://tracker.ceph.com/issues/21391 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
image['stats'][s] = self._module.get_rate( | ||
'tcmu-runner', service_id, perf_key) | ||
image['stats_history'][s] = self._module.get_counter( | ||
'tcmu-runner', service_id, perf_key)[perf_key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman Just a question to understand how ceph-mgr works with perf counters. Does _module.get_counter
query ceph-mgr's collected in memory data (then how it is collected?) or does it query the service (tcmu-runner)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trociny tcmu-runner will now periodically send perf counters to ceph-mgr if the priority of the counter is high enough. Therefore, this get_counter
is reading the latest perf stats that are in-memory within the ceph-mgr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman Could you please point me to the tcmu-runner code that does this? I'd like to look at it as an example.
And thinking about this. Don't we want to make librbd send this instead (probably if some configuration option is enabled)? I had an impression we were going to do this anyway to have reports like top rbd client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trociny It actually is librbd that is sending the stats. See commit 7a9d10a in this PR. The top-most image in a clone hierarchy (i.e. the one you open) will send read/write throughput and op stats at priority level PRIO_USEFUL
.
The MgrClient will by default send any perf counters at level PRIO_USEFUL
or above to ceph-mgr
(
Line 4135 in 7a23097
.set_default((int64_t)PerfCountersBuilder::PRIO_USEFUL) |
The tcmu-runner is registering itself as a service daemon so those stats will automatically be exported (https://github.com/open-iscsi/tcmu-runner/blob/8777084029c7708d8fcdbb79e23f086e730dd2e5/rbd.c#L156)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman Thank you for the explanation! I saw the commit that set PRIO_USEFUL
but this did not help me to understand how the magic actually happened.
So after 7a9d10a is merged we could tweak "rbd_ls" dashboard to add IO metrics similarly to rbd_iscsi dashboard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trociny The ceph-mgr perf gathering design won't really scale well to potentially tens of thousands of images sending it perf stats. I think in the future there will be two approaches for generic RBD image metric gathering: (1) an OSD-based statistical approach to provide an "rbd top"-like tool for the whole cluster and (2) a per-image opt-in system where an admin enables perf gathering for one (or a few) select RBD images which can be sent to the ceph-mgr / dashboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@ceph-jenkins retest this please |
No description provided.