New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: perf schema fns/change notification and Prometheus plugin #16406

Merged
merged 7 commits into from Jul 21, 2017

Conversation

Projects
None yet
4 participants
@dmick
Member

dmick commented Jul 19, 2017

No description provided.

dmick added some commits Jul 12, 2017

mgr/, pybind/mgr: add get_perf_schema() for mgr Python modules
Signed-off-by: Dan Mick <dan.mick@redhat.com>
pybind/mgr/dashboard: expose get_counter and get_perf_schema
Useful for experimentation/development; not required

Signed-off-by: Dan Mick <dan.mick@redhat.com>
mgr/DaemonServer: add perf_schema_update notification for Py modules
MMgrReport contains info about changed schema, so create a notify
type for it so that Python modules interested in perf counters can
update schema only when it changes

Signed-off-by: Dan Mick <dan.mick@redhat.com>

@dmick dmick requested review from jcsp and liewegas Jul 19, 2017

@dmick

This comment has been minimized.

Member

dmick commented Jul 19, 2017

Possible discussion points:

  1. disabled by default, must set port
  2. does not currently set timestamps; easy to do but perhaps inadvisable
  3. does minimal translation on stat names; there may be better mappings
  4. does not attempt to label stats with any dimension other than 'daemon.id', but could easily do so
@dmick

This comment has been minimized.

Member

dmick commented Jul 19, 2017

retest this please

f.open_object_section("perf_schema");
// FIXME: this is unsafe, I need to either be inside DaemonStateIndex's
// lock or put a lock on individual DaemonStates

This comment has been minimized.

@liewegas

liewegas Jul 19, 2017

Member

Hmm, should fix this before merge. John, any preference how to approach this? Maybe a DaemonState dump method that takes the DaemonStateCollection and Formatter as an arg and locks appropriately?

This comment has been minimized.

@dmick

dmick Jul 19, 2017

Member

Yeah, I confess I just boilerplated and did not understand the issue

This comment has been minimized.

@liewegas

liewegas Jul 19, 2017

Member

I have a separate PR that adds a per-DaemonState lock, so we can ignore that issue here.

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 19, 2017

  1. I like the pattern of having a default port and binding to :: by default so that a simple 'ceph mgr module enable prometheus' is sufficient to turn it on.
@dmick

This comment has been minimized.

Member

dmick commented Jul 19, 2017

...and now that I'm using cherrypy I could do that, I suppose. The Prometheus client library didn't accept that (needed 0.0.0.0).

@l-mb

This comment has been minimized.

l-mb commented Jul 19, 2017

Very interesting! Is the goal to make this feature equivalent (or better) to ceph_exporter, or is it meant to be used alongside it?

Will it perhaps even expose the same metrics as a drop-in replacement? (Or should we take this discussion to ceph-devel?)

@dmick

This comment has been minimized.

Member

dmick commented Jul 19, 2017

@l-mb: the goal is to get something in for enhancement :D I do know about ceph-exporter, and realized I'd need to read the code, which means reading Go, to really understand what it advertises and how. It would probably make more sense to try to be as compatible as possible in terms of metric names, types, and labels, I agree. I think the intent is to make a "low-barrier" collection point for small-to-medium clusters. But yes, happy to discuss on ceph-devel too

@liewegas liewegas added this to the luminous milestone Jul 19, 2017

@dmick

This comment has been minimized.

Member

dmick commented Jul 19, 2017

repushed with default listen addr set to '::'

@liewegas liewegas added the needs-qa label Jul 19, 2017

@dmick

This comment has been minimized.

Member

dmick commented Jul 19, 2017

Hmm, maybe should pick a port like these: https://github.com/prometheus/prometheus/wiki/Default-port-allocations I suspect we don't want to share 9128 in case in future we have both mon/osd and mgr on the same host and want to use per-daemon instrumentation as in ceph-exporter

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 19, 2017

pybind/mgr/prometheus: add Prometheus metric collector
This implements a Prometheus 'text' endpoint with cherrypy.
All paths return the metrics (as the Prometheus project's
Python client server does).

1) the newest stat is used for all counters/gauges
2) histograms are not handled
3) the timestamp assigned by Prometheus is the time of query, not
   the time collected by Ceph (long story), but this is now
   easily changed if necessary (by adding the timestamp in ms to
   the end of each reporting line, in Go float format).

Signed-off-by: Dan Mick <dan.mick@redhat.com>
@dmick

This comment has been minimized.

Member

dmick commented Jul 19, 2017

Added 9283 there, updated default port here

@jan--f

This comment has been minimized.

Member

jan--f commented Jul 20, 2017

@dmick I was going to suggest to use port 9128 actually. If this code exports what the ceph_exporter produces (and more) then the two are compatible from the scrapers perspective, i.e. the mgr prometheus plugin could be a drop-in replacement for the ceph_exporter.

@jan--f

This comment has been minimized.

Member

jan--f commented Jul 20, 2017

See here for an example ceph_exporter output.

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 20, 2017

@dmick

This comment has been minimized.

Member

dmick commented Jul 20, 2017

@jan--f: but if you want both on one host, that's a problem. (i.e. if the ceph-mgr version does only cluster-wide stats, and the ceph_exporter version does the daemons.)

@dmick

This comment has been minimized.

Member

dmick commented Jul 20, 2017

@liewegas Docs?!? what are those???

um....er, yeah. @dmick digs to see where and what

dmick added some commits Jul 21, 2017

doc/mgr/zabbix.rst: fix subheadings, add subsubheadings
(otherwise subheadings appear as top-level items in the TOC, because
they're marked just like the title)

Signed-off-by: Dan Mick <dan.mick@redhat.com>
mgr/index.rst: clarify which toplevel items are plugins
Signed-off-by: Dan Mick <dan.mick@redhat.com>
doc/mgr: add Prometheus plugin docs
Signed-off-by: Dan Mick <dan.mick@redhat.com>
@dmick

This comment has been minimized.

Member

dmick commented Jul 21, 2017

Added docs. There are also recommendations for Prometheus metric names (see https://prometheus.io/docs/practices/naming/), which I think ceph_exporter uses, that would require a translation table of some kind here. I don't know how important it is. Perhaps this is a "put this out there and refine it going forward" type question.

@jan--f

This comment has been minimized.

Member

jan--f commented Jul 21, 2017

@dmick I would think the mgr prometheus code will eventually export much more then the ceph_exporter can. The ceph_exporter can only export what it finds in the return of ceph health and in the osdmap and pgmap. But as you say this is probably also a "put this out there and refine it going forward" type question.
One minor thing: Its common practice for prometheus exporters to expose their metrics under <host>:<port>/metrics. If I understand your code correctly, the plugin exposes its metrics under /. https://prometheus.io/docs/instrumenting/writing_exporters/#landing-page

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 21, 2017

I think there is a lot of cleanup we can do in the metric names. I like that doc because it's prescriptive about it, but adopting probably means going in and renaming a bunch of our internal perfcounters (vs having some ugly remapping table at this layer). So, longer term effort...

@liewegas liewegas merged commit 410ab9a into ceph:master Jul 21, 2017

2 of 4 checks passed

make check make check failed
Details
make check (arm64) make check failed
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
@dmick

This comment has been minimized.

Member

dmick commented Jul 21, 2017

@jan--f: ceph_exporter doesn't use 'daemon perf dump'? I'd just assumed it would. I'm sure that's an easy-ish mod.

I mentioned in the doc, but it actually just ignores path, so /metrics works, as does /; it's the same technique the Python Prometheus client library used, so I didn't stress. Easy enough to add path handling if/when we need it (@jcsp mentioned that federation might be a thing, so /federate might be a future).

@liewegas: yeah, mapping in this app was what I was thinking.

@dmick dmick deleted the dmick:wip-mgr-counters branch Jul 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment