pybind/mgr/prometheus: add StandbyModule and handle failed MON cluster #19744

jan--f · 2018-01-02T17:33:51Z

No description provided.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

jcsp · 2018-01-03T18:30:33Z

src/pybind/mgr/prometheus/module.py

        cherrypy.engine.start()
+        cherrypy.engine.start()


double call to start()

jcsp · 2018-01-03T18:31:28Z

src/pybind/mgr/prometheus/module.py

        cherrypy.engine.block()
+
+    def shutdown(self):
+        self.serving = False


looks like self.serving was never really being used anywhere, so can remove it here too

jcsp · 2018-01-10T15:14:08Z

Could you add a test similar to TestDashboard.test_standby in qa/tasks/mgr?

I can't remember if you've worked on those tests before, but they're generally convenient to run in a vstart cluster with something like:

LD_LIBRARY_PATH=lib/ PYTHONPATH=lib/cython_modules/lib.2 python ../qa/tasks/vstart_runner.py --create tasks.mgr.test_module_selftest.TestModuleSelftest.test_zabbix

(just copied from my bash history)

jan--f · 2018-01-13T16:31:50Z

Tidied up the module as per your comments, thanks for that.
I also added a test_prometheus.py in qa/tasks/mgr, taking test_dashboard.py as a template. I wasn't able to test it locally though. Your command needs teuthology in the PYTHONPATH I assume.

jcsp · 2018-01-15T07:56:17Z

That's true about teuthology, but you don't need anything running, just the python source: just clone it, run ./bootstrap in the clone dir, and then source ./virtualenv/bin/activate

Calling cherrypy.engine.block() in the stanby module results in a failing mgr failover. Signed-off-by: Jan Fajerski <jfajerski@suse.com>

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

jan--f · 2018-01-22T12:28:06Z

@jcsp ok this works for me now.

2018-01-22 13:23:32,905.905 INFO:__main__:Stopped test: test_urls (tasks.mgr.test_prometheus.TestPrometheus) in 21.410567s
2018-01-22 13:23:32,905.905 INFO:__main__:
2018-01-22 13:23:32,905.905 INFO:__main__:----------------------------------------------------------------------
2018-01-22 13:23:32,905.905 INFO:__main__:Ran 2 tests in 55.564s
2018-01-22 13:23:32,906.906 INFO:__main__:
2018-01-22 13:23:32,906.906 INFO:__main__:OK

Added in ceph#19744 Signed-off-by: John Spray <john.spray@redhat.com>

jcsp · 2018-01-22T13:33:51Z

Added the .yaml snippet to get the new prometheus test running in the lab environment here: #20047

This was throwing IOError("Port 9283 not free on '::'",) when trying to serve, since merging ceph#19744 It's because the standbys (on the same node as the active) are now trying to listen too. Signed-off-by: John Spray <john.spray@redhat.com>

Added in ceph#19744 Signed-off-by: John Spray <john.spray@redhat.com>

This was throwing IOError("Port 9283 not free on '::'",) when trying to serve, since merging ceph#19744 It's because the standbys (on the same node as the active) are now trying to listen too. Fixes: https://tracker.ceph.com/issues/22755 Signed-off-by: John Spray <john.spray@redhat.com>

Added in ceph#19744 Signed-off-by: John Spray <john.spray@redhat.com>

This was throwing IOError("Port 9283 not free on '::'",) when trying to serve, since merging ceph#19744 It's because the standbys (on the same node as the active) are now trying to listen too. Fixes: https://tracker.ceph.com/issues/22755 Signed-off-by: John Spray <john.spray@redhat.com>

This was throwing IOError("Port 9283 not free on '::'",) when trying to serve, since merging ceph#19744 It's because the standbys (on the same node as the active) are now trying to listen too. Fixes: https://tracker.ceph.com/issues/22755 Signed-off-by: John Spray <john.spray@redhat.com> (cherry picked from commit e2c68d5)

Jan Fajerski added 2 commits January 2, 2018 18:18

pybind/mgr/prometheus: add StandbyModule; return empty answer

18ffd75

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

pybind/mgr/prometheus: return 503 if MON cluster is down

eda9f15

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

jcsp self-requested a review January 2, 2018 17:36

jcsp added the mgr label Jan 2, 2018

jan--f mentioned this pull request Jan 3, 2018

Enable Prometheus metrics on Ceph mgr rook/rook#1154

Merged

2 tasks

jcsp reviewed Jan 10, 2018

View reviewed changes

jan--f force-pushed the mgr-prometheus-standby-mondown branch 2 times, most recently from c5a1c1a to d3c2f5f Compare January 13, 2018 16:29

Jan Fajerski added 2 commits January 22, 2018 13:21

pybing/mgr/prometheus: tidy up cherrypy engine start and stop

ff471d4

Calling cherrypy.engine.block() in the stanby module results in a failing mgr failover. Signed-off-by: Jan Fajerski <jfajerski@suse.com>

qa/tasks/mgr: add test_prometheus; smoke tests for prometheus module

4a45b02

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

jan--f force-pushed the mgr-prometheus-standby-mondown branch from d3c2f5f to 4a45b02 Compare January 22, 2018 12:26

jcsp approved these changes Jan 22, 2018

View reviewed changes

jcsp merged commit c05d963 into ceph:master Jan 22, 2018

jcsp pushed a commit to jcsp/ceph that referenced this pull request Jan 22, 2018

qa: add new prometheus test to rados/mgr suite

d5b2fd9

Added in ceph#19744 Signed-off-by: John Spray <john.spray@redhat.com>

jcsp mentioned this pull request Jan 22, 2018

qa: add new prometheus test to rados/mgr suite #20047

Merged

jcsp pushed a commit to jcsp/ceph that referenced this pull request Jan 23, 2018

qa: add new prometheus test to rados/mgr suite

dd4f322

Added in ceph#19744 Signed-off-by: John Spray <john.spray@redhat.com>

cache-nez pushed a commit to cache-nez/ceph that referenced this pull request Feb 6, 2018

qa: add new prometheus test to rados/mgr suite

0ddd7ef

Added in ceph#19744 Signed-off-by: John Spray <john.spray@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pybind/mgr/prometheus: add StandbyModule and handle failed MON cluster #19744

pybind/mgr/prometheus: add StandbyModule and handle failed MON cluster #19744

jan--f commented Jan 2, 2018

jcsp Jan 3, 2018

jcsp Jan 3, 2018

jcsp commented Jan 10, 2018

jan--f commented Jan 13, 2018

jcsp commented Jan 15, 2018

jan--f commented Jan 22, 2018

jcsp commented Jan 22, 2018

pybind/mgr/prometheus: add StandbyModule and handle failed MON cluster #19744

pybind/mgr/prometheus: add StandbyModule and handle failed MON cluster #19744

Conversation

jan--f commented Jan 2, 2018

jcsp Jan 3, 2018

Choose a reason for hiding this comment

jcsp Jan 3, 2018

Choose a reason for hiding this comment

jcsp commented Jan 10, 2018

jan--f commented Jan 13, 2018

jcsp commented Jan 15, 2018

jan--f commented Jan 22, 2018

jcsp commented Jan 22, 2018