New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luminous: bulk backport of ceph-mgr improvements #18675

Merged
merged 107 commits into from Nov 2, 2017

Conversation

Projects
None yet
8 participants
@jcsp
Contributor

jcsp commented Nov 1, 2017

The motivation here is to provide the latest ceph-mgr additions to luminous users:

  • fleshed out prometheus module
  • influxdb module
  • balancer module (latest balancer module bits to be backported separately or added here once #17983 merges in master)
  • standby mode for modules
  • "mgr services" command to see the addresses of module-provided services

This PR includes the relevant commits along with the things they depend on and some bug fixes that are coming along for the ride. This will overlap with some backport tracker tickets, but we can clean those up afterwards. It's essential to do this in a big PR because of the subtle merge conflicts that otherwise exist between the changes included here: the order in which things are cherry-picked into this PR is somewhat magic.

@jcsp jcsp requested a review from liewegas Nov 1, 2017

@liewegas liewegas added this to the luminous milestone Nov 1, 2017

liewegas and others added some commits Jul 11, 2017

mgr: add trivial OSDMap wrapper class
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2ef0051)
mgr/PyModules: add 'pg_status' dump
This is summary info, same as what's in 'ceph status'.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 85b5b80)
mgr/PyModules: add 'pg_dump' get
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit bfb9286)
pybind/mgr/mgr_module: add default arg to get_config
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 39c42dd)
pybind/mgr/balancer: add balancer module
- wake up every minute
- back off when unknown, inactive, degraded
- throttle against misplaced ratio
- apply some optimization step
  - initially implement 'upmap' only

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 0d9685c)
pybind/mgr/balancer: do upmap by pool, in random order
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 028a66d)
crush/CrushWrapper: refactor get_rule_weight_osd_map to work with roo…
…ts too

Allow us to specify a root node in the hierarchy instead of a rule.
This way we can use it in conjunction with find_takes().

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 69454e0)
mgr/PyOSDMap: get_crush, find_takes, get_take_weight_osd_map
These let us identify distinct CRUSH hierarchies that rules distribute
data over, and create relative weight maps for the OSDs they map to.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 3b8a276)
crush/CrushWrapper: rule_has_take
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit ef140de)
mgr/PyOSDMap: OSDMap.map_pool_pgs_up, CRUSHMap.get_item_name
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit a928bf6)
mgr: apply a threshold to perf counter prios
...so that we can control the level of load
we're putting on ceph-mgr with perf counters.  Don't collect
anything below PRIO_USEFUL by default.

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit bdc775f)
pybind/mgr/balancer: rough framework
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit d5e5c68)
pybind/mgr/balancer: make 'crush-compat' sort of work
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 7a00e02)
src/pybind/mgr/balancer/module.py: improve scoring method
* score lies in [0, 1), 0 being perfect distribution
* use shifted and scaled cdf of normal distribution
  to prioritize highly over-weighted device.
* consider only over-weighted devices to calculate score

Signed-off-by: Spandan Kumar Sahu <spandankumarsahu@gmail.com>
(cherry picked from commit c09308c)
pybind/mgr/balancer: make auto mode work
(with upmap at least)

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit ef1a3be)
mgr: clean up DaemonStateIndex locking
Various things here were dangerously operating
outside locks.

Additionally switch to a RWLock because this lock
will be relatively read-hot when it's taken every time
a MMgrReport is handled, to look up the DaemonState
for the sender.

Fixes: http://tracker.ceph.com/issues/21158
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 806f108)
mgr: runtime adjustment of perf counter threshold
ceph-mgr has missed out on the `config set` command
that the other daemons got recently: add it here
and hook it all up to the stats period and threshold
settings.

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 057b73d)
mgr/influx: added influx plugin
Signed-off-by: My Do <mhdo@umich.edu>
(cherry picked from commit 68ae26c)
pybind/mgr/prometheus: prefix metrics with 'ceph'; replace :: with _
Both follow prometheus best practices. While : is a legal metric
character, "Exposed metrics should not contain colons, these are for
users to use when aggregating."

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 177afcc)
pybind/mgr/prometheus: add cluster wide metrics; no perf counters for…
… now

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 49b3ff8)
pybind/mgr/prometheus: add device_class label to osd metrics
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 76d1918)
pybind/mgr/prometheus: no need to convert perf_schema to ordered_dict
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 5e4b4b5)
pybind/mgr/prometheus: no need to wait for notify event
If stats or perf counters are not available they won't be emitted.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit ead0973)
pybind/mgr/prometheus: actually emit reported pg counts
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit c288624)

jcsp added some commits Nov 1, 2017

doc: describe using `mgr module ...` commands
...including the new "mgr services" command.

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit d220e1a)
mon: include disabled modules in `mgr module ls`
Otherwise, when someone wants to see what's possible
to do with `mgr module enable` they have to trawl
through the whole mgr map dump.

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 5861c4c)
mgr: fix up make_unique usage for backport
This was getting the definition some other way in master,
but in luminous we need to include the backport14 header.

Signed-off-by: John Spray <john.spray@redhat.com>
qa: fix mgr _load_module helper
I inadvertently broke this with the latest change
to the module ls output.

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 4fb3025)
@jcsp

This comment has been minimized.

Show comment
Hide comment
@jcsp

jcsp Nov 2, 2017

Contributor

This inherited some master failures from #18685, I've pulled that fix into this branch.

Contributor

jcsp commented Nov 2, 2017

This inherited some master failures from #18685, I've pulled that fix into this branch.

@liewegas liewegas merged commit 240edcf into ceph:luminous Nov 2, 2017

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
@smithfarm

This comment has been minimized.

Show comment
Hide comment
@smithfarm

smithfarm Nov 6, 2017

Contributor

@jcsp Thanks, John. This will reduce cherry-pick conflicts for sure, at least in the short term.

Contributor

smithfarm commented Nov 6, 2017

@jcsp Thanks, John. This will reduce cherry-pick conflicts for sure, at least in the short term.

@jcsp jcsp deleted the jcsp:wip-luminous-mgr branch Nov 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment