New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: add crush type down health warnings #14914

Merged
merged 5 commits into from May 18, 2017

Conversation

Projects
None yet
4 participants
@neha-ojha
Member

neha-ojha commented May 2, 2017

This pull request aims to add additional health warnings for down crush types.

Signed-off-by: Neha Ojha nojha@redhat.com

mon: add crush type down health warnings
Signed-off-by: Neha Ojha <nojha@redhat.com>
@neha-ojha

This comment has been minimized.

Member

neha-ojha commented May 2, 2017

retest this please.

@neha-ojha neha-ojha requested a review from liewegas May 3, 2017

ss << num_down_in_subtrees << "/" << num_in_subtrees << " of CRUSH type " <<
g_conf->mon_osd_down_out_subtree_limit << " is down. ";
}
else {

This comment has been minimized.

@jdurgin

jdurgin May 4, 2017

Member

nit: else should go on one line with brackets, e.g. } else {

This comment has been minimized.

@jdurgin

jdurgin May 4, 2017

Member

no need to output this when no subtree is down

}
if (detail) {
ss << "CRUSH type " << g_conf->mon_osd_down_out_subtree_limit << " down list: [" <<
down_in_subtrees << "]";

This comment has been minimized.

@jdurgin

jdurgin May 4, 2017

Member

when printing out, we can get the name of the bucket or device via CrushWrapper's get_item_name(id) method - you can add a method to OSDMap that returns that and use it here

if (num_down_in_osds > 0) {
ostringstream ss;
ss << num_down_in_osds << "/" << num_in_osds << " in osds are down";
summary.push_back(make_pair(HEALTH_WARN, ss.str()));
ss << num_down_in_osds << "/" << num_in_osds << " in osds are down. ";

This comment has been minimized.

@jdurgin

jdurgin May 4, 2017

Member

this can stay in its own summary entry - each one gets displayed on a separate line

ss << num_down_in_osds << "/" << num_in_osds << " in osds are down. ";
if (num_down_in_subtrees == 1) {
ss << num_down_in_subtrees << "/" << num_in_subtrees << " of CRUSH type " <<
g_conf->mon_osd_down_out_subtree_limit << " is down. ";

This comment has been minimized.

@jdurgin

jdurgin May 4, 2017

Member

missing summary.push_back(...) here

if (!osdmap.is_up(i)) {
++num_down_in_osds;
if (detail) {
const osd_info_t& info = osdmap.get_info(i);
ostringstream ss;
ss << "osd." << i << " is down since epoch " << info.down_at
ss << "osd." << i << " belonging to " << g_conf->mon_osd_down_out_subtree_limit
<< " id " << subtree_id << " is down since epoch " << info.down_at

This comment has been minimized.

@jdurgin

jdurgin May 4, 2017

Member

can get the item name instead of the id for the user readable output here

mon: display subtree name in crush type down health warnings
Signed-off-by: Neha Ojha <nojha@redhat.com>
@neha-ojha

This comment has been minimized.

Member

neha-ojha commented May 4, 2017

The health warning for a cluster with 3 racks and 10 osds, with all osds down, looks like this:

HEALTH_WARN 7 pgs peering; 22 pgs stale; 10/10 in osds are down.; 3/3 of CRUSH type rack are down.
pg 0.7 is stale+active+clean, acting [7,0,1]
pg 1.6 is stale+active+clean, acting [6,9,8]
pg 2.5 is stale+active+clean, acting [8,4,0]
pg 0.6 is stale+peering, acting [9,0,5]
pg 1.7 is stale+active+clean, acting [6,7,3]
pg 2.4 is stale+peering, acting [9,4,0]
pg 2.7 is stale+active+clean, acting [6,0,9]
pg 0.5 is stale+activating, acting [3,5,2]
pg 1.4 is stale+active+clean, acting [3,0,8]
pg 0.4 is stale+active+clean, acting [0,3,2]
pg 1.5 is stale+peering, acting [4,0,3]
pg 1.1 is stale+active+clean, acting [5,8,4]
pg 0.0 is stale+active+clean, acting [6,4,3]
pg 2.2 is stale+active+clean, acting [5,6,1]
pg 1.0 is stale+active+clean, acting [8,0,2]
pg 0.1 is stale+peering, acting [2,5,7]
pg 2.3 is stale+active+clean, acting [5,4,1]
pg 0.2 is stale+active+clean, acting [3,7,4]
pg 1.3 is creating+peering, acting []
pg 2.0 is stale+active+clean, acting [3,5,9]
pg 0.3 is stale+active+clean, acting [2,0,8]
pg 1.2 is stale+peering, acting [4,1,3]
pg 2.1 is stale+peering, acting [4,8,0]
osd.0 belonging to rack0 is down since epoch 27, last address 127.0.0.1:6800/15327
osd.1 belonging to rack0 is down since epoch 27, last address 127.0.0.1:6804/29925
osd.2 belonging to rack0 is down since epoch 27, last address 127.0.0.1:6808/10700
osd.3 belonging to rack0 is down since epoch 27, last address 127.0.0.1:6812/23171
osd.4 belonging to rack1 is down since epoch 27, last address 127.0.0.1:6816/3197
osd.5 belonging to rack1 is down since epoch 27, last address 127.0.0.1:6820/15718
osd.6 belonging to rack1 is down since epoch 27, last address 127.0.0.1:6824/27988
osd.7 belonging to rack1 is down since epoch 27, last address 127.0.0.1:6828/8788
osd.8 belonging to rack2 is down since epoch 27, last address 127.0.0.1:6832/21113
osd.9 belonging to rack2 is down since epoch 27, last address 127.0.0.1:6836/1293
10/10 in osds are down.
3/3 of CRUSH type rack are down.
CRUSH type rack down list: [rack0,rack1,rack2]

For the same setup, if 3 osds and none of the racks are down:

HEALTH_WARN 6 pgs peering; 4 pgs stale; 3/10 in osds are down.; no active mgr
pg 2.5 is stale+active+clean, acting [8,4,0]
pg 0.6 is peering, acting [9,0,5]
pg 2.4 is peering, acting [9,4,0]
pg 1.5 is peering, acting [4,0,3]
pg 1.0 is stale+active+clean, acting [8,0,2]
pg 0.1 is stale+active+clean, acting [8,6,5]
pg 1.3 is creating+peering, acting [3,5]
pg 0.3 is stale+active+clean, acting [2,0,8]
pg 1.2 is peering, acting [4,1,3]
pg 2.1 is peering, acting [4,0]
osd.1 belonging to rack0 is down since epoch 27, last address 127.0.0.1:6804/10796
osd.2 belonging to rack0 is down since epoch 27, last address 127.0.0.1:6808/23528
osd.8 belonging to rack2 is down since epoch 27, last address 127.0.0.1:6832/32736
@jdurgin

jdurgin approved these changes May 4, 2017

@liewegas

This comment has been minimized.

Member

liewegas commented May 5, 2017

I would streamline these messages a bit:

  • instead of "3/3 of CRUSH type rack are down" to "3/3 racks are down".
  • we can include the list, too, instead of making a separate 'detail' item: "3/3 racks are down (rack0, rack2, rack2)"
  • for the 'belonging to' part, how about "osd.1 (root=default,rack=rack0,host=foo) is down since ..."? That has more information (and is probably easier to code.. can just cram the crush position map<> into the ostream).

Thanks!

mon: modify crush type down health warnings
Signed-off-by: Neha Ojha <nojha@redhat.com>
@neha-ojha

This comment has been minimized.

Member

neha-ojha commented May 8, 2017

I have modified the health warnings to look something like this:

HEALTH_WARN 3 pgs peering; 24 pgs stale; 10/10 in osds are down; 3/3 racks are down(rack0,rack1,rack2)
pg 0.7 is stale+active+clean, acting [7,0,1]
pg 1.6 is stale+active+clean, acting [6,9,8]
pg 2.5 is stale+active+clean, acting [8,4,0]
pg 0.6 is stale+active+clean, acting [4,1,0]
pg 1.7 is stale+active+clean, acting [6,7,3]
pg 2.4 is stale+active+clean, acting [1,9,2]
pg 2.7 is stale+active+clean, acting [6,0,9]
pg 0.5 is stale+active+clean, acting [9,3,5]
pg 1.4 is stale+active+clean, acting [3,0,8]
pg 2.6 is stale+active+clean, acting [9,0,4]
pg 0.4 is stale+active+clean, acting [0,3,2]
pg 1.5 is stale+peering, acting [4,0,3]
pg 1.1 is stale+active+clean, acting [5,8,4]
pg 0.0 is stale+active+clean, acting [6,4,3]
pg 2.2 is stale+active+clean, acting [5,6,1]
pg 1.0 is stale+active+clean, acting [8,0,2]
pg 0.1 is stale+active+clean, acting [8,6,5]
pg 2.3 is stale+active+clean, acting [5,4,1]
pg 0.2 is stale+active+clean, acting [3,7,4]
pg 1.3 is stale+active+clean, acting [4,1,2]
pg 2.0 is stale+active+clean, acting [3,5,9]
pg 0.3 is stale+active+clean, acting [2,0,8]
pg 1.2 is stale+peering, acting [4,1,3]
pg 2.1 is stale+peering, acting [4,8,0]
osd.0{default=default,host=host0,rack=rack0} is down since epoch 27, last address 127.0.0.1:6800/9068
osd.1{default=default,host=host0,rack=rack0} is down since epoch 27, last address 127.0.0.1:6804/23567
osd.2{default=default,host=host1,rack=rack0} is down since epoch 27, last address 127.0.0.1:6808/4324
osd.3{default=default,host=host1,rack=rack0} is down since epoch 27, last address 127.0.0.1:6812/16895
osd.4{default=default,host=host2,rack=rack1} is down since epoch 27, last address 127.0.0.1:6816/28276
osd.5{default=default,host=host2,rack=rack1} is down since epoch 27, last address 127.0.0.1:6820/8695
osd.6{default=default,host=host3,rack=rack1} is down since epoch 27, last address 127.0.0.1:6824/21306
osd.7{default=default,host=host3,rack=rack1} is down since epoch 27, last address 127.0.0.1:6828/2162
osd.8{default=default,host=host4,rack=rack2} is down since epoch 27, last address 127.0.0.1:6832/14903
osd.9{default=default,host=host4,rack=rack2} is down since epoch 27, last address 127.0.0.1:6836/26684

HEALTH_WARN 7 pgs stale; 3/10 in osds are down
pg 2.7 is stale+active+clean, acting [8,2,1]
pg 1.4 is stale+active+clean, acting [1,7,8]
pg 1.0 is stale+active+clean, acting [8,5,7]
pg 0.1 is stale+active+clean, acting [2,5,7]
pg 2.3 is stale+active+clean, acting [8,1,6]
pg 0.2 is stale+active+clean, acting [8,4,1]
pg 2.0 is stale+active+clean, acting [8,4,2]
osd.1{default=default,host=host0,rack=rack0} is down since epoch 37, last address 127.0.0.1:6804/23567
osd.2{default=default,host=host1,rack=rack0} is down since epoch 37, last address 127.0.0.1:6808/4324
osd.8{default=default,host=host4,rack=rack2} is down since epoch 37, last address 127.0.0.1:6832/14903

osdmap.get_addr(i);
map<string, string> loc;
loc = osdmap.crush->get_full_location(i);
ss << "osd." << i << loc << " is down since epoch " << info.down_at << ", last address "

This comment has been minimized.

@liewegas

liewegas May 9, 2017

Member

space between i and loc?

@liewegas

This comment has been minimized.

Member

liewegas commented May 9, 2017

Okay, I didn't look at the actual code closely before, so this is late feedback (sorry!). What I originally had in mind was that the subtree-based health warnings would simplify the report based on any of the subtree levels in use in the crush map, and not actually have anything to do with the mon_osd_down_out_subtree_limit config option. So any time there is a node in the hierachy where everything beneath it is down, we would report that instead (unless it's parent is also completely down). Does that make sense? I think it means reworking how the group algorithm is written to not be tied at all to a particular type/level in the tree.

Probably something like starting with the first osd, checking if its parent contains all down things, and then if its parent is all down, etc., then moving on to the next osd and doing the same. With some set<>'s (or unordered_set<>'s) caching intermediate nodes that have already been classified as all down or not all down it should be reasonably efficient?

@neha-ojha

This comment has been minimized.

Member

neha-ojha commented May 9, 2017

So, we would be reporting warnings at the highest level of subtree that is down. How should the health warning messages look like, now that we might have different subtree levels down?
For example if osd.5, host0(with osd.0 & osd.1) and rack2(with host4-osd.8 & osd.9) are down

@liewegas

This comment has been minimized.

Member

liewegas commented May 9, 2017

@tchaikov tchaikov self-requested a review May 9, 2017

@neha-ojha

This comment has been minimized.

Member

neha-ojha commented May 9, 2017

Thanks for this example. I'll work on it and will let you know if I have further questions or suggestions.

@neha-ojha

This comment has been minimized.

Member

neha-ojha commented May 12, 2017

I have figured out the detail part of the health warnings. The summary needs some more work. Here is an example of what I have until now:

ID WEIGHT   TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-9 10.00000 root root                                            
-6  4.00000     rack rack0                                       
-1  2.00000         host host0                                   
 0  1.00000             osd.0       up  1.00000          1.00000 
 1  1.00000             osd.1       up  1.00000          1.00000 
-2  2.00000         host host1                                   
 2  1.00000             osd.2       up  1.00000          1.00000 
 3  1.00000             osd.3       up  1.00000          1.00000 
-7  4.00000     rack rack1                                       
-3  2.00000         host host2                                   
 4  1.00000             osd.4       up  1.00000          1.00000 
 5  1.00000             osd.5       up  1.00000          1.00000 
-4  2.00000         host host3                                   
 6  1.00000             osd.6       up  1.00000          1.00000 
 7  1.00000             osd.7       up  1.00000          1.00000 
-8  2.00000     rack rack2                                       
-5  2.00000         host host4                                   
 8  1.00000             osd.8       up  1.00000          1.00000 
 9  1.00000             osd.9       up  1.00000          1.00000 
+ ceph osd down osd.0 osd.1 osd.2 osd.3 osd.5 osd.6 osd.7 osd.9
marked down osd.0. marked down osd.1. marked down osd.2. marked down osd.3. marked down osd.5. marked down osd.6. marked down osd.7. marked down osd.9. 
+ ceph health detail
HEALTH_WARN 6 pgs peering; 14 pgs stale; 
1 rack (4 osds) is down
3 hosts (6 osds) are down
8 osds are down
pg 0.7 is stale+active+clean, acting [7,0,1]
pg 1.6 is stale+active+clean, acting [6,9,8]
pg 1.7 is stale+active+clean, acting [6,7,3]
pg 2.4 is stale+active+clean, acting [1,9,2]
pg 2.7 is peering, acting [8,2,1]
pg 0.5 is stale+active+clean, acting [9,3,5]
pg 1.4 is stale+peering, acting [1,7,8]
pg 2.6 is stale+active+clean, acting [9,0,4]
pg 0.4 is stale+active+clean, acting [0,3,2]
pg 1.1 is stale+active+clean, acting [5,8,4]
pg 0.0 is stale+active+clean, acting [6,4,3]
pg 2.2 is stale+active+clean, acting [5,6,1]
pg 1.0 is peering, acting [8,5,7]
pg 2.3 is peering, acting [8,1,6]
pg 0.2 is peering, acting [8,4,1]
pg 2.0 is peering, acting [8,4,2]
pg 0.3 is stale+active+clean, acting [2,0,8]
pg 1.2 is stale+active+clean, acting [0,7,9]
pg 2.1 is stale+active+clean, acting [7,5,0]
rack rack0 (root=root) (4 osds) is down
host host0 (root=root,rack=rack0) (2 osds) is down
host host1 (root=root,rack=rack0) (2 osds) is down
host host3 (root=root,rack=rack1) (2 osds) is down
osd.0 (root=root,rack=rack0,host=host0) is down
osd.1 (root=root,rack=rack0,host=host0) is down
osd.2 (root=root,rack=rack0,host=host1) is down
osd.3 (root=root,rack=rack0,host=host1) is down
osd.5 (root=root,rack=rack1,host=host2) is down
osd.6 (root=root,rack=rack1,host=host3) is down
osd.7 (root=root,rack=rack1,host=host3) is down
osd.9 (root=root,rack=rack2,host=host4) is down

Please let me know if this looks good.

@liewegas

This comment has been minimized.

Member

liewegas commented May 12, 2017

mon: subtree-based crush type down health warnings
Signed-off-by: Neha Ojha <nojha@redhat.com>
@jdurgin

latest iterations looks good

- [mon.a, mgr.x, osd.0, osd.1, osd.2, osd.3, osd.4, osd.5, osd.6, osd.7, osd.8, osd.9, client.0]
tasks:
- install:
- workunit:

This comment has been minimized.

@jdurgin

jdurgin May 16, 2017

Member

will need - ceph: after install, to configure + run the cluster

This comment has been minimized.

@neha-ojha

neha-ojha May 16, 2017

Member

added it.

mon: add test for crush type down health warnings
Signed-off-by: Neha Ojha <nojha@redhat.com>

@yuriw yuriw merged commit ef1c024 into ceph:master May 18, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment