Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crush, mon: simplify device class manipulation commands #16388

Merged
merged 18 commits into from Jul 27, 2017

Conversation

Projects
None yet
3 participants
@xiexingguo
Copy link
Member

xiexingguo commented Jul 18, 2017

No description provided.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-class-misc-fixes branch from 29bf288 to e5e52ce Jul 18, 2017

@@ -189,6 +189,8 @@ TEST(CrushWrapper, swap_bucket) {
ASSERT_EQ(1, c->get_bucket_item(b, 1));
ASSERT_EQ(2, c->get_bucket_item(b, 2));
ASSERT_EQ(3, c->get_bucket_item(a, 0));

delete c;

This comment has been minimized.

Copy link
@tchaikov

tchaikov Jul 18, 2017

Contributor

please use unique_ptr<> (and make_unique<>() if you please), otherwise static analyzer will complain. as ASSERT_EQ will bail out without executing delete c.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-class-misc-fixes branch from 63cd3d5 to f389e71 Jul 18, 2017

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-class-misc-fixes branch from f389e71 to b747ae6 Jul 18, 2017

@xiexingguo xiexingguo requested a review from liewegas Jul 18, 2017

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-class-misc-fixes branch from b747ae6 to e915a35 Jul 18, 2017

class_is_in_use(get_class_id(old_class_name), &ts)) {
*ss << "osd." << id << " has already bound to class '" << old_class_name
<< "', which is currently " << ts.str();
return -EBUSY;

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 19, 2017

Member

I'm not sure I understand the logic here. If the class is wrong, what is the user supposed to do?

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 19, 2017

Author Member

This works as a guard to prevents the problem below from happening:

ceph osd crush set-device-class hdd osd.0
ceph osd crush rule create-replicated foo-hdd1 default host hdd
ceph osd crush set-device-class ssd osd.0

Before this change, the above ops will be considered as legal and we'll silently change the class of osd.0 from hdd into ssd.
But actually it is not unsafe to do so as the old class of osd.0 might be in use by crush rule foo-hdd1.

If the class is wrong, what is the user supposed to do?

In case user want to (re)use "set-device-class" to reset the class of specific osds, make sure the old class is not currently in use by any crush rule. Otherwise we fail and user shall delete the crush rule that is still referencing the old device class first before he can proceed.

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 19, 2017

Member

There has to be something the user can do if the osd is misclassified as hdd vs ssd and there are rules for both.

Maybe we should leave class create and rm but also let the set implicitly create the class...

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 19, 2017

Author Member

There has to be something the user can do if the osd is misclassified as hdd vs ssd and there are rules for both.

How about dropping the constranits above and warning about potential data losing (and migrating) if any, and requiring user to pass "--yes-i-really-mean-it" to go on then?

Maybe we should leave class create and rm

But an empty class is meanless unless it is bounded to particular devices.

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 19, 2017

Member

It's not if a CRUSH rule points to it. I see that you're trying to prevent the situation where that happens (a crush rule references a class with no devices), but it might be simpler to just allow it. e.g., generate shadow hierarchies for any (class,root) combination referenced by a crush rule. If the users removes the last device in the class the rule will point to an empty shadow root and return no device, and those PGs will become remapped (probably) or some other semi-broken PG state until they fix their crush rule. That seems more friendly and intuitive than being prevented from changing a device's class until you delete the crush rule.

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 19, 2017

Member

At the same time it might also make things a bit safer if we prevent set-device-class from changing a class.. only setting it, and require users to rm-device-class and then set-device-class to change. That way a change in code/policy/whatever won't wholesale switch device classes in the cluster and move a ton of data and break things accidentally.

@liewegas liewegas added this to the luminous milestone Jul 19, 2017

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jul 19, 2017

Pretty sure this will conflict with #16326 ... do you mind reviewing that one?

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jul 19, 2017

do you mind reviewing that one?

with pleasure

@@ -135,19 +135,126 @@ function TEST_set_device_class() {
$ok || return 1
}

function expect_false()

This comment has been minimized.

Copy link
@tchaikov

tchaikov Jul 19, 2017

Contributor

can we re-use expect_failure instead?

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 19, 2017

Author Member

sure, will rebase once #16326 gets merged

@tchaikov

This comment has been minimized.

Copy link
Contributor

tchaikov commented Jul 24, 2017

@xiexingguo #16326 has been merged, could you update this PR and rebase it? thanks.

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jul 24, 2017

Thanks for the reminder, kefu. I am on it.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-class-misc-fixes branch 3 times, most recently from 1ac2074 to 55be15e Jul 25, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jul 25, 2017

@liewegas Done rebasing.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jul 25, 2017

sorry, one more rebase!

@liewegas
Copy link
Member

liewegas left a comment

Questions on several of these...

@@ -581,6 +581,10 @@ class CrushWrapper {
if (have_rmaps)
rule_name_rmap[name] = i;
}
bool shadow_item(int id) const {

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 25, 2017

Member

can we do 'bool is_shadow_item()'?

{
assert(parents);
parents->clear();
while (_search_item_exists(id)) {

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 25, 2017

Member

how does this loop terminate? we aren't modifying the tree inside the loop.

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 26, 2017

Author Member

we aren't modifying the tree inside the loop

id will be reset to its direct parent after every loop, see b80c530#diff-c1ea4e8ae4b5520080ef65c1bd8ed666R1857
So we'll quit(normally) if we finally reach the root.

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 26, 2017

Member

I see. This goes away if we use the simplified remove approach below, though, right?

}

int CrushWrapper::remove_device_class(CephContext *cct, int id, ostream *ss)
{

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 25, 2017

Member

I think it would be simpler to just

  • remove the class for this item
  • count how many other devices with this class are left. if 0, and no rule is referencing the class, remove the class too.
  • rebuild the shadow hierarchy

? Not terribly efficient, but I'm not sure it matters?

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 26, 2017

Author Member

This won't work for the following case(multiple roots share the same class):

./bin/ceph osd tree up --show-shadow


ID  CLASS WEIGHT  TYPE NAME                                            UP/DOWN REWEIGHT PRI-AFF 
-10   ssd 2.00000 root default~ssd                                                              
 -9   ssd 2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic~ssd                          
  1   ssd 1.00000         osd.1                                             up  1.00000 1.00000 
  2   ssd 1.00000         osd.2                                             up  1.00000 1.00000 
 -8   ssd 1.00000 root foo~ssd                                                                  
 -4   ssd 1.00000     rack foo-rack~ssd                                                         
 -3   ssd 1.00000         host foo-host~ssd                                                     
  0   ssd 1.00000             osd.0                                         up  1.00000 1.00000 
 -7       1.00000 root foo                                                                      
 -6       1.00000     rack foo-rack                                                             
 -5       1.00000         host foo-host                                                         
  0   ssd 1.00000             osd.0                                         up  1.00000 1.00000 
 -1       2.00000 root default                                                                  
 -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic                              
  1   ssd 1.00000         osd.1                                             up  1.00000 1.00000 
  2   ssd 1.00000         osd.2                                             up  1.00000 1.00000 

BTW, can we get #16577 into luminous too? It's really annoying if there is no obvious way to find out all the shadow trees, especially for debugging.

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 26, 2017

Member

Yeah sure.. but should we do --show-shadow on 'osd crush tree' instead of 'osd tree'? It seems more appropriate there (and won't mix with the up/down/in/out/destroyed stuff).

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 26, 2017

Member

You mean if osd.0 is removed, right? In that case, we would (1) unset osd.0's class, (2) see there are more ssds so do not delete the class, and then (3) rebuild the shadow trees.

Partly I like the rebuild every time approach because it means we have a consistent result regardless of which path (setting class, unsetting class, whatever) gets us to a particular point.

int CrushWrapper::rename_class(CephContext *cct,
const string& srcname,
const string& dstname,
ostream *ss) {

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 25, 2017

Member

Hmm, it kind of seems like the only use for this is when you are using the class. (If you aren't, you can just remove it and create the new class.) And the hard part about renaming a class that you are using is

  1. you'd like to preserve the bucket ids in the shadow tree across the rename so data doesn't move around, and
  2. you'd like to atomically update the device classes, the class name, shadow hiearrchy, and the rule referencing the class.

If we aren't doing 1+2 (we aren't) then this is no better than a for loop over rm-device-class + set-device-class, right?

(Doing 1+2 would be nice, but if we don't do them, I wonder if we should just leave the rename command disabled entirely? It won't do the hard parts the user wants it to.)

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 26, 2017

Author Member

Kill "class rename" would be the preferred option

xiexingguo added some commits Jul 26, 2017

mon/OSDMonitor: optional show-shadow for "crush tree" command
 ./bin/ceph osd crush tree --show-shadow
ID WEIGHT  TYPE NAME
-4 3.00000 root default~ssd
-3 3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic~ssd
 0 1.00000         osd.0
 1 1.00000         osd.1
 2 1.00000         osd.2
-1 3.00000 root default
-2 3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
 0 1.00000         osd.0
 1 1.00000         osd.1
 2 1.00000         osd.2

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: fix "crush create-or-move/move" would drop osd's class
    Was:
     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0       1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    Now:
    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0   ssd 1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo added some commits Jul 25, 2017

crush: rebuild shadow tree on "crush create-or-move/move"
This patch solves the problem below:

./bin/ceph osd crush move osd.0 root=foo rack=foo-rack host=foo-host
moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

 ./bin/ceph osd crush rule create-replicated foo-rule foo host ssd
Error EINVAL: root foo has no devices with class ssd

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: fix class_is_in_use()
A class can be considered as in-use only if it is referenced by
any of the existing crush rules.

The patch also makes the output more human readable. For example:

./bin/ceph osd crush rule create-replicated myrule default host ssd
./bin/ceph osd crush class rm ssd
Error EBUSY: class 'ssd' still referenced by crush_rule 'myrule'

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: guard set-device-class
If a device has already been bounded to a class,
do not allow to change its class silently.
Require user call rm-device-class first.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: reuse find_roots to implement find_nonshadow_roots
to reduce code redundance...

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: use is_shadow_item() wrapper if possible
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: implement find_shadow_roots
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: tidy up class_map on remove_root()
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: rm-device-class support
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: allow "crush class rm" to automatically recycle shadow tree(s)
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: kill "class create" command
The device class is now self and automatically managed.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
crush: kill 'class rename'
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
test/crush: kill dead class_is_in_use test
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-class-misc-fixes branch from 55be15e to 15085fc Jul 26, 2017

xiexingguo added some commits Jul 18, 2017

test/crush: fix memory leak
use unique_ptr<> instead, otherwise static analyzer will complain.
as ASSERT_EQ will bail out without executing delete c

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
doc/release-notes: update device-class cli family
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mon/OSDMonitor: be more helpful for new weight-set CLI users
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
doc/rados/operations/crush-map: s/die/dice/
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
int CrushWrapper::get_immediate_parent_id(int id, int *parent) const
int CrushWrapper::get_immediate_parent_id(int id,
int *parent,
parent_type_t choice) const

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 26, 2017

Member

I think this change isn't needed now? Can leave it, I guess

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jul 26, 2017

I think this change isn't needed now? Can leave it, I guess

Yeah.
I see you are testing. Will do a follow-up PR to clean this up.

@liewegas liewegas merged commit c7430c5 into ceph:master Jul 27, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@xiexingguo xiexingguo deleted the xiexingguo:wip-class-misc-fixes branch Jul 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.