Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crush: weight_set and id remapping #14486

Merged
merged 7 commits into from Apr 19, 2017
Merged

crush: weight_set and id remapping #14486

merged 7 commits into from Apr 19, 2017

Conversation

ghost
Copy link

@ghost ghost commented Apr 12, 2017

No description provided.

@ghost ghost added feature core labels Apr 12, 2017
@ghost
Copy link
Author

ghost commented Apr 14, 2017

@liewegas here is a proposal for a minimalist integration of the weight_set in Ceph.

For buckets where the weight_set is not empty, the functions CrushWrapper::{insert_item,move_bucket,update_item,remove_item,remove_item_under} set the weight set and the target weight of new items to zero, and they do not adjust the weights of their parents. This allows the user to maintain the weights via the crushmap using the "choose_args" keyword::

// weight_set for pool 0
choose_args 0 {
  {
    bucket_id -5
    weight_set {
       position 0 weights 0x10000 0x20000 0x50000
       position 1 weights 0x30000 0x20000 0x50000
    }
    ids -20 -30 -25
  }

  {
    bucket_id -20
    weight_set {
       position 0 weights 0x10000 0x20000
       position 1 weights 0x30000 0x20000
    }
    ids -45 -23
  }
}

Whenever a choose_args is added to CrushWrapper (by compiling or decoding the crushmap), the buckets for which there exists at least one non empty weight_set are marked as such so that CrushWrapper::{insert_item,move_bucket,update_item,remove_item,remove_item_under} know to treat them differently.

The CrushWrapper::do_rule function takes an additional argument which is used as an index to find the
corresponding choose_args and OSDMap::_pg_to_raw_osds uses the pool number for this argument so that there can be a different choose_args per pool.

I'm moderately happy about this proposal but that's what I have so far ;-)

@liewegas
Copy link
Member

liewegas commented Apr 14, 2017 via email

@ghost
Copy link
Author

ghost commented Apr 14, 2017

Where do you want to store the choose_args/weight_set? As part of the map proper at the end of CrushWrapper::{encode,decode}?

yes

Will it be decompiled/compiled as part of the crush map (with a syntax like that below)?

yes

What bothers me is that I can't quite imagine how it will evolve beyond this minimalist proposal. But it's probably a problem for later :-)

@liewegas
Copy link
Member

liewegas commented Apr 14, 2017 via email

@ghost
Copy link
Author

ghost commented Apr 14, 2017

when adding/removing items in a bucket, each choose_args[-1-bucket_id] must also be modified to remove the corresponding item or add an item with weight zero. This ensure the items count for each choose_args[-1-bucket_id] matches the items count of the map->bucket[-1-bucket_id] in the crushmap. In other words they must be kept in sync at all times. This should not be too much of a problem since adding / removing items/buckets is done via the CrushWrapper API. Unless there is something I'm not seeing ?

@liewegas
Copy link
Member

liewegas commented Apr 14, 2017 via email

@ghost
Copy link
Author

ghost commented Apr 14, 2017

it will be effectively removed from the old location but not added to the new one (until someone recalculates the weight_set values).

Yes. From a practical point of view this is not a useful behavior. Maybe it would be better to just forbid adding/removing items/buckets via the CrushWrapper API when there are weight_sets. The user has to add/remove items/buckets by decompiling/modifying/compiling the crushmap and is responsible for maintaining the weight_set accordingly.

@tchaikov tchaikov self-requested a review April 14, 2017 14:46
@ghost
Copy link
Author

ghost commented Apr 15, 2017

The compiler part is implemented and the syntax is slightly simpler than what was originally proposed 8438556#diff-6c59ef5804a9ab71015e059395d84feeR107 . Please let me know if you see something odd ;-)

@ghost
Copy link
Author

ghost commented Apr 16, 2017

@liewegas updated the implementation with compile/decompile/encode/decode and a crushtool based test, moving on to disabling crush map updates via the CrushWrapper api when weight sets are present.

@ghost ghost changed the title DNM: crush multiweight crush: weight_set and id remapping Apr 16, 2017
@ghost ghost requested a review from liewegas April 16, 2017 15:37
@ghost
Copy link
Author

ghost commented Apr 16, 2017

@liewegas I think it is ready for review :-)

@ghost
Copy link
Author

ghost commented Apr 16, 2017

teuthology-suite -k distro --priority 1000 --suite rados --subset $(expr $RANDOM % 200)/200 --email loic@dachary.org --ceph wip-multiweight --machine-type smithi

Re-running the failed tests

They consistently fail but are unrelated to this pull request

@@ -1540,6 +1572,38 @@ void CrushWrapper::decode(bufferlist::iterator& blp)
class_rname[c.second] = c.first;
::decode(class_bucket, blp);
cleanup_classes();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a } if (!blp.end()) { here? i have the class code deployed on a test cluster.

[ 1.000 2.000 5.000 ]
[ 3.000 2.000 5.000 ]
]
ids -20 -30 -25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight_set uses [ ] for the lists.. maybe ids should too?

ids [ -20 -30 -25 ]

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit on syntax, but otherwise looks good!

ldachary and others added 7 commits April 18, 2017 09:38
It is bugous and unsupported.

Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Signed-off-by: Loic Dachary <loic@dachary.org>
If there is no crush_choose_arg_map for a given pool (the default) a
NULL pointer is given instead and crush_do_rule behavior remains
unchanged.

Signed-off-by: Loic Dachary <loic@dachary.org>
A map of crush_choose_arg_map is added to the crushmap text syntax. The
key is an integer matching a pool number.

Signed-off-by: Loic Dachary <loic@dachary.org>
Adding, removing or move items / buckets via the CrushWrapper API when
choose_args is not empty is unlikely to produce the desired outcome. The
caller should instead add, remove or move items / buckets in a
decompiled crushmap, update the associated choose_arg and upload the new
crushmap.

Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
@ghost
Copy link
Author

ghost commented Apr 18, 2017

@liewegas the ids syntax was changed here and tested here

The extra blp.end is here

Also added 57d4c8d and 18245ec fixing test leakage & valgrind complaints.

@liewegas
Copy link
Member

Looks good!

Maybe we should deprecate the tree buckets more explicitly somehow so they can be removed.. maybe a bool CrushWrapper::has_tree_buckets(), and a health warning in luminous if they're in use to warn that support will be removed post-luminous?

@ghost
Copy link
Author

ghost commented Apr 18, 2017

good idea created http://tracker.ceph.com/issues/19654 to not forget about it.

@ghost
Copy link
Author

ghost commented Apr 18, 2017

teuthology-suite -k distro --priority 1000 --suite rados --subset $(expr $RANDOM % 200)/200 --email loic@dachary.org --ceph wip-multiweight --machine-type smithi

Re-running the failed tests

@liewegas liewegas merged commit 4e8dd17 into ceph:master Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants