New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crush: weight_set and id remapping #14486

Merged
merged 7 commits into from Apr 19, 2017

Conversation

Projects
None yet
2 participants
@ghost

ghost commented Apr 12, 2017

No description provided.

@ghost ghost added feature core labels Apr 12, 2017

@ghost

This comment has been minimized.

ghost commented Apr 14, 2017

@liewegas here is a proposal for a minimalist integration of the weight_set in Ceph.

For buckets where the weight_set is not empty, the functions CrushWrapper::{insert_item,move_bucket,update_item,remove_item,remove_item_under} set the weight set and the target weight of new items to zero, and they do not adjust the weights of their parents. This allows the user to maintain the weights via the crushmap using the "choose_args" keyword::

// weight_set for pool 0
choose_args 0 {
  {
    bucket_id -5
    weight_set {
       position 0 weights 0x10000 0x20000 0x50000
       position 1 weights 0x30000 0x20000 0x50000
    }
    ids -20 -30 -25
  }

  {
    bucket_id -20
    weight_set {
       position 0 weights 0x10000 0x20000
       position 1 weights 0x30000 0x20000
    }
    ids -45 -23
  }
}

Whenever a choose_args is added to CrushWrapper (by compiling or decoding the crushmap), the buckets for which there exists at least one non empty weight_set are marked as such so that CrushWrapper::{insert_item,move_bucket,update_item,remove_item,remove_item_under} know to treat them differently.

The CrushWrapper::do_rule function takes an additional argument which is used as an index to find the
corresponding choose_args and OSDMap::_pg_to_raw_osds uses the pool number for this argument so that there can be a different choose_args per pool.

I'm moderately happy about this proposal but that's what I have so far ;-)

@liewegas

This comment has been minimized.

Member

liewegas commented Apr 14, 2017

@ghost

This comment has been minimized.

ghost commented Apr 14, 2017

Where do you want to store the choose_args/weight_set? As part of the map proper at the end of CrushWrapper::{encode,decode}?

yes

Will it be decompiled/compiled as part of the crush map (with a syntax like that below)?

yes

What bothers me is that I can't quite imagine how it will evolve beyond this minimalist proposal. But it's probably a problem for later :-)

@liewegas

This comment has been minimized.

Member

liewegas commented Apr 14, 2017

@ghost

This comment has been minimized.

ghost commented Apr 14, 2017

when adding/removing items in a bucket, each choose_args[-1-bucket_id] must also be modified to remove the corresponding item or add an item with weight zero. This ensure the items count for each choose_args[-1-bucket_id] matches the items count of the map->bucket[-1-bucket_id] in the crushmap. In other words they must be kept in sync at all times. This should not be too much of a problem since adding / removing items/buckets is done via the CrushWrapper API. Unless there is something I'm not seeing ?

@liewegas

This comment has been minimized.

Member

liewegas commented Apr 14, 2017

@ghost

This comment has been minimized.

ghost commented Apr 14, 2017

it will be effectively removed from the old location but not added to the new one (until someone recalculates the weight_set values).

Yes. From a practical point of view this is not a useful behavior. Maybe it would be better to just forbid adding/removing items/buckets via the CrushWrapper API when there are weight_sets. The user has to add/remove items/buckets by decompiling/modifying/compiling the crushmap and is responsible for maintaining the weight_set accordingly.

@tchaikov tchaikov self-requested a review Apr 14, 2017

@ghost

This comment has been minimized.

ghost commented Apr 15, 2017

The compiler part is implemented and the syntax is slightly simpler than what was originally proposed 8438556#diff-6c59ef5804a9ab71015e059395d84feeR107 . Please let me know if you see something odd ;-)

@ghost

This comment has been minimized.

ghost commented Apr 16, 2017

@liewegas updated the implementation with compile/decompile/encode/decode and a crushtool based test, moving on to disabling crush map updates via the CrushWrapper api when weight sets are present.

@ghost ghost changed the title from DNM: crush multiweight to crush: weight_set and id remapping Apr 16, 2017

@ghost ghost requested a review from liewegas Apr 16, 2017

@ghost

This comment has been minimized.

ghost commented Apr 16, 2017

@liewegas I think it is ready for review :-)

@ghost

This comment has been minimized.

ghost commented Apr 16, 2017

teuthology-suite -k distro --priority 1000 --suite rados --subset $(expr $RANDOM % 200)/200 --email loic@dachary.org --ceph wip-multiweight --machine-type smithi

Re-running the failed tests

They consistently fail but are unrelated to this pull request

@@ -1540,6 +1572,38 @@ void CrushWrapper::decode(bufferlist::iterator& blp)
class_rname[c.second] = c.first;
::decode(class_bucket, blp);
cleanup_classes();

This comment has been minimized.

@liewegas

liewegas Apr 17, 2017

Member

can you add a } if (!blp.end()) { here? i have the class code deployed on a test cluster.

[ 1.000 2.000 5.000 ]
[ 3.000 2.000 5.000 ]
]
ids -20 -30 -25

This comment has been minimized.

@liewegas

liewegas Apr 17, 2017

Member

weight_set uses [ ] for the lists.. maybe ids should too?

ids [ -20 -30 -25 ]

@liewegas

Minor nit on syntax, but otherwise looks good!

ldachary and others added some commits Apr 18, 2017

crush: do not use TREE in tests
It is bugous and unsupported.

Signed-off-by: Loic Dachary <loic@dachary.org>
crush: cleanup test memory leaks
Signed-off-by: Loic Dachary <loic@dachary.org>
crush: implement weight and id overrides for straw2
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Signed-off-by: Loic Dachary <loic@dachary.org>
crush: add per pool choose_args when calling do_rule
If there is no crush_choose_arg_map for a given pool (the default) a
NULL pointer is given instead and crush_do_rule behavior remains
unchanged.

Signed-off-by: Loic Dachary <loic@dachary.org>
crush: compile/decompile crush_choose_arg_map
A map of crush_choose_arg_map is added to the crushmap text syntax. The
key is an integer matching a pool number.

Signed-off-by: Loic Dachary <loic@dachary.org>
crush: disable modification API when choose_args is not empty
Adding, removing or move items / buckets via the CrushWrapper API when
choose_args is not empty is unlikely to produce the desired outcome. The
caller should instead add, remove or move items / buckets in a
decompiled crushmap, update the associated choose_arg and upload the new
crushmap.

Signed-off-by: Loic Dachary <loic@dachary.org>
crush: implement CrushWrapper::dump(choose_args)
Signed-off-by: Loic Dachary <loic@dachary.org>
@ghost

This comment has been minimized.

ghost commented Apr 18, 2017

@liewegas the ids syntax was changed here and tested here

The extra blp.end is here

Also added 57d4c8d and 18245ec fixing test leakage & valgrind complaints.

@liewegas

This comment has been minimized.

Member

liewegas commented Apr 18, 2017

Looks good!

Maybe we should deprecate the tree buckets more explicitly somehow so they can be removed.. maybe a bool CrushWrapper::has_tree_buckets(), and a health warning in luminous if they're in use to warn that support will be removed post-luminous?

@liewegas liewegas added the needs-qa label Apr 18, 2017

@ghost

This comment has been minimized.

ghost commented Apr 18, 2017

good idea created http://tracker.ceph.com/issues/19654 to not forget about it.

@ghost

This comment has been minimized.

ghost commented Apr 18, 2017

teuthology-suite -k distro --priority 1000 --suite rados --subset $(expr $RANDOM % 200)/200 --email loic@dachary.org --ceph wip-multiweight --machine-type smithi

Re-running the failed tests

@liewegas liewegas merged commit 4e8dd17 into ceph:master Apr 19, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment