Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common/hobject: compare two objects' key directly #21062

Merged
merged 1 commit into from Mar 29, 2018

Conversation

xiexingguo
Copy link
Member

The original implement does not appear to be a problem for two different
objects which are usually of different oids/names.
However, for a specific head object and its posterity snap objects, this
is not very effective since we are going to compare oids/names twice
(because they are definitely equivalent).

Signed-off-by: xie xingguo xie.xingguo@zte.com.cn

The original implement does not appear to be a problem for two different
objects which are usually of different oids/names.
However, for a specific head object and its posterity snap objects, this
is not very effective since we are going to compare oids/names twice
(because they are definitely equivalent).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
@tchaikov tchaikov merged commit bff0d38 into ceph:master Mar 29, 2018
@xiexingguo
Copy link
Member Author

Thanks @tchaikov !

@xiexingguo xiexingguo deleted the wip-hobject-cmp branch March 29, 2018 01:12
@gregsfortytwo
Copy link
Member

Help me out, here. Why is it okay to skip using the “name” component for general comparisons? Is it replicated by use of the oid? Or is this an optimization for a specific case that happens not to break our test suite but is in fact wrong?
(Note that the key is mostly not used these days, but that the name is a required portion of the object...)

@tchaikov
Copy link
Contributor

tchaikov commented Mar 30, 2018

Why is it okay to skip using the “name” component for general comparisons?

@gregsfortytwo it's not okay to do so. we are not skipping using the name for comparisons.

Is it replicated by use of the oid?

yes. before this change, there is chance that we compare the "name" for 3 times at least, because oid 's "opereator<" compares "name":

  if (l.get_effective_key() < r.get_effective_key())
    return -1;
  if (l.get_effective_key() > r.get_effective_key())
    return 1;
  if (l.oid < r.oid)
    return -1;
  if (l.oid > r.oid)
    return 1;

if "key" is empty. the keys are compared for once at least, if they are not empty.

after this change

  if (l.get_key() < r.get_key())
    return -1;
  if (l.get_key() > r.get_key())
    return 1;
  if (l.oid < r.oid)
    return -1;
  if (l.oid > r.oid)
    return 1;

we compare the "name" for only a single time at least, even if "key" is empty. and the keys are compared for are compared for once at least, if they are not empty.

Or is this an optimization for a specific case that happens not to break our test suite but is in fact wrong?

i'd say it is an optimization for a more popular case.

@gregsfortytwo
Copy link
Member

Oh, get_effective_key is looking at oid.name not some direct name member. That makes way more sense, thanks!

@liewegas
Copy link
Member

liewegas commented Apr 2, 2018

The key thing here is that we cannot change the effective sort order with this change. IIRC that is why we were using get_effective_key()... because if we have an object a(name=foo key=) and b(name=foo key=food) then b needs to sort next to a with all of the foo's, not at the end of the hash value with all of the objects with non-empty keys. I think this change breaks that...

I suspect we need to pull (some of) the effective_key logic up into this comparator if we want to avoid the duplicate comparisons. A simple option is jsut checking whether both objects have an empty key (the common case) and if so do the get_key() comparison, otherwise do the full get_effective_key one.

@tchaikov
Copy link
Contributor

tchaikov commented Apr 2, 2018

@liewegas copy that. i will pull together a fix tomorrow morning.

@xiexingguo
Copy link
Member Author

The key thing here is that we cannot change the effective sort order with this change. IIRC that is why we were using get_effective_key()... because if we have an object a(name=foo key=) and b(name=foo key=food) then b needs to sort next to a with all of the foo's, not at the end of the hash value with all of the objects with non-empty keys. I think this change breaks that...

@liewegas

Well on a second thought I agree that this change slightly change the comparison behaviour. But I don't think your example is the right case:

a(name=foo key=)
b(name=foo key=food)

Since key always takes precedence when calculating hash, object a and b should basically have different hashes(assume there is no collision, which is the normal case). So the order of a and b is still undetermined?

@liewegas
Copy link
Member

liewegas commented Apr 3, 2018

Ah, right. In practice hash is always hash(key or name), although you can actually populate the hobject_t however you like. I guess the case is

before (correct):

c(hash=hash(food), key=food, name=fooc)
d(hash=hash(food), key=, name=food)
e(hash=hash(food), key=food, name=fooe)

now (bad):

d(hash=hash(food), key=, name=food)
c(hash=hash(food), key=food, name=fooc)
e(hash=hash(food), key=food, name=fooe)

@tchaikov
Copy link
Contributor

tchaikov commented Apr 3, 2018

@xiexingguo i think the order is determined before this change.

@liewegas
Copy link
Member

liewegas commented Apr 3, 2018

Yes--we definitely need to fix this. I think simply optimizing the comparison for the key.empty() case is all we need!

@tchaikov
Copy link
Contributor

tchaikov commented Apr 3, 2018

@xiexingguo @liewegas #21217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants