Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance search rate: improve attrlist_find #4309

Closed
389-ds-bot opened this issue Sep 13, 2020 · 6 comments
Closed

performance search rate: improve attrlist_find #4309

389-ds-bot opened this issue Sep 13, 2020 · 6 comments
Labels
performance Issue impacts performance

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/51256


Issue Description

The routine is highly used. Its algo is basic, following attributes list and doing strcasecmp. This ticket is to evaluate the benefit of a hashtable that would reduce the number of strcasecmp.

Package Version and Platform

all version

Steps to reproduce

searchrate on indexed attribute. Likely the more attribute in the filter and the more matching entries the more helful would be the hashtable

Actual results

Expected results

@389-ds-bot
Copy link
Author

Comment from elkris at 2020-08-31 20:50:03

Isn't a hashtable a bit overhead especially if you have entries with few attributes.
Would an approach like in valuset work ? Instead of having a linked list of attrs use a sorted (directly or indirectly) array of attr structs and do a binary search for attrlist_find. It would reduce the comparisons, reduce allocs (in slapi_attr_new) and be more cpu cache friendly

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2020-09-01 01:44:09

Isn't a hashtable a bit overhead especially if you have entries with few attributes.

Correct. Especially if you have to rebuild the hashtable frequently too.

Would an approach like in valuset work ? Instead of having a linked list of attrs use a sorted (directly or indirectly) array of attr structs and do a binary search for attrlist_find. It would reduce the comparisons, reduce allocs (in slapi_attr_new) and be more cpu cache friendly

I think that a directly sorted version seems better here yes. Especially because attrs in a directory would tend to either single value in many cases OR a large number of values (ie memberof). So a BTreeSet would actually be perfect here as in the single value or low value case, it's effectively a sorted array. But given we don't have access to this easily in C, I think the sorted array + binary search is the best option.

@389-ds-bot
Copy link
Author

Comment from firstyear (@Firstyear) at 2020-09-01 01:44:09

Metadata Update from @Firstyear:

  • Custom field origin adjusted to None
  • Custom field reviewstatus adjusted to None

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2020-09-01 08:59:42

@Firstyear , @elkris, Thank you so much for your ideas and feedback. I agree that sorted array looks a better idea than hash table. A first step of the ticket is to do an evaluation of the expected benefit, will do that later today

@tbordaz tbordaz added the performance Issue impacts performance label Jan 12, 2022
@celestian
Copy link

We are closing this ticket as the BZ was closed by the team after heavy investigation.
https://bugzilla.redhat.com/show_bug.cgi?id=1897617

@tbordaz
Copy link
Contributor

tbordaz commented May 16, 2022

This improvement is expensive and low benefit => wont fix

@tbordaz tbordaz closed this as completed May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issue impacts performance
Projects
None yet
Development

No branches or pull requests

3 participants