performance search rate: improve attrlist_find #4309

389-ds-bot · 2020-09-13T10:08:54Z

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/51256

Created at 2020-08-31 20:00:03 by tbordaz (@tbordaz)
Assigned to nobody

Issue Description

The routine is highly used. Its algo is basic, following attributes list and doing strcasecmp. This ticket is to evaluate the benefit of a hashtable that would reduce the number of strcasecmp.

Package Version and Platform

all version

Steps to reproduce

searchrate on indexed attribute. Likely the more attribute in the filter and the more matching entries the more helful would be the hashtable

Actual results

Expected results

389-ds-bot · 2020-09-13T10:08:55Z

Comment from elkris at 2020-08-31 20:50:03

Isn't a hashtable a bit overhead especially if you have entries with few attributes.
Would an approach like in valuset work ? Instead of having a linked list of attrs use a sorted (directly or indirectly) array of attr structs and do a binary search for attrlist_find. It would reduce the comparisons, reduce allocs (in slapi_attr_new) and be more cpu cache friendly

389-ds-bot · 2020-09-13T10:08:56Z

Comment from firstyear (@Firstyear) at 2020-09-01 01:44:09

Isn't a hashtable a bit overhead especially if you have entries with few attributes.

Correct. Especially if you have to rebuild the hashtable frequently too.

Would an approach like in valuset work ? Instead of having a linked list of attrs use a sorted (directly or indirectly) array of attr structs and do a binary search for attrlist_find. It would reduce the comparisons, reduce allocs (in slapi_attr_new) and be more cpu cache friendly

I think that a directly sorted version seems better here yes. Especially because attrs in a directory would tend to either single value in many cases OR a large number of values (ie memberof). So a BTreeSet would actually be perfect here as in the single value or low value case, it's effectively a sorted array. But given we don't have access to this easily in C, I think the sorted array + binary search is the best option.

389-ds-bot · 2020-09-13T10:08:56Z

Comment from firstyear (@Firstyear) at 2020-09-01 01:44:09

Metadata Update from @Firstyear:

Custom field origin adjusted to None
Custom field reviewstatus adjusted to None

389-ds-bot · 2020-09-13T10:08:57Z

Comment from tbordaz (@tbordaz) at 2020-09-01 08:59:42

@Firstyear , @elkris, Thank you so much for your ideas and feedback. I agree that sorted array looks a better idea than hash table. A first step of the ticket is to do an evaluation of the expected benefit, will do that later today

celestian · 2022-05-16T10:43:24Z

We are closing this ticket as the BZ was closed by the team after heavy investigation.
https://bugzilla.redhat.com/show_bug.cgi?id=1897617

tbordaz · 2022-05-16T10:44:49Z

This improvement is expensive and low benefit => wont fix

tbordaz added the performance Issue impacts performance label Jan 12, 2022

tbordaz closed this as completed May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance search rate: improve attrlist_find #4309

performance search rate: improve attrlist_find #4309

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

celestian commented May 16, 2022

tbordaz commented May 16, 2022

performance search rate: improve attrlist_find #4309

performance search rate: improve attrlist_find #4309

Comments

389-ds-bot commented Sep 13, 2020

Issue Description

Package Version and Platform

Steps to reproduce

Actual results

Expected results

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

celestian commented May 16, 2022

tbordaz commented May 16, 2022