Raise idlistscanlimit by default #2435

389-ds-bot · 2020-09-13T02:31:35Z

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/49376

Created at 2017-09-08 08:25:45 by firstyear (@Firstyear)
Assigned to firstyear (@Firstyear)

Issue Description

With https://pagure.io/389-ds-base/issue/49372 we try to avoid hitting large IDLs by fixing the filters.

There are some cases left we may access IDLs and in those cases, we probably want to return an IDL anyway.

One case is if we hit the idlistscanlimit in an OR query. In this case we return ALLIDS to the parent filter, which likely inturn raises this back to the parent. This means that we do a very expensive id2entry scan. Had we of return an IDL, even if the IDL had say 100,000 items, that's still better than doing a full id2entry on a db with 200,000.

Another case is broad AND queries. If we do a subtree search of objectClass=group under say ou=Groups, if this returns ALLIDS, we have just wasted a lot of time again falling back to a full db search when we could have loaded the IDL and intersected it with parentid.

I think that we should examine the current idlistscanlimit and either raise it, or consider removing it all together.

389-ds-bot · 2020-09-13T02:31:36Z

Comment from firstyear (@Firstyear) at 2017-09-11 04:07:06

Here is an example:

idlistscanlimit: 4000 (default)
[11/Sep/2017:12:03:33.524447131 +1000] conn=2 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(&(objectClass=inetOrgPerson)(objectClass=person)(objectClass=posixAccount)(uid>=test0053715))" attrs="uid"
[11/Sep/2017:12:03:54.236090743 +1000] conn=2 op=1 RESULT err=0 tag=101 nentries=46286 etime=20.1288196523 notes=A

idlistscanlimit: 99999999999
[11/Sep/2017:12:01:34.797114055 +1000] conn=4 op=1 SRCH base="dc=example,dc=com" scope=2 filter="(&(objectClass=inetOrgPerson)(objectClass=person)(objectClass=posixAccount)(uid>=test0053715))" attrs="uid"
[11/Sep/2017:12:01:45.866876775 +1000] conn=4 op=1 RESULT err=0 tag=101 nentries=46286 etime=11.0069939325

So despite being over 40,000 entries, using the indexes is still faster.

In fact, it looks like the idlistscan limit is an attempt to optimise the "(&(uid=x)(objectClass=y))" case to improve this search, but the recent changes to filter_optimise in #2431 solve this much more completely.

I think we should remove the idlistscanlimit, as no other database (IE postgresql) has such an odd concept of bypassing indexes. We can gain more from improve filter optimise than trying to hand tune these numbers for specific cases,

389-ds-bot · 2020-09-13T02:31:37Z

Comment from firstyear (@Firstyear) at 2017-09-11 04:07:07

Metadata Update from @Firstyear:

Custom field component adjusted to None
Custom field origin adjusted to None
Custom field reviewstatus adjusted to None
Custom field type adjusted to None
Custom field version adjusted to None

389-ds-bot · 2020-09-13T02:31:38Z

Comment from firstyear (@Firstyear) at 2017-09-11 04:24:28

Following this, this would actually be a case to raise the filter test threshold, because this would allow better shortcutting of genuinely small sets that want to avoid large indexes. IE if we set test thrushold to say 32, but then if the set is larger than this, it's probably better to actually use the index. We could do some testing of the numbers here to be sure, but 32 - 64 could be good guesses.

389-ds-bot · 2020-09-13T02:31:39Z

Comment from firstyear (@Firstyear) at 2017-09-11 04:46:26

Another point is that to maek filter optimisation easier, perhaps we should change their structure from linked list to something easier to sort / re-arrange?

389-ds-bot · 2020-09-13T02:31:39Z

Comment from firstyear (@Firstyear) at 2017-09-12 01:55:51

#811

389-ds-bot · 2020-09-13T02:31:40Z

Comment from firstyear (@Firstyear) at 2017-09-12 01:57:20

Metadata Update from @Firstyear:

Issue assigned to Firstyear

389-ds-bot · 2020-09-13T02:31:41Z

Comment from mreynolds (@mreynolds389) at 2017-09-14 17:35:25

Metadata Update from @mreynolds389:

Issue set to the milestone: 1.4 backlog

389-ds-bot · 2020-09-13T02:31:42Z

Comment from firstyear (@Firstyear) at 2017-10-03 06:18:10

Relies on #2431

389-ds-bot · 2020-09-13T02:31:43Z

Comment from firstyear (@Firstyear) at 2017-10-03 06:18:12

Metadata Update from @Firstyear:

Custom field reviewstatus adjusted to review (was: None)

389-ds-bot · 2020-09-13T02:31:43Z

Comment from lkrispen (@elkris) at 2017-10-04 11:02:22

I don't think we should change the default.

There are some scenarios wher it might be better to always return an idl. It is always a decision what is more costly: to handle and manage a very large idlist (it also has a lot of database accesses) or to handle all entries (which might be already in the entry cache).
The scanlimit is configurable and GSS knows to suggest proper settings.
If we change index lookup and idl handling (in other tickets) we need to document the effects it might have an the use of scan limits.
If we are really confident that always returning and idl would be the right thing, then we should remove the scan limit at all.

In general I think for changes like this we should have a set of reliable performance benchmarks first.

389-ds-bot · 2020-09-13T02:31:44Z

Comment from firstyear (@Firstyear) at 2017-10-05 00:56:47

@elkris I have been performing a number of tests with the filter optimisation patch enabled, and in all cases the idlistscanlimit of 999999999 exceeded the performance of the current default. I will prepare a set of numbers for you.

As much as I love and respect GSS, they do NOT know how to suggest a proper setting here. This is a deeply confusing setting that relies on intimate knowledge of our internal search procedure that most users don't have.

I agree that we should remove the scan limit completely, but I would like to do this first to give us a clean backout strategy if it does fail on us,

389-ds-bot · 2020-09-13T02:31:45Z

Comment from lkrispen (@elkris) at 2017-11-17 09:39:17

waiting for the set of numbers

389-ds-bot · 2020-09-13T02:31:46Z

Comment from firstyear (@Firstyear) at 2017-11-20 10:31:21

Yep! I need to make the numbers for you! I've been quite busy as you might know, and I want to work on this with @vashirov to make our performance test framework? If you are going to do something, may as well do it correctly.

mreynolds389 · 2022-02-09T21:57:44Z

I don't have any official numbers but I also have not seen a performance hit when having a very large value for idscanlimit. I do recall with DSEE that it was worse, but in 389 I did not see same behavior. I'm not sure about LMDB, but increasing this with bdb has more pro's than con's

Firstyear · 2022-02-09T23:49:10Z

@mreynolds389 This change is predicated on query optimisation existing, and I ran into waaayyyy too many issues with VLV to get it to "stick" in the past.

Firstyear · 2023-02-07T01:10:01Z

@mreynolds389 @progier389 Can we review this again? We have the optimiser code now, and the IDL scan limit does nothing for us but actively harm performance of large queries. We should increase this value, but better yet, we should remove it entirely.

mreynolds389 · 2023-02-07T02:37:27Z

@mreynolds389 @progier389 Can we review this again? We have the optimiser code now, and the IDL scan limit does nothing for us but actively harm performance of large queries. We should increase this value, but better yet, we should remove it entirely.

I would love to drop it entirely, but I know others on the team are not so sure. Need to prove that it won't hurt anything...

progier389 · 2023-02-07T14:24:26Z

My gut feeling is that @Firstyear is right and that that limit is obsolete (probably since the idl_new index format was added), but we have an easy way to mitigate the risks by doing the change in two phases:

Let set the default value to INT_MAX but keep the current code for now.
In practice it disable the limit but allow easy reversal in case of trouble.
And if nobody complains about it, we can always remove the code a few years later.

Firstyear · 2023-02-07T22:49:33Z

Sounds like a reasonable plan to me :)

@progier389 It's not because of the idl_new format, but the query optimiser.

summary is if we have "(&(objectClass=person)(uid=william))" then we want to hit the "filter test threshold" as fast as possible to minimise work.

The idlscanlimit does this by loading the idl for objectClass=person then discarding if it is too long, then "hoping" what ever is next is going to be more precise.

The optimiser does this by swapping the terms to "(&(uid=william)(objectClass=person))" meaning the idl for uid=william is loaded, is seen to be length 1, and the shortcut out is taken and filter test applied.

The idlscanlimit has a single pathological bad case, which is where you are scanning large sets for example, "(&(objectClass=person)(country=australia))". In the idlscanlimit case, if both idls are > scanlimit, this becomes fully unindexed and we filter test the whole DB. By raising the idlscanlimit we avoid this by still ending up loading two large idls, but that is a fraction of cost compared to a full table scan. Since the optimiser now does the work to ensure queries are in "shortcut likely" arrangements, the idlscanlimit now just causes a degradation, not an improvement.

Bug Description: The IDL scan limit existed as a naive attempt at query optimisation to force longer IDLs to be skipped in favour of short IDLs. However, since we now have a true query optimiser that can handle this correctl, the IDL scan limit is redundant and not functional. The only condition the IDL scan limit can now impact is making queries slower by rejecting the use of longer IDLs when shortcut conditions are not possible. Fix Description: Raise the IDL Scan Limit to INT_MAX to observe the results. In the future we can remove the code entirely. fixes: 389ds#2435 Author: William Brown <william@blackhats.net.au> Review by: ???

@progier389

Bug Description: The IDL scan limit existed as a naive attempt at query optimisation to force longer IDLs to be skipped in favour of short IDLs. However, since we now have a true query optimiser that can handle this correctl, the IDL scan limit is redundant and not functional. The only condition the IDL scan limit can now impact is making queries slower by rejecting the use of longer IDLs when shortcut conditions are not possible. Fix Description: Raise the IDL Scan Limit to INT_MAX to observe the results. In the future we can remove the code entirely. fixes: #2435 Author: William Brown <william@blackhats.net.au> Review by: @progier389 @mreynolds389

@progier389

Bug Description: The IDL scan limit existed as a naive attempt at query optimisation to force longer IDLs to be skipped in favour of short IDLs. However, since we now have a true query optimiser that can handle this correctl, the IDL scan limit is redundant and not functional. The only condition the IDL scan limit can now impact is making queries slower by rejecting the use of longer IDLs when shortcut conditions are not possible. Fix Description: Raise the IDL Scan Limit to INT_MAX to observe the results. In the future we can remove the code entirely. fixes: #2435 Author: William Brown <william@blackhats.net.au> Review by: @progier389 @mreynolds389

@progier389

Bug Description: The IDL scan limit existed as a naive attempt at query optimisation to force longer IDLs to be skipped in favour of short IDLs. However, since we now have a true query optimiser that can handle this correctl, the IDL scan limit is redundant and not functional. The only condition the IDL scan limit can now impact is making queries slower by rejecting the use of longer IDLs when shortcut conditions are not possible. Fix Description: Raise the IDL Scan Limit to INT_MAX to observe the results. In the future we can remove the code entirely. fixes: #2435 Author: William Brown <william@blackhats.net.au> Review by: @progier389 @mreynolds389

@progier389

Bug Description: The IDL scan limit existed as a naive attempt at query optimisation to force longer IDLs to be skipped in favour of short IDLs. However, since we now have a true query optimiser that can handle this correctl, the IDL scan limit is redundant and not functional. The only condition the IDL scan limit can now impact is making queries slower by rejecting the use of longer IDLs when shortcut conditions are not possible. Fix Description: Raise the IDL Scan Limit to INT_MAX to observe the results. In the future we can remove the code entirely. fixes: #2435 Author: William Brown <william@blackhats.net.au> Review by: @progier389 @mreynolds389

@progier389

Bug Description: The IDL scan limit existed as a naive attempt at query optimisation to force longer IDLs to be skipped in favour of short IDLs. However, since we now have a true query optimiser that can handle this correctl, the IDL scan limit is redundant and not functional. The only condition the IDL scan limit can now impact is making queries slower by rejecting the use of longer IDLs when shortcut conditions are not possible. Fix Description: Raise the IDL Scan Limit to INT_MAX to observe the results. In the future we can remove the code entirely. fixes: #2435 Author: William Brown <william@blackhats.net.au> Review by: @progier389 @mreynolds389

389-ds-bot added this to the 1.4 backlog milestone Sep 13, 2020

mreynolds389 added the priority_high need urgent fix / highly valuable / easy to fix label Feb 9, 2022

mreynolds389 modified the milestones: 1.4 backlog, 2.x backlog Feb 9, 2022

Firstyear mentioned this issue Feb 8, 2023

Issue 2435 - RFE - Raise IDL Scan Limit to INT_MAX #5639

Merged

Firstyear closed this as completed in #5639 Feb 9, 2023

Firstyear mentioned this issue Aug 23, 2024

Constant RAM consumption #6249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise idlistscanlimit by default #2435

Raise idlistscanlimit by default #2435

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020 •

edited

Loading

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020 •

edited

Loading

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020 •

edited

Loading

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

mreynolds389 commented Feb 9, 2022

Firstyear commented Feb 9, 2022

Firstyear commented Feb 7, 2023

mreynolds389 commented Feb 7, 2023

progier389 commented Feb 7, 2023

Firstyear commented Feb 7, 2023

Raise idlistscanlimit by default #2435

Raise idlistscanlimit by default #2435

Comments

389-ds-bot commented Sep 13, 2020

Issue Description

389-ds-bot commented Sep 13, 2020 • edited Loading

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020 • edited Loading

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020 • edited Loading

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

mreynolds389 commented Feb 9, 2022

Firstyear commented Feb 9, 2022

Firstyear commented Feb 7, 2023

mreynolds389 commented Feb 7, 2023

progier389 commented Feb 7, 2023

Firstyear commented Feb 7, 2023

389-ds-bot commented Sep 13, 2020 •

edited

Loading

389-ds-bot commented Sep 13, 2020 •

edited

Loading

389-ds-bot commented Sep 13, 2020 •

edited

Loading