Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter optimiser #5170

Closed
Firstyear opened this issue Feb 22, 2022 · 47 comments · Fixed by #5285, #5301, #5315 or #5604
Closed

Filter optimiser #5170

Firstyear opened this issue Feb 22, 2022 · 47 comments · Fixed by #5285, #5301, #5315 or #5604
Labels
performance Issue impacts performance RFE Request for Enhancement
Milestone

Comments

@Firstyear
Copy link
Contributor

LDAP queries are evaluated by our backend in the order of provided terms, which can cause surprising performance issues. These have tried to be mitigated with tools like the idl scan limit, but a proper query optimiser will help us with non optimal searches.

@Firstyear Firstyear added the needs triage The issue will be triaged during scrum label Feb 22, 2022
@mreynolds389
Copy link
Contributor

PR: #5171

@mreynolds389 mreynolds389 added performance Issue impacts performance RFE Request for Enhancement labels Feb 23, 2022
@mreynolds389 mreynolds389 added this to the 2.0.0 milestone Feb 23, 2022
@mreynolds389 mreynolds389 removed the needs triage The issue will be triaged during scrum label Feb 23, 2022
Firstyear added a commit to Firstyear/389-ds-base that referenced this issue May 5, 2022
Bug Description: This introduces a query optimiser to 389-ds.
A query optimiser helps our backend process queries in a manner
that is more efficient by reducing IO events, and returning
the correct results quicker. This is needed as most applications
are not aware of the internals of how we execute a query internally
 - nor should they need to know. As an example, queries commonly
produced by SSSD are suboptimial and cause high latency and IO, but
this optimiser can resolve the majority of these issues which
generally improves all server throughput.

Fix Description: The optimiser works by attempting to produce
"the smallest candidate set" first. Having a smaller candidate
set earlier means that we can be below the filter test threshold
which shortcuts and can evaluate the remaining filter assertions
on the partial candidate set, rather than loading more expensive
and larger indexes. This also allows us to "fail fast" by
promoting "unlikely" candidates earlier which may end up
having a zero-length so that again, we don't load larger indexes.

A number of other changes were made and resolved in this change:

* Issue: The onelevel filter (not related to the
    optimiser ...) works be injecting a parentid field check
    to the filter that is used in the search. If the user
    does not have access to read this field, then one level
    searches fail

* Fix: Split the filter by "intent" and "as
    executed". We can filter test on as executed without check
    of the ACI, but we also check the filter as executed to
    ensure that only valid entries matching the proper semantics
    are returned

* Fix: Duplicate the filter for optimisation instead
    and free it in the search path, leaving the pblock filter as
    the "query as intended".

* Issue: In some cases, the filter test shortcut was
    applied, but with a specially crafted query, this could trigger
    the return prematurely giving incorrect results as the filter
    test was NOT bubbled up correctly.

* Fix: Flag in the sr that the filter test MUST be
    applied in certain shortcut cases.

fixes: 389ds#5170

Author: William Brown <william@blackhats.net.au>

Review by: @tbordaz, @mreynolds389
@mreynolds389
Copy link
Contributor

@Firstyear can you cherry pick this down to 2.0?

@mreynolds389 mreynolds389 reopened this May 5, 2022
@Firstyear
Copy link
Contributor Author

Sure, if you'd like :)

Firstyear added a commit that referenced this issue May 5, 2022
Bug Description: This introduces a query optimiser to 389-ds.
A query optimiser helps our backend process queries in a manner
that is more efficient by reducing IO events, and returning
the correct results quicker. This is needed as most applications
are not aware of the internals of how we execute a query internally
 - nor should they need to know. As an example, queries commonly
produced by SSSD are suboptimial and cause high latency and IO, but
this optimiser can resolve the majority of these issues which
generally improves all server throughput.

Fix Description: The optimiser works by attempting to produce
"the smallest candidate set" first. Having a smaller candidate
set earlier means that we can be below the filter test threshold
which shortcuts and can evaluate the remaining filter assertions
on the partial candidate set, rather than loading more expensive
and larger indexes. This also allows us to "fail fast" by
promoting "unlikely" candidates earlier which may end up
having a zero-length so that again, we don't load larger indexes.

A number of other changes were made and resolved in this change:

* Issue: The onelevel filter (not related to the
    optimiser ...) works be injecting a parentid field check
    to the filter that is used in the search. If the user
    does not have access to read this field, then one level
    searches fail

* Fix: Split the filter by "intent" and "as
    executed". We can filter test on as executed without check
    of the ACI, but we also check the filter as executed to
    ensure that only valid entries matching the proper semantics
    are returned

* Fix: Duplicate the filter for optimisation instead
    and free it in the search path, leaving the pblock filter as
    the "query as intended".

* Issue: In some cases, the filter test shortcut was
    applied, but with a specially crafted query, this could trigger
    the return prematurely giving incorrect results as the filter
    test was NOT bubbled up correctly.

* Fix: Flag in the sr that the filter test MUST be
    applied in certain shortcut cases.

fixes: #5170

Author: William Brown <william@blackhats.net.au>

Review by: @tbordaz, @mreynolds389
Firstyear added a commit that referenced this issue May 5, 2022
Bug Description: This introduces a query optimiser to 389-ds.
A query optimiser helps our backend process queries in a manner
that is more efficient by reducing IO events, and returning
the correct results quicker. This is needed as most applications
are not aware of the internals of how we execute a query internally
 - nor should they need to know. As an example, queries commonly
produced by SSSD are suboptimial and cause high latency and IO, but
this optimiser can resolve the majority of these issues which
generally improves all server throughput.

Fix Description: The optimiser works by attempting to produce
"the smallest candidate set" first. Having a smaller candidate
set earlier means that we can be below the filter test threshold
which shortcuts and can evaluate the remaining filter assertions
on the partial candidate set, rather than loading more expensive
and larger indexes. This also allows us to "fail fast" by
promoting "unlikely" candidates earlier which may end up
having a zero-length so that again, we don't load larger indexes.

A number of other changes were made and resolved in this change:

* Issue: The onelevel filter (not related to the
    optimiser ...) works be injecting a parentid field check
    to the filter that is used in the search. If the user
    does not have access to read this field, then one level
    searches fail

* Fix: Split the filter by "intent" and "as
    executed". We can filter test on as executed without check
    of the ACI, but we also check the filter as executed to
    ensure that only valid entries matching the proper semantics
    are returned

* Fix: Duplicate the filter for optimisation instead
    and free it in the search path, leaving the pblock filter as
    the "query as intended".

* Issue: In some cases, the filter test shortcut was
    applied, but with a specially crafted query, this could trigger
    the return prematurely giving incorrect results as the filter
    test was NOT bubbled up correctly.

* Fix: Flag in the sr that the filter test MUST be
    applied in certain shortcut cases.

fixes: #5170

Author: William Brown <william@blackhats.net.au>

Review by: @tbordaz, @mreynolds389
@Firstyear
Copy link
Contributor Author

   31f6c444a..b6d2fa4e4  389-ds-base-2.1 -> 389-ds-base-2.1
   0a1dc2a86..a444d3454  389-ds-base-2.0 -> 389-ds-base-2.0

@Firstyear
Copy link
Contributor Author

It'll be worth testing after that cherry-pick on the two branches just to be sure - there was a conflict in ldbm_search.c that I was careful in how I resolved it, but it'd be good for someone else to double check.

@mreynolds389
Copy link
Contributor

mreynolds389 commented May 5, 2022

We have a regression, and it was picked up in the github CI but it was overlooked:

py.test dirsrvtests/tests/suites/filter/complex_filters_test.py

This patch now returns ldapsubentries when it should not:

dn: cn=389_ds_system,dc=example,dc=com
objectClass: top
objectClass: nscontainer
objectClass: ldapsubentry
cn: 389_ds_system

@mreynolds389 mreynolds389 reopened this May 5, 2022
@Firstyear
Copy link
Contributor Author

I did say there was a conflict and it was about subentry code, but I can't look today because I have a security issue to deal with :(

@mreynolds389
Copy link
Contributor

I did say there was a conflict and it was about subentry code, but I can't look today

This is on master branch, so the cherry pick issues you had are unrelated. If don't mind looking into it next week that would be great.

because I have a security issue to deal with :(

And are you dealing with a SQL injection attack? Fun fun...

@Firstyear
Copy link
Contributor Author

I did say there was a conflict and it was about subentry code, but I can't look today

This is on master branch, so the cherry pick issues you had are unrelated. If don't mind looking into it next week that would be great.

Yeah, I can look next week.

because I have a security issue to deal with :(

And are you dealing with a SQL injection attack? Fun fun...

That would be the one :(

@Firstyear
Copy link
Contributor Author

Firstyear commented May 30, 2022

I agree. With a OR, I see no reason to reject the full filter if a component is not searchable. I think we can not guess the value of an unsearchable attribute, like we can do with a AND.
So IMHO there is no need to have a compatibility toggle, we should revert the old behavior.

This is already fixed in the PR #5315

It's worth noting this corrupting behaviour already existed in the server before the optimiser. Reverting to the "old behaviour" actually means "do nothing" since it was already broken (although in a subtly different query execution path).

So saying "we should revert to the old behaviour" is meaningless, since it was already broken. It turns out our query backend had wildly different behaviours and was broken in many many many ways that are all horrible - including this access control bypass.

IMO we merge #5315 which leaves the optimiser in place, fixes the OR behaviour, and already has the CVE preventions.

Otherwise, there is #5316 which reverts the optimiser, fixes the OR behaviour, and disables shortcut queries due to the CVE. Note this may negatively affect server performance.

@Firstyear
Copy link
Contributor Author

Firstyear commented May 31, 2022

Just to summarise, since I think there is lots of noise.

Issue - OR cases may reject a valid query when a non-matching candidate was denied access.
existed before filter optimiser - YES
introduced by filter optimiser - NO
security issue - NO
fix provided - YES - #5315 OR #5316

Issue - OR cases may disclose sensitive data including userPasswords, private keys, kerberos master key etc.
existed before filter optimiser - YES
introduced by filter optimiser - NO
security issue - YES - HIGH
fix provided - YES - already merged as part of filter optimiser - alternate for older code streams is #5316
NOTE - this is the behaviour change that everyone is worried about, it MUST occur since it's a security issue.

Issue - Referrals are not returned when user doesn't have ACI access to them
existed before filter optimiser - YES
introduced by filter optimiser - NO
security issue - NO
fix provided - NO - due to the design of 389-ds ACI's it is IMPOSSIBLE to fix this, and the behaviour stays the same.

None of the issues you are seeing are being introduced by the filter optimiser, it's highlighting that they always and already existed because it's fixed the way we were executing queries especially in shortcut cases.

@Firstyear
Copy link
Contributor Author

For everyone, the access control bypass has been assigned a CVE.

CVE-2022-1949
CVSSv3: 7.4/CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:N
Redhat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2091781

Remember, the filter optimiser DOES fix the bypass. Codestreams without the optimiser will need the change per #5316

@mreynolds389
Copy link
Contributor

I'm sorry I'm not sure what is going on now, but I can not reproduce the CVE anymore :-( Old code, new code, doesn't matter I can no longer bypass aci's. Last week I did reproduce it with my CI script, but now that same script is not working. Not sure what to say at this point - I'm stumped why it's not working now. I've rebuild the server 10 times, etc, but I can no longer bypass the ACI. So maybe we don't have a CVE? sigh

@Firstyear
Copy link
Contributor Author

I'm sorry I'm not sure what is going on now, but I can not reproduce the CVE anymore :-( Old code, new code, doesn't matter I can no longer bypass aci's. Last week I did reproduce it with my CI script, but now that same script is not working. Not sure what to say at this point - I'm stumped why it's not working now. I've rebuild the server 10 times, etc, but I can no longer bypass the ACI. So maybe we don't have a CVE? sigh

FreeIPA was only working because of an access bypass. That's a CVE. You reproduced it before, but not now. Could be as simple as a change in what is or isn't indexed, or a shortcut case, or something else. There is strong evidence that the aci check was NOT enforced else how did FreeIPA ever work?

@mreynolds389
Copy link
Contributor

I'm sorry I'm not sure what is going on now, but I can not reproduce the CVE anymore :-( Old code, new code, doesn't matter I can no longer bypass aci's. Last week I did reproduce it with my CI script, but now that same script is not working. Not sure what to say at this point - I'm stumped why it's not working now. I've rebuild the server 10 times, etc, but I can no longer bypass the ACI. So maybe we don't have a CVE? sigh

FreeIPA was only working because of an access bypass.

That's not entirely accurate, there was another ACI that was correctly allowing read access, but a different ACI was incorrectly rejecting the Search with your optimizer code. So there was never a CVE/aci bypass in the IPA case.

So IPA had this aci which was allowing access:

aci: (targetfilter="(&(objectclass=nsContainer)(!(objectclass=krbPwdPolicy)))"
)(target!="ldap:///cn=masters,cn=ipa,cn=etc,dc=ipa,dc=test")(targetattr="obje
ctclass || cn")(version 3.0; acl "Anonymous read access to containers"; allow
(read, search, compare) userdn = "ldap:///anyone";)

But another ACI for the "admin group" was incorrectly overriding this ACI with the optimizer.

@mreynolds389
Copy link
Contributor

I'm sorry I'm not sure what is going on now, but I can not reproduce the CVE anymore :-( Old code, new code, doesn't matter I can no longer bypass aci's. Last week I did reproduce it with my CI script, but now that same script is not working. Not sure what to say at this point - I'm stumped why it's not working now. I've rebuild the server 10 times, etc, but I can no longer bypass the ACI. So maybe we don't have a CVE? sigh

FreeIPA was only working because of an access bypass.

That's not entirely accurate, there was another ACI that was correctly allowing read access, but a different ACI was incorrectly rejecting the Search with your optimizer code. So there was never a CVE/aci bypass in the IPA case.

Anyway all that being said I DID reproduce it last week with my own testcase (no IPA required), so it is real haha. I am just stumped why it's not working now. I'll put some more time into trying to reproduce it today

Firstyear added a commit to Firstyear/389-ds-base that referenced this issue Jun 3, 2022
Bug Description: In the filter test during access only
checks, OR conditions were not correctly evaluated. They
would have their access checked, but it was not confirmed
if this was the element that the entry matched. This mean
that queries could incorrectly reduce entries.

Fix Description: Remove the access check only mode from being
using in filter tests since it is broken, and requires the full
filter test to be evaluated along with it to work in complex cases.

fixes: 389ds#5170

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389
Firstyear added a commit that referenced this issue Jun 3, 2022
Bug Description: In the filter test during access only
checks, OR conditions were not correctly evaluated. They
would have their access checked, but it was not confirmed
if this was the element that the entry matched. This mean
that queries could incorrectly reduce entries.

Fix Description: Remove the access check only mode from being
using in filter tests since it is broken, and requires the full
filter test to be evaluated along with it to work in complex cases.

fixes: #5170

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389
Firstyear added a commit that referenced this issue Jun 3, 2022
Bug Description: In the filter test during access only
checks, OR conditions were not correctly evaluated. They
would have their access checked, but it was not confirmed
if this was the element that the entry matched. This mean
that queries could incorrectly reduce entries.

Fix Description: Remove the access check only mode from being
using in filter tests since it is broken, and requires the full
filter test to be evaluated along with it to work in complex cases.

fixes: #5170

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389
Firstyear added a commit that referenced this issue Jun 3, 2022
Bug Description: In the filter test during access only
checks, OR conditions were not correctly evaluated. They
would have their access checked, but it was not confirmed
if this was the element that the entry matched. This mean
that queries could incorrectly reduce entries.

Fix Description: Remove the access check only mode from being
using in filter tests since it is broken, and requires the full
filter test to be evaluated along with it to work in complex cases.

fixes: #5170

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389
tbordaz added a commit to tbordaz/389-ds-base that referenced this issue Feb 22, 2023
… of referral

Bug description:
	A part of the fix 389ds#5170 append '(objectclass=referral)' to
	the original filter (in subtree scope) in order to conform
        smart referral support https://www.ietf.org/rfc/rfc3296.txt
	This triggers a drop on SRCH throughput (10%).
	389ds#5598 limits the case when '(objectclass=referral)' is added
         - Most of the time a server does not contain smart referral.
           So most of the time it is useless to  add that subfilter
         - It should not be added for internal searches

Fix description:
	A mechanism periodically (each 30s) checks if there are
	smart referral entries (referral_check) under each backends.
        Note that if a smart referral is present in a subsuffix,
	the parent suffix inherits the referral flag.
	When a smart referral is detected or no more smart
	referral is detected it logs a information message.
	During a direct subtree search, 'objectclass=referral' is
	append at the condition it exists at least a referral
        under the backend.

relates:  389ds#5598

Reviewed by: Mark Reynolds, Pierre Rogier, William Brown (Thanks)
tbordaz added a commit that referenced this issue Feb 22, 2023
… of referral (#5604)

Bug description:
	A part of the fix #5170 append '(objectclass=referral)' to
	the original filter (in subtree scope) in order to conform
        smart referral support https://www.ietf.org/rfc/rfc3296.txt
	This triggers a drop on SRCH throughput (10%).
	#5598 limits the case when '(objectclass=referral)' is added
         - Most of the time a server does not contain smart referral.
           So most of the time it is useless to  add that subfilter
         - It should not be added for internal searches

Fix description:
	A mechanism periodically (each 30s) checks if there are
	smart referral entries (referral_check) under each backends.
        Note that if a smart referral is present in a subsuffix,
	the parent suffix inherits the referral flag.
	When a smart referral is detected or no more smart
	referral is detected it logs a information message.
	During a direct subtree search, 'objectclass=referral' is
	append at the condition it exists at least a referral
        under the backend.

relates:  #5598

Reviewed by: Mark Reynolds, Pierre Rogier, William Brown (Thanks)
tbordaz added a commit that referenced this issue Feb 22, 2023
… of referral (#5604)

Bug description:
	A part of the fix #5170 append '(objectclass=referral)' to
	the original filter (in subtree scope) in order to conform
        smart referral support https://www.ietf.org/rfc/rfc3296.txt
	This triggers a drop on SRCH throughput (10%).
	#5598 limits the case when '(objectclass=referral)' is added
         - Most of the time a server does not contain smart referral.
           So most of the time it is useless to  add that subfilter
         - It should not be added for internal searches

Fix description:
	A mechanism periodically (each 30s) checks if there are
	smart referral entries (referral_check) under each backends.
        Note that if a smart referral is present in a subsuffix,
	the parent suffix inherits the referral flag.
	When a smart referral is detected or no more smart
	referral is detected it logs a information message.
	During a direct subtree search, 'objectclass=referral' is
	append at the condition it exists at least a referral
        under the backend.

relates:  #5598

Reviewed by: Mark Reynolds, Pierre Rogier, William Brown (Thanks)
tbordaz added a commit that referenced this issue Feb 22, 2023
… of referral (#5604)

Bug description:
        A part of the fix #5170 append '(objectclass=referral)' to
        the original filter (in subtree scope) in order to conform
        smart referral support https://www.ietf.org/rfc/rfc3296.txt
        This triggers a drop on SRCH throughput (10%).
        #5598 limits the case when '(objectclass=referral)' is added
         - Most of the time a server does not contain smart referral.
           So most of the time it is useless to  add that subfilter
         - It should not be added for internal searches

Fix description:
        A mechanism periodically (each 30s) checks if there are
        smart referral entries (referral_check) under each backends.
        Note that if a smart referral is present in a subsuffix,
        the parent suffix inherits the referral flag.
        When a smart referral is detected or no more smart
        referral is detected it logs a information message.
        During a direct subtree search, 'objectclass=referral' is
        append at the condition it exists at least a referral
        under the backend.

relates:  #5598

Reviewed by: Mark Reynolds, Pierre Rogier, William Brown (Thanks)
tbordaz added a commit that referenced this issue Feb 23, 2023
… of referral (#5604)

Bug description:
        A part of the fix #5170 append '(objectclass=referral)' to
        the original filter (in subtree scope) in order to conform
        smart referral support https://www.ietf.org/rfc/rfc3296.txt
        This triggers a drop on SRCH throughput (10%).
        #5598 limits the case when '(objectclass=referral)' is added
         - Most of the time a server does not contain smart referral.
           So most of the time it is useless to  add that subfilter
         - It should not be added for internal searches

Fix description:
        A mechanism periodically (each 30s) checks if there are
        smart referral entries (referral_check) under each backends.
        Note that if a smart referral is present in a subsuffix,
        the parent suffix inherits the referral flag.
        When a smart referral is detected or no more smart
        referral is detected it logs a information message.
        During a direct subtree search, 'objectclass=referral' is
        append at the condition it exists at least a referral
        under the backend.

relates:  #5598

Reviewed by: Mark Reynolds, Pierre Rogier, William Brown (Thanks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment