Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use FieldExistsQuery over REF that is arg of NON_NULLABLE function #15280

Merged
merged 1 commit into from Jan 10, 2024

Conversation

jeeminso
Copy link
Contributor

@jeeminso jeeminso commented Jan 3, 2024

Summary of the changes / Why this improves CrateDB

Fixes #15265 AND an undocumented bug:

cr> create table t (a int, b int);
CREATE OK, 1 row affected (0.210 sec)
cr> insert into t values (null, null), (null, 2), (2, null), (2, 2);
INSERT OK, 4 rows affected (0.032 sec)

cr> select * from t where a != (b||1);
+---+---+
| a | b |
+---+---+
| 2 | 2 |  --> missing a row, see below
+---+---+
SELECT 1 row in set (0.022 sec)

cr> select a, b, a != (b||1) from t;
+------+------+----------------------------+
|    a |    b | (NOT (a = concat(b, '1'))) |
+------+------+----------------------------+
| NULL |    2 | NULL                       |
| NULL | NULL | NULL                       |
|    2 | NULL | TRUE                       | --> this row is missing from the result of the query above
|    2 |    2 | TRUE                       |
+------+------+----------------------------+
SELECT 4 rows in set (0.023 sec)

the undocumented bug is not a regression from master, fails on 5.5.0 as well.

Current NullabilityVisitor will wrap all or none of the refs under a not operator with FieldsExistsQuery, and in this case, both a and b from a != (b||1) are wrapped with FieldsExistsQuery (considering both as nullable) causing extra rows to be filtered.

What is missing from the current NullabilityVisitor is a way to identify a ref(leaf) is under a NON_NULLABLE function. For instance, a != (b||1), b is implicitly NON_NULLABLE since it is under || but a is NULLABLE.

Similarly, NOT(NOT(c IS NULL)) from #15265 fails because NullabilityVisitor cannot identify the inner NOT(c IS NULL) is NON_NULLABLE.


To describe the change in terms of the commit message of 4428eb1,

This improves the three-valued-logic (3vl) processing by grouping all
scalar functions into the three categories:

  • Nullable means the function returns a null value for a null argument.

  • Non-nullable means the function returns never a null values for null
    arguments.

  • Conditional means it follows it's own logic. By default each scalar
    function is conditional.

Based on these groups the generation of the Lucene queries for the
NotPredicate can be optimized in the following way:

  • If all involved scalar functions are nullable, no 3vl is required
    and the query will be negated and null values can be excluded by
    using a field exist query.

  • If all involved scalar functions are non-nullable or nullable,
    no 3vl is required and the query can be simply negated.

  • If all involved scalar functions are non-nullable or nullable,
    no 3vl is required. The query will be negated and null values
    will be excluded from 'nullable' fields by field exist query. A field
    is nullable if the field is nullable and all its parent functions
    are nullable.

  • If any of the scalar function is conditional, 3vl is required,
    the query will be negated and further processed by converting
    the relevanting scalar function to generic function filters.

Checklist

  • Added an entry in the latest docs/appendices/release-notes/<x.y.0>.rst for user facing changes
  • Updated documentation & sql_features table for user facing changes
  • Touched code is covered by tests
  • CLA is signed
  • This does not contain breaking changes, or if it does:
    • It is released within a major release
    • It is recorded in the latest docs/appendices/release-notes/<x.y.0>.rst
    • It was marked as deprecated in an earlier release if possible
    • You've thought about the consequences and other components are adapted
      (E.g. AdminUI)

@jeeminso
Copy link
Contributor Author

jeeminso commented Jan 8, 2024

Hi @mkleen could you also have a quick look at this draft? Thank you

@jeeminso jeeminso requested a review from mkleen January 8, 2024 15:33
@jeeminso jeeminso force-pushed the jeeminso/fix-15229 branch 5 times, most recently from 11cbef4 to 4651e2f Compare January 10, 2024 01:43
@jeeminso jeeminso added bug Clear identification of incorrect behaviour and removed bug Clear identification of incorrect behaviour labels Jan 10, 2024
@jeeminso
Copy link
Contributor Author

the undocumented bug is not a regression from master, fails on 5.5.0 as well.

I think this cannot be easily back-ported since this depends on 4428eb1.

@jeeminso jeeminso marked this pull request as ready for review January 10, 2024 02:04
@jeeminso jeeminso requested a review from matriv January 10, 2024 10:03
return null;
}
}
for (Symbol arg : function.arguments()) {
arg.accept(this, context);
arg.accept(this, NullabilityContext.of(context));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we pass one context through? Why do we need to clone it here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just pass the same context and it will be the same result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we pass one context through? Why do we need to clone it here ?

Thanks for reviewing. It prevents any context.isNullable = false from left arg to be carried over to right arg ultimately achieving A field is nullable if the field is nullable and all its parent functions are nullable.

references.add(symbol);
private HashSet<Reference> nullableReferences = new HashSet<>();
private boolean isNullable = true;
private AtomicBoolean enforceThreeValuedLogic = new AtomicBoolean(false);
Copy link
Contributor

@mkleen mkleen Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need an atomic boolean here, because this will never be called by multiple threads simultaneously. Also NullabilityContext is newly instantiated on each pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike isNullable, a single instance of enforceThreeValuedLogic is purposely shared across all instances of NullabilityContext. I will try to attach better comments.

Comment on lines +159 to +164
// saves and restores isNullable of the current context
// such that any non-nullables observed from the left arg is not transferred to the right arg.
boolean isNullable = context.isNullable;
for (Symbol arg : function.arguments()) {
arg.accept(this, context);
context.isNullable = isNullable;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing @mkleen . I think the static method of was the cause of the confusion. Please check again.

@jeeminso jeeminso requested a review from mkleen January 10, 2024 15:43
Copy link
Contributor

@mkleen mkleen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thank you!

@jeeminso jeeminso added the ready-to-merge Let Mergify merge the PR once approved and checks pass label Jan 10, 2024
…nctions

cr> create table t (a int, b int);
CREATE OK, 1 row affected (0.210 sec)
cr> insert into t values (null, null), (null, 2), (2, null), (2, 2);
INSERT OK, 4 rows affected (0.032 sec)

cr> select * from t where a != (b||1);
+---+---+
| a | b |
+---+---+
| 2 | 2 |  --> missing a row, see below
+---+---+
SELECT 1 row in set (0.022 sec)

cr> select a, b, a != (b||1) from t;
+------+------+----------------------------+
|    a |    b | (NOT (a = concat(b, '1'))) |
+------+------+----------------------------+
| NULL |    2 | NULL                       |
| NULL | NULL | NULL                       |
|    2 | NULL | TRUE                       | --> this row is missing from the result of the query above
|    2 |    2 | TRUE                       |     because of an unwanted FieldExistsQuery over field 'b'
+------+------+----------------------------+
SELECT 4 rows in set (0.023 sec)
@mergify mergify bot merged commit 5ff277c into master Jan 10, 2024
15 checks passed
@mergify mergify bot deleted the jeeminso/fix-15229 branch January 10, 2024 19:31
@jeeminso jeeminso restored the jeeminso/fix-15229 branch January 10, 2024 20:16
jeeminso added a commit that referenced this pull request Jan 16, 2024
mergify bot pushed a commit that referenced this pull request Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Let Mergify merge the PR once approved and checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected Result when using IS NOT NULL
2 participants