Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Simpify IS NULL/IS NOT NULL evaluation #103099

Merged
merged 7 commits into from Dec 13, 2023
Merged

Conversation

costin
Copy link
Member

@costin costin commented Dec 7, 2023

The IS NOT NULL predicate cares only about the nullability not
the precise value of an expression. Given the expression x + 1 / 2, the actual
result does not matter only if it's null or not which can occur if there's an
overflow/underflow, the value is multi value or if the value is null.
We cannot detect the overflow in advanced, we cannot yet detect if a field
is multi value, however we can detect if a field is not null.
In practice this means x + 1 / 2 IS NOT NULL can be prefixed with
x IS NOT NULL check:
x IS NOT NULL AND x + 1 / 2 IS NOT NULL
which can be pushed down before the expression gets evaluated.

Fix #103097

@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

The IS NULL, IS NOT NULL predicates only care about the nullability not
 the value of an expression. Given the expression x + 1 / 2, the actual
 result does not matter only if it's null or not - that is, it only
 matters if x is null or not.
 So x + 1 / 2 IS NULL becomes x IS NULL - which can be opportunistically
 pushed down or evaluated.

Fix elastic#103097
Preserve the original expression to cope with under/overflow or mv
 fields
@costin costin marked this pull request as ready for review December 8, 2023 01:54
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Dec 8, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

@nik9000
Copy link
Member

nik9000 commented Dec 8, 2023

This seems to work, though I can't comment on the code itself.

If I run FROM tsdb | EVAL ml = LENGTH(message) | WHERE ml IS NOT NULL | KEEP ml with the profiler the important bit of the execution looks like:

          {
            "operator" : "ValuesSourceReaderOperator[field = message]",
            "status" : {
              "readers_built" : {
                "message:column_at_a_time:null" : 1,
                "message:row_stride:BlockSourceReader.Bytes" : 1,
                "stored_fields[requires_source:true, fields:0, sequential: true]" : 2
              },
              "pages_processed" : 2
            }
          },
          {
            "operator" : "FilterOperator[evaluator=IsNotNullEvaluator[field=Attribute[channel=1]]]",
            "status" : {
              "pages_processed" : 2
            }
          },
          {
            "operator" : "EvalOperator[evaluator=LengthEvaluator[val=Attribute[channel=1]]]",
            "status" : {
              "pages_processed" : 2
            }
          },
          {
            "operator" : "FilterOperator[evaluator=IsNotNullEvaluator[field=Attribute[channel=2]]]",
            "status" : {
              "pages_processed" : 2
            }
          },

@costin
Copy link
Member Author

costin commented Dec 9, 2023

@elasticsearchmachine update branch

@costin
Copy link
Member Author

costin commented Dec 9, 2023

@elasticmachine merge upstream

@costin costin added this to the 8.13 milestone Dec 9, 2023
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected some optimizer tests in the ESQL project.

* Implementation-wise this rule goes bottom-up, keeping an alias up to date to the current plan
* and then looks for replacing the target.
*/
public static class InferIsNotNull extends Rule<LogicalPlan, LogicalPlan> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a rule that could be used in other QL projects... but is it really going to be used there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the intent behind putting in there - there is nothing specific about it and both EQL and SQL can benefit from it since it's mv compatible.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested one query where I am not using an explicit is not null, but a variant that should be, in some scenarios, equivalent: where NOT(field1 IS NULL) it is actually not treated the same. I don't know in which order the rules apply or if we should deal with not (isnull) but I was a bit surprised by the behavior.

For ip1 is not null the plan result from this rule is

Project[[ip1{f}#9, ip1.keyword{f}#10, ip2{f}#11, ip2.keyword{f}#12]]
\_Limit[500[INTEGER]]
  \_Filter[ISNOTNULL(ip1{f}#9) AND ISNOTNULL(ip1{f}#9)]
    \_EsRelation[test][ip1{f}#9, ip1.keyword{f}#10, ip2{f}#11, ip2.keyword..]

First of all, why \_Filter[ISNOTNULL(ip1{f}#9) AND ISNOTNULL(ip1{f}#9)] (two identical conditions)?

But for not (ip1 is null) the plan is:

Project[[ip1{f}#15, ip1.keyword{f}#16, ip2{f}#17, ip2.keyword{f}#18]]
\_Limit[500[INTEGER]]
  \_Filter[NOT(ISNULL(ip1{f}#15))]
    \_EsRelation[test][ip1{f}#15, ip1.keyword{f}#16, ip2{f}#17, ip2.keywor..]

not (is null) seems to be not be equivalent to is not null. I don't think this has to do with specific PR, likely other rules are not doing the right thing, or we don't do this replacement on purpose (though, I don't remember why in that case).

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but before merging it's worth investigating the problem identified by Andrei (\_Filter[ISNOTNULL(ip1{f}#9) AND ISNOTNULL(ip1{f}#9)]). The fact that NOT (IS NULL) is not optimized can probably be addressed in a follow-up PR, as it's an optimization by itself

* This handles the case of fields nested inside functions or expressions in order to avoid:
* - having to evaluate the whole expression
* - not pushing down the filter due to expression evaluation
* IS NULL cannot be simplified since it leads to a disjunction which cannot prevents the simplified IS NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* IS NULL cannot be simplified since it leads to a disjunction which cannot prevents the simplified IS NULL

should be

* IS NULL cannot be simplified since it leads to a disjunction which prevents the simplified IS NULL

@@ -229,7 +229,7 @@ protected final boolean enableWarningsCheck() {
}

public boolean logResults() {
return true;
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


@Override
protected boolean skipExpression(Expression e) {
return e instanceof Coalesce;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scares me a bit TBH, especially if we decide to add user-defined functions in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed - null-aware conditions are special and we don't have a marker for them as of now.
Nor for user functions which likely have their own super / marker class.

@costin
Copy link
Member Author

costin commented Dec 12, 2023

First of all, why _Filter[ISNOTNULL(ip1{f}#9) AND ISNOTNULL(ip1{f}#9)] (two identical conditions)?

I've added a test a fix for this.

@costin costin requested a review from astefan December 12, 2023 04:51
@costin
Copy link
Member Author

costin commented Dec 12, 2023

Separately I've raised - #103301 since I was expecting the duplicate to be removed by a separate rule in the chain.
Turns out I was wrong - hence why I raised an issue for it considering past discussions.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add some tests specifically for ESQL in LocalLogicalPlanOptimizerTests.

@costin costin requested a review from astefan December 13, 2023 02:11
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@costin costin merged commit 2c8f6eb into elastic:main Dec 13, 2023
15 checks passed
@costin costin deleted the fix/103097 branch December 13, 2023 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:QL (Deprecated) Meta label for query languages team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Simplify IS NULL/IS NOT NULL evaluation
6 participants