-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: incorrect nullability of InList expr
#6799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nicely written and tested @jonahgao 👏 Thank you very much
Improve code readability Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
|
||
| Expr::InList(InList { expr, list, .. }) => { | ||
| // Avoid inspecting too many expressions. | ||
| const MAX_INSPECT_LIMIT: usize = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to know why use this number? Is it the practice from other systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark handle it simplified.
override def nullable: Boolean = children. Exists(_.nullable)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to know why use this number? Is it the practice from other systems?
@jackwener No. The nullable function may be called multiple times during the optimization phase,
So I think adding a limitation would be preferable in order to prevent it from being excessively slow.
But I'm not quite sure what would be an appropriate number.
spark handle it simplified.
override def nullable: Boolean = children. Exists(_.nullable)
This seems to be a cache style.
We can implement this by precomputing nullable in the InList::new() function.
But the disadvantages are:
- We need a new field for the
InListstruct - The precomputed nullable may not be used.
- Calculating nullable requires
input_schema
@jackwener Which solution do you prefer?
Update: It seems challenging to accomplish. I need to take a closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @jonahgao hints at , trying to add some sort of cache / memoization is going to be challenging given Rust and how DataFusion is structured
I think the current PR solution is good:
- It is well tested
- It is conservative (it might say an InList is nullable that isn't, but I think that will not generate wrong results, only potentially less optimal plans)
Thus I think we should merge this PR as is and then as a follow on PR we can remove the limit or increase it, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* fix: incorrect nullability of InList expr * Update datafusion/expr/src/expr_schema.rs Improve code readability Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* fix: incorrect nullability of InList expr * Update datafusion/expr/src/expr_schema.rs Improve code readability Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
None
Rationale for this change
Similar to #6786 .
We can safely assume that the
InListexpr is non-nullalbe only if all of its subexpressions are non-nullable.An example of a nullable expression is:
What changes are included in this PR?
InListexprAre these changes tested?
Yes
Are there any user-facing changes?
No