-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Make previous unchecked division kernel available #6970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you for the contribution @kazuyukitanimura
cc @liukun4515 and @tustvold
|
This operator has actually been removed upstream... Perhaps this could be implemented as a nullif zero prior to division? This would be consistent with postgres. As written this PR would reopen #6791 |
Confirming you mean something like https://docs.rs/datafusion/latest/datafusion/logical_expr/fn.nullif.html so like let expr = nullif(expr, lit(0.0f64))What do you think @kazuyukitanimura ? |
|
Let's make this preference (exception vs. special value as the default behavior) configurable. There are many use cases where one wants the output of division to be nullable for integers, or NaN for floats. IIRC, we follow IEEE 754 for floats, so there is no issue there. For integers, PostgreSQL goes the exception route, but others (e.g. Spark IIRC) proceed with execution in terms of default behavior. Applications involving numerics (custom fixed point data etc.) may prefer the latter default behavior and handle such situations downstream. The entire family of streaming use cases is also another example where exception behavior is problematic. If we make nullable divisions a configurable behavior for integer types and act according to user's preferences, I think everyone will be happy. |
|
I would be fine with making this configurable and use the nullif kernel as suggested. That being said it does seem inconsistent to special case integer division by zero and not all the other sources of errors in DF, from casting to arithmetic overflow. I wonder if there is a higher level issue if not producing errors is important? FWIW in ANSI mode, which I think is what we should be aiming to replicate, Spark will return errors. |
This is true. We will probably have a small family of flags to control these (adding them as we make progress in the project and needs arise) and have an accompanying general (exception-vs-special-value) flag to collectively handle them. Most users will simply deal with that one flag and be happy. Advanced users will be able to set that one flag to establish their baseline and fine tune the behavior according to their own requirements by toggling individual flags. The experience will be in line with compilers' approach to these things so we would have POLS even for advanced users. |
|
Thank you all for the reviews. I am currently testing the performance of the |
|
Marking as a draft so I don't accidentally merge this PR |
|
It seems no performance degradation for the |
Which issue does this PR close?
Closes #6967
Rationale for this change
With #6792, now divisions check dividing by zero.
That is an API change and we are (and some other users may be) looking for an old behavior.
What changes are included in this PR?
This PR implements
DivideUncheckedoperation that behaves like the previous Division operator (return null for dividing by zero)Are these changes tested?
Yes
Are there any user-facing changes?
No, but users have a choice explicitly to use the previous division behavior.