Skip to content

Conversation

@kazuyukitanimura
Copy link
Contributor

Which issue does this PR close?

Closes #6967

Rationale for this change

With #6792, now divisions check dividing by zero.
That is an API change and we are (and some other users may be) looking for an old behavior.

What changes are included in this PR?

This PR implements DivideUnchecked operation that behaves like the previous Division operator (return null for dividing by zero)

Are these changes tested?

Yes

Are there any user-facing changes?

No, but users have a choice explicitly to use the previous division behavior.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates substrait Changes to the substrait crate labels Jul 14, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you for the contribution @kazuyukitanimura

cc @liukun4515 and @tustvold

@tustvold
Copy link
Contributor

tustvold commented Jul 15, 2023

This operator has actually been removed upstream... Perhaps this could be implemented as a nullif zero prior to division? This would be consistent with postgres.

As written this PR would reopen #6791

@alamb
Copy link
Contributor

alamb commented Jul 15, 2023

This operator has actually been removed upstream... Perhaps this could be implemented as a nullif zero prior to division? This would be consistent with postgres.

Confirming you mean something like https://docs.rs/datafusion/latest/datafusion/logical_expr/fn.nullif.html so like

let expr = nullif(expr, lit(0.0f64))

What do you think @kazuyukitanimura ?

@ozankabak
Copy link
Contributor

ozankabak commented Jul 17, 2023

Let's make this preference (exception vs. special value as the default behavior) configurable. There are many use cases where one wants the output of division to be nullable for integers, or NaN for floats. IIRC, we follow IEEE 754 for floats, so there is no issue there.

For integers, PostgreSQL goes the exception route, but others (e.g. Spark IIRC) proceed with execution in terms of default behavior. Applications involving numerics (custom fixed point data etc.) may prefer the latter default behavior and handle such situations downstream. The entire family of streaming use cases is also another example where exception behavior is problematic.

If we make nullable divisions a configurable behavior for integer types and act according to user's preferences, I think everyone will be happy.

@tustvold
Copy link
Contributor

I would be fine with making this configurable and use the nullif kernel as suggested.

That being said it does seem inconsistent to special case integer division by zero and not all the other sources of errors in DF, from casting to arithmetic overflow. I wonder if there is a higher level issue if not producing errors is important? FWIW in ANSI mode, which I think is what we should be aiming to replicate, Spark will return errors.

@ozankabak
Copy link
Contributor

That being said it does seem inconsistent to special case integer division by zero and not all the other sources of errors in DF, from casting to arithmetic overflow.

This is true. We will probably have a small family of flags to control these (adding them as we make progress in the project and needs arise) and have an accompanying general (exception-vs-special-value) flag to collectively handle them. Most users will simply deal with that one flag and be happy. Advanced users will be able to set that one flag to establish their baseline and fine tune the behavior according to their own requirements by toggling individual flags. The experience will be in line with compilers' approach to these things so we would have POLS even for advanced users.

@kazuyukitanimura
Copy link
Contributor Author

Thank you all for the reviews. I am currently testing the performance of the nullif approach.

@alamb alamb marked this pull request as draft July 18, 2023 20:49
@alamb
Copy link
Contributor

alamb commented Jul 18, 2023

Marking as a draft so I don't accidentally merge this PR

@kazuyukitanimura
Copy link
Contributor Author

It seems no performance degradation for the nullif solution. Closing.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make unchecked division kernel available

4 participants