Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divide By Zero Panics in Unchecked Arithmetic Kernels #2647

Closed
tustvold opened this issue Sep 4, 2022 · 11 comments
Closed

Divide By Zero Panics in Unchecked Arithmetic Kernels #2647

tustvold opened this issue Sep 4, 2022 · 11 comments
Labels
question Further information is requested

Comments

@tustvold
Copy link
Contributor

tustvold commented Sep 4, 2022

Which part is this question about

In Rust division and modulus is always checked - you can see that here. We currently provide checked and unchecked division kernels, and a single modulus kernel which is always checked.

The result is that the checked kernels will return an error on divide by zero, whilst the unchecked kernel will panic. This feels a little counter-intuitive, I would expect the unchecked kernel to silently do "something defined but not necessarily meaningful".

Describe your question

The comparison kernels handle the null mask separately from the values data, and this massively improves performance. I would like to apply the same optimisation to the unchecked arithmetic kernels, but cannot do this whilst dividing or modulus by 0 results in a panic, as the null values may have arbitrary values which could trigger this.

I would like to propose the unchecked variant of these division kernels set the value to be 0 on division by 0 (or some other sentinel value). As we're checking regardless there is no performance impact of doing this, and it allows blindly processing the values without first consulting the bitmask.

An alternative option would be to only have checked variants, however, this would preclude handling the null mask separately for the non-scalar variants of the kernels. I think this would be a shame.

One further alternative I did consider was to only consult the bitmask on divide by zero, however, the reality is most null slots will have a value of 0, and consequently this cost of this conditional checking is likely to undermine any benefits from the optimisation.

Additional context

Thoughts @alamb @viirya @Dandandan @jhorstmann ?

@tustvold tustvold added the question Further information is requested label Sep 4, 2022
@viirya
Copy link
Member

viirya commented Sep 4, 2022

I am +1 for that the unchecked variant of these division kernels should do something else instead of panic.

For the idea to fill a special value, just wondering if NULL is more suitable/meaningful for this case (Spark does this). Not strong option and also from performance perspective from the proposed optimization, it may not a good choice.

For using 0 for dividing by zero slot, not sure if there is any other doing the same so this is my little concern. But I guess it is okay if we well define this behavior in the document.

Btw, C++ divide kernel returns an error for integer division by zero like divide_checked does. For floating point, it doesn't specially handle it so it gets an infinity value.

@liukun4515
Copy link
Contributor

cc @viirya @tustvold
In the kernel cast, there is castoption to control the behavior when meet the error.

@tustvold
Copy link
Contributor Author

tustvold commented Sep 5, 2022

An interesting datapoint I was not aware of is there aren't actually SIMD instructions for integer division, and integer division is sufficiently slow that people do arcane tricks doing floating point division and then compensating for the precision loss instead 😅

Perhaps this means division will always be its own special snowflake and I should just optimize the other kernels as a first step 🤔

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

Ideally, I would expect the division kernel to have the same option as the cast kernel:

  1. if CastOptions::safe is true, then any indexes that divide by 0 would have their validity slot set to NULL and can have an arbitrary value in the values Buffer
  2. If CastOptions::safe is false, then if any indexes divide by zero, an Err is returned

I believe this is consistent with what @liukun4515 and @viirya are saying

I also think this is consistent with @tustvold 's proposal, but I am 100% sure.

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

Oh, I see -- there is no CastOption for division -- maybe that is the trick: allow users to choose if they want an error or NULL if there is a divide by zero?

lol -- it appears @viirya is already implementing this idea: #2647 (I was just too slow :) )

@tustvold
Copy link
Contributor Author

tustvold commented Sep 5, 2022

It isn't what I was proposing but having thought about it a bit more I'm not sure having checked and unchecked variants of the division kernel really makes sense:

  • Integer division is incredibly slow outweighing benefits of handling the null mask separately
  • There doesn't appear to be a way to get LLVM to vectorise it correctly (as the branch is always present)

As such I would like to propose we only provide a single division and modulus kernels, which will return an error on division by zero.

there is no CastOption for division

There is no CastOption for any of the arithmetic kernels, and I'm somewhat inclined to keep it that way. If people want the "safe" semantic (a name I really dislike) a preceding nullif will likely perform similarly.

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

As such I would like to propose we only provide a single division and modulus kernels, which will return an error on division by zero.

This makes sense to me (and is consistent with what postgres division does):

alamb=# select * from foo;
 i |  c  
---+-----
 1 | foo
 2 | bar
 0 | baz
(3 rows)

alamb=# select i / 0 from foo;
ERROR:  division by zero

@jhorstmann
Copy link
Contributor

I might be misremembering this, but I think divide_unchecked is only implemented for floating point types (which have a defined value for division by zero: positive or negative infinity). What might be missing is a scalar version of that kernel, and a version for the modulo operation.

@tustvold
Copy link
Contributor Author

tustvold commented Oct 7, 2022

Further to this the divide kernel is currently using math_op which assumes no side-effects as it evaluates for null slots. This effectively makes the current kernel useless for integer arrays that may contain nulls

@tustvold
Copy link
Contributor Author

I have confirmed empirically that there is no discernible performance difference between a variant that returns an error only on divide by zero, and a variant that also returns an error on overflow.

I therefore think it makes sense to only have checked division and modulus kernels, this is what I plan to do as part of #3999

@tustvold
Copy link
Contributor Author

Closed by #4594

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants