ARROW-9042: [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior #7341

kszucs · 2020-06-03T11:33:21Z

Currently the output type of the Add function is identical with the argument types which makes it unsafe to add numeric limit values, so instead of using (int8, int8) -> int8 signature we should use (int8, int8) -> int16.

avoid undefined behaviour caused by signed integer overflow by casting to unsigned counterparts before operation
added subtract and multiply kernels (multiply required special handling for uint16 types to avoid casting the operands to signed int32 types)
test case parametrization supporting different argument and output types (not used for the current tests cases, but will be useful for the upcoming arithmetic kernels)

follow-ups:

variants that raise on overflow
AssertArrayAlmostEqual for floating point comparisons

cpp/src/arrow/compute/kernels/codegen_internal.h

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

github-actions · 2020-06-03T12:01:19Z

https://issues.apache.org/jira/browse/ARROW-9022

bkietz · 2020-06-03T14:37:48Z

we should use (int8, int8) -> int16

I'm not sure that's desirable. For one thing this leads to inconsistent handling of 64 bit integer types, which are currently allowed to overflow (NB: that means this kernel includes undefined behavior for int64).

There are a few other approaches we could take (ordered by personal preference):

define explicit overflow behavior for signed integer operands (for example if we declared that add(i8(a), i8(b)) will always be equivalent to i8(i16(a) + i16(b)) then we could instantiate only unsigned addition kernels)
raise an error on signed overflow
provide ArithmeticOptions::overflow_behavior and allow users to choose between these
require users to pass arguments which will not overflow

@wesm ?

This is probably nuanced enough to merit a mailing list discussion.

kszucs · 2020-06-03T15:00:06Z

This is probably nuanced enough to merit a mailing list discussion.

Certainly.

What kind of overflow strategies would could we have for other functions with higher probability of overflowing like product?

wesm · 2020-06-03T15:49:09Z

Per the mailing list discussion, I think we should apply the strategies used by open source analytic DBMS that we have access to and not think too hard about it:

Do signed arithmetic as unsigned to prevent UB
Do not promote small integers (except perhaps in multiplication, what do the DBMSes do?)
Do not check for overflows

kszucs · 2020-06-03T16:19:31Z

I'll update the PR as discussed, and defer the implicit promotions to a follow-up PR.

pitrou · 2020-06-03T19:33:06Z

As I said on the ML, I'm -1 on implicit promotion. We may discuss whether overflow should be detected or not. But trying to make the output type "big enough" is a can of worms.

wesm · 2020-06-03T19:38:51Z

Agreed. Implicit casts or type promotions is something that is typically negotiated by the orchestration/planning layer 1 to 2 levels above the kernels.

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/codegen_internal.h

kszucs · 2020-06-04T21:22:26Z

The (uint16 * uint16) operation triggers the following ubsan error:

runtime error: signed integer overflow: 65535 * 65535 cannot be represented in type 'int'

According to this SO post we might need a special case for uint16 (or perhaps it depends on the arch) multiplication.

wesm · 2020-06-04T22:29:22Z

We could also simply not generate multiply kernels for 8- and 16-byte integer types.

kszucs · 2020-06-05T00:35:03Z

Just fixed it.

kszucs · 2020-06-05T00:37:46Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc


-  this->AssertAdd("[3, 2, 6]", "[1, 0, 2]", "[4, 2, 8]");
+  // this->AssertBinop(arrow::compute::Subtract,


The array equality check fails despite that after printing out both sides the values are the same. I'm not sure why, perhaps a float precision problem?

github-actions · 2020-06-05T13:03:07Z

https://issues.apache.org/jira/browse/ARROW-9042

kszucs · 2020-06-05T14:48:03Z

@bkietz I think it is ready for a review now, the build failures are present on the master branch.

…t shapes

… ubsan

Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>

kszucs · 2020-06-08T22:02:51Z

Nice! Thanks for upgrading it!

wesm · 2020-06-09T01:12:35Z

cpp/src/arrow/compute/kernels/codegen_internal.h

-      *out_data++ = Op::template Call<OUT, ARG0, ARG1>(ctx, *arg0_data++, *arg1_data++);
-    }
-  }
-}


If you're going to remove this, you absolutely must write benchmarks to show that the more general version is not slower.

Writing a benchmark, the jira to track it https://issues.apache.org/jira/browse/ARROW-9079

@wesm

Quickly wanted to add a benchmark for the `Add` function to verify that no significant regressions were introduced by #7341 Before: ``` --------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------- AddArrayArrayKernel/32768/10000 18 us 18 us 35892 null_percent=0.01 size=32.768k 1.67854GB/s AddArrayArrayKernel/32768/100 19 us 19 us 37540 null_percent=1 size=32.768k 1.61941GB/s AddArrayArrayKernel/32768/10 20 us 20 us 37049 null_percent=10 size=32.768k 1.55599GB/s AddArrayArrayKernel/32768/2 20 us 20 us 35394 null_percent=50 size=32.768k 1.54512GB/s AddArrayArrayKernel/32768/1 19 us 19 us 37901 null_percent=100 size=32.768k 1.63153GB/s ``` After: ``` --------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------- AddArrayArrayKernel/32768/10000 19 us 19 us 36704 null_percent=0.01 size=32.768k 1.64619GB/s AddArrayArrayKernel/32768/100 18 us 18 us 37194 null_percent=1 size=32.768k 1.67588GB/s AddArrayArrayKernel/32768/10 18 us 18 us 36341 null_percent=10 size=32.768k 1.65205GB/s AddArrayArrayKernel/32768/2 18 us 18 us 37502 null_percent=50 size=32.768k 1.662GB/s AddArrayArrayKernel/32768/1 18 us 18 us 38622 null_percent=100 size=32.768k 1.66593GB/s ``` cc @wesm Closes #7417 from kszucs/ARROW-9079 Lead-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Wes McKinney <wesm@apache.org> Signed-off-by: Wes McKinney <wesm@apache.org>

kszucs commented Jun 3, 2020

View reviewed changes

cpp/src/arrow/compute/kernels/codegen_internal.h Show resolved Hide resolved

kszucs commented Jun 3, 2020

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

kszucs commented Jun 3, 2020

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

bkietz requested changes Jun 4, 2020

View reviewed changes

wesm reviewed Jun 4, 2020

View reviewed changes

cpp/src/arrow/compute/api_scalar.h Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/codegen_internal.h Show resolved Hide resolved

kszucs commented Jun 5, 2020

View reviewed changes

kszucs force-pushed the add-kernel branch from 76e2bd4 to 5dfe608 Compare June 5, 2020 00:48

kszucs changed the title ~~ARROW-9022: [C++][Compute] Make Add function safe for numeric limits~~ ARROW-9042: [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior Jun 5, 2020

kszucs added 10 commits June 8, 2020 15:09

safe add

00bb29f

clang format

d2b37aa

remove flippedop

bd94196

deduce error

047d7d2

add error as a comment

f1fce32

left char behind

0a44c14

fix tests and use ScalarBinary exec function to support multiple inpu…

24835c2

…t shapes

specialize Add operation explicitly for smaller integer types, please…

b75327c

… ubsan

unchecked add, sub, mul

b5bb26f

format, remove comments

82b8dd0

kszucs and others added 12 commits June 8, 2020 15:09

fix op

af3e452

failing ubsan test for signed integer overflow

6d9924b

avoid undefined signed integer overflow

744c27d

more overflow tests

c12ca4f

Apply suggestions from Ben

72c33e6

Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>

renames

9f34b32

make uint16 an exceptional case for multiplication

85e98f9

fix arithmetics with scalar operands

83c442b

ubsan

3d7e5eb

compare as string workaround until we have AssertArraysAlmostEqual

19ad8ca

scalar - scalar tests

d81d00f

cleanup tests

9c2921a

bkietz force-pushed the add-kernel branch from f829049 to 9c2921a Compare June 8, 2020 20:42

bkietz approved these changes Jun 8, 2020

View reviewed changes

silence conversion warning error for MSVC

870bcc1

bkietz closed this in 66bc8f0 Jun 9, 2020

wesm reviewed Jun 9, 2020

View reviewed changes

kszucs mentioned this pull request Jun 12, 2020

ARROW-9079: [C++] Write benchmark for arithmetic kernels #7417

Closed

This was referenced Jul 9, 2020

[C++] Add/Sub/Mul arithmetic kernels with overflow check #25139

Closed

[C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior #25158

Closed

[C++] Write benchmark for arithmetic kernels #25194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-9042: [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior #7341

ARROW-9042: [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior #7341

kszucs commented Jun 3, 2020 •

edited

Loading

github-actions bot commented Jun 3, 2020

bkietz commented Jun 3, 2020

kszucs commented Jun 3, 2020 •

edited

Loading

wesm commented Jun 3, 2020 •

edited

Loading

kszucs commented Jun 3, 2020

pitrou commented Jun 3, 2020

wesm commented Jun 3, 2020

kszucs commented Jun 4, 2020

wesm commented Jun 4, 2020 •

edited

Loading

kszucs commented Jun 5, 2020

kszucs Jun 5, 2020

github-actions bot commented Jun 5, 2020

kszucs commented Jun 5, 2020 •

edited

Loading

kszucs commented Jun 8, 2020

wesm Jun 9, 2020

kszucs Jun 9, 2020


		this->AssertAdd("[3, 2, 6]", "[1, 0, 2]", "[4, 2, 8]");
		// this->AssertBinop(arrow::compute::Subtract,

ARROW-9042: [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior #7341

ARROW-9042: [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior #7341

Conversation

kszucs commented Jun 3, 2020 • edited Loading

github-actions bot commented Jun 3, 2020

bkietz commented Jun 3, 2020

kszucs commented Jun 3, 2020 • edited Loading

wesm commented Jun 3, 2020 • edited Loading

kszucs commented Jun 3, 2020

pitrou commented Jun 3, 2020

wesm commented Jun 3, 2020

kszucs commented Jun 4, 2020

wesm commented Jun 4, 2020 • edited Loading

kszucs commented Jun 5, 2020

kszucs Jun 5, 2020

Choose a reason for hiding this comment

github-actions bot commented Jun 5, 2020

kszucs commented Jun 5, 2020 • edited Loading

kszucs commented Jun 8, 2020

wesm Jun 9, 2020

Choose a reason for hiding this comment

kszucs Jun 9, 2020

Choose a reason for hiding this comment

kszucs commented Jun 3, 2020 •

edited

Loading

kszucs commented Jun 3, 2020 •

edited

Loading

wesm commented Jun 3, 2020 •

edited

Loading

wesm commented Jun 4, 2020 •

edited

Loading

kszucs commented Jun 5, 2020 •

edited

Loading