ARROW-13136: [C++] Add coalesce function #10608

lidavidm · 2021-06-28T17:40:29Z

No description provided.

github-actions · 2021-06-28T17:42:10Z

https://issues.apache.org/jira/browse/ARROW-13136

lidavidm · 2021-07-02T13:39:53Z

Hmm, this has some Valgrind errors - taking a look.

lidavidm · 2021-07-02T14:00:41Z

@github-actions crossbow submit conda-cpp-valgrind

lidavidm · 2021-07-02T14:03:18Z

@github-actions crossbow submit test-conda-cpp-valgrind

github-actions · 2021-07-02T14:04:05Z

Revision: 08c6375

Submitted crossbow builds: ursacomputing/crossbow @ actions-560

Task	Status
test-conda-cpp-valgrind

lidavidm · 2021-07-12T15:04:27Z

@bkietz do you have time to review this? Do we want to add a benchmark here?

bkietz

Thanks for doing this!

Yes, we definitely want a benchmark here. A few other comments:

cpp/src/arrow/compute/kernels/codegen_internal.cc

cpp/src/arrow/util/bit_block_counter.h

bkietz · 2021-07-12T15:22:09Z

cpp/src/arrow/compute/kernels/scalar_if_else.cc

+  kernel.null_handling = NullHandling::COMPUTED_NO_PREALLOCATE;
+  kernel.mem_allocation = MemAllocation::PREALLOCATE;


I think we should always be able to preallocate the validity bitmap in addition to the data/offsets buffer, which will enable the preallocate_contiguous_ optimization for fixed width types.

Suggested change

kernel.null_handling = NullHandling::COMPUTED_NO_PREALLOCATE;

kernel.mem_allocation = MemAllocation::PREALLOCATE;

kernel.null_handling = NullHandling::COMPUTED_PREALLOCATE;

kernel.mem_allocation = MemAllocation::PREALLOCATE;

if (var width type) {

kernel.can_write_into_slices = false;

}

cpp/src/arrow/compute/kernels/scalar_if_else.cc

bkietz · 2021-07-12T15:34:37Z

cpp/src/arrow/compute/kernels/scalar_if_else.cc

+    std::shared_ptr<Array> temp_output;
+    RETURN_NOT_OK(builder.Finish(&temp_output));


Suggested change

std::shared_ptr<Array> temp_output;

RETURN_NOT_OK(builder.Finish(&temp_output));

ARROW_ASSIGN_OR_RAISE(auto temp_output, builder.Finish());

lidavidm · 2021-07-12T17:48:47Z

Performance is fairly meh. perf shows a decent amount of time spent in CopyValues/CopyOneValue just re-examining the Datum variant, calling Buffer::data(), etc.

-------------------------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------
CoalesceBench64/1048576/0    19254992 ns     19255104 ns           37 bytes_per_second=1.62295G/s
CoalesceBench64/1048576/99   19281752 ns     19281409 ns           37 bytes_per_second=1.62058G/s

Trying an approach based on VisitSetBitRunsVoid may be beneficial, and/or manually hoisting the scalar-vs-array detection and having separate CopyScalarValues and CopyArrayValues.

lidavidm · 2021-07-12T18:42:56Z

Alright, this buys us ~50% more performance by specializing the common case

--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
CoalesceBench64/1048576/0           13335778 ns     13335766 ns           52 bytes_per_second=2.34332G/s
CoalesceBench64/1048576/99          13562011 ns     13561826 ns           51 bytes_per_second=2.30404G/s
CoalesceNonNullBench64/1048576/0    13552482 ns     13552397 ns           48 bytes_per_second=2.30587G/s
CoalesceNonNullBench64/1048576/99   13572235 ns     13572002 ns           51 bytes_per_second=2.30232G/s

bkietz · 2021-07-13T19:27:29Z

Trying an approach based on VisitSetBitRunsVoid

IIUC this would require a varargs version of OptionalBitBlockCounter or Bitmap::VisitWords, which would probably be generally useful as varargs compute functions continue to proliferate

lidavidm · 2021-07-13T20:45:10Z

Basically, there was a lot of overhead from the fallback loop of "for offset in range(block size), if bit is set, copy one element" because 1) the 'copy one element' function used CopyBitmap which has a ton of overhead for copying one bit and 2) unboxing the array every time was costly when done in a loop like that (e.g. the profiler showed that even Buffer::data()'s check for whether the buffer is on-CPU was hot). But now I've specialized things to avoid most of that overhead.

The reason why I wanted something like VisitSetBitRunsVoid was to go a step further and always try to perform block copies instead of falling back to one-element-at-a-time-copies. But yes, then it needs to be able to combine two bitmaps with AndNot (we want runs of bits where !output_valid & input_valid)

lidavidm · 2021-07-13T20:46:10Z

I would kind of prefer to get all these kernels merged and consolidated before I start trying to microoptimize them, though, given they've been around for a while and all use similar helper code (that's now starting to diverge slightly once I look at optimizing).

lidavidm · 2021-07-14T15:22:42Z

@ursabot please benchmark lang=C++

ursabot · 2021-07-14T15:23:04Z

Benchmark runs are scheduled for baseline = 9c6d417 and contender = e32cf48. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.53% ⬆️0.05%] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

bkietz

This LGTM, but could you rewrite a bit for readability?

bkietz · 2021-07-14T18:56:42Z

cpp/src/arrow/compute/kernels/scalar_if_else.cc

+    if ((datum.is_scalar() && datum.scalar()->is_valid) ||
+        (datum.is_array() && !datum.array()->MayHaveNulls())) {
+      BitBlockCounter counter(out_valid, out_offset, batch.length);
+      int64_t offset = 0;
+      while (offset < batch.length) {
+        const auto block = counter.NextWord();
+        if (block.NoneSet()) {
+          CopyValues<Type>(datum, offset, block.length, out_valid, out_values,
+                           out_offset + offset);
+        } else if (!block.AllSet()) {
+          for (int64_t j = 0; j < block.length; ++j) {
+            if (!BitUtil::GetBit(out_valid, out_offset + offset + j)) {
+              CopyValues<Type>(datum, offset + j, 1, out_valid, out_values,
+                               out_offset + offset + j);
+            }
+          }
+        }
+        offset += block.length;
+      }
+      break;
+    } else if (datum.is_array()) {


Could you de-nest some of this branching by extracting some functions and intersperse some whitespace and comments? This is a little difficult to read. Something like:

Suggested change

if ((datum.is_scalar() && datum.scalar()->is_valid) ||

(datum.is_array() && !datum.array()->MayHaveNulls())) {

BitBlockCounter counter(out_valid, out_offset, batch.length);

int64_t offset = 0;

while (offset < batch.length) {

const auto block = counter.NextWord();

if (block.NoneSet()) {

CopyValues<Type>(datum, offset, block.length, out_valid, out_values,

out_offset + offset);

} else if (!block.AllSet()) {

for (int64_t j = 0; j < block.length; ++j) {

if (!BitUtil::GetBit(out_valid, out_offset + offset + j)) {

CopyValues<Type>(datum, offset + j, 1, out_valid, out_values,

out_offset + offset + j);

}

}

}

offset += block.length;

}

break;

} else if (datum.is_array()) {

if ((datum.is_scalar() && datum.scalar()->is_valid) ||

(datum.is_array() && !datum.array()->MayHaveNulls())) {

// all-valid scalar or array

CopyValuesAllValid<Type>(datum, batch.length, out_valid, out_values, out_offset);

break;

}

// null scalar; skip

if (datum.is_scalar()) continue;

lidavidm · 2021-07-14T20:47:06Z

Broke up the main function a bit.

lidavidm · 2021-07-15T20:02:40Z

Rebased (wow, that was more painful than I wanted it to be)

lidavidm · 2021-07-16T14:31:39Z

Rebased again to fix the conflict with the make_struct change.

bkietz

LGTM, thanks!

github-actions bot added the Component: C++ label Jun 28, 2021

lidavidm force-pushed the arrow-13136 branch 2 times, most recently from 392e979 to 59bdbf7 Compare June 30, 2021 15:13

lidavidm marked this pull request as draft July 1, 2021 18:12

lidavidm force-pushed the arrow-13136 branch from 59bdbf7 to 7864ee7 Compare July 1, 2021 20:24

lidavidm marked this pull request as ready for review July 1, 2021 20:25

lidavidm mentioned this pull request Jul 2, 2021

ARROW-13064: [C++] Implement select ('case when') function for fixed-width types #10557

Closed

lidavidm force-pushed the arrow-13136 branch from 08c6375 to b0f8858 Compare July 7, 2021 13:54

bkietz requested review from pitrou and bkietz and removed request for pitrou July 12, 2021 15:07

bkietz requested changes Jul 12, 2021

View reviewed changes

lidavidm force-pushed the arrow-13136 branch from b0f8858 to 74a7001 Compare July 12, 2021 17:47

lidavidm force-pushed the arrow-13136 branch from e32cf48 to 9fb0d66 Compare July 14, 2021 15:47

bkietz requested changes Jul 14, 2021

View reviewed changes

lidavidm force-pushed the arrow-13136 branch from c7e83fe to 7f0eace Compare July 15, 2021 20:02

ARROW-13064: [C++] Implement 'coalesce' function

1e85ca0

lidavidm force-pushed the arrow-13136 branch from 7f0eace to 1e85ca0 Compare July 16, 2021 14:31

lidavidm mentioned this pull request Jul 16, 2021

ARROW-13220: [C++] Implement 'choose' function #10642

Closed

bkietz approved these changes Jul 19, 2021

View reviewed changes

bkietz closed this in c848f12 Jul 19, 2021

asfimport mentioned this pull request Aug 9, 2021

[C++] Add a "coalesce" variadic scalar kernel #28837

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-13136: [C++] Add coalesce function #10608

ARROW-13136: [C++] Add coalesce function #10608

lidavidm commented Jun 28, 2021 •

edited

Loading

github-actions bot commented Jun 28, 2021

lidavidm commented Jul 2, 2021

lidavidm commented Jul 2, 2021

lidavidm commented Jul 2, 2021

github-actions bot commented Jul 2, 2021

lidavidm commented Jul 12, 2021

bkietz left a comment

bkietz Jul 12, 2021

bkietz Jul 12, 2021

lidavidm commented Jul 12, 2021

lidavidm commented Jul 12, 2021

bkietz commented Jul 13, 2021

lidavidm commented Jul 13, 2021

lidavidm commented Jul 13, 2021

lidavidm commented Jul 14, 2021

ursabot commented Jul 14, 2021 •

edited

Loading

bkietz left a comment

bkietz Jul 14, 2021

lidavidm commented Jul 14, 2021

lidavidm commented Jul 15, 2021

lidavidm commented Jul 16, 2021

bkietz left a comment

		kernel.null_handling = NullHandling::COMPUTED_NO_PREALLOCATE;
		kernel.mem_allocation = MemAllocation::PREALLOCATE;

		std::shared_ptr<Array> temp_output;
		RETURN_NOT_OK(builder.Finish(&temp_output));

	std::shared_ptr<Array> temp_output;
	RETURN_NOT_OK(builder.Finish(&temp_output));
	ARROW_ASSIGN_OR_RAISE(auto temp_output, builder.Finish());

ARROW-13136: [C++] Add coalesce function #10608

ARROW-13136: [C++] Add coalesce function #10608

Conversation

lidavidm commented Jun 28, 2021 • edited Loading

github-actions bot commented Jun 28, 2021

lidavidm commented Jul 2, 2021

lidavidm commented Jul 2, 2021

lidavidm commented Jul 2, 2021

github-actions bot commented Jul 2, 2021

lidavidm commented Jul 12, 2021

bkietz left a comment

Choose a reason for hiding this comment

bkietz Jul 12, 2021

Choose a reason for hiding this comment

bkietz Jul 12, 2021

Choose a reason for hiding this comment

lidavidm commented Jul 12, 2021

lidavidm commented Jul 12, 2021

bkietz commented Jul 13, 2021

lidavidm commented Jul 13, 2021

lidavidm commented Jul 13, 2021

lidavidm commented Jul 14, 2021

ursabot commented Jul 14, 2021 • edited Loading

bkietz left a comment

Choose a reason for hiding this comment

bkietz Jul 14, 2021

Choose a reason for hiding this comment

lidavidm commented Jul 14, 2021

lidavidm commented Jul 15, 2021

lidavidm commented Jul 16, 2021

bkietz left a comment

Choose a reason for hiding this comment

lidavidm commented Jun 28, 2021 •

edited

Loading

ursabot commented Jul 14, 2021 •

edited

Loading