GH-35749: [C++] Handle run-end encoded filters in compute kernels #35750

felipecrv · 2023-05-24T21:40:49Z

Rationale for this change

Boolean arrays (bitmaps) used to represent filters in Arrow take 1 bit per boolean value. If the filter contains long runs, the filter can be run-end encoded and save even more memory.

Using POPCNT, a bitmap can be scanned efficiently for <64 runs of logical values, but a run-end encoded array gives the lengths of the run directly and go beyond word size per run.

These two observations make the case that, for the right dataset, REE filters can be more efficiently processed in compute kernels.

What changes are included in this PR?

GetFilterOutputSize can count number of emits from a REE filter
GetTakeIndices can produce an array of logical indices from a REE filter
"array_filter" can handle REE filters

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

Closes: [C++] Handle run-end encoded filters in compute kernels #35749

github-actions · 2023-05-24T21:41:11Z

Closes: [C++] Handle run-end encoded filters in compute kernels #35749

felipecrv · 2023-05-24T21:48:29Z

The code-moving commits in this PR are better reviewed by themselves in this separate PR: #35751

pitrou

Very neat PR @felipecrv !

pitrou · 2023-06-05T08:56:41Z

cpp/src/arrow/compute/kernels/ree_util_internal.h

+  /// Pre-conditions guaranteed by the callers:
+  /// - i and j are valid indices into the values buffer
+  /// - the values in i and j are valid
+  bool CompareValuesAt(int64_t i, int64_t j) const {


I don't understand what this is doing in REE utils? This is essentially representing value access in primitive arrays.

Comparing values is commonly used when run-end encoding kernels. I would happily move it out of here if you have a suggestion.

cpp/src/arrow/compute/kernels/vector_selection_internal.cc

pitrou · 2023-06-05T09:06:50Z

cpp/src/arrow/compute/kernels/vector_selection_internal.cc

+        const bool valid = bit_util::GetBit(filter_is_valid, i);
+        const bool emit = !valid || bit_util::GetBit(filter_selection, i);
+        if (emit) {
+          emit_segment(it.logical_position(), it.run_length(), valid);


I'm not sure whether you tried to time these new kernels, but emit_segment being a std::function will come with its own overhead (I'm not sure by how much).

Another approach would be to give emit_segment a batch of ranges:

struct REEFilterSegment { int64_t position; int64_t segment_length; bool filter_valid; // false means emit none }; using EmitREEFilterSegment = std::function<void(const REEFilterSegment*, int32_t num_segments)>;

Of course, ideally a REE-encoded array has long enough runs to make REE encoding worthwhile...

I developed the REExREE kernels first and measured the binary-size impact of using template-param lambdas: my compilation unit was the biggest in the whole project, so I decided to migrate to std::function to save multiple MBs in binary size.

The PlainxREE kernel is simpler, so maybe the impact wouldn't be so bad.

Of course, ideally a REE-encoded array has long enough runs to make REE encoding worthwhile...

Exactly. So I think it's better to start with std::function to avoid inflating the library size. If this gains adoption, we can revisit the kernels later.

cpp/src/arrow/compute/kernels/vector_selection_internal.cc

pitrou · 2023-06-05T09:13:34Z

cpp/src/arrow/compute/kernels/vector_selection_internal.cc

@@ -239,6 +309,43 @@ struct Selection {
      }
    };

+    if (is_ree_filter) {
+      Status status;


Why not let emit_segment return a Status instead?

When I started with the numeric kernels there was no way for emit_segment to fail. Making it return Status will create overhead in VisitPlainxREEFilterOutputSegments when it's dealing with primitives.

Should I worry about the overhead of checking for the status returned by std::function in the context of primitive kernels?

That's a good question. Ideally there should be no overhead returning a successful Status, but in practice there is (we try to measure it in type_benchmark.cc). I'll let you choose what is best here.

I added a bool return type that I check in the VisitPlainxREEFilterOutputSegments loop.

cpp/src/arrow/compute/kernels/vector_selection_filter_internal.cc

cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc

cpp/src/arrow/compute/kernels/vector_selection_test.cc

pitrou

Looks mostly good to me now, just a number of minor comments.

cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc

cpp/src/arrow/compute/kernels/vector_selection_test.cc

cpp/src/arrow/compute/kernels/vector_selection_filter_internal.cc

pitrou · 2023-06-14T14:34:05Z

cpp/src/arrow/compute/kernels/vector_selection_filter_internal.cc

+    };
+    Status status;
+    VisitPlainxREEFilterOutputSegments(
+        filter, true, null_selection,


Also include parameter name here.

I reviewed all calls to VisitPlainxREEFilterOutputSegments and added the label.

pitrou · 2023-06-14T14:34:29Z

cpp/src/arrow/compute/kernels/vector_selection_filter_internal.cc

+          status = emit_segment(position, segment_length, filter_valid);
+          return status.ok();
+        });
+    RETURN_NOT_OK(std::move(status));


The std::move is a bit pedantic here IMHO...

GenericToStatus takes an r-value and that avoids code-bloat -- no need to generate memory allocation code to copy the Status string. This is multiplied by the number of template instances.

#define ARROW_RETURN_NOT_OK(status) \ do { \ ::arrow::Status __s = ::arrow::internal::GenericToStatus(status); \ ARROW_RETURN_IF_(!__s.ok(), __s, ARROW_STRINGIFY(status)); \ } while (false)

Status::Status(const Status& s) : state_((s.state_ == NULLPTR) ? NULLPTR : new State(*s.state_)) {}

vs

Status::Status(Status&& s) noexcept : state_(s.state_) { s.state_ = NULLPTR; }

being inlined.

Which yes, is a pedantic std::move, but should I remove it? :D

No need to remove it. I'm merely pointing out that it's not necessary to micro-optimize this particular end of function :-)

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

This reverts commit a0bb513.

…Segments

felipecrv · 2023-06-14T23:38:12Z

I rebased and forced-pushed (instead of merging) to see if the macOS script that failed in CI now works.

pitrou

Thanks a lot @felipecrv !

conbench-apache-arrow · 2023-06-17T06:35:16Z

Conbench analyzed the 6 benchmark runs on commit 475b5b94.

There were 30 benchmark results indicating a performance regression:

Commit Run on arm64-m6g-linux-compute at 2023-06-15 20:50:30Z
- params=1048576/6, source=cpp-micro, suite=arrow-compute-vector-selection-benchmark
- params=1048576/1, source=cpp-micro, suite=arrow-compute-vector-selection-benchmark
and 28 more (see the report linked below)

The full Conbench report has more details.

felipecrv requested a review from westonpace as a code owner May 24, 2023 21:40

felipecrv marked this pull request as draft May 24, 2023 21:40

github-actions bot added Component: C++ awaiting review Awaiting review labels May 24, 2023

felipecrv mentioned this pull request May 25, 2023

GH-35765: [C++] Split vector_selection.cc into more compilation units #35751

Merged

1 task

felipecrv force-pushed the plain_x_ree_filter branch 2 times, most recently from 4d04729 to ffa119d Compare June 1, 2023 03:33

felipecrv marked this pull request as ready for review June 1, 2023 03:43

felipecrv requested a review from pitrou June 1, 2023 14:13

felipecrv force-pushed the plain_x_ree_filter branch from ffa119d to a0bb513 Compare June 2, 2023 16:48

pitrou requested changes Jun 5, 2023

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 5, 2023

felipecrv force-pushed the plain_x_ree_filter branch from ece8e62 to 3f52581 Compare June 5, 2023 23:20

felipecrv requested a review from pitrou June 9, 2023 18:09

pitrou reviewed Jun 14, 2023

View reviewed changes

felipecrv requested a review from pitrou June 14, 2023 20:04

felipecrv and others added 11 commits June 14, 2023 20:37

Make GetFilterOutputSize handle REE filters

bce5be4

Make GetTakeIndices handle REE filters

7e256d4

ree_util: Add CompareValuesAt to ReadWriteValue

7a2f2ac

Add another convenient factory method for RunEndEncoded matcher

6cd4875

Make 'array_filter' handle REE filters

3057ef0

Simplification: Remove out_offset_ variable as it's always 0

48fd629

Apply suggestions from code review

1473ea7

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Fix fast path comment

fa4e060

Extract CheckTakeCase test utility

cf29d2a

Remove cur_offset variable

7a1f4bd

Revert "Simplification: Remove out_offset_ variable as it's always 0"

3b88445

This reverts commit a0bb513.

felipecrv added 5 commits June 14, 2023 20:37

Review all writes and make the account for out_offset

188aff0

Return bool from emit_segment calls

a27b20d

fix linter error

d76883b

Json -> JSON, CheckTakeCase -> CheckTakeIndicesCase

429fe5e

Review all filter_may_have_nulls passed to VisitPlainxREEFilterOutput…

ad05570

…Segments

felipecrv force-pushed the plain_x_ree_filter branch from bfcb695 to ad05570 Compare June 14, 2023 23:37

pitrou approved these changes Jun 15, 2023

View reviewed changes

pitrou merged commit 475b5b9 into apache:main Jun 15, 2023
35 checks passed

felipecrv deleted the plain_x_ree_filter branch June 15, 2023 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-35749: [C++] Handle run-end encoded filters in compute kernels #35750

GH-35749: [C++] Handle run-end encoded filters in compute kernels #35750

felipecrv commented May 24, 2023 •

edited

Loading

github-actions bot commented May 24, 2023

felipecrv commented May 24, 2023

pitrou left a comment

pitrou Jun 5, 2023

felipecrv Jun 5, 2023

pitrou Jun 5, 2023

pitrou Jun 5, 2023

felipecrv Jun 5, 2023

pitrou Jun 5, 2023

felipecrv Jun 5, 2023

pitrou Jun 6, 2023

felipecrv Jun 9, 2023

pitrou left a comment

pitrou Jun 14, 2023

felipecrv Jun 14, 2023

pitrou Jun 14, 2023

felipecrv Jun 14, 2023

felipecrv Jun 14, 2023

felipecrv Jun 14, 2023

pitrou Jun 15, 2023

felipecrv commented Jun 14, 2023

pitrou left a comment

conbench-apache-arrow bot commented Jun 17, 2023

GH-35749: [C++] Handle run-end encoded filters in compute kernels #35750

GH-35749: [C++] Handle run-end encoded filters in compute kernels #35750

Conversation

felipecrv commented May 24, 2023 • edited Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented May 24, 2023

felipecrv commented May 24, 2023

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felipecrv commented Jun 14, 2023

pitrou left a comment

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Jun 17, 2023

felipecrv commented May 24, 2023 •

edited

Loading