Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35749: [C++] Handle run-end encoded filters in compute kernels #35750

Merged
merged 16 commits into from
Jun 15, 2023

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented May 24, 2023

Rationale for this change

Boolean arrays (bitmaps) used to represent filters in Arrow take 1 bit per boolean value. If the filter contains long runs, the filter can be run-end encoded and save even more memory.

Using POPCNT, a bitmap can be scanned efficiently for <64 runs of logical values, but a run-end encoded array gives the lengths of the run directly and go beyond word size per run.

These two observations make the case that, for the right dataset, REE filters can be more efficiently processed in compute kernels.

What changes are included in this PR?

  • GetFilterOutputSize can count number of emits from a REE filter
  • GetTakeIndices can produce an array of logical indices from a REE filter
  • "array_filter" can handle REE filters

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

@felipecrv felipecrv marked this pull request as draft May 24, 2023 21:40
@github-actions
Copy link

@felipecrv
Copy link
Contributor Author

The code-moving commits in this PR are better reviewed by themselves in this separate PR: #35751

@felipecrv felipecrv force-pushed the plain_x_ree_filter branch 2 times, most recently from 4d04729 to ffa119d Compare June 1, 2023 03:33
@felipecrv felipecrv marked this pull request as ready for review June 1, 2023 03:43
@felipecrv felipecrv requested a review from pitrou June 1, 2023 14:13
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat PR @felipecrv !

/// Pre-conditions guaranteed by the callers:
/// - i and j are valid indices into the values buffer
/// - the values in i and j are valid
bool CompareValuesAt(int64_t i, int64_t j) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is doing in REE utils? This is essentially representing value access in primitive arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing values is commonly used when run-end encoding kernels. I would happily move it out of here if you have a suggestion.

const bool valid = bit_util::GetBit(filter_is_valid, i);
const bool emit = !valid || bit_util::GetBit(filter_selection, i);
if (emit) {
emit_segment(it.logical_position(), it.run_length(), valid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether you tried to time these new kernels, but emit_segment being a std::function will come with its own overhead (I'm not sure by how much).

Another approach would be to give emit_segment a batch of ranges:

struct REEFilterSegment {
  int64_t position;
  int64_t segment_length;
  bool filter_valid; // false means emit none
};

using EmitREEFilterSegment =
    std::function<void(const REEFilterSegment*, int32_t num_segments)>;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, ideally a REE-encoded array has long enough runs to make REE encoding worthwhile...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I developed the REExREE kernels first and measured the binary-size impact of using template-param lambdas: my compilation unit was the biggest in the whole project, so I decided to migrate to std::function to save multiple MBs in binary size.

image

The PlainxREE kernel is simpler, so maybe the impact wouldn't be so bad.

Of course, ideally a REE-encoded array has long enough runs to make REE encoding worthwhile...

Exactly. So I think it's better to start with std::function to avoid inflating the library size. If this gains adoption, we can revisit the kernels later.

cpp/src/arrow/compute/kernels/vector_selection_internal.cc Outdated Show resolved Hide resolved
@@ -239,6 +309,43 @@ struct Selection {
}
};

if (is_ree_filter) {
Status status;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not let emit_segment return a Status instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I started with the numeric kernels there was no way for emit_segment to fail. Making it return Status will create overhead in VisitPlainxREEFilterOutputSegments when it's dealing with primitives.

Should I worry about the overhead of checking for the status returned by std::function in the context of primitive kernels?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Ideally there should be no overhead returning a successful Status, but in practice there is (we try to measure it in type_benchmark.cc). I'll let you choose what is best here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bool return type that I check in the VisitPlainxREEFilterOutputSegments loop.

cpp/src/arrow/compute/kernels/vector_selection_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_selection_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 5, 2023
@felipecrv felipecrv requested a review from pitrou June 9, 2023 18:09
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me now, just a number of minor comments.

cpp/src/arrow/compute/kernels/vector_selection_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_selection_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_selection_test.cc Outdated Show resolved Hide resolved
};
Status status;
VisitPlainxREEFilterOutputSegments(
filter, true, null_selection,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also include parameter name here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed all calls to VisitPlainxREEFilterOutputSegments and added the label.

status = emit_segment(position, segment_length, filter_valid);
return status.ok();
});
RETURN_NOT_OK(std::move(status));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::move is a bit pedantic here IMHO...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenericToStatus takes an r-value and that avoids code-bloat -- no need to generate memory allocation code to copy the Status string. This is multiplied by the number of template instances.

  #define ARROW_RETURN_NOT_OK(status)                                   \
    do {                                                                \
      ::arrow::Status __s = ::arrow::internal::GenericToStatus(status); \
      ARROW_RETURN_IF_(!__s.ok(), __s, ARROW_STRINGIFY(status));        \
    } while (false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status::Status(const Status& s)
      : state_((s.state_ == NULLPTR) ? NULLPTR : new State(*s.state_)) {}

vs

  Status::Status(Status&& s) noexcept : state_(s.state_) { s.state_ = NULLPTR; }

being inlined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which yes, is a pedantic std::move, but should I remove it? :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to remove it. I'm merely pointing out that it's not necessary to micro-optimize this particular end of function :-)

@felipecrv felipecrv requested a review from pitrou June 14, 2023 20:04
@felipecrv
Copy link
Contributor Author

I rebased and forced-pushed (instead of merging) to see if the macOS script that failed in CI now works.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @felipecrv !

@pitrou pitrou merged commit 475b5b9 into apache:main Jun 15, 2023
35 checks passed
@felipecrv felipecrv deleted the plain_x_ree_filter branch June 15, 2023 14:16
@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit 475b5b94.

There were 30 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Handle run-end encoded filters in compute kernels
2 participants