[C++] Investigate potential performance improvements for the filter node

Right now some early runs with Arrowbench and the OT PR (https://github.com/apache/arrow/pull/12100) shows that we spend a fair amount of time in TPC-H queries on filter nodes.  There are a few improvements we know could be made to our filtering approach at the moment.  I'm creating this parent issue to help categorize and track those:

- ~~We can use a selection vector in our filters to reduce the amount of materialization needed.  While long term we may want to support a selection vector throughout the exec plan a good start would be to use it when we encounter a chain of filters to avoid excess materialization (e.g. x < 10 && x > 5 && y < 20)~~
- If a filter if very selective then we may end up outputting a lot of very small batches.  We could probably hold onto the data at the filter node until we've accumulated enough rows for a decent sized batch.
- The filter node is currently creating new thread tasks instead of appending its work onto an existing thread task.
- If we have a chain of filters we could potentially use runtime selectivity statistics / estimates to reorder our filters so that the most selective filters are evaluated first.

**Reporter**: [Weston Pace](https://issues.apache.org/jira/browse/ARROW-15519) / @westonpace
#### Subtasks:
- [ ] [[C++] Investigate batching filter node output](https://github.com/apache/arrow/issues/30994)
- [ ] [[C++] Investigate reporting filter selectivity for filter order optimization](https://github.com/apache/arrow/issues/30995)
#### Related issues:
- [[C++][Compute] Replace ExecNode::InputReceived with ::MakeTask (Part 2)](https://github.com/apache/arrow/issues/30492) (is related to)

<sub>**Note**: *This issue was originally created as [ARROW-15519](https://issues.apache.org/jira/browse/ARROW-15519). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Investigate potential performance improvements for the filter node #30992

Subtasks:

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++] Investigate potential performance improvements for the filter node #30992

Description

Subtasks:

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions