Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Extend RecordBatch.filter to take an Expression in addition to a boolean mask Array #39220

Closed
nph opened this issue Dec 13, 2023 · 2 comments

Comments

@nph
Copy link
Contributor

nph commented Dec 13, 2023

Describe the enhancement requested

Currently RecordBatches can only be filtered using a boolean mask Array, unlike Tables which can be filtered using either a mask or an Expression. It would be useful to allow RecordBatch.filter to also accept an Expression to make it consistent with Table.filter.

See also discussion here

Component(s)

Python

@Fokko
Copy link
Contributor

Fokko commented Jun 21, 2024

Looks like this is supported in Rust as well: https://arrow.apache.org/rust/arrow_select/filter/fn.filter_record_batch.html

jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Jun 25, 2024
wjones127 pushed a commit that referenced this issue Jun 26, 2024
… in addition to mask array (#43043)

### Rationale for this change

`Table.filter()` already accepted either a boolean mask array or a boolean expression. But the equivalent method on `RecordBatch` only accepted the array. This makes both methods consistent in accepting both types of mask.

### What changes are included in this PR?

Consolidate the `Table.filter` and `RecordBatch.fitler` methods into a single shared method on the base class, and expanded the `_filter_table` Acero helper to also work with RecordBatch in addition to Table (and ensure to return a batch if the input was a batch)

### Are these changes tested?
Yes
* GitHub Issue: #39220

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
@wjones127
Copy link
Member

Issue resolved by pull request 43043
#43043

@wjones127 wjones127 added this to the 17.0.0 milestone Jun 26, 2024
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Jul 9, 2024
…ession in addition to mask array (apache#43043)

### Rationale for this change

`Table.filter()` already accepted either a boolean mask array or a boolean expression. But the equivalent method on `RecordBatch` only accepted the array. This makes both methods consistent in accepting both types of mask.

### What changes are included in this PR?

Consolidate the `Table.filter` and `RecordBatch.fitler` methods into a single shared method on the base class, and expanded the `_filter_table` Acero helper to also work with RecordBatch in addition to Table (and ensure to return a batch if the input was a batch)

### Are these changes tested?
Yes
* GitHub Issue: apache#39220

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants