Skip to content

feat(functions-nested): add array_filter higher-order function#21895

Open
ologlogn wants to merge 1 commit intoapache:mainfrom
ologlogn:array-filter-lambda
Open

feat(functions-nested): add array_filter higher-order function#21895
ologlogn wants to merge 1 commit intoapache:mainfrom
ologlogn:array-filter-lambda

Conversation

@ologlogn
Copy link
Copy Markdown

@ologlogn ologlogn commented Apr 28, 2026

Which issue does this PR close?

Partially addresses #14509 — implements array_filter / list_filter.

Rationale for this change

array_transform (#21679) added the first HigherOrderUDF. array_filter is the natural companion: filter array elements with a boolean lambda, matching Spark filter / DuckDB list_filter semantics.

What changes are included in this PR?

  • New HigherOrderUDF ArrayFilter (array_filter / list_filter alias)
    • Boolean lambda per element; true keeps, false/null drops (matches Spark semantics)
    • Handles List, LargeList, sliced arrays, null sublists
    • Scalar predicate short-circuit (x -> true / x -> false)
    • No-copy fast path when nothing is filtered (skips arrow::compute::filter)
  • lambda_utils.rs: shared HOF helpers extracted from array_transform (value_lambda_pair, coerce_single_list_arg, single_list_lambda_parameters, extract_list_values)
  • test_utils.rs: shared unit test helpers (create_i32_list, eval_hof_on_i32_list)

Are these changes tested?

  • Unit tests: basic filter, multiple sublists, sliced arrays, null sublists, all-filtered-out, nothing-filtered (fast path), scalar true/false predicates
  • SQL logic tests in higher_order.slt: filter variants, array_filter + array_transform combinations, error cases

Are there any user-facing changes?

Yes — array_filter(array, lambda) and alias list_filter(array, lambda) are now available as SQL functions.

@github-actions github-actions Bot added the functions Changes to functions implementation label Apr 28, 2026
@ologlogn ologlogn force-pushed the array-filter-lambda branch from 6ff8773 to 07e4548 Compare April 28, 2026 16:39
@ologlogn
Copy link
Copy Markdown
Author

Hi @gabotechs, could you please trigger CI? Thanks!

@ologlogn ologlogn force-pushed the array-filter-lambda branch from 07e4548 to 44715ac Compare April 28, 2026 18:24
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) documentation Improvements or additions to documentation labels Apr 28, 2026
@ologlogn ologlogn force-pushed the array-filter-lambda branch from cbf076a to 36c8f36 Compare April 29, 2026 12:06
@ologlogn ologlogn closed this Apr 29, 2026
@ologlogn ologlogn force-pushed the array-filter-lambda branch from cb94b16 to ec92925 Compare April 29, 2026 13:00
@ologlogn ologlogn reopened this Apr 29, 2026
@ologlogn ologlogn force-pushed the array-filter-lambda branch from 36c8f36 to 406f85b Compare April 29, 2026 13:03
@LiaCastaneda LiaCastaneda mentioned this pull request Apr 29, 2026
24 tasks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me if I missed something, but how does this function behave if the array argument is null or the lambda itself is null? Could we have some tests for those as well?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null array -> should return null.
if lambda returns null for some elements -> those will be filtered out. Null is treated as false.
if lambda always returns null -> output will be empty list.

i will try to add sql tests for this

@ologlogn ologlogn force-pushed the array-filter-lambda branch from 406f85b to 4e0caa6 Compare April 29, 2026 14:19
impl ArrayFilter {
pub fn new() -> Self {
Self {
signature: HigherOrderSignature::user_defined(Volatility::Immutable),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to open a PR soon so we can be more specific about the Lambda signature we want (e.g. exact types) so all the validation can be hidden into the planner (and potentially be able to remove value_lambda_pair)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants