Skip to content

Centralize Aggregate FILTER Some(true) Semantics for Consistency and Correctness Hardening #22665

@kosiew

Description

@kosiew

Summary

Aggregate FILTER row-validity handling is currently fragmented across helper and grouped accumulator paths. We should centralize the invariant that a row passes aggregate FILTER if and only if the predicate is Some(true).

Current State

  • filter_to_validity was introduced and grouped-accumulator helper paths were updated (including accumulate_multiple and filter_to_nulls plumbing) in fix: Incorrect behavior for FILTER on NULLs #22068
  • first_last.rs still uses local filter checks in grouped paths.
  • accumulate_indices still repeats 64-bit validity iteration logic across multiple branches.

Problem

Duplicated nullable-filter logic increases the risk of semantic drift and regressions (especially around NULL predicate handling), and makes validity bitmap code harder to maintain.

Because grouped and non-grouped paths can evolve independently, this drift can become user-visible as incorrect aggregate results under nullable predicates.

Proposal

  1. Add one shared helper/iterator in functions-aggregate-common that encodes:
    • row passes FILTER iff predicate is Some(true)
  2. Migrate grouped first_value / last_value filter paths to this shared utility.
  3. Refactor accumulate_indices to reuse shared validity iteration and remove duplicated loops.
  4. Migrate remaining grouped accumulators incrementally.

Scope

In scope:

  • Shared FILTER validity/predicate utility.
  • Targeted migration of first_last grouped paths.
  • Targeted cleanup in accumulate_indices.
  • Focused test coverage for nullable FILTER predicates.

Out of scope:

  • Broad performance rewrites not tied to FILTER semantics.
  • Large cross-workspace refactors.

Acceptance Criteria

  1. A single reusable utility encodes FILTER pass criteria as Some(true).
  2. Grouped first_value / last_value no longer use bespoke nullable filter checks.
  3. accumulate_indices no longer carries duplicated validity loop logic that can drift semantically.
  4. Existing aggregate FILTER tests pass.
  5. New tests cover nullable FILTER predicates for grouped first/last behavior.
  6. No regression for non-null filter predicates.

Testing

  • Add SQLLogicTests for grouped first_value and last_value with nullable FILTER predicates.
  • Include cases where value bits may be true while validity is NULL.
  • Validate result is NULL when no rows satisfy Some(true).
  • Run relevant aggregate crate tests and sqllogictest aggregate coverage.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions