Skip to content

Conversation

@uros7251brick
Copy link
Contributor

Currently, Spark has two bitmap aggregation functions bitmap_construct_agg and bitmap_or_agg for constructing a bitmap out of set of integers and performing union on two sets represented by bitmaps, respectively. However, efficient intersect operation (bitwise AND) is missing.

What changes were proposed in this pull request?

  • Implemented bitmap_and_agg expression: New aggregation function that performs bitwise AND operations on binary column inputs.

Design Decisions

  • Result on empty input is identity element for the operation: Empty input groups return all-ones bitmaps (AND identity).
  • Missing bytes handling: For AND operations, missing bytes in input are treated as zeros to maintain intersection semantics.

How was this patch tested?

Added new test cases to cover bitmap_and_agg functionality:

  • BitmapExpressionsQuerySuite: Added test cases for basic AND operations, edge cases, empty group handling, and integration with other bitmap functions.

Does this PR introduce any user-facing change?

No.

Was this patch authored or co-authored using generative AI tooling?

No.

@uros7251brick uros7251brick marked this pull request as ready for review October 13, 2025 14:05
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 56f8b3b Oct 13, 2025
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
Currently, Spark has two bitmap aggregation functions `bitmap_construct_agg` and `bitmap_or_agg` for constructing a bitmap out of set of integers and performing union on two sets represented by bitmaps, respectively. However, efficient intersect operation (bitwise AND) is missing.

## What changes were proposed in this pull request?
- **Implemented `bitmap_and_agg` expression**: New aggregation function that performs bitwise AND operations on binary column inputs.

### Design Decisions
- **Result on empty input is identity element for the operation**: Empty input groups return all-ones bitmaps (AND identity).
- **Missing bytes handling**: For AND operations, missing bytes in input are treated as zeros to maintain intersection semantics.

## How was this patch tested?
Added new test cases to cover `bitmap_and_agg` functionality:
- **`BitmapExpressionsQuerySuite`**: Added test cases for basic AND operations, edge cases, empty group handling, and integration with other bitmap functions.

### Does this PR introduce _any_ user-facing change?
No.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52586 from uros7251brick/add-bitmap-and-agg.

Authored-by: Uros Stojkovic <uros.stojkovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants