-
Notifications
You must be signed in to change notification settings - Fork 518
[Feature] Provide built-in Flink SQL functions for RoaringBitmap construction and cardinality query (rbm32_build, rbm64_build, rbm32_cardinality, rbm64_cardinality) #2848
Copy link
Copy link
Open
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Motivation
Fluss 0.9 introduced the Aggregation Merge Engine with rbm32 and rbm64 aggregate functions,
which enables storage-level precise UV counting via RoaringBitmap.
However, the current usage requires manual client-side serialization before inserting into
rbm32/rbm64 columns. This is because FieldRoaringBitmap64Agg.agg() expects both accumulator
and inputField to be pre-serialized byte[] of a Roaring64Bitmap.
As a result, users must either:
- Write Java application code to serialize each
userIdinto a single-element bitmapbyte[], or - Implement and register a custom Flink
ScalarFunctionUDF for every project.
This creates a significant barrier to adoption, especially for:
- Users writing Flink SQL jobs who cannot easily embed custom Java code
- Users connecting from non-Java clients (Python, Go)
- The Dictionary Table + RoaringBitmap UV counting pattern recommended in the 0.9 release notes,
which requires converting anINT/BIGINTauto-increment ID into a serialized bitmap
Comparison with peer systems:
| System | Built-in bitmap functions |
|---|---|
| ClickHouse | bitmapBuild(), bitmapCardinality(), bitmapOr() |
| Apache Doris | bitmap_from_array(), bitmap_count(), bitmap_union() |
| Apache Paimon | Requires manual serialization (same limitation) |
| Apache Fluss | Requires manual serialization (current state) |
Proposed Solution
Add the following built-in functions to the Flink connector (fluss-flink-common module):
Construction functions
-- Create a single-element 32-bit bitmap from an INT value
rbm32_build(value INT) → BYTES
-- Create a single-element 64-bit bitmap from a BIGINT value
rbm64_build(value BIGINT) → BYTESCardinality functions
-- Get the number of distinct elements from a serialized 32-bit bitmap
rbm32_cardinality(bitmap BYTES) → BIGINT
-- Get the number of distinct elements from a serialized 64-bit bitmap
rbm64_cardinality(bitmap BYTES) → BIGINTSolution
No response
Anything else?
No response
Willingness to contribute
- I'm willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for Feature.