Skip to content

[Feature] Provide built-in Flink SQL functions for RoaringBitmap construction and cardinality query (rbm32_build, rbm64_build, rbm32_cardinality, rbm64_cardinality) #2848

@matrixsparse

Description

@matrixsparse

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Motivation

Fluss 0.9 introduced the Aggregation Merge Engine with rbm32 and rbm64 aggregate functions,
which enables storage-level precise UV counting via RoaringBitmap.

However, the current usage requires manual client-side serialization before inserting into
rbm32/rbm64 columns. This is because FieldRoaringBitmap64Agg.agg() expects both accumulator
and inputField to be pre-serialized byte[] of a Roaring64Bitmap.

As a result, users must either:

  1. Write Java application code to serialize each userId into a single-element bitmap byte[], or
  2. Implement and register a custom Flink ScalarFunction UDF for every project.

This creates a significant barrier to adoption, especially for:

  • Users writing Flink SQL jobs who cannot easily embed custom Java code
  • Users connecting from non-Java clients (Python, Go)
  • The Dictionary Table + RoaringBitmap UV counting pattern recommended in the 0.9 release notes,
    which requires converting an INT / BIGINT auto-increment ID into a serialized bitmap

Comparison with peer systems:

System Built-in bitmap functions
ClickHouse bitmapBuild(), bitmapCardinality(), bitmapOr()
Apache Doris bitmap_from_array(), bitmap_count(), bitmap_union()
Apache Paimon Requires manual serialization (same limitation)
Apache Fluss Requires manual serialization (current state)

Proposed Solution

Add the following built-in functions to the Flink connector (fluss-flink-common module):

Construction functions

-- Create a single-element 32-bit bitmap from an INT value
rbm32_build(value INT) → BYTES

-- Create a single-element 64-bit bitmap from a BIGINT value  
rbm64_build(value BIGINT) → BYTES

Cardinality functions

-- Get the number of distinct elements from a serialized 32-bit bitmap
rbm32_cardinality(bitmap BYTES) → BIGINT

-- Get the number of distinct elements from a serialized 64-bit bitmap
rbm64_cardinality(bitmap BYTES) → BIGINT

Solution

No response

Anything else?

No response

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions