[FLINK-39051][Table SQL/Planner] Support APPROX_COUNT_DISTINCT aggregate function in streaming mode with Window TVF #27555

Myracle · 2026-02-09T08:22:24Z

What is the purpose of the change

This pull request extends the APPROX_COUNT_DISTINCT aggregate function to support streaming mode, including Window TVF (TUMBLE, HOP, CUMULATE). Previously, this function was only available in batch mode. The implementation uses the HyperLogLog++ algorithm to provide approximate distinct counting with approximately 1% relative standard error while using constant memory (approximately 16KB). This enables efficient cardinality estimation in streaming scenarios where exact counts are not required.

Brief change log

Added a new unified ApproxCountDistinctAggFunctions class that supports both batch and streaming modes
Deprecated the existing BatchApproxCountDistinctAggFunctions class for backward compatibility
Updated AggFunctionFactory to use the new implementation and added validation to reject retraction scenarios in non-windowed streaming aggregations
Implemented merge() method required for Window TVF aggregation
Optimized timestamp hashing to support high-precision TIMESTAMP values (nanosecond precision)
Added SQL function documentation in both English and Chinese
Added release notes for Flink 2.2

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

This change added tests and can be verified as follows:

Added ApproxCountDistinctAggFunctionTest with unit tests covering all supported data types (TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATE, TIME, TIMESTAMP, TIMESTAMP_LTZ, VARCHAR/CHAR) and merge functionality
Added StreamApproxCountDistinctITCase with integration tests for:
- TUMBLE window aggregation
- HOP window aggregation
- CUMULATE window aggregation
- Different data types in streaming mode
- NULL value handling
- Precision validation with 10,000 records (error rate < 2%)
Added BatchApproxCountDistinctITCase with integration tests for:
- All supported data types in batch mode
- GROUP BY aggregation
- NULL value handling
- Empty table handling
- Precision validation with 10,000 records

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? docs / JavaDocs
- Added function documentation in docs/data/sql_functions.yml and docs/data/sql_functions_zh.yml
- Added comprehensive JavaDocs in ApproxCountDistinctAggFunctions class

flinkbot · 2026-02-09T08:30:09Z

CI report:

7ea8bd5 Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

…ate function in streaming mode with Window TVF

[FLINK-39051][Table SQL/Planner] Support APPROX_COUNT_DISTINCT aggreg…

7ea8bd5

…ate function in streaming mode with Window TVF

Myracle force-pushed the FLINK-39051-streaming-APPROX_COUNT_DISTINCT branch from cce7639 to 7ea8bd5 Compare February 9, 2026 08:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-39051][Table SQL/Planner] Support APPROX_COUNT_DISTINCT aggregate function in streaming mode with Window TVF #27555

[FLINK-39051][Table SQL/Planner] Support APPROX_COUNT_DISTINCT aggregate function in streaming mode with Window TVF #27555

Myracle commented Feb 9, 2026

Uh oh!

flinkbot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FLINK-39051][Table SQL/Planner] Support APPROX_COUNT_DISTINCT aggregate function in streaming mode with Window TVF #27555

Are you sure you want to change the base?

[FLINK-39051][Table SQL/Planner] Support APPROX_COUNT_DISTINCT aggregate function in streaming mode with Window TVF #27555

Conversation

Myracle commented Feb 9, 2026

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flinkbot commented Feb 9, 2026 •

edited

Loading