[FLINK-39051][Table SQL/Planner] Support APPROX_COUNT_DISTINCT aggregate function in streaming mode with Window TVF #27555
+1,796
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
This pull request extends the
APPROX_COUNT_DISTINCTaggregate function to support streaming mode, including Window TVF (TUMBLE, HOP, CUMULATE). Previously, this function was only available in batch mode. The implementation uses the HyperLogLog++ algorithm to provide approximate distinct counting with approximately 1% relative standard error while using constant memory (approximately 16KB). This enables efficient cardinality estimation in streaming scenarios where exact counts are not required.Brief change log
ApproxCountDistinctAggFunctionsclass that supports both batch and streaming modesBatchApproxCountDistinctAggFunctionsclass for backward compatibilityAggFunctionFactoryto use the new implementation and added validation to reject retraction scenarios in non-windowed streaming aggregationsmerge()method required for Window TVF aggregationVerifying this change
Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.
This change added tests and can be verified as follows:
ApproxCountDistinctAggFunctionTestwith unit tests covering all supported data types (TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATE, TIME, TIMESTAMP, TIMESTAMP_LTZ, VARCHAR/CHAR) and merge functionalityStreamApproxCountDistinctITCasewith integration tests for:BatchApproxCountDistinctITCasewith integration tests for:Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation
docs/data/sql_functions.ymlanddocs/data/sql_functions_zh.ymlApproxCountDistinctAggFunctionsclass