Skip to content

Feat: Add APPROX_MOST_FREQUENT Aggregation Function#15570

Merged
JackieTien97 merged 8 commits intoapache:masterfrom
FearfulTomcat27:feat/mostfrequent
May 28, 2025
Merged

Feat: Add APPROX_MOST_FREQUENT Aggregation Function#15570
JackieTien97 merged 8 commits intoapache:masterfrom
FearfulTomcat27:feat/mostfrequent

Conversation

@FearfulTomcat27
Copy link
Copy Markdown
Contributor

This pull request introduces a new feature for approximate most frequent value aggregation (approx_most_frequent) in Apache IoTDB. It includes changes to support this functionality in the query engine, test cases, and dependency updates. Below are the key changes:

Feature Implementation: Approximate Most Frequent Aggregation

  • New Abstract Class for Accumulators: Added AbstractApproxMostFrequentAccumulator to handle common logic for approximate most frequent value aggregation. It includes methods for intermediate and final evaluations, reset functionality, and unsupported statistics handling. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/AbstractApproxMostFrequentAccumulator.java)

  • Specific Accumulators: Introduced specialized accumulator classes such as BinaryApproxMostFrequentAccumulator and BlobApproxMostFrequentAccumulator to handle different data types, including binary and blob data. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/BinaryApproxMostFrequentAccumulator.java, iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/BlobApproxMostFrequentAccumulator.java) [1] [2]

  • Accumulator Factory Enhancements: Updated AccumulatorFactory to support APPROX_MOST_FREQUENT aggregation type, with methods to create grouped and table accumulators for various data types. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/AccumulatorFactory.java) [1] [2] [3]

Dependency Updates

  • Added Dependency for Stream Library: Included com.clearspring.analytics:stream (version 2.9.8) to support space-saving data structures used in the new aggregation functionality. (iotdb-core/datanode/pom.xml)

Testing

  • Integration Tests: Added a new test case approxMostFrequentTest to verify the functionality of the approx_most_frequent aggregation. The test includes queries for different scenarios and validates the results against expected outputs. (integration-test/src/test/java/org/apache/iotdb/relational/it/query/recent/IoTDBTableAggregationIT.java)

Refactoring and Adjustments

  • Refactored Approximation Logic: Adjusted imports and dependencies in ApproxCountDistinctAccumulator to align with the new structure for approximate aggregations. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/ApproxCountDistinctAccumulator.java) [1] [2]

Comment on lines +395 to +399
<dependency>
<groupId>com.clearspring.analytics</groupId>
<artifactId>stream</artifactId>
<version>2.9.8</version>
</dependency>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid using this dependency, it seems that it has stopped to be maintained and has a lot of CVEs.

@JackieTien97 JackieTien97 merged commit 4342166 into apache:master May 28, 2025
56 of 58 checks passed
@FearfulTomcat27 FearfulTomcat27 deleted the feat/mostfrequent branch October 9, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants