Skip to content

Add statistical type aggregate functions, including autocorrelation, skewness, and linear regression#17292

Open
Cool6689 wants to merge 7 commits intoapache:masterfrom
Cool6689:feat/Statistical-Aggregate-Functions
Open

Add statistical type aggregate functions, including autocorrelation, skewness, and linear regression#17292
Cool6689 wants to merge 7 commits intoapache:masterfrom
Cool6689:feat/Statistical-Aggregate-Functions

Conversation

@Cool6689
Copy link

@Cool6689 Cool6689 commented Mar 12, 2026

This pull request adds support for several advanced statistical aggregation functions to the IoTDB query engine, including correlation, covariance, regression, skewness, and kurtosis. It introduces new accumulator classes to implement these functions and updates the relevant factory and enum classes to register and handle them appropriately.

New statistical aggregation functions:

  • Added new aggregation function types: CORR, COVAR_POP, COVAR_SAMP, REGR_SLOPE, REGR_INTERCEPT, SKEWNESS, and KURTOSIS to BuiltinAggregationFunctionEnum for recognition in the system.
  • Updated AccumulatorFactory to recognize these new functions as multi-input or single-input aggregations and to instantiate the appropriate new accumulator classes when requested. [1] [2] [3]

Implementation of new accumulator classes:

  • Introduced CorrelationAccumulator for correlation and covariance calculations, supporting both population and sample variants.
  • Introduced RegressionAccumulator for regression slope and intercept calculations.
  • Introduced CentralMomentAccumulator for skewness and kurtosis calculations.

Integration with sliding window aggregators:

  • Updated SlidingWindowAggregatorFactory to support the new statistical functions within sliding window queries.This pull request introduces support for several advanced statistical aggregation functions in the IoTDB query engine, including correlation, covariance, regression, skewness, and kurtosis. It adds new accumulator implementations for these functions and integrates them into the aggregation and sliding window frameworks.

The most important changes are:

New statistical accumulator implementations:

  • Added CentralMomentAccumulator for computing skewness and kurtosis. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/aggregation/CentralMomentAccumulator.java)
  • Added CorrelationAccumulator for correlation and covariance calculations. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/aggregation/CorrelationAccumulator.java)
  • Added RegressionAccumulator for regression slope and intercept calculations. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/aggregation/RegressionAccumulator.java)

Integration with aggregation framework:

  • Updated AccumulatorFactory to support the new aggregation functions, including logic for multi-input and single-input accumulators, and to instantiate the new accumulator classes as appropriate. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/aggregation/AccumulatorFactory.java) [1] [2] [3]

Sliding window support:

  • Modified SlidingWindowAggregatorFactory to support the new statistical functions within sliding window aggregations. (iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/aggregation/slidingwindow/SlidingWindowAggregatorFactory.java)

Enumeration and function registration:

  • Extended BuiltinAggregationFunctionEnum to include the new statistical aggregation function names. (integration-test/src/main/java/org/apache/iotdb/itbase/constant/BuiltinAggregationFunctionEnum.java)## Description

Content1 ...

Content2 ...

Content3 ...


This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@Cool6689 Cool6689 changed the title Enhance aggregate functions with correlation, regression, and validation Add statistical type aggregate functions, including autocorrelation, skewness, and linear regression Mar 12, 2026
…ity by removing unnecessary comments and whitespace
@Cool6689 Cool6689 closed this Mar 13, 2026
@Cool6689 Cool6689 reopened this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant