-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting complex aggregation functions #5261
Comments
This is amazing, requirements for the AggregationFunction interface
|
|
mayankshriv
added a commit
that referenced
this issue
Apr 17, 2020
This PR starts to address the issue #5261. The current implementation assumes that all AggregationFunctions take one argument with the exception of DistinctAggregationFunction. This PR handles changes related to supporting AggregationFunctions with multiple arguments, as we anticipate new aggregation functions to be added that take multiple arguments. 1. Enhanced parser to allow multiple arguments for aggregation functions. 2. AggregationFunctionFactory provides the right set of arguments when instantiating individual aggregation functions. 3. AggregationFunctions now store their arguments, as opposed to assuming that the right BlockValSet is passed to the aggregate() api's. 4. AggregationFunction.aggregate() api's now take a Map<String, BlockValSet> where the key is the argument expression (columnName for simple case), as opposed to a variable array as that interface does not provide a way to associate BlockValSet with the argument. 5. Cleanup: Removed env variable to enable/disable Distinct, as there is no need for it to be disabled anymore.
mayankshriv
added a commit
to mayankshriv/pinot
that referenced
this issue
Apr 18, 2020
…td). This PR is a continuation of apache#5259 to address the issue apache#5261. 1. Added new field in request.thrift `aggregationFunctionArgs` as a list of String arguments for the aggregation funciton. - Could not use the existing `aggregationParams` as it is a Map, and functions with variable arguments may not provide a name for the arg (to be used as key in Map). - Maintain backward compatibility by first check for the new field, and fall back to the existing one if it does not exist. 2. Ensure that all calls to the old AggregationInfo.getAggregationParams() is replaced with backward compatible AgguregationFunctionUtils.getAggregationArgs(). 3. Since most aggregation functions today have just one argument, added a separate api AggregationFuncitonContext.getFirstArgument() as an optimization. 4. Cleaned up getColumnName() and getResultColumnName() api's in AggregationFunctionContext class to not require the column name argument, as this is already stored in the AggregationFunction. 5. Modified all tests to use aggregationFunctionArgs instead of aggregationParams. TODO: Remove the AggregationFunctionContext class as AggregationFunctions now store their arguments, and this class no longer provides any additional value.
mayankshriv
added a commit
to mayankshriv/pinot
that referenced
this issue
Apr 19, 2020
…td). This PR is a continuation of apache#5259 to address the issue apache#5261. 1. Added new field in request.thrift `aggregationFunctionArgs` as a list of String arguments for the aggregation funciton. - Could not use the existing `aggregationParams` as it is a Map, and functions with variable arguments may not provide a name for the arg (to be used as key in Map). - Maintain backward compatibility by first check for the new field, and fall back to the existing one if it does not exist. 2. Ensure that all calls to the old AggregationInfo.getAggregationParams() is replaced with backward compatible AgguregationFunctionUtils.getAggregationArgs(). 3. Since most aggregation functions today have just one argument, added a separate api AggregationFuncitonContext.getFirstArgument() as an optimization. 4. Cleaned up getColumnName() and getResultColumnName() api's in AggregationFunctionContext class to not require the column name argument, as this is already stored in the AggregationFunction. 5. Modified all tests to use aggregationFunctionArgs instead of aggregationParams. TODO: Remove the AggregationFunctionContext class as AggregationFunctions now store their arguments, and this class no longer provides any additional value.
mayankshriv
added a commit
that referenced
this issue
Apr 19, 2020
…td). (#5275) This PR is a continuation of #5259 to address the issue #5261. 1. Added new field in request.thrift `aggregationFunctionArgs` as a list of String arguments for the aggregation funciton. - Could not use the existing `aggregationParams` as it is a Map, and functions with variable arguments may not provide a name for the arg (to be used as key in Map). - Maintain backward compatibility by first check for the new field, and fall back to the existing one if it does not exist. 2. Ensure that all calls to the old AggregationInfo.getAggregationParams() is replaced with backward compatible AgguregationFunctionUtils.getAggregationArgs(). 3. Since most aggregation functions today have just one argument, added a separate api AggregationFuncitonContext.getFirstArgument() as an optimization. 4. Cleaned up getColumnName() and getResultColumnName() api's in AggregationFunctionContext class to not require the column name argument, as this is already stored in the AggregationFunction. 5. Modified all tests to use aggregationFunctionArgs instead of aggregationParams. TODO: Remove the AggregationFunctionContext class as AggregationFunctions now store their arguments, and this class no longer provides any additional value.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Today, almost all aggregation functions take a single argument (except Distinct). We need to be able to support more complex aggregation functions such as:
Aggregation functions with multiple arguments, for example:
ThetaSketch(expr1, expr2..., exprN): where exprN is the final set operation to evaluate, and expr1 through exprN-1 are terms in exprN
(more on this separately).Composite aggregation functions such as:
sum(colA) + sum(colB)
This requires us to rethink the AggregationFunction interface that is generic enough to support all such requirements.
The text was updated successfully, but these errors were encountered: