Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting complex aggregation functions #5261

Closed
mayankshriv opened this issue Apr 16, 2020 · 2 comments
Closed

Supporting complex aggregation functions #5261

mayankshriv opened this issue Apr 16, 2020 · 2 comments

Comments

@mayankshriv
Copy link
Contributor

Today, almost all aggregation functions take a single argument (except Distinct). We need to be able to support more complex aggregation functions such as:

  1. Aggregation functions with multiple arguments, for example:
    ThetaSketch(expr1, expr2..., exprN): where exprN is the final set operation to evaluate, and expr1 through exprN-1 are terms in exprN (more on this separately).

  2. Composite aggregation functions such as:
    sum(colA) + sum(colB)

This requires us to rethink the AggregationFunction interface that is generic enough to support all such requirements.

@kishoreg
Copy link
Member

This is amazing, requirements for the AggregationFunction interface

  • ability to take multiple arguments
  • The arguments can be a column, literal or Function (transform/aggregation)
  • Default Name of the output column
  • Annotations for static properties of the FunctionImpl e.g. Name (will help keep the interface simple)

@kishoreg
Copy link
Member

  • The ability for users to plugin the aggregation functions. We have a switch case today based on the name, this should go away.

mayankshriv added a commit that referenced this issue Apr 17, 2020
This PR starts to address the issue #5261. The current implementation assumes that all AggregationFunctions take one argument
with the exception of DistinctAggregationFunction. This PR handles changes related
to supporting AggregationFunctions with multiple arguments, as we anticipate new
aggregation functions to be added that take multiple arguments.

1. Enhanced parser to allow multiple arguments for aggregation functions.
2. AggregationFunctionFactory provides the right set of arguments when instantiating
   individual aggregation functions.
3. AggregationFunctions now store their arguments, as opposed to assuming that the right
   BlockValSet is passed to the aggregate() api's.
4. AggregationFunction.aggregate() api's now take a Map<String, BlockValSet> where the key
   is the argument expression (columnName for simple case), as opposed to a variable array
   as that interface does not provide a way to associate BlockValSet with the argument.
5. Cleanup: Removed env variable to enable/disable Distinct, as there is no need for it to be
   disabled anymore.
mayankshriv added a commit to mayankshriv/pinot that referenced this issue Apr 18, 2020
…td).

This PR is a continuation of apache#5259
to address the issue apache#5261.

1. Added new field in request.thrift `aggregationFunctionArgs` as a list of String
   arguments for the aggregation funciton.
   - Could not use the existing `aggregationParams` as it is a Map, and functions with
     variable arguments may not provide a name for the arg (to be used as key in Map).
   - Maintain backward compatibility by first check for the new field, and fall back to
     the existing one if it does not exist.

2. Ensure that all calls to the old AggregationInfo.getAggregationParams() is replaced
   with backward compatible AgguregationFunctionUtils.getAggregationArgs().

3. Since most aggregation functions today have just one argument, added a separate api
   AggregationFuncitonContext.getFirstArgument() as an optimization.

4. Cleaned up getColumnName() and getResultColumnName() api's in AggregationFunctionContext
   class to not require the column name argument, as this is already stored in the
   AggregationFunction.

5. Modified all tests to use aggregationFunctionArgs instead of aggregationParams.

TODO:
Remove the AggregationFunctionContext class as AggregationFunctions now store their arguments,
and this class no longer provides any additional value.
mayankshriv added a commit to mayankshriv/pinot that referenced this issue Apr 19, 2020
…td).

This PR is a continuation of apache#5259
to address the issue apache#5261.

1. Added new field in request.thrift `aggregationFunctionArgs` as a list of String
   arguments for the aggregation funciton.
   - Could not use the existing `aggregationParams` as it is a Map, and functions with
     variable arguments may not provide a name for the arg (to be used as key in Map).
   - Maintain backward compatibility by first check for the new field, and fall back to
     the existing one if it does not exist.

2. Ensure that all calls to the old AggregationInfo.getAggregationParams() is replaced
   with backward compatible AgguregationFunctionUtils.getAggregationArgs().

3. Since most aggregation functions today have just one argument, added a separate api
   AggregationFuncitonContext.getFirstArgument() as an optimization.

4. Cleaned up getColumnName() and getResultColumnName() api's in AggregationFunctionContext
   class to not require the column name argument, as this is already stored in the
   AggregationFunction.

5. Modified all tests to use aggregationFunctionArgs instead of aggregationParams.

TODO:
Remove the AggregationFunctionContext class as AggregationFunctions now store their arguments,
and this class no longer provides any additional value.
mayankshriv added a commit that referenced this issue Apr 19, 2020
…td). (#5275)

This PR is a continuation of #5259
to address the issue #5261.

1. Added new field in request.thrift `aggregationFunctionArgs` as a list of String
   arguments for the aggregation funciton.
   - Could not use the existing `aggregationParams` as it is a Map, and functions with
     variable arguments may not provide a name for the arg (to be used as key in Map).
   - Maintain backward compatibility by first check for the new field, and fall back to
     the existing one if it does not exist.

2. Ensure that all calls to the old AggregationInfo.getAggregationParams() is replaced
   with backward compatible AgguregationFunctionUtils.getAggregationArgs().

3. Since most aggregation functions today have just one argument, added a separate api
   AggregationFuncitonContext.getFirstArgument() as an optimization.

4. Cleaned up getColumnName() and getResultColumnName() api's in AggregationFunctionContext
   class to not require the column name argument, as this is already stored in the
   AggregationFunction.

5. Modified all tests to use aggregationFunctionArgs instead of aggregationParams.

TODO:
Remove the AggregationFunctionContext class as AggregationFunctions now store their arguments,
and this class no longer provides any additional value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants