Clean up AggregationFunctionContext and use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions #5364

Jackie-Jiang · 2020-05-11T19:36:11Z

Clean up all the usage of AggregationFunctionContext to directly use AggregationFunction
Construct the AggregationFunctions at planning phase once and pass it to Operator and Executor to save the extra expression compilation
Use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions
- The benefit of this is to save the redundant string conversion, and more efficient hashCode() and equals()
- The keys of the blockValSetMap should be the same as AggregationFunction.getInputExpressions()
- The only exception is CountAggregationFunction with Star-Tree where there is a single entry in blockValSetMap (column "*")
Enhance Star-Tree Aggregation/Group-by Executor to handle the column name conversion so that AggregationFunctionColumnPair is transparent to the AggregationFunction

BACKWARD-INCOMPATIBLE CHANGE:
The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap):
void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);

mayankshriv · 2020-05-11T20:29:16Z

pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java

@@ -456,9 +457,9 @@ private void handleCaseSensitivity(BrokerRequest brokerRequest) {
      for (AggregationInfo info : brokerRequest.getAggregationsInfo()) {
        if (!info.getAggregationType().equalsIgnoreCase(AggregationFunctionType.COUNT.getName())) {
          // Always read from backward compatible api in AggregationFunctionUtils.
-          List<String> expressions = AggregationFunctionUtils.getAggregationExpressions(info);
+          String[] expressions = AggregationFunctionUtils.getArguments(info);


Seems like AggregationFunctionUtils.getArguments first creates List, then converts to String[], and here we are doing the reverse, creating [] from List. Is that not redundant? Or is it because of more callers of getArguments() benefit if it returns String[]?

Changed it back to List. For the new format, no conversion is required. For the old backward-compatible format, there will be an array to list conversion, which is fine.

mayankshriv · 2020-05-11T20:53:02Z

...src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java

+        return new AggregationFunctionColumnPair(aggregationFunctionType, inputExpression.getValue());
+      }
+    }
+    return null;


Are there other callers of this api, other than Star tree? If not, then may be better to throw exception?

This is for star-tree only, but we use it to check whether the query can be solved by star-tree. We use null to notify the caller that the function cannot be solved by star-tree.

mayankshriv · 2020-05-11T20:55:40Z

...rg/apache/pinot/core/query/aggregation/function/BaseSingleExpressionAggregationFunction.java

+/**
+ * Base implementation of {@link AggregationFunction} with single expression.
+ */
+public abstract class BaseSingleExpressionAggregationFunction<I, F extends Comparable> implements AggregationFunction<I, F> {


SingleInput instead?

mayankshriv · 2020-05-11T20:58:41Z

.../main/java/org/apache/pinot/core/query/aggregation/function/DistinctAggregationFunction.java

@@ -188,13 +186,13 @@ public GroupByResultHolder createGroupByResultHolder(int initialCapacity, int ma

  @Override
  public void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder,
-      Map<String, BlockValSet> blockValSetMap) {
+      Map<TransformExpressionTree, BlockValSet> blockValSetMap) {


Most aggregations today take single input. Converting them into expression trees may penalize the common case?

We get the expressions from AggregationFunction.getInputExpressions(), which is already compiled. We only compile the expression once.

siddharthteotia · 2020-05-12T16:04:32Z

.../main/java/org/apache/pinot/core/query/aggregation/function/DistinctAggregationFunction.java

+
+    BlockValSet[] blockValSets = new BlockValSet[numExpressions];
+    for (int i = 0; i < numExpressions; i++) {
+      blockValSets[i] = blockValSetMap.get(_inputExpressions.get(i));


Do we agree that there will be some (may be marginal only) overhead in terms of computing equals() on TransformExpressionTree (which will compare the expression type and the expression value which is column name for the general case)? Earlier it was being done directly on String identifier. Same goes for hashcode.

For column (not function expression), there might be minimal overhead (overhead should be much smaller comparing to creating the map). For function expression, IMO comparing TransformExpressionTree should be cheaper comparing to the expression string.
Also, here we saved the overhead of converting expression to string, so directly using expression should give better performance.

siddharthteotia · 2020-05-12T16:11:40Z

...src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java

+    List<TransformExpressionTree> expressions = aggregationFunction.getInputExpressions();
+    int numExpressions = expressions.size();
+    if (numExpressions == 0) {
+      return Collections.emptyMap();


Why do we need these special checks for 0 and 1?

IIUC, instead of executing the for loop once for numExpressions == 1, you are using a branch. Can we just have the loop? It will be much cleaner unless I am missing the performance benefit of doing it this way. Loop will take care of returning the empty map, map with 1 KV pair or more

We can use loop, but that will always create a HashMap, where all the operations are much more expensive than EmptyMap and SingletonMap. Because most of the functions are zero (COUNT(*)) or single input expression, this should give better performance.

Got it. Thanks

siddharthteotia · 2020-05-12T16:12:34Z

...src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java

-    return expressionTrees;
+  /**
+   * Creates a map from expression required by the {@link AggregationFunctionColumnPair} to {@link BlockValSet} fetched
+   * from the {@link TransformBlock} (for star-tree).


(nit) - For better readability, consider putting "for star tree" at the beginning something like This function is used in start tree code path only

…s the key in the blockValSetMap passed to the AggregationFunctions - Clean up all the usage of AggregationFunctionContext to directly use AggregationFunction - Construct the AggregationFunctions and Group-by Expressions at planning phase and pass them to Operator and Executor to save the extra expression compilation - Use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions - The benefit of this is to save the redundant string conversion, and more efficient hashCode() and equals() - The keys of the blockValSetMap should be the same as AggregationFunction.getInputExpressions() - The only exception is CountAggregationFunction with Star-Tree where there is a single entry in blockValSetMap (column "*") - Add base implementation of AggregationFunction: BaseSingleExpressionAggregationFunction for aggregation functions on single expressions - For PERCENTILE group aggregation functions, support using the second arguments to pass in percentile (e.g. PERCENTILE(column, 99), PERCENTILETDIGEST(column, 90)) - Enhance Star-Tree Aggregation/Group-by Executor to handle the column name conversion so that AggregationFunctionColumnPair is transparent to the AggregationFunction BACKWARD-INCOMPATIBLE CHANGE: The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap): void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap); void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap); void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);

mayankshriv reviewed May 11, 2020

View reviewed changes

Jackie-Jiang force-pushed the aggregation_function_cleanup branch from e6d8aa5 to 27dbef6 Compare May 12, 2020 01:26

siddharthteotia reviewed May 12, 2020

View reviewed changes

Jackie-Jiang force-pushed the aggregation_function_cleanup branch from 27dbef6 to 309416f Compare May 12, 2020 19:42

Jackie-Jiang force-pushed the aggregation_function_cleanup branch from 309416f to e2b366d Compare May 12, 2020 19:51

siddharthteotia approved these changes May 12, 2020

View reviewed changes

mayankshriv approved these changes May 12, 2020

View reviewed changes

minor change

406f938

Jackie-Jiang merged commit 8b0089f into apache:master May 12, 2020

Jackie-Jiang deleted the aggregation_function_cleanup branch May 12, 2020 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up AggregationFunctionContext and use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions #5364

Clean up AggregationFunctionContext and use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions #5364

Jackie-Jiang commented May 11, 2020

mayankshriv May 11, 2020

Jackie-Jiang May 11, 2020

mayankshriv May 11, 2020

Jackie-Jiang May 11, 2020

mayankshriv May 11, 2020

Jackie-Jiang May 11, 2020

mayankshriv May 11, 2020

Jackie-Jiang May 11, 2020

siddharthteotia May 12, 2020

Jackie-Jiang May 12, 2020

siddharthteotia May 12, 2020

Jackie-Jiang May 12, 2020

siddharthteotia May 12, 2020

siddharthteotia May 12, 2020 •

edited

Jackie-Jiang May 12, 2020

Clean up AggregationFunctionContext and use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions #5364

Clean up AggregationFunctionContext and use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions #5364

Conversation

Jackie-Jiang commented May 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia May 12, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia May 12, 2020 •

edited