New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up AggregationFunctionContext and use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions #5364
Conversation
@@ -456,9 +457,9 @@ private void handleCaseSensitivity(BrokerRequest brokerRequest) { | |||
for (AggregationInfo info : brokerRequest.getAggregationsInfo()) { | |||
if (!info.getAggregationType().equalsIgnoreCase(AggregationFunctionType.COUNT.getName())) { | |||
// Always read from backward compatible api in AggregationFunctionUtils. | |||
List<String> expressions = AggregationFunctionUtils.getAggregationExpressions(info); | |||
String[] expressions = AggregationFunctionUtils.getArguments(info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like AggregationFunctionUtils.getArguments first creates List, then converts to String[], and here we are doing the reverse, creating [] from List. Is that not redundant? Or is it because of more callers of getArguments() benefit if it returns String[]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it back to List. For the new format, no conversion is required. For the old backward-compatible format, there will be an array to list conversion, which is fine.
return new AggregationFunctionColumnPair(aggregationFunctionType, inputExpression.getValue()); | ||
} | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other callers of this api, other than Star tree? If not, then may be better to throw exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for star-tree only, but we use it to check whether the query can be solved by star-tree. We use null
to notify the caller that the function cannot be solved by star-tree.
/** | ||
* Base implementation of {@link AggregationFunction} with single expression. | ||
*/ | ||
public abstract class BaseSingleExpressionAggregationFunction<I, F extends Comparable> implements AggregationFunction<I, F> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SingleInput instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -188,13 +186,13 @@ public GroupByResultHolder createGroupByResultHolder(int initialCapacity, int ma | |||
|
|||
@Override | |||
public void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, | |||
Map<String, BlockValSet> blockValSetMap) { | |||
Map<TransformExpressionTree, BlockValSet> blockValSetMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most aggregations today take single input. Converting them into expression trees may penalize the common case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get the expressions from AggregationFunction.getInputExpressions()
, which is already compiled. We only compile the expression once.
e6d8aa5
to
27dbef6
Compare
|
||
BlockValSet[] blockValSets = new BlockValSet[numExpressions]; | ||
for (int i = 0; i < numExpressions; i++) { | ||
blockValSets[i] = blockValSetMap.get(_inputExpressions.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we agree that there will be some (may be marginal only) overhead in terms of computing equals() on TransformExpressionTree (which will compare the expression type and the expression value which is column name for the general case)? Earlier it was being done directly on String identifier. Same goes for hashcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For column (not function expression), there might be minimal overhead (overhead should be much smaller comparing to creating the map). For function expression, IMO comparing TransformExpressionTree should be cheaper comparing to the expression string.
Also, here we saved the overhead of converting expression to string, so directly using expression should give better performance.
List<TransformExpressionTree> expressions = aggregationFunction.getInputExpressions(); | ||
int numExpressions = expressions.size(); | ||
if (numExpressions == 0) { | ||
return Collections.emptyMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these special checks for 0 and 1?
IIUC, instead of executing the for loop once for numExpressions == 1
, you are using a branch. Can we just have the loop? It will be much cleaner unless I am missing the performance benefit of doing it this way. Loop will take care of returning the empty map, map with 1 KV pair or more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use loop, but that will always create a HashMap, where all the operations are much more expensive than EmptyMap and SingletonMap. Because most of the functions are zero (COUNT(*)
) or single input expression, this should give better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks
return expressionTrees; | ||
/** | ||
* Creates a map from expression required by the {@link AggregationFunctionColumnPair} to {@link BlockValSet} fetched | ||
* from the {@link TransformBlock} (for star-tree). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) - For better readability, consider putting "for star tree" at the beginning something like This function is used in start tree code path only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
27dbef6
to
309416f
Compare
…s the key in the blockValSetMap passed to the AggregationFunctions - Clean up all the usage of AggregationFunctionContext to directly use AggregationFunction - Construct the AggregationFunctions and Group-by Expressions at planning phase and pass them to Operator and Executor to save the extra expression compilation - Use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions - The benefit of this is to save the redundant string conversion, and more efficient hashCode() and equals() - The keys of the blockValSetMap should be the same as AggregationFunction.getInputExpressions() - The only exception is CountAggregationFunction with Star-Tree where there is a single entry in blockValSetMap (column "*") - Add base implementation of AggregationFunction: BaseSingleExpressionAggregationFunction for aggregation functions on single expressions - For PERCENTILE group aggregation functions, support using the second arguments to pass in percentile (e.g. PERCENTILE(column, 99), PERCENTILETDIGEST(column, 90)) - Enhance Star-Tree Aggregation/Group-by Executor to handle the column name conversion so that AggregationFunctionColumnPair is transparent to the AggregationFunction BACKWARD-INCOMPATIBLE CHANGE: The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap): void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap); void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap); void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
309416f
to
e2b366d
Compare
BACKWARD-INCOMPATIBLE CHANGE:
The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap):
void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);