Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up AggregationFunctionContext and use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions #5364

Merged
merged 2 commits into from May 12, 2020

Conversation

Jackie-Jiang
Copy link
Contributor

  • Clean up all the usage of AggregationFunctionContext to directly use AggregationFunction
  • Construct the AggregationFunctions at planning phase once and pass it to Operator and Executor to save the extra expression compilation
  • Use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions
    • The benefit of this is to save the redundant string conversion, and more efficient hashCode() and equals()
    • The keys of the blockValSetMap should be the same as AggregationFunction.getInputExpressions()
    • The only exception is CountAggregationFunction with Star-Tree where there is a single entry in blockValSetMap (column "*")
  • Enhance Star-Tree Aggregation/Group-by Executor to handle the column name conversion so that AggregationFunctionColumnPair is transparent to the AggregationFunction

BACKWARD-INCOMPATIBLE CHANGE:
The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap):
void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);

@@ -456,9 +457,9 @@ private void handleCaseSensitivity(BrokerRequest brokerRequest) {
for (AggregationInfo info : brokerRequest.getAggregationsInfo()) {
if (!info.getAggregationType().equalsIgnoreCase(AggregationFunctionType.COUNT.getName())) {
// Always read from backward compatible api in AggregationFunctionUtils.
List<String> expressions = AggregationFunctionUtils.getAggregationExpressions(info);
String[] expressions = AggregationFunctionUtils.getArguments(info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like AggregationFunctionUtils.getArguments first creates List, then converts to String[], and here we are doing the reverse, creating [] from List. Is that not redundant? Or is it because of more callers of getArguments() benefit if it returns String[]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back to List. For the new format, no conversion is required. For the old backward-compatible format, there will be an array to list conversion, which is fine.

return new AggregationFunctionColumnPair(aggregationFunctionType, inputExpression.getValue());
}
}
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other callers of this api, other than Star tree? If not, then may be better to throw exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for star-tree only, but we use it to check whether the query can be solved by star-tree. We use null to notify the caller that the function cannot be solved by star-tree.

/**
* Base implementation of {@link AggregationFunction} with single expression.
*/
public abstract class BaseSingleExpressionAggregationFunction<I, F extends Comparable> implements AggregationFunction<I, F> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SingleInput instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -188,13 +186,13 @@ public GroupByResultHolder createGroupByResultHolder(int initialCapacity, int ma

@Override
public void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder,
Map<String, BlockValSet> blockValSetMap) {
Map<TransformExpressionTree, BlockValSet> blockValSetMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most aggregations today take single input. Converting them into expression trees may penalize the common case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get the expressions from AggregationFunction.getInputExpressions(), which is already compiled. We only compile the expression once.


BlockValSet[] blockValSets = new BlockValSet[numExpressions];
for (int i = 0; i < numExpressions; i++) {
blockValSets[i] = blockValSetMap.get(_inputExpressions.get(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we agree that there will be some (may be marginal only) overhead in terms of computing equals() on TransformExpressionTree (which will compare the expression type and the expression value which is column name for the general case)? Earlier it was being done directly on String identifier. Same goes for hashcode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For column (not function expression), there might be minimal overhead (overhead should be much smaller comparing to creating the map). For function expression, IMO comparing TransformExpressionTree should be cheaper comparing to the expression string.
Also, here we saved the overhead of converting expression to string, so directly using expression should give better performance.

List<TransformExpressionTree> expressions = aggregationFunction.getInputExpressions();
int numExpressions = expressions.size();
if (numExpressions == 0) {
return Collections.emptyMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these special checks for 0 and 1?

IIUC, instead of executing the for loop once for numExpressions == 1, you are using a branch. Can we just have the loop? It will be much cleaner unless I am missing the performance benefit of doing it this way. Loop will take care of returning the empty map, map with 1 KV pair or more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use loop, but that will always create a HashMap, where all the operations are much more expensive than EmptyMap and SingletonMap. Because most of the functions are zero (COUNT(*)) or single input expression, this should give better performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks

return expressionTrees;
/**
* Creates a map from expression required by the {@link AggregationFunctionColumnPair} to {@link BlockValSet} fetched
* from the {@link TransformBlock} (for star-tree).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) - For better readability, consider putting "for star tree" at the beginning something like This function is used in start tree code path only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

…s the key in the blockValSetMap passed to the AggregationFunctions

- Clean up all the usage of AggregationFunctionContext to directly use AggregationFunction
- Construct the AggregationFunctions and Group-by Expressions at planning phase and pass them to Operator and Executor to save the extra expression compilation
- Use TransformExpressionTree as the key in the blockValSetMap passed to the AggregationFunctions
  - The benefit of this is to save the redundant string conversion, and more efficient hashCode() and equals()
  - The keys of the blockValSetMap should be the same as AggregationFunction.getInputExpressions()
  - The only exception is CountAggregationFunction with Star-Tree where there is a single entry in blockValSetMap (column "*")
- Add base implementation of AggregationFunction: BaseSingleExpressionAggregationFunction for aggregation functions on single expressions
- For PERCENTILE group aggregation functions, support using the second arguments to pass in percentile (e.g. PERCENTILE(column, 99), PERCENTILETDIGEST(column, 90))
- Enhance Star-Tree Aggregation/Group-by Executor to handle the column name conversion so that AggregationFunctionColumnPair is transparent to the AggregationFunction

BACKWARD-INCOMPATIBLE CHANGE:
The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap):
void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
@Jackie-Jiang Jackie-Jiang merged commit 8b0089f into apache:master May 12, 2020
@Jackie-Jiang Jackie-Jiang deleted the aggregation_function_cleanup branch May 12, 2020 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants