Add api in AggregationFunction to get its compiled input expressions. #5339

mayankshriv · 2020-05-06T05:27:33Z

With aggregation functions now taking multiple agruments, only the functions themselves
have the knowledge on how to interpret these arguments. This poses a problem for the planning
phase on what columns need to be projected and what expressions need to be computed.
With this change, AggregationFunction's are now responsible for providing what inputs they need.

Added a new api in AggregationFunction interface getInputExpressions(), that returns a list
of compiled TransformExpressionTrees that the aggregation function needs as input to compute.
Cleaned up the chaining data dependency during planning phase. Before this PR, all planning nodes
receive the BrokerRequest (and pass to their child plan node) to extract out all information needed.
With this change:
- Aggregation plan nodes only specify the expression trees they need from Transform plan nodes, and
  Transform plan nodes use that to specify what columns they need from projection plan nodes.

TODO: Ideally we should completely eliminate passing of BrokerRequest throughout the chain plan nodes,
and only pass minimal information instead. This change only does so for projection columns. A TODO here
is to extend it to FilterPlanNode and deeper.

Jackie-Jiang

LGTM otherwise

Jackie-Jiang · 2020-05-06T18:34:03Z

pinot-core/src/main/java/org/apache/pinot/core/plan/TransformPlanNode.java

-        columns.addAll(brokerRequest.getGroupBy().getExpressions());
-      }
-    } else {
+  private void setMaxDocsForSelection(BrokerRequest brokerRequest) {


This logic should also be handle in the upper level (SelectionPlanNode) and passed to this class

Yeah I thought so too. But upper levels can be aggregation as well, which will now have this field leaked. For now I tend to keep it here (as was the case before) until the full cleanup happens.

Jackie-Jiang · 2020-05-06T18:36:13Z

pinot-core/src/main/java/org/apache/pinot/core/plan/TransformPlanNode.java

  private int _maxDocPerNextCall = DocIdSetPlanNode.MAX_DOC_PER_CALL;

-  public TransformPlanNode(IndexSegment indexSegment, BrokerRequest brokerRequest) {
+  public TransformPlanNode(IndexSegment indexSegment, BrokerRequest brokerRequest,
+      Set<TransformExpressionTree> expressionsToPlan) {


(nit) expressionsToPlan -> expressions?

Also pass maxDocsPerBlock from upper level?

expressions was too generic and being used in too many places for different purposes, so I thought to call it expressionsToPlan.
Replied on maxDocsPerBlock above.

...core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunction.java

...src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java

...src/main/java/org/apache/pinot/core/query/aggregation/function/CountAggregationFunction.java

With aggregation functions now taking multiple agruments, only the functions themselves have the knowledge on how to interpret these arguments. This poses a problem for the planning phase on what columns need to be projected and what expressions need to be computed. With this change, AggregationFunction's are now responsible for providing what inputs they need. 1. Added a new api in AggregationFunction interface `getInputExpressions()`, that returns a list of compiled TransformExpressionTrees that the aggregation function needs as input to compute. 2. Cleaned up the chaining data dependency during planning phase. Before this PR, all planning nodes receive the BrokerRequest (and pass to their child plan node) to extract out all information needed. With this change: - Aggregation plan nodes only specify the expression trees they need from Transform plan nodes, and Transform plan nodes use that to specify what columns they need from projection plan nodes. TODO: Ideally we should completely eliminate passing of BrokerRequest throughout the chain plan nodes, and only pass minimal information instead. This change only does so for projection columns. A TODO here is to extend it to FilterPlanNode and deeper.

mayankshriv force-pushed the plan-cleanup branch 2 times, most recently from 37dc52c to f02f5f3 Compare May 6, 2020 14:00

mayankshriv changed the title ~~Add api in AggregationFunction to return the its compiled input expressions.~~ Add api in AggregationFunction to get its compiled input expressions. May 6, 2020

Jackie-Jiang approved these changes May 6, 2020

View reviewed changes

mayankshriv force-pushed the plan-cleanup branch from f02f5f3 to f1cd04a Compare May 6, 2020 20:12

mayankshriv merged commit 347a97f into apache:master May 6, 2020

mayankshriv deleted the plan-cleanup branch May 6, 2020 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add api in AggregationFunction to get its compiled input expressions. #5339

Add api in AggregationFunction to get its compiled input expressions. #5339

mayankshriv commented May 6, 2020

Jackie-Jiang left a comment

Jackie-Jiang May 6, 2020

mayankshriv May 6, 2020

Jackie-Jiang May 6, 2020

mayankshriv May 6, 2020

Add api in AggregationFunction to get its compiled input expressions. #5339

Add api in AggregationFunction to get its compiled input expressions. #5339

Conversation

mayankshriv commented May 6, 2020

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Jackie-Jiang May 6, 2020

Choose a reason for hiding this comment

mayankshriv May 6, 2020

Choose a reason for hiding this comment

Jackie-Jiang May 6, 2020

Choose a reason for hiding this comment

mayankshriv May 6, 2020

Choose a reason for hiding this comment