Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add api in AggregationFunction to get its compiled input expressions. #5339

Merged
merged 1 commit into from
May 6, 2020

Conversation

mayankshriv
Copy link
Contributor

With aggregation functions now taking multiple agruments, only the functions themselves
have the knowledge on how to interpret these arguments. This poses a problem for the planning
phase on what columns need to be projected and what expressions need to be computed.
With this change, AggregationFunction's are now responsible for providing what inputs they need.

  1. Added a new api in AggregationFunction interface getInputExpressions(), that returns a list
    of compiled TransformExpressionTrees that the aggregation function needs as input to compute.

  2. Cleaned up the chaining data dependency during planning phase. Before this PR, all planning nodes
    receive the BrokerRequest (and pass to their child plan node) to extract out all information needed.
    With this change:

    • Aggregation plan nodes only specify the expression trees they need from Transform plan nodes, and
      Transform plan nodes use that to specify what columns they need from projection plan nodes.

TODO: Ideally we should completely eliminate passing of BrokerRequest throughout the chain plan nodes,
and only pass minimal information instead. This change only does so for projection columns. A TODO here
is to extend it to FilterPlanNode and deeper.

@mayankshriv mayankshriv force-pushed the plan-cleanup branch 2 times, most recently from 37dc52c to f02f5f3 Compare May 6, 2020 14:00
@mayankshriv mayankshriv changed the title Add api in AggregationFunction to return the its compiled input expressions. Add api in AggregationFunction to get its compiled input expressions. May 6, 2020
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

columns.addAll(brokerRequest.getGroupBy().getExpressions());
}
} else {
private void setMaxDocsForSelection(BrokerRequest brokerRequest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should also be handle in the upper level (SelectionPlanNode) and passed to this class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought so too. But upper levels can be aggregation as well, which will now have this field leaked. For now I tend to keep it here (as was the case before) until the full cleanup happens.

private int _maxDocPerNextCall = DocIdSetPlanNode.MAX_DOC_PER_CALL;

public TransformPlanNode(IndexSegment indexSegment, BrokerRequest brokerRequest) {
public TransformPlanNode(IndexSegment indexSegment, BrokerRequest brokerRequest,
Set<TransformExpressionTree> expressionsToPlan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) expressionsToPlan -> expressions?

Also pass maxDocsPerBlock from upper level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expressions was too generic and being used in too many places for different purposes, so I thought to call it expressionsToPlan.
Replied on maxDocsPerBlock above.

With aggregation functions now taking multiple agruments, only the functions themselves
have the knowledge on how to interpret these arguments. This poses a problem for the planning
phase on what columns need to be projected and what expressions need to be computed.
With this change, AggregationFunction's are now responsible for providing what inputs they need.

1. Added a new api in AggregationFunction interface `getInputExpressions()`, that returns a list
   of compiled TransformExpressionTrees that the aggregation function needs as input to compute.

2. Cleaned up the chaining data dependency during planning phase. Before this PR, all planning nodes
   receive the BrokerRequest (and pass to their child plan node) to extract out all information needed.
   With this change:
   - Aggregation plan nodes only specify the expression trees they need from Transform plan nodes, and
     Transform plan nodes use that to specify what columns they need from projection plan nodes.

TODO: Ideally we should completely eliminate passing of BrokerRequest throughout the chain plan nodes,
and only pass minimal information instead. This change only does so for projection columns. A TODO here
is to extend it to FilterPlanNode and deeper.
@mayankshriv mayankshriv merged commit 347a97f into apache:master May 6, 2020
@mayankshriv mayankshriv deleted the plan-cleanup branch May 6, 2020 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants