Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage] Fix Predicate Pushdown by Using Rule Collection #10409

Merged
merged 6 commits into from Mar 15, 2023

Conversation

ankitsultana
Copy link
Contributor

@ankitsultana ankitsultana commented Mar 10, 2023

At present the optimizer rules have the following issues:

  1. Predicate pushdown doesn't work in some cases leading to full table scans in the leaf stages. See: [multistage] Filter Clause Not Pushed Down In Some Join Queries #10392
  2. We often end up with multiple nested projects or stage nodes which can be pruned/merged but are not.

This PR leverages RuleCollection to fix these issues. A RuleCollection is a collection of rules that are run in a single HepInstruction which ensures that we don't need to rely on ordering of the rules in a given collection. E.g. say we have: Filter > Project > Join > Project > Join > Project > TableScan, if we use a different HepInstruction for pushing filter past project or joins then we won't be able to push the filter down all the way.

The UTs changed in this PR also demonstrate the expected improvements in the plans.

@ankitsultana ankitsultana changed the title [Draft] [multistage] Fix Predicate Pushdow by Using Rule Collection [Draft] [multistage] Fix Predicate Pushdown by Using Rule Collection Mar 10, 2023
@@ -115,4 +116,12 @@ private PinotQueryRuleSets() {
PinotAggregateExchangeNodeInsertRule.INSTANCE,
PinotWindowExchangeNodeInsertRule.INSTANCE
);

public static final Collection<RelOptRule> FILTER_PUSHDOWN_RULES = ImmutableList.of(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: by default Calcite will use depth first order for running these rules. Also it won't do a "fullRestartAfterTransformation" unless we use HepMatchOrder.TOP_DOWN or HepMatchOrder.BOTTOM_UP.

I think using depth first order without doing full restarts after transformation should be fine but would be good if someone else also chimes in. Note that the match order can be changed for only this collection (it's a HepInstruction) so it doesn't need to be a global setting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now we should be good. HEP planner is used here to avoid a lengthy volcano planner that adds latency to the planning phase. as long as the plan results are determinisitc we can always change the way we configure the planner IMO

@codecov-commenter
Copy link

codecov-commenter commented Mar 10, 2023

Codecov Report

Merging #10409 (986ea6c) into master (f8d4320) will increase coverage by 7.12%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master   #10409      +/-   ##
============================================
+ Coverage     63.24%   70.36%   +7.12%     
- Complexity     5069     6104    +1035     
============================================
  Files          2030     2055      +25     
  Lines        110629   111394     +765     
  Branches      16842    16940      +98     
============================================
+ Hits          69965    78384    +8419     
+ Misses        35538    27517    -8021     
- Partials       5126     5493     +367     
Flag Coverage Δ
integration1 24.54% <0.00%> (+0.14%) ⬆️
integration2 24.40% <0.00%> (+0.03%) ⬆️
unittests1 67.99% <100.00%> (-0.04%) ⬇️
unittests2 13.85% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../java/org/apache/pinot/query/QueryEnvironment.java 93.33% <100.00%> (+0.47%) ⬆️

... and 310 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ankitsultana ankitsultana changed the title [Draft] [multistage] Fix Predicate Pushdown by Using Rule Collection [multistage] Fix Predicate Pushdown by Using Rule Collection Mar 13, 2023
Comment on lines 113 to 114
// ---- rules apply before exchange insertion.
PinotFilterExpandSearchRule.INSTANCE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter expand search rule can be applied with filter rules or basic rules IMO.

Comment on lines 116 to 119
// First run the basic rules using 1 HepInstruction per rule.
for (RelOptRule relOptRule : PinotQueryRuleSets.BASIC_RULES) {
hepProgramBuilder.addRuleInstance(relOptRule);
}
Copy link
Contributor

@walterddr walterddr Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need 1 HelpInstruction per rule? might be good to add some explanation

"\n LogicalProject(col4=[$0], col2=[$1], col3=[$2])",
"\n LogicalFilter(condition=[AND(>=($2, 0), =($1, 'pink floyd'))])",
"\n LogicalTableScan(table=[[a]])",
"\nLogicalProject(avg=[/(CASE(=($1, 0), null:DECIMAL(1000, 0), $0), $1)])",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder what made this 2 project merge possible after the rule changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I am now doing the pruning/merging of redundant operators in the end and using a RuleCollection. (if that's what you meant)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. gotcha. this is great!

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm mostly. please kindly take a look at the comments. thank you for the rule split. this is amazing

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. retriggering CI to make sure it is clean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants