-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-55072][SQL]. Ensure IsNotNull inferred filter is not missed for left leg of Inner Join ( obtained via conversion of Left Outer) #53838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
JIRA Issue Information=== Improvement SPARK-55072 === This comment was automatically generated by GitHub Actions |
| InputAdapter | ||
| Scan parquet spark_catalog.default.web_returns [wr_item_sk,wr_order_number,wr_return_amt,wr_net_loss,wr_returned_date_sk] | ||
| ReusedSubquery [d_date_sk] #1 | ||
| Filter [wr_item_sk,wr_order_number] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the IsNotNull filter which is missed.
|
There are multiple failures.. especially like ThriftServerQueryTestSuite, which to me indicates other hidden bugs .. looking into it. |
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
|
The combined PR for SPARK-55185 and SPARK-55072 is |
|
The clean pass for this PR is contingent on the PR for SPARK-55185 ( PR_SPARK_55185) |
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
What changes were proposed in this pull request?
The change is to modify the batches of rules from
to
That is the rule InferFiltersFromConstraints is run as part of the Batch "Operator Optimization after Inferring Filters", which is a Fixed Point iteration.
This is to ensure that constraints generated after execution of "operatorOptimizationRuleSet" are able to infer new filters.
It is to be noted, that ideally rule of
Batch("Infer Filters", Once,
InferFiltersFromGenerate,
InferFiltersFromConstraints),
should be part of the ruleset operatorOptimizationRuleSet, itself , and there should be NO separate Batches of
"Operator Optimization before Inferring Filters" and "Operator Optimization after Inferring Filters", but because having a single Fixed Point batch, introduces lots of cosmetic plan changes, causing many more golden files to change, I have kept it the way in the PR.
Why are the changes needed?
The way current set of optimizer rules work is that
In the batch of rules "operatorOptimizationRuleSet", the conversion of Joins like "Left Outer" to Inner happens.{}
After that "InferFiltersFromConstraints" is called which is able to create new constraints like IsNotNull, to be pushed on either side of the Inner Join tables.{}
Notice that "operatorOptimizationRuleSet" is called twice, before and after inferring filters.{}
It so happens that in TPCDS Q5, atleast, the conversion of LeftOuter to Inner for one of the Join cases, happens in step3.
But since, there is no further call of InferFiltersFromConstraints, the IsNotNull constraints generation is missed, for the Left Leg of the Join.
Though please note that, ideally this change should not cause any impact , other than the intended fix, and if there is anything breaking, should in my view be looked as exposing some hidden problem.
This issue has exposed 2 other problems for which details will be added, as seen by the failure of correlated subqueries test of the Docker Suite
Does this PR introduce any user-facing change?
No
How was this patch tested?
Passing of existing tests..
will be adding a dedicated bug test.
Was this patch authored or co-authored using generative AI tooling?
No