Skip to content

Conversation

@ahshahid
Copy link
Contributor

@ahshahid ahshahid commented Jan 16, 2026

…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed

What changes were proposed in this pull request?

The change is to modify the batches of rules from

 val operatorOptimizationBatch: Seq[Batch] = Seq(
      Batch("Operator Optimization before Inferring Filters", fixedPoint,
        operatorOptimizationRuleSet: _*),
      Batch("Infer Filters", Once,
        InferFiltersFromGenerate,
        InferFiltersFromConstraints),
      Batch("Operator Optimization after Inferring Filters", fixedPoint,
        operatorOptimizationRuleSet: _*),
      Batch("Push extra predicate through join", fixedPoint,
        PushExtraPredicateThroughJoin,
        PushDownPredicates))

to

 val operatorOptimizationBatch: Seq[Batch] = Seq(
      Batch("Operator Optimization before Inferring Filters", fixedPoint,
        operatorOptimizationRuleSet: _*),
      Batch("Infer Filters", Once,
        InferFiltersFromGenerate,
        InferFiltersFromConstraints),
     **Batch("Operator Optimization after Inferring Filters", fixedPoint,
        operatorOptimizationRuleSet :+ InferFiltersFromConstraints: _*),**
      Batch("Push extra predicate through join", fixedPoint,
        PushExtraPredicateThroughJoin,
        PushDownPredicates))

That is the rule InferFiltersFromConstraints is run as part of the Batch "Operator Optimization after Inferring Filters", which is a Fixed Point iteration.
This is to ensure that constraints generated after execution of "operatorOptimizationRuleSet" are able to infer new filters.

It is to be noted, that ideally rule of
Batch("Infer Filters", Once,
InferFiltersFromGenerate,
InferFiltersFromConstraints),
should be part of the ruleset operatorOptimizationRuleSet, itself , and there should be NO separate Batches of
"Operator Optimization before Inferring Filters" and "Operator Optimization after Inferring Filters", but because having a single Fixed Point batch, introduces lots of cosmetic plan changes, causing many more golden files to change, I have kept it the way in the PR.

Why are the changes needed?

The way current set of optimizer rules work is that

// **step1**
Batch("Operator Optimization before Inferring Filters", fixedPoint,
operatorOptimizationRuleSet: _*),
 
// **step2**
Batch("Infer Filters", Once,
InferFiltersFromGenerate,
InferFiltersFromConstraints),
 
// **step3**
Batch("Operator Optimization after Inferring Filters", fixedPoint,
operatorOptimizationRuleSet: _*)

In the batch of rules "operatorOptimizationRuleSet", the conversion of Joins like "Left Outer" to Inner happens.{}
After that "InferFiltersFromConstraints" is called which is able to create new constraints like IsNotNull, to be pushed on either side of the Inner Join tables.{}

Notice that "operatorOptimizationRuleSet" is called twice, before and after inferring filters.{}

It so happens that in TPCDS Q5, atleast, the conversion of LeftOuter to Inner for one of the Join cases, happens in step3.

But since, there is no further call of InferFiltersFromConstraints, the IsNotNull constraints generation is missed, for the Left Leg of the Join.

Though please note that, ideally this change should not cause any impact , other than the intended fix, and if there is anything breaking, should in my view be looked as exposing some hidden problem.
This issue has exposed 2 other problems for which details will be added, as seen by the failure of correlated subqueries test of the Docker Suite

Does this PR introduce any user-facing change?

No

How was this patch tested?

Passing of existing tests..
will be adding a dedicated bug test.

Was this patch authored or co-authored using generative AI tooling?

No

…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
@github-actions
Copy link

JIRA Issue Information

=== Improvement SPARK-55072 ===
Summary: Inferring new Constraint misses IsNotNull on Left Leg, when an Outer Join gets converted into inner Join
Assignee: None
Status: Open
Affected: ["4.2.0","4.1.1"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions bot added the SQL label Jan 16, 2026
InputAdapter
Scan parquet spark_catalog.default.web_returns [wr_item_sk,wr_order_number,wr_return_amt,wr_net_loss,wr_returned_date_sk]
ReusedSubquery [d_date_sk] #1
Filter [wr_item_sk,wr_order_number]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the IsNotNull filter which is missed.

@ahshahid
Copy link
Contributor Author

There are multiple failures.. especially like ThriftServerQueryTestSuite, which to me indicates other hidden bugs .. looking into it.

…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
…romConstraints rule are run as part of the operatorOptimizationRuleSet, so that generated filter on Left leg of Join, in some specific situation, is not missed
@ahshahid
Copy link
Contributor Author

The combined PR for SPARK-55185 and SPARK-55072 is
PR-SPARK-55072_SPARK-55185

@ahshahid ahshahid changed the title [WIP][SPARK-55072][SQL]. Ensure IsNotNull inferred filter is not missed for left leg of Inner Join ( obtained via conversion of Left Outer) [SPARK-55072][SQL]. Ensure IsNotNull inferred filter is not missed for left leg of Inner Join ( obtained via conversion of Left Outer) Jan 26, 2026
@ahshahid
Copy link
Contributor Author

ahshahid commented Jan 26, 2026

The clean pass for this PR is contingent on the PR for SPARK-55185 ( PR_SPARK_55185)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant