[SPARK-56840][SQL] Avoid unresolved NullIf type lookup#55838
Closed
sunchao wants to merge 3 commits into
Closed
Conversation
dongjoon-hyun
approved these changes
May 13, 2026
Member
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @sunchao .
cc @peter-toth .
peter-toth
approved these changes
May 13, 2026
peter-toth
pushed a commit
that referenced
this pull request
May 14, 2026
### Why are the changes needed? `NULLIF` builds its replacement expression before analysis has resolved all child expressions. For nested field references, the existing implementation can read the left operand's data type too early while constructing the null branch, which can fail analysis even though the SQL shape is valid. SPARK-56840 tracks this analyzer failure. ### What changes were proposed in this PR? - Build the `NULLIF` null branch with a lazy typed-null placeholder so construction does not eagerly read the unresolved left operand type, while `NullIf.replacement.dataType` remains valid once the operand type is available. - Make that placeholder `RuntimeReplaceable`, so `ReplaceExpressions` restores an ordinary typed `Literal(null, ...)` before later optimizer rules run and existing null-literal simplifications continue to apply. - Add focused regressions for: - nested struct-field `nullif(c.provider, lower(...))` analysis in both `ALWAYS_INLINE_COMMON_EXPR` modes; - `NullIf` replacement type reporting before type coercion; - optimizer replacement back to a normal null literal; - explain output avoiding exposure of the internal helper name. ### Does this PR introduce _any_ user-facing change? Yes. Valid `NULLIF` expressions over unresolved nested field references that could fail during analysis now resolve and execute successfully. ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.NullExpressionsSuite -- -z "NullIf replacement preserves its data type before type coercion"'` - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.OptimizerSuite -- -z "NullIf typed null branch is replaced with a null literal"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "nullif function"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.ExplainSuite -- -z "explain for these functions; use range to avoid constant folding"'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Codex (GPT-5.5) Closes #55838 from sunchao/dev/chao/codex/oss-nullif-unresolved. Authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Peter Toth <peter.toth@gmail.com> (cherry picked from commit 5949ab3) Signed-off-by: Peter Toth <peter.toth@gmail.com>
peter-toth
pushed a commit
that referenced
this pull request
May 14, 2026
### Why are the changes needed? `NULLIF` builds its replacement expression before analysis has resolved all child expressions. For nested field references, the existing implementation can read the left operand's data type too early while constructing the null branch, which can fail analysis even though the SQL shape is valid. SPARK-56840 tracks this analyzer failure. ### What changes were proposed in this PR? - Build the `NULLIF` null branch with a lazy typed-null placeholder so construction does not eagerly read the unresolved left operand type, while `NullIf.replacement.dataType` remains valid once the operand type is available. - Make that placeholder `RuntimeReplaceable`, so `ReplaceExpressions` restores an ordinary typed `Literal(null, ...)` before later optimizer rules run and existing null-literal simplifications continue to apply. - Add focused regressions for: - nested struct-field `nullif(c.provider, lower(...))` analysis in both `ALWAYS_INLINE_COMMON_EXPR` modes; - `NullIf` replacement type reporting before type coercion; - optimizer replacement back to a normal null literal; - explain output avoiding exposure of the internal helper name. ### Does this PR introduce _any_ user-facing change? Yes. Valid `NULLIF` expressions over unresolved nested field references that could fail during analysis now resolve and execute successfully. ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.NullExpressionsSuite -- -z "NullIf replacement preserves its data type before type coercion"'` - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.OptimizerSuite -- -z "NullIf typed null branch is replaced with a null literal"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "nullif function"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.ExplainSuite -- -z "explain for these functions; use range to avoid constant folding"'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Codex (GPT-5.5) Closes #55838 from sunchao/dev/chao/codex/oss-nullif-unresolved. Authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Peter Toth <peter.toth@gmail.com> (cherry picked from commit 5949ab3) Signed-off-by: Peter Toth <peter.toth@gmail.com>
peter-toth
pushed a commit
that referenced
this pull request
May 14, 2026
### Why are the changes needed? `NULLIF` builds its replacement expression before analysis has resolved all child expressions. For nested field references, the existing implementation can read the left operand's data type too early while constructing the null branch, which can fail analysis even though the SQL shape is valid. SPARK-56840 tracks this analyzer failure. ### What changes were proposed in this PR? - Build the `NULLIF` null branch with a lazy typed-null placeholder so construction does not eagerly read the unresolved left operand type, while `NullIf.replacement.dataType` remains valid once the operand type is available. - Make that placeholder `RuntimeReplaceable`, so `ReplaceExpressions` restores an ordinary typed `Literal(null, ...)` before later optimizer rules run and existing null-literal simplifications continue to apply. - Add focused regressions for: - nested struct-field `nullif(c.provider, lower(...))` analysis in both `ALWAYS_INLINE_COMMON_EXPR` modes; - `NullIf` replacement type reporting before type coercion; - optimizer replacement back to a normal null literal; - explain output avoiding exposure of the internal helper name. ### Does this PR introduce _any_ user-facing change? Yes. Valid `NULLIF` expressions over unresolved nested field references that could fail during analysis now resolve and execute successfully. ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.NullExpressionsSuite -- -z "NullIf replacement preserves its data type before type coercion"'` - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.OptimizerSuite -- -z "NullIf typed null branch is replaced with a null literal"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "nullif function"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.ExplainSuite -- -z "explain for these functions; use range to avoid constant folding"'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Codex (GPT-5.5) Closes #55838 from sunchao/dev/chao/codex/oss-nullif-unresolved. Authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Peter Toth <peter.toth@gmail.com> (cherry picked from commit 5949ab3) Signed-off-by: Peter Toth <peter.toth@gmail.com>
peter-toth
pushed a commit
that referenced
this pull request
May 14, 2026
### Why are the changes needed? `NULLIF` builds its replacement expression before analysis has resolved all child expressions. For nested field references, the existing implementation can read the left operand's data type too early while constructing the null branch, which can fail analysis even though the SQL shape is valid. SPARK-56840 tracks this analyzer failure. ### What changes were proposed in this PR? - Build the `NULLIF` null branch with a lazy typed-null placeholder so construction does not eagerly read the unresolved left operand type, while `NullIf.replacement.dataType` remains valid once the operand type is available. - Make that placeholder `RuntimeReplaceable`, so `ReplaceExpressions` restores an ordinary typed `Literal(null, ...)` before later optimizer rules run and existing null-literal simplifications continue to apply. - Add focused regressions for: - nested struct-field `nullif(c.provider, lower(...))` analysis in both `ALWAYS_INLINE_COMMON_EXPR` modes; - `NullIf` replacement type reporting before type coercion; - optimizer replacement back to a normal null literal; - explain output avoiding exposure of the internal helper name. ### Does this PR introduce _any_ user-facing change? Yes. Valid `NULLIF` expressions over unresolved nested field references that could fail during analysis now resolve and execute successfully. ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.NullExpressionsSuite -- -z "NullIf replacement preserves its data type before type coercion"'` - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.OptimizerSuite -- -z "NullIf typed null branch is replaced with a null literal"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "nullif function"'` - `build/sbt 'sql/testOnly org.apache.spark.sql.ExplainSuite -- -z "explain for these functions; use range to avoid constant folding"'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Codex (GPT-5.5) Closes #55838 from sunchao/dev/chao/codex/oss-nullif-unresolved. Authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Peter Toth <peter.toth@gmail.com> (cherry picked from commit 5949ab3) Signed-off-by: Peter Toth <peter.toth@gmail.com>
Contributor
|
@sunchao , sorry, I think I merged this PR a bit too early, can you please doublecheck the repro test as it seems to pass without the fix on I think the issue is real, but we probably need a proper repro in a follow-up PR. Also, I couldn't merge this to |
Member
Author
|
Thanks @peter-toth . Let me take a look on the repro test. And yes, I can open a PR against |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
NULLIFbuilds its replacement expression before analysis has resolved all child expressions.For nested field references, the existing implementation can read the left operand's data type
too early while constructing the null branch, which can fail analysis even though the SQL shape
is valid.
SPARK-56840 tracks this analyzer failure.
What changes were proposed in this PR?
NULLIFnull branch with a lazy typed-null placeholder so construction does not eagerlyread the unresolved left operand type, while
NullIf.replacement.dataTyperemains valid once theoperand type is available.
RuntimeReplaceable, soReplaceExpressionsrestores an ordinary typedLiteral(null, ...)before later optimizer rules run and existing null-literal simplificationscontinue to apply.
nullif(c.provider, lower(...))analysis in bothALWAYS_INLINE_COMMON_EXPRmodes;NullIfreplacement type reporting before type coercion;Does this PR introduce any user-facing change?
Yes. Valid
NULLIFexpressions over unresolved nested field references that could fail duringanalysis now resolve and execute successfully.
How was this patch tested?
build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.NullExpressionsSuite -- -z "NullIf replacement preserves its data type before type coercion"'build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.OptimizerSuite -- -z "NullIf typed null branch is replaced with a null literal"'build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "nullif function"'build/sbt 'sql/testOnly org.apache.spark.sql.ExplainSuite -- -z "explain for these functions; use range to avoid constant folding"'Was this patch authored or co-authored using generative AI tooling?
Generated-by: Codex (GPT-5.5)