New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40615][SQL] Check unsupported data types when decorrelating subqueries #38050
[SPARK-40615][SQL] Check unsupported data types when decorrelating subqueries #38050
Conversation
}, | ||
"UNSUPPORTED_OUTER_REFERENCE_DATA_TYPE" : { | ||
"message" : [ | ||
"Correlated column references do not support data type <dataType>: <expr>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I found it was a bit confusing to read the error message instance: Correlated column references do not support data type map<string,int>: v1.x
.
Basically v1.x
as an expression was not clear in the error message.
For example to me this is more clear: Correlated column references do not support expression v1.x due to unsupported data type map<string,int>
. Something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even Correlated column references do not support data type map<string,int> of expression v1.x
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just say:
The data type of correlated column references cannot be/contain map type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amaliujia @cloud-fan I will update the error message.
cc @cloud-fan |
078a1a2
to
dacf544
Compare
throw QueryCompilationErrors.unsupportedCorrelatedReferenceDataTypeError( | ||
o, a.dataType, plan.origin) | ||
} else { | ||
throw new IllegalStateException(s"Unable to decorrelate subquery: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this should never happen, let's use SparkException.internalError
@@ -3138,4 +3138,4 @@ | |||
"<className> must override either <m1> or <m2>" | |||
] | |||
} | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: revert this.
@@ -369,7 +370,7 @@ object DecorrelateInnerQuery extends PredicateHelper { | |||
throw QueryCompilationErrors.unsupportedCorrelatedReferenceDataTypeError( | |||
o, a.dataType, plan.origin) | |||
} else { | |||
throw new IllegalStateException(s"Unable to decorrelate subquery: " + | |||
throw SparkException.internalError(s"Unable to decorrelate subquery: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I saw Wenchen's comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #38050 (comment)
@@ -879,6 +879,11 @@ | |||
"Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses<treeNode>" | |||
] | |||
}, | |||
"UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE" : { | |||
"message" : [ | |||
"Correlated column reference '<expr>' cannot be <dataType> type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this is better!
LGTM |
thanks, merging to master! |
…bled ### What changes were proposed in this pull request? This PR proposes the make the tests added in #38050 pass with ANSI mode enabled by avoiding string binary operations. ### Why are the changes needed? To make the tests pass with ANSI enabled on. Currently, it fails as below (https://github.com/apache/spark/actions/runs/3286184541/jobs/5414029918): ``` [info] - SPARK-40615: Check unsupported data type when decorrelating subqueries *** FAILED *** (118 milliseconds) [info] "[DATATYPE_MISMATCH.BINARY_OP_WRONG_TYPE] Cannot resolve "(a + a)" due to data type mismatch: the binary operator requires the input type ("NUMERIC" or "INTERVAL DAY TO SECOND" or "INTERVAL YEAR TO MONTH" or "INTERVAL"), not "STRING".; line 1 pos 15; [info] 'Project [unresolvedalias(scalar-subquery#426412 [], None)] [info] : +- 'Project [unresolvedalias((a#426411 + a#426411), None)] [info] : +- SubqueryAlias __auto_generated_subquery_name [info] : +- Project [upper(cast(outer(x#426413)[a] as string)) AS a#426411] [info] : +- OneRowRelation [info] +- SubqueryAlias v1 [info] +- View (`v1`, [x#426413]) [info] +- Project [cast(x#426414 as map<string,int>) AS x#426413] [info] +- SubqueryAlias t [info] +- LocalRelation [x#426414] [info] " did not contain "Correlated column reference 'v1.x' cannot be map type" (SubquerySuite.scala:2480) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.SubquerySuite.$anonfun$new$320(SubquerySuite.scala:2480) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) [info] at org.apache.spark.sql.test.SQLTestUtilsBase.withTempView(SQLTestUtils.scala:276) [info] at org.apache.spark.sql.test.SQLTestUtilsBase.withTempView$(SQLTestUtils.scala:274) [info] at org.apache.spark.sql.SubquerySuite.withTempView(SubquerySuite.scala:32) [info] at org.apache.spark.sql.SubquerySuite.$anonfun$new$319(SubquerySuite.scala:2459) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) ``` ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran the tests and verified that it passes. Closes #38325 from HyukjinKwon/SPARK-40615-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…bqueries ### What changes were proposed in this pull request? This PR checks unsupported data types when decorrelating subqueries, and throws a more user-friendly error message if so. ### Why are the changes needed? There are certain data types (e.g MapType) that do not support ordering. This will cause the join conditions added by `DecorrelateInnerQuery` unresolved. We want to improve the error messages in this case. ### Does this PR introduce _any_ user-facing change? Yes. This PR introduces a new error message. For example: ```sql -- Suppose x is a map type column select (select a + a from (select upper(x['a']) as a)) from v1 ``` Before this PR, this will throw an exception: ``` After applying rule org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates in batch Pullup Correlated Expressions, the structural integrity of the plan is broken. ``` After this PR, this will throw an exception with a better error message: ``` Correlated column reference 'v1.x' cannot be map type ``` ### How was this patch tested? Unit tests Closes apache#38050 from allisonwang-db/spark-40615-check-subquery-data-type. Authored-by: allisonwang-db <allison.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…bled ### What changes were proposed in this pull request? This PR proposes the make the tests added in apache#38050 pass with ANSI mode enabled by avoiding string binary operations. ### Why are the changes needed? To make the tests pass with ANSI enabled on. Currently, it fails as below (https://github.com/apache/spark/actions/runs/3286184541/jobs/5414029918): ``` [info] - SPARK-40615: Check unsupported data type when decorrelating subqueries *** FAILED *** (118 milliseconds) [info] "[DATATYPE_MISMATCH.BINARY_OP_WRONG_TYPE] Cannot resolve "(a + a)" due to data type mismatch: the binary operator requires the input type ("NUMERIC" or "INTERVAL DAY TO SECOND" or "INTERVAL YEAR TO MONTH" or "INTERVAL"), not "STRING".; line 1 pos 15; [info] 'Project [unresolvedalias(scalar-subquery#426412 [], None)] [info] : +- 'Project [unresolvedalias((a#426411 + a#426411), None)] [info] : +- SubqueryAlias __auto_generated_subquery_name [info] : +- Project [upper(cast(outer(x#426413)[a] as string)) AS a#426411] [info] : +- OneRowRelation [info] +- SubqueryAlias v1 [info] +- View (`v1`, [x#426413]) [info] +- Project [cast(x#426414 as map<string,int>) AS x#426413] [info] +- SubqueryAlias t [info] +- LocalRelation [x#426414] [info] " did not contain "Correlated column reference 'v1.x' cannot be map type" (SubquerySuite.scala:2480) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.SubquerySuite.$anonfun$new$320(SubquerySuite.scala:2480) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) [info] at org.apache.spark.sql.test.SQLTestUtilsBase.withTempView(SQLTestUtils.scala:276) [info] at org.apache.spark.sql.test.SQLTestUtilsBase.withTempView$(SQLTestUtils.scala:274) [info] at org.apache.spark.sql.SubquerySuite.withTempView(SubquerySuite.scala:32) [info] at org.apache.spark.sql.SubquerySuite.$anonfun$new$319(SubquerySuite.scala:2459) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) ``` ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran the tests and verified that it passes. Closes apache#38325 from HyukjinKwon/SPARK-40615-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR checks unsupported data types when decorrelating subqueries, and throws a more user-friendly error message if so.
Why are the changes needed?
There are certain data types (e.g MapType) that do not support ordering. This will cause the join conditions added by
DecorrelateInnerQuery
unresolved. We want to improve the error messages in this case.Does this PR introduce any user-facing change?
Yes. This PR introduces a new error message. For example:
Before this PR, this will throw an exception:
After this PR, this will throw an exception with a better error message:
How was this patch tested?
Unit tests