[SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion #34747

gengliangwang · 2021-11-29T16:45:31Z

What changes were proposed in this pull request?

Show extra hint in the error message if analysis failed only with ANSI type coercion:

To fix the error, you might need to add explicit type casts. If necessary set spark.sql.ansi.enabled to false to bypass this error.

Why are the changes needed?

Improve error message

Does this PR introduce any user-facing change?

Yes, Spark will show extra hint if analyzer fails due to ANSI type coercion

How was this patch tested?

Unit tests

gengliangwang · 2021-11-29T16:45:46Z

cc @cloud-fan @entong

entong · 2021-11-29T17:43:00Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+      try {
+        checkAnalysis(nonAnsiPlan)
+        "\nTo fix the error, you might need to add explicit type casts.\n" +
+          "To bypass the error with lenient type coercion rules, " +


If necessary set spark.sql.ansi.enabled to false to bypass this error. to be consistent with other ansi related errors.

I feel that we need to provide more context here. There are data type mismatch errors in non-Ansi mode as well.

SparkQA · 2021-11-29T18:05:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50198/

SparkQA · 2021-11-29T19:04:37Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50198/

SparkQA · 2021-11-29T19:32:00Z

Test build #145728 has finished for PR 34747 at commit 16763bb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

allisonwang-db · 2021-11-30T02:12:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+      ""
+    } else {
+      val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext {
+        executeSameContext(plan)


It might be expensive to run the analyzer again under ANSI mode just for the error message. Maybe we can just add this hint "To fix the error, you might need to add explicit type casts." to the existing error messages.

It won't be a perf issue. When the code reaches here, the query already fails.
But from user experience, I am thinking about just adding To fix the error, you might need to add explicit type casts. and don't show the hint set spark.sql.ansi.enabled to false

In some cases, people are not able to edit the query. I think turning off ansi mode is still a necessary workaround.

@gengliangwang To fix the error, you might need to add explicit type casts. If necessary set spark.sql.ansi.enabled to false to bypass this error.

cloud-fan · 2021-11-30T04:36:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+      }
+      try {
+        checkAnalysis(nonAnsiPlan)
+        "\nTo fix the error, you might need to add explicit type casts.\n" +


I'm a bit worried about the accuracy here. Re-running the entire analyzer includes more stuff, not just type coercion. Can we be more surgical and only rerun type coercion rules in CheckAnalysis when we hit input type mismatch error?

Another point is, the analyzer has "side effects", as it may send RPC requests to the remote catalog. I think it's better to not run the entire analyzer again, even if the query fails.

Can we be more surgical and only rerun type coercion rules in CheckAnalysis when we hit input type mismatch error?

We need to rerun some of the rules since they were skipped because the children weren't resolved.
If we have to do it, we should split the case match of checkAnalysis into two parts

IIUC the analysis is bottom-up, and CheckAnalysis should find the bottom-most expression whose children are all resolved and input type mismatches?

Take ResolveAliases as an example, the order of it is in front of Type Coercion rules, but it won't happen in the first run since the children is not resolved. After apply the Type Coercion rules, we still have to run the other rules again:

cloud-fan · 2021-11-30T09:01:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

              // Check if the data types match.
              dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, dt2), ci) =>
                // SPARK-18058: we shall not care about the nullability of columns
                if (dataTypesAreCompatibleFn(dt1, dt2)) {
+                  operator.setTagValue(DATA_TYPE_MISMATCH_ERROR, true)


do we need to set the tag here? it's always the root node and it's very easy to find it.

SparkQA · 2021-11-30T10:15:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50235/

SparkQA · 2021-11-30T11:07:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50237/

SparkQA · 2021-11-30T11:14:25Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50235/

SparkQA · 2021-11-30T12:09:33Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50237/

SparkQA · 2021-11-30T14:14:31Z

Test build #145763 has finished for PR 34747 at commit 8307e0a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-30T15:19:12Z

Test build #145765 has finished for PR 34747 at commit eb497b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2021-12-01T04:44:36Z

Merging to master

…rrect name ### What changes were proposed in this pull request? #41850 uses `TYPE_CHECK_FAILURE_WITH_HINT`, it should be `DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`. The first commit come from #34747. ### Why are the changes needed? Fix a bug. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? N/A Closes #42084 from beliefer/SPARK-44292_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…rrect name ### What changes were proposed in this pull request? #41850 uses `TYPE_CHECK_FAILURE_WITH_HINT`, it should be `DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`. The first commit come from #34747. ### Why are the changes needed? Fix a bug. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? N/A Closes #42084 from beliefer/SPARK-44292_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 325888b) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…rrect name ### What changes were proposed in this pull request? apache#41850 uses `TYPE_CHECK_FAILURE_WITH_HINT`, it should be `DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`. The first commit come from apache#34747. ### Why are the changes needed? Fix a bug. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? N/A Closes apache#42084 from beliefer/SPARK-44292_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

improve error message

16763bb

github-actions bot added the SQL label Nov 29, 2021

entong reviewed Nov 29, 2021

View reviewed changes

allisonwang-db reviewed Nov 30, 2021

View reviewed changes

cloud-fan reviewed Nov 30, 2021

View reviewed changes

gengliangwang added 4 commits November 30, 2021 14:36

revert changes on Analyzer.scala

78f7326

check type coercion in CheckAnalysis

bf8aa7f

new implementation

b4f85fb

update golden files

2855860

cloud-fan reviewed Nov 30, 2021

View reviewed changes

cloud-fan approved these changes Nov 30, 2021

View reviewed changes

gengliangwang added 3 commits November 30, 2021 17:17

refactor

97cd037

revise

8307e0a

revise

eb497b5

gengliangwang closed this in d61c2f4 Dec 1, 2021

beliefer mentioned this pull request Jul 20, 2023

[SPARK-44292][SQL][FOLLOWUP] Make TYPE_CHECK_FAILURE_WITH_HINT use correct name #42084

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion #34747

[SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion #34747

gengliangwang commented Nov 29, 2021 •

edited

gengliangwang commented Nov 29, 2021

entong Nov 29, 2021 •

edited

gengliangwang Nov 30, 2021

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

allisonwang-db Nov 30, 2021

gengliangwang Nov 30, 2021

cloud-fan Nov 30, 2021

entong Nov 30, 2021

cloud-fan Nov 30, 2021

cloud-fan Nov 30, 2021

gengliangwang Nov 30, 2021

cloud-fan Nov 30, 2021

gengliangwang Nov 30, 2021

cloud-fan Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

gengliangwang commented Dec 1, 2021

[SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion #34747

[SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion #34747

Conversation

gengliangwang commented Nov 29, 2021 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

gengliangwang commented Nov 29, 2021

entong Nov 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

gengliangwang commented Dec 1, 2021

gengliangwang commented Nov 29, 2021 •

edited

entong Nov 29, 2021 •

edited