Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion #34747

Closed
wants to merge 8 commits into from

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Nov 29, 2021

What changes were proposed in this pull request?

Show extra hint in the error message if analysis failed only with ANSI type coercion:

To fix the error, you might need to add explicit type casts. If necessary set spark.sql.ansi.enabled to false to bypass this error.

Why are the changes needed?

Improve error message

Does this PR introduce any user-facing change?

Yes, Spark will show extra hint if analyzer fails due to ANSI type coercion

How was this patch tested?

Unit tests

@gengliangwang
Copy link
Member Author

cc @cloud-fan @entong

@github-actions github-actions bot added the SQL label Nov 29, 2021
try {
checkAnalysis(nonAnsiPlan)
"\nTo fix the error, you might need to add explicit type casts.\n" +
"To bypass the error with lenient type coercion rules, " +
Copy link

@entong entong Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If necessary set spark.sql.ansi.enabled to false to bypass this error. to be consistent with other ansi related errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we need to provide more context here. There are data type mismatch errors in non-Ansi mode as well.

@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50198/

@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50198/

@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Test build #145728 has finished for PR 34747 at commit 16763bb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

""
} else {
val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext {
executeSameContext(plan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be expensive to run the analyzer again under ANSI mode just for the error message. Maybe we can just add this hint "To fix the error, you might need to add explicit type casts." to the existing error messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be a perf issue. When the code reaches here, the query already fails.
But from user experience, I am thinking about just adding To fix the error, you might need to add explicit type casts. and don't show the hint set spark.sql.ansi.enabled to false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, people are not able to edit the query. I think turning off ansi mode is still a necessary workaround.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang To fix the error, you might need to add explicit type casts. If necessary set spark.sql.ansi.enabled to false to bypass this error.

}
try {
checkAnalysis(nonAnsiPlan)
"\nTo fix the error, you might need to add explicit type casts.\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about the accuracy here. Re-running the entire analyzer includes more stuff, not just type coercion. Can we be more surgical and only rerun type coercion rules in CheckAnalysis when we hit input type mismatch error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is, the analyzer has "side effects", as it may send RPC requests to the remote catalog. I think it's better to not run the entire analyzer again, even if the query fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more surgical and only rerun type coercion rules in CheckAnalysis when we hit input type mismatch error?

We need to rerun some of the rules since they were skipped because the children weren't resolved.
If we have to do it, we should split the case match of checkAnalysis into two parts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the analysis is bottom-up, and CheckAnalysis should find the bottom-most expression whose children are all resolved and input type mismatches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take ResolveAliases as an example, the order of it is in front of Type Coercion rules, but it won't happen in the first run since the children is not resolved. After apply the Type Coercion rules, we still have to run the other rules again:
image

// Check if the data types match.
dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, dt2), ci) =>
// SPARK-18058: we shall not care about the nullability of columns
if (dataTypesAreCompatibleFn(dt1, dt2)) {
operator.setTagValue(DATA_TYPE_MISMATCH_ERROR, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to set the tag here? it's always the root node and it's very easy to find it.

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50235/

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50237/

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50235/

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50237/

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Test build #145763 has finished for PR 34747 at commit 8307e0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 30, 2021

Test build #145765 has finished for PR 34747 at commit eb497b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

Merging to master

cloud-fan pushed a commit that referenced this pull request Jul 21, 2023
…rrect name

### What changes were proposed in this pull request?
#41850 uses `TYPE_CHECK_FAILURE_WITH_HINT`, it should be `DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`.

The first commit come from #34747.

### Why are the changes needed?
Fix a bug.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

Closes #42084 from beliefer/SPARK-44292_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jul 21, 2023
…rrect name

### What changes were proposed in this pull request?
#41850 uses `TYPE_CHECK_FAILURE_WITH_HINT`, it should be `DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`.

The first commit come from #34747.

### Why are the changes needed?
Fix a bug.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

Closes #42084 from beliefer/SPARK-44292_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 325888b)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…rrect name

### What changes were proposed in this pull request?
apache#41850 uses `TYPE_CHECK_FAILURE_WITH_HINT`, it should be `DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`.

The first commit come from apache#34747.

### Why are the changes needed?
Fix a bug.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

Closes apache#42084 from beliefer/SPARK-44292_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants