[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208

icexelloss · 2018-08-23T18:44:00Z

What changes were proposed in this pull request?

The current error message is often confusing to a new Spark user that a column containing "." needs backticks quote.

For example, consider the following code:

spark.range(0, 1).toDF('a.b')['a.b']

the current message looks like this and is confusing:

Cannot resolve column name "a.b" among (a.b)

This PR improves the error message to,

Cannot resolve column name "a.b" among (a.b). Try adding backticks to the column name, i.e., `a.b`;

How was this patch tested?

Manual test in shell

SparkQA · 2018-08-23T22:21:21Z

Test build #95178 has finished for PR 22208 at commit 21a3732.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-08-24T01:29:48Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+        if (schema.fieldNames.contains(colName)) {
+          throw new AnalysisException(
+            s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}).
+               | Try adding backticks to the column name, i.e., `$colName`"""


I would explain, for instance, if the name parts in the column should be kept as the part of its column name, try to quote them by backticks.

@HyukjinKwon Thanks for the review!

Sorry I don't quite understand your sentence here:

if the name parts in the column should be kept as the part of its column name

Would you mind elaborating what do you mean?

Ah, I mean if the name parts of the column a.b.c should be considered as the name of whole column itself like `a.b.c`

I see, how about:

Try adding backticks to the column name, i.e., `$colName`, if $colName is the name of the whole column

I am fine with either one

Yup, please go ahead.

dongjoon-hyun · 2018-08-25T02:30:09Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+              .stripMargin.replaceAll("\n", ""))
+        } else {
+          throw new AnalysisException(
+            s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}"""


At the end of message, ) is missing.

dongjoon-hyun · 2018-08-25T02:36:52Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

@@ -216,8 +216,16 @@ class Dataset[T] private[sql](
  private[sql] def resolve(colName: String): NamedExpression = {
    queryExecution.analyzed.resolveQuoted(colName, sparkSession.sessionState.analyzer.resolver)
      .getOrElse {
-        throw new AnalysisException(
-          s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})""")
+        if (schema.fieldNames.contains(colName)) {


@icexelloss . This cannot handle mixed cases like the following. This should be handled for the purpose of this PR. Please use sparkSession.sessionState.analyzer.resolver.

spark.range(0, 1).toDF('A.b')['a.B']

dongjoon-hyun · 2018-08-25T02:39:31Z

Could you add some unit tests for this? At least, we had better check the error message for both spark.sql.caseSensitive=true and spark.sql.caseSensitive=false.

SparkQA · 2018-08-27T20:26:02Z

Test build #95301 has finished for PR 22208 at commit 01f9cd5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-28T01:55:49Z

Test build #95315 has finished for PR 22208 at commit a8a5976.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

icexelloss · 2018-08-28T13:39:40Z

@dongjoon-hyun Could please take another look? I changed to use resolver and try to resolve column with backticks and added unit tests as well.

dongjoon-hyun · 2018-08-29T17:12:26Z

Retest this please.

dongjoon-hyun · 2018-08-29T17:53:03Z

Thank you for updating and adding a test case, @icexelloss .

First of all, my previous comment about using resolver means the following. Instead of queryExecution.analyzed.resolveQuoted(xxx, resolver).isDefined, the following will be enough and fast.
```
- if (schema.fieldNames.contains(colName)) {
+ if (schema.fieldNames.exists(resolver(_, colName))) {
```
Given that this is about appending additional note at the end of error message, the third commit looks like a too aggressive change. Could you rollback that in order to minimize the touched line?

icexelloss · 2018-08-29T18:09:29Z

@dongjoon-hyun SGTM. I misunderstood your suggestion about resolver. Keeping it simple was my preference too.

SparkQA · 2018-08-29T18:52:36Z

Test build #95427 has finished for PR 22208 at commit a8a5976.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-30T00:43:01Z

Test build #95439 has finished for PR 22208 at commit 2b00e92.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-22T07:05:04Z

Test build #97716 has finished for PR 22208 at commit 2b00e92.

This patch fails due to an unknown error code, -9.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2018-10-22T07:05:04Z

Test build #97728 has finished for PR 22208 at commit 2b00e92.

This patch fails due to an unknown error code, -9.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2018-10-22T13:13:08Z

Test build #97783 has started for PR 22208 at commit 2b00e92.

AmplabJenkins · 2018-10-22T16:36:50Z

Build finished. Test FAILed.

HyukjinKwon · 2019-03-02T05:26:00Z

I'm leaving this closed for inactivity.

Improve error message when a column containing dot cannot be resolved

21a3732

icexelloss changed the title ~~Improve error message when a column containing dot cannot be resolved~~ [SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved Aug 23, 2018

HyukjinKwon reviewed Aug 24, 2018

View reviewed changes

HyukjinKwon approved these changes Aug 24, 2018

View reviewed changes

dongjoon-hyun reviewed Aug 25, 2018

View reviewed changes

Use resolver; Add test

01f9cd5

Catch exception when resolving with backticks

a8a5976

Revert double resolveQuoted logic

2b00e92

HyukjinKwon closed this Mar 2, 2019

HyukjinKwon mentioned this pull request Sep 17, 2019

[SPARK-25153][SQL] Improve error messages for columns with dots/periods #25807

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208

[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208

icexelloss commented Aug 23, 2018 •

edited

SparkQA commented Aug 23, 2018

HyukjinKwon Aug 24, 2018

icexelloss Aug 24, 2018

HyukjinKwon Aug 24, 2018

icexelloss Aug 24, 2018

HyukjinKwon Aug 25, 2018

dongjoon-hyun Aug 25, 2018

dongjoon-hyun Aug 25, 2018

dongjoon-hyun commented Aug 25, 2018

SparkQA commented Aug 27, 2018

SparkQA commented Aug 28, 2018

icexelloss commented Aug 28, 2018

dongjoon-hyun commented Aug 29, 2018

dongjoon-hyun commented Aug 29, 2018 •

edited

icexelloss commented Aug 29, 2018

SparkQA commented Aug 29, 2018

SparkQA commented Aug 30, 2018

SparkQA commented Oct 22, 2018

SparkQA commented Oct 22, 2018

SparkQA commented Oct 22, 2018

AmplabJenkins commented Oct 22, 2018

HyukjinKwon commented Mar 2, 2019

[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208

[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208

Conversation

icexelloss commented Aug 23, 2018 • edited

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Aug 23, 2018

HyukjinKwon Aug 24, 2018

Choose a reason for hiding this comment

icexelloss Aug 24, 2018

Choose a reason for hiding this comment

HyukjinKwon Aug 24, 2018

Choose a reason for hiding this comment

icexelloss Aug 24, 2018

Choose a reason for hiding this comment

HyukjinKwon Aug 25, 2018

Choose a reason for hiding this comment

dongjoon-hyun Aug 25, 2018

Choose a reason for hiding this comment

dongjoon-hyun Aug 25, 2018

Choose a reason for hiding this comment

dongjoon-hyun commented Aug 25, 2018

SparkQA commented Aug 27, 2018

SparkQA commented Aug 28, 2018

icexelloss commented Aug 28, 2018

dongjoon-hyun commented Aug 29, 2018

dongjoon-hyun commented Aug 29, 2018 • edited

icexelloss commented Aug 29, 2018

SparkQA commented Aug 29, 2018

SparkQA commented Aug 30, 2018

SparkQA commented Oct 22, 2018

SparkQA commented Oct 22, 2018

SparkQA commented Oct 22, 2018

AmplabJenkins commented Oct 22, 2018

HyukjinKwon commented Mar 2, 2019

icexelloss commented Aug 23, 2018 •

edited

dongjoon-hyun commented Aug 29, 2018 •

edited