Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208

Closed

Conversation

icexelloss
Copy link
Contributor

@icexelloss icexelloss commented Aug 23, 2018

What changes were proposed in this pull request?

The current error message is often confusing to a new Spark user that a column containing "." needs backticks quote.

For example, consider the following code:

spark.range(0, 1).toDF('a.b')['a.b']

the current message looks like this and is confusing:

Cannot resolve column name "a.b" among (a.b)

This PR improves the error message to,

Cannot resolve column name "a.b" among (a.b). Try adding backticks to the column name, i.e., `a.b`;

How was this patch tested?

Manual test in shell

@icexelloss icexelloss changed the title Improve error message when a column containing dot cannot be resolved [SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved Aug 23, 2018
@SparkQA
Copy link

SparkQA commented Aug 23, 2018

Test build #95178 has finished for PR 22208 at commit 21a3732.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (schema.fieldNames.contains(colName)) {
throw new AnalysisException(
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}).
| Try adding backticks to the column name, i.e., `$colName`"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explain, for instance, if the name parts in the column should be kept as the part of its column name, try to quote them by backticks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Thanks for the review!

Sorry I don't quite understand your sentence here:

if the name parts in the column should be kept as the part of its column name

Would you mind elaborating what do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I mean if the name parts of the column a.b.c should be considered as the name of whole column itself like `a.b.c`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, how about:

Try adding backticks to the column name, i.e., `$colName`, if $colName is the name of the whole column

I am fine with either one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, please go ahead.

.stripMargin.replaceAll("\n", ""))
} else {
throw new AnalysisException(
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of message, ) is missing.

@@ -216,8 +216,16 @@ class Dataset[T] private[sql](
private[sql] def resolve(colName: String): NamedExpression = {
queryExecution.analyzed.resolveQuoted(colName, sparkSession.sessionState.analyzer.resolver)
.getOrElse {
throw new AnalysisException(
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})""")
if (schema.fieldNames.contains(colName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icexelloss . This cannot handle mixed cases like the following. This should be handled for the purpose of this PR. Please use sparkSession.sessionState.analyzer.resolver.

spark.range(0, 1).toDF('A.b')['a.B']

@dongjoon-hyun
Copy link
Member

Could you add some unit tests for this? At least, we had better check the error message for both spark.sql.caseSensitive=true and spark.sql.caseSensitive=false.

@SparkQA
Copy link

SparkQA commented Aug 27, 2018

Test build #95301 has finished for PR 22208 at commit 01f9cd5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2018

Test build #95315 has finished for PR 22208 at commit a8a5976.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@icexelloss
Copy link
Contributor Author

@dongjoon-hyun Could please take another look? I changed to use resolver and try to resolve column with backticks and added unit tests as well.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 29, 2018

Thank you for updating and adding a test case, @icexelloss .

  • First of all, my previous comment about using resolver means the following. Instead of queryExecution.analyzed.resolveQuoted(xxx, resolver).isDefined, the following will be enough and fast.

    - if (schema.fieldNames.contains(colName)) {
    + if (schema.fieldNames.exists(resolver(_, colName))) {
  • Given that this is about appending additional note at the end of error message, the third commit looks like a too aggressive change. Could you rollback that in order to minimize the touched line?

@icexelloss
Copy link
Contributor Author

@dongjoon-hyun SGTM. I misunderstood your suggestion about resolver. Keeping it simple was my preference too.

@SparkQA
Copy link

SparkQA commented Aug 29, 2018

Test build #95427 has finished for PR 22208 at commit a8a5976.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 30, 2018

Test build #95439 has finished for PR 22208 at commit 2b00e92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97716 has finished for PR 22208 at commit 2b00e92.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97728 has finished for PR 22208 at commit 2b00e92.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97783 has started for PR 22208 at commit 2b00e92.

@AmplabJenkins
Copy link

Build finished. Test FAILed.

@HyukjinKwon
Copy link
Member

I'm leaving this closed for inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants