-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208
[SPARK-25216][SQL] Improve error message when a column containing dot cannot be resolved #22208
Conversation
Test build #95178 has finished for PR 22208 at commit
|
if (schema.fieldNames.contains(colName)) { | ||
throw new AnalysisException( | ||
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}). | ||
| Try adding backticks to the column name, i.e., `$colName`""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would explain, for instance, if the name parts in the column should be kept as the part of its column name, try to quote them by backticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Thanks for the review!
Sorry I don't quite understand your sentence here:
if the name parts in the column should be kept as the part of its column name
Would you mind elaborating what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I mean if the name parts of the column a.b.c
should be considered as the name of whole column itself like `a.b.c`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, how about:
Try adding backticks to the column name, i.e., `$colName`, if $colName is the name of the whole column
I am fine with either one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, please go ahead.
.stripMargin.replaceAll("\n", "")) | ||
} else { | ||
throw new AnalysisException( | ||
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of message, )
is missing.
@@ -216,8 +216,16 @@ class Dataset[T] private[sql]( | |||
private[sql] def resolve(colName: String): NamedExpression = { | |||
queryExecution.analyzed.resolveQuoted(colName, sparkSession.sessionState.analyzer.resolver) | |||
.getOrElse { | |||
throw new AnalysisException( | |||
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})""") | |||
if (schema.fieldNames.contains(colName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icexelloss . This cannot handle mixed cases like the following. This should be handled for the purpose of this PR. Please use sparkSession.sessionState.analyzer.resolver
.
spark.range(0, 1).toDF('A.b')['a.B']
Could you add some unit tests for this? At least, we had better check the error message for both |
Test build #95301 has finished for PR 22208 at commit
|
Test build #95315 has finished for PR 22208 at commit
|
@dongjoon-hyun Could please take another look? I changed to use resolver and try to resolve column with backticks and added unit tests as well. |
Retest this please. |
Thank you for updating and adding a test case, @icexelloss .
|
@dongjoon-hyun SGTM. I misunderstood your suggestion about resolver. Keeping it simple was my preference too. |
Test build #95427 has finished for PR 22208 at commit
|
Test build #95439 has finished for PR 22208 at commit
|
Test build #97716 has finished for PR 22208 at commit
|
Test build #97728 has finished for PR 22208 at commit
|
Test build #97783 has started for PR 22208 at commit |
Build finished. Test FAILed. |
I'm leaving this closed for inactivity. |
What changes were proposed in this pull request?
The current error message is often confusing to a new Spark user that a column containing "." needs backticks quote.
For example, consider the following code:
the current message looks like this and is confusing:
This PR improves the error message to,
How was this patch tested?
Manual test in shell