-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22442][SQL] ScalaReflection should produce correct field names for special characters #19664
Conversation
Test build #83478 has finished for PR 19664 at commit
|
Test build #83486 has finished for PR 19664 at commit
|
cc @cloud-fan for review. |
@@ -214,11 +215,13 @@ case class Invoke( | |||
override def eval(input: InternalRow): Any = | |||
throw new UnsupportedOperationException("Only code-generated evaluation is supported.") | |||
|
|||
private lazy val encodedFunctionName = TermName(functionName).encodedName.toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does StaticInvoke
have some issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, although I didn't have concrete case causing the issue for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we use Invoke
to access field(s) in object, this is an issue. I didn't found StaticInvoke
used similarly. So it should be fine.
val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { | ||
case UpCast(u: UnresolvedAttribute, _, _) => u.name | ||
}} | ||
assert(argumentsFields(0) == "`field.1`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it has backticks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to deliberately wrap backticks around a field name such as field.1
because of the dot character. Otherwise UnresolvedAttribute
will parse it as two name parts Seq("field", "1")
and then fail resolving later.
good catch! |
assert(serializer.dataType(1).name == "field 2") | ||
|
||
val argumentsFields = deserializer.asInstanceOf[NewInstance].arguments.flatMap { _.collect { | ||
case UpCast(u: UnresolvedAttribute, _, _) => u.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we should collect u.nameParts
here. u.name
adds backticks if the nameParts
contains dot, which makes the test result a little confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Reasonable.
Test build #83622 has finished for PR 19664 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
For a class with field name of special characters, e.g.:
Although we can manipulate DataFrame/Dataset, the field names are encoded:
It causes resolving problem when we try to convert the data with non-encoded field names:
We should use decoded field name in Dataset schema.
How was this patch tested?
Added tests.