-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32624][SQL][FOLLOWUP] Fix regression in CodegenContext.addReferenceObj on nested Scala types #29602
Conversation
…624: it'll get a "null" for class name of some nested Scala types
cc original author and potential reviewers: @wangyum @cloud-fan @maropu @kiszk @viirya |
Test build #128114 has finished for PR 29602 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, @rednaxelafx !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch and nice description to explain the issue! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged to master and branch-3.0. |
…renceObj on nested Scala types ### What changes were proposed in this pull request? Use `CodeGenerator.typeName()` instead of `Class.getCanonicalName()` in `CodegenContext.addReferenceObj()` for getting the runtime class name for an object. ### Why are the changes needed? #29439 fixed a bug in `CodegenContext.addReferenceObj()` for `Array[Byte]` (i.e. Spark SQL's `BinaryType`) objects, but unfortunately it introduced a regression for some nested Scala types. For example, for `implicitly[Ordering[UTF8String]]`, after that PR `CodegenContext.addReferenceObj()` would return `((null) references[0] /* ... */)`. The actual type for `implicitly[Ordering[UTF8String]]` is `scala.math.LowPriorityOrderingImplicits$$anon$3` in Scala 2.12.10, and `Class.getCanonicalName()` returns `null` for that class. On the other hand, `Class.getName()` is safe to use for all non-array types, and Janino will happily accept the type name returned from `Class.getName()` for nested types. `CodeGenerator.typeName()` happens to do the right thing by correctly handling arrays and otherwise use `Class.getName()`. So it's a better alternative than `Class.getCanonicalName()`. Side note: rule of thumb for using Java reflection in Spark: it may be tempting to use `Class.getCanonicalName()`, but for functions that may need to handle Scala types, please avoid it due to potential issues with nested Scala types. Instead, use `Class.getName()` or utility functions in `org.apache.spark.util.Utils` (e.g. `Utils.getSimpleName()` or `Utils.getFormattedClassName()` etc). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new unit test case for the regression case in `CodeGenerationSuite`. Closes #29602 from rednaxelafx/spark-32624-followup. Authored-by: Kris Mok <kris.mok@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 6e5bc39) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Good catch, late LGTM |
What changes were proposed in this pull request?
Use
CodeGenerator.typeName()
instead ofClass.getCanonicalName()
inCodegenContext.addReferenceObj()
for getting the runtime class name for an object.Why are the changes needed?
#29439 fixed a bug in
CodegenContext.addReferenceObj()
forArray[Byte]
(i.e. Spark SQL'sBinaryType
) objects, but unfortunately it introduced a regression for some nested Scala types.For example, for
implicitly[Ordering[UTF8String]]
, after that PRCodegenContext.addReferenceObj()
would return((null) references[0] /* ... */)
. The actual type forimplicitly[Ordering[UTF8String]]
isscala.math.LowPriorityOrderingImplicits$$anon$3
in Scala 2.12.10, andClass.getCanonicalName()
returnsnull
for that class.On the other hand,
Class.getName()
is safe to use for all non-array types, and Janino will happily accept the type name returned fromClass.getName()
for nested types.CodeGenerator.typeName()
happens to do the right thing by correctly handling arrays and otherwise useClass.getName()
. So it's a better alternative thanClass.getCanonicalName()
.Side note: rule of thumb for using Java reflection in Spark: it may be tempting to use
Class.getCanonicalName()
, but for functions that may need to handle Scala types, please avoid it due to potential issues with nested Scala types.Instead, use
Class.getName()
or utility functions inorg.apache.spark.util.Utils
(e.g.Utils.getSimpleName()
orUtils.getFormattedClassName()
etc).Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new unit test case for the regression case in
CodeGenerationSuite
.