[SPARK-32624][SQL][FOLLOWUP] Fix regression in CodegenContext.addReferenceObj on nested Scala types #29602

rednaxelafx · 2020-08-31T19:34:14Z

What changes were proposed in this pull request?

Use CodeGenerator.typeName() instead of Class.getCanonicalName() in CodegenContext.addReferenceObj() for getting the runtime class name for an object.

Why are the changes needed?

#29439 fixed a bug in CodegenContext.addReferenceObj() for Array[Byte] (i.e. Spark SQL's BinaryType) objects, but unfortunately it introduced a regression for some nested Scala types.

For example, for implicitly[Ordering[UTF8String]], after that PR CodegenContext.addReferenceObj() would return ((null) references[0] /* ... */). The actual type for implicitly[Ordering[UTF8String]] is scala.math.LowPriorityOrderingImplicits$$anon$3 in Scala 2.12.10, and Class.getCanonicalName() returns null for that class.

On the other hand, Class.getName() is safe to use for all non-array types, and Janino will happily accept the type name returned from Class.getName() for nested types. CodeGenerator.typeName() happens to do the right thing by correctly handling arrays and otherwise use Class.getName(). So it's a better alternative than Class.getCanonicalName().

Side note: rule of thumb for using Java reflection in Spark: it may be tempting to use Class.getCanonicalName(), but for functions that may need to handle Scala types, please avoid it due to potential issues with nested Scala types.
Instead, use Class.getName() or utility functions in org.apache.spark.util.Utils (e.g. Utils.getSimpleName() or Utils.getFormattedClassName() etc).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new unit test case for the regression case in CodeGenerationSuite.

…624: it'll get a "null" for class name of some nested Scala types

rednaxelafx · 2020-08-31T19:35:48Z

cc original author and potential reviewers: @wangyum @cloud-fan @maropu @kiszk @viirya

SparkQA · 2020-09-01T00:05:08Z

Test build #128114 has finished for PR 29602 at commit 2ecd30b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu

Nice catch, @rednaxelafx !

viirya

Good catch and nice description to explain the issue! Thanks!

wangyum

LGTM

HyukjinKwon · 2020-09-01T06:14:35Z

Merged to master and branch-3.0.

…renceObj on nested Scala types ### What changes were proposed in this pull request? Use `CodeGenerator.typeName()` instead of `Class.getCanonicalName()` in `CodegenContext.addReferenceObj()` for getting the runtime class name for an object. ### Why are the changes needed? #29439 fixed a bug in `CodegenContext.addReferenceObj()` for `Array[Byte]` (i.e. Spark SQL's `BinaryType`) objects, but unfortunately it introduced a regression for some nested Scala types. For example, for `implicitly[Ordering[UTF8String]]`, after that PR `CodegenContext.addReferenceObj()` would return `((null) references[0] /* ... */)`. The actual type for `implicitly[Ordering[UTF8String]]` is `scala.math.LowPriorityOrderingImplicits$$anon$3` in Scala 2.12.10, and `Class.getCanonicalName()` returns `null` for that class. On the other hand, `Class.getName()` is safe to use for all non-array types, and Janino will happily accept the type name returned from `Class.getName()` for nested types. `CodeGenerator.typeName()` happens to do the right thing by correctly handling arrays and otherwise use `Class.getName()`. So it's a better alternative than `Class.getCanonicalName()`. Side note: rule of thumb for using Java reflection in Spark: it may be tempting to use `Class.getCanonicalName()`, but for functions that may need to handle Scala types, please avoid it due to potential issues with nested Scala types. Instead, use `Class.getName()` or utility functions in `org.apache.spark.util.Utils` (e.g. `Utils.getSimpleName()` or `Utils.getFormattedClassName()` etc). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added new unit test case for the regression case in `CodeGenerationSuite`. Closes #29602 from rednaxelafx/spark-32624-followup. Authored-by: Kris Mok <kris.mok@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 6e5bc39) Signed-off-by: HyukjinKwon <gurwls223@apache.org>

kiszk · 2020-09-02T07:56:46Z

Good catch, late LGTM

Fix a regression in CodegenContext.addReferenceObj caused by SPARK-32…

2ecd30b

…624: it'll get a "null" for class name of some nested Scala types

probot-autolabeler bot added the SQL label Aug 31, 2020

maropu approved these changes Sep 1, 2020

View reviewed changes

viirya approved these changes Sep 1, 2020

View reviewed changes

wangyum approved these changes Sep 1, 2020

View reviewed changes

HyukjinKwon approved these changes Sep 1, 2020

View reviewed changes

HyukjinKwon closed this in 6e5bc39 Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32624][SQL][FOLLOWUP] Fix regression in CodegenContext.addReferenceObj on nested Scala types #29602

[SPARK-32624][SQL][FOLLOWUP] Fix regression in CodegenContext.addReferenceObj on nested Scala types #29602

rednaxelafx commented Aug 31, 2020

rednaxelafx commented Aug 31, 2020

SparkQA commented Sep 1, 2020

maropu left a comment

viirya left a comment

wangyum left a comment

HyukjinKwon commented Sep 1, 2020

kiszk commented Sep 2, 2020

[SPARK-32624][SQL][FOLLOWUP] Fix regression in CodegenContext.addReferenceObj on nested Scala types #29602

[SPARK-32624][SQL][FOLLOWUP] Fix regression in CodegenContext.addReferenceObj on nested Scala types #29602

Conversation

rednaxelafx commented Aug 31, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

rednaxelafx commented Aug 31, 2020

SparkQA commented Sep 1, 2020

maropu left a comment

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment

wangyum left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Sep 1, 2020

kiszk commented Sep 2, 2020