New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31190][SQL] ScalaReflection should not erasure user defined AnyVal type #27959
Conversation
if (isSubtype(tpe, localTypeOf[AnyVal]) && !tpe.toString.startsWith("scala")) { | ||
tpe | ||
} else { | ||
tpe.erasure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does erasure do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, it convert a Scala type to Java type. e.g. scala.Any
-> java.lang.Object
.
Test build #120053 has finished for PR 27959 at commit
|
* or nested classes in package objects, it uses the dollar sign ($) to create | ||
* synthetic classes, emulating behaviour in Java bytecode. | ||
*/ | ||
def getClassNameFromType(tpe: `Type`): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to quote Type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know it clearly...just copied from the original one.
* synthetic classes, emulating behaviour in Java bytecode. | ||
*/ | ||
def getClassNameFromType(tpe: `Type`): String = { | ||
tpe.dealias.erasure.typeSymbol.asClass.fullName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's a problem only for AnyVal? Here we blindly erasure everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a AnVal class, e.g. case class Foo(i: Int) extends AnyVal
, erasure on it will return type of Int
instead of Foo
. So, this actually cause a problem when we add the class name to the walkedTypePath
.
@@ -611,10 +611,33 @@ object ScalaReflection extends ScalaReflection { | |||
} | |||
} | |||
|
|||
private def erasure(tpe: Type): Type = { | |||
if (isSubtype(tpe, localTypeOf[AnyVal]) && !tpe.toString.startsWith("scala")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some comments to explain why we special case AnyVal
?
Test build #120097 has finished for PR 27959 at commit
|
Jenkins, retest this please. |
Test build #120172 has finished for PR 27959 at commit
|
thanks, merging to master/3.0! |
…yVal type ### What changes were proposed in this pull request? Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits: 1. Give better encode error message for some unsupported types, e.g. `Any` 2. Won't miss the walk path for the `AnyVal` type ### Why are the changes needed? Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`. Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types. ### Does this PR introduce any user-facing change? Yes. For the test below: ``` case class Bar(i: Any) case class Foo(i: Bar) extends AnyVal test() { implicitly[ExpressionEncoder[Foo]] } ``` Before: ``` java.lang.ClassNotFoundException: scala.Any at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ... ```` After: ``` java.lang.UnsupportedOperationException: No Encoder found for Any - field (class: "java.lang.Object", name: "i") - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i") - root class: "org.apache.spark.sql.catalyst.encoders.Foo" at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561) ``` ### How was this patch tested? Added unit test and test manually. Closes #27959 from Ngone51/impr_anyval. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 5c4d44b) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks all! |
…yVal type ### What changes were proposed in this pull request? Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits: 1. Give better encode error message for some unsupported types, e.g. `Any` 2. Won't miss the walk path for the `AnyVal` type ### Why are the changes needed? Firstly, PR apache#15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`. Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types. ### Does this PR introduce any user-facing change? Yes. For the test below: ``` case class Bar(i: Any) case class Foo(i: Bar) extends AnyVal test() { implicitly[ExpressionEncoder[Foo]] } ``` Before: ``` java.lang.ClassNotFoundException: scala.Any at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ... ```` After: ``` java.lang.UnsupportedOperationException: No Encoder found for Any - field (class: "java.lang.Object", name: "i") - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i") - root class: "org.apache.spark.sql.catalyst.encoders.Foo" at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561) ``` ### How was this patch tested? Added unit test and test manually. Closes apache#27959 from Ngone51/impr_anyval. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Improve
ScalaReflection
to only don't erasure non user definedAnyVal
type, but still erasure other types, e.g.Any
. And this brings two benefits:Give better encode error message for some unsupported types, e.g.
Any
Won't miss the walk path for the
AnyVal
typeWhy are the changes needed?
Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends
AnyVal
, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g.Any
, it will fail onjava.lang.ClassNotFoundException: scala.Any
due to the reason thatscala.Any
doesn't erasure tojava.lang.Object
.Also, in current
getClassNameFromType()
, it always erasure types which could missing walked path for user definedAnyVal
types.Does this PR introduce any user-facing change?
Yes. For the test below:
Before:
After:
How was this patch tested?
Added unit test and test manually.