Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31190][SQL] ScalaReflection should not erasure user defined AnyVal type #27959

Closed
wants to merge 3 commits into from

Conversation

Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Mar 19, 2020

What changes were proposed in this pull request?

Improve ScalaReflection to only don't erasure non user defined AnyVal type, but still erasure other types, e.g. Any. And this brings two benefits:

  1. Give better encode error message for some unsupported types, e.g. Any

  2. Won't miss the walk path for the AnyVal type

Why are the changes needed?

Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends AnyVal, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. Any, it will fail on java.lang.ClassNotFoundException: scala.Any due to the reason that scala.Any doesn't erasure to java.lang.Object.

Also, in current getClassNameFromType(), it always erasure types which could missing walked path for user defined AnyVal types.

Does this PR introduce any user-facing change?

Yes. For the test below:

case class Bar(i: Any)
case class Foo(i: Bar) extends AnyVal

test() {
  implicitly[ExpressionEncoder[Foo]]
}

Before:

java.lang.ClassNotFoundException: scala.Any
 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
 ...

After:

java.lang.UnsupportedOperationException: No Encoder found for Any
 - field (class: "java.lang.Object", name: "i")
 - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i")
 - root class: "org.apache.spark.sql.catalyst.encoders.Foo"
 at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561)

How was this patch tested?

Added unit test and test manually.

if (isSubtype(tpe, localTypeOf[AnyVal]) && !tpe.toString.startsWith("scala")) {
tpe
} else {
tpe.erasure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does erasure do here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, it convert a Scala type to Java type. e.g. scala.Any -> java.lang.Object.

@SparkQA
Copy link

SparkQA commented Mar 19, 2020

Test build #120053 has finished for PR 27959 at commit 9e6a006.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Bar(i: Any)
  • case class Foo(i: Bar) extends AnyVal
  • test(\"AnyVal class with Any fields\")

* or nested classes in package objects, it uses the dollar sign ($) to create
* synthetic classes, emulating behaviour in Java bytecode.
*/
def getClassNameFromType(tpe: `Type`): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to quote Type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know it clearly...just copied from the original one.

* synthetic classes, emulating behaviour in Java bytecode.
*/
def getClassNameFromType(tpe: `Type`): String = {
tpe.dealias.erasure.typeSymbol.asClass.fullName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's a problem only for AnyVal? Here we blindly erasure everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a AnVal class, e.g. case class Foo(i: Int) extends AnyVal, erasure on it will return type of Int instead of Foo. So, this actually cause a problem when we add the class name to the walkedTypePath.

@@ -611,10 +611,33 @@ object ScalaReflection extends ScalaReflection {
}
}

private def erasure(tpe: Type): Type = {
if (isSubtype(tpe, localTypeOf[AnyVal]) && !tpe.toString.startsWith("scala")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some comments to explain why we special case AnyVal?

@SparkQA
Copy link

SparkQA commented Mar 20, 2020

Test build #120097 has finished for PR 27959 at commit ceb2ce6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • //case class Foo(i: Int) extends AnyValwill return typeIntinstead ofFoo.

@Ngone51
Copy link
Member Author

Ngone51 commented Mar 23, 2020

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 23, 2020

Test build #120172 has finished for PR 27959 at commit ceb2ce6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • //case class Foo(i: Int) extends AnyValwill return typeIntinstead ofFoo.

@cloud-fan cloud-fan changed the title [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type [SPARK-31190][SQL] ScalaReflection should not erasure non user defined AnyVal type Mar 23, 2020
@cloud-fan cloud-fan changed the title [SPARK-31190][SQL] ScalaReflection should not erasure non user defined AnyVal type [SPARK-31190][SQL] ScalaReflection should not erasure user defined AnyVal type Mar 23, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 5c4d44b Mar 23, 2020
cloud-fan pushed a commit that referenced this pull request Mar 23, 2020
…yVal type

### What changes were proposed in this pull request?

Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits:

1. Give better encode error message for some unsupported types, e.g. `Any`

2. Won't miss the walk path for the `AnyVal` type

### Why are the changes needed?

Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`.

Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types.

### Does this PR introduce any user-facing change?

Yes. For the test below:

```
case class Bar(i: Any)
case class Foo(i: Bar) extends AnyVal

test() {
  implicitly[ExpressionEncoder[Foo]]
}
```

Before:

```
java.lang.ClassNotFoundException: scala.Any
 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
 ...
````

After:
```
java.lang.UnsupportedOperationException: No Encoder found for Any
 - field (class: "java.lang.Object", name: "i")
 - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i")
 - root class: "org.apache.spark.sql.catalyst.encoders.Foo"
 at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561)
```

### How was this patch tested?

Added unit test and test manually.

Closes #27959 from Ngone51/impr_anyval.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 5c4d44b)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@Ngone51
Copy link
Member Author

Ngone51 commented Mar 23, 2020

thanks all!

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…yVal type

### What changes were proposed in this pull request?

Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits:

1. Give better encode error message for some unsupported types, e.g. `Any`

2. Won't miss the walk path for the `AnyVal` type

### Why are the changes needed?

Firstly, PR apache#15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`.

Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types.

### Does this PR introduce any user-facing change?

Yes. For the test below:

```
case class Bar(i: Any)
case class Foo(i: Bar) extends AnyVal

test() {
  implicitly[ExpressionEncoder[Foo]]
}
```

Before:

```
java.lang.ClassNotFoundException: scala.Any
 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
 ...
````

After:
```
java.lang.UnsupportedOperationException: No Encoder found for Any
 - field (class: "java.lang.Object", name: "i")
 - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i")
 - root class: "org.apache.spark.sql.catalyst.encoders.Foo"
 at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561)
```

### How was this patch tested?

Added unit test and test manually.

Closes apache#27959 from Ngone51/impr_anyval.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants