Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17368] [SQL] Add support for value class serialization and deserialization #15284

Closed
wants to merge 1 commit into from
Closed

Conversation

jodersky
Copy link
Member

@jodersky jodersky commented Sep 28, 2016

What changes were proposed in this pull request?

Value classes were unsupported because catalyst data types were
obtained through reflection on erased types, which would resolve to a
value class' wrapped type and hence lead to unavailable methods during
code generation.

E.g. the following class

case class Foo(x: Int) extends AnyVal

would be seen as an int in catalyst and will cause instance cast failures when generated java code tries to treat it as a Foo.

This patch simply removes the erasure step when getting data types for
catalyst.

How was this patch tested?

Additional tests in ExpressionEncoderSuite.

Value classes were unsupported because catalyst data types were
obtained through reflection on erased types, which would resolve to a
value class' wrapped type and hence lead to unavailable methods during
code generation.

This patch simply removes the erasure step when getting data types for
catalyst.
@@ -66,8 +66,6 @@ case class RepeatedData(
mapFieldNull: scala.collection.Map[Int, java.lang.Long],
structField: PrimitiveData)

case class SpecificCollection(l: List[Int])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup: found this unused class whilst going over the code

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66060 has finished for PR 15284 at commit a3dce9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -628,7 +628,7 @@ object ScalaReflection extends ScalaReflection {
/*
* Retrieves the runtime class corresponding to the provided type.
*/
def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.erasure.typeSymbol.asClass)
def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.typeSymbol.asClass)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change

@jodersky
Copy link
Member Author

jodersky commented Oct 13, 2016

cc @marmbrus for the changes to scala reflection

@marmbrus
Copy link
Contributor

marmbrus commented Oct 13, 2016

LGTM, merging to master.

@asfgit asfgit closed this in 9dc0ca0 Oct 14, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…rialization

## What changes were proposed in this pull request?
Value classes were unsupported because catalyst data types were
obtained through reflection on erased types, which would resolve to a
value class' wrapped type and hence lead to unavailable methods during
code generation.

E.g. the following class
```scala
case class Foo(x: Int) extends AnyVal
```
would be seen as an `int` in catalyst and will cause instance cast failures when generated java code tries to treat it as a `Foo`.

This patch simply removes the erasure step when getting data types for
catalyst.

## How was this patch tested?
Additional tests in `ExpressionEncoderSuite`.

Author: Jakob Odersky <jakob@odersky.com>

Closes apache#15284 from jodersky/value-classes.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…rialization

## What changes were proposed in this pull request?
Value classes were unsupported because catalyst data types were
obtained through reflection on erased types, which would resolve to a
value class' wrapped type and hence lead to unavailable methods during
code generation.

E.g. the following class
```scala
case class Foo(x: Int) extends AnyVal
```
would be seen as an `int` in catalyst and will cause instance cast failures when generated java code tries to treat it as a `Foo`.

This patch simply removes the erasure step when getting data types for
catalyst.

## How was this patch tested?
Additional tests in `ExpressionEncoderSuite`.

Author: Jakob Odersky <jakob@odersky.com>

Closes apache#15284 from jodersky/value-classes.
cloud-fan pushed a commit that referenced this pull request Mar 23, 2020
…yVal type

### What changes were proposed in this pull request?

Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits:

1. Give better encode error message for some unsupported types, e.g. `Any`

2. Won't miss the walk path for the `AnyVal` type

### Why are the changes needed?

Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`.

Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types.

### Does this PR introduce any user-facing change?

Yes. For the test below:

```
case class Bar(i: Any)
case class Foo(i: Bar) extends AnyVal

test() {
  implicitly[ExpressionEncoder[Foo]]
}
```

Before:

```
java.lang.ClassNotFoundException: scala.Any
 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
 ...
````

After:
```
java.lang.UnsupportedOperationException: No Encoder found for Any
 - field (class: "java.lang.Object", name: "i")
 - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i")
 - root class: "org.apache.spark.sql.catalyst.encoders.Foo"
 at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561)
```

### How was this patch tested?

Added unit test and test manually.

Closes #27959 from Ngone51/impr_anyval.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Mar 23, 2020
…yVal type

### What changes were proposed in this pull request?

Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits:

1. Give better encode error message for some unsupported types, e.g. `Any`

2. Won't miss the walk path for the `AnyVal` type

### Why are the changes needed?

Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`.

Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types.

### Does this PR introduce any user-facing change?

Yes. For the test below:

```
case class Bar(i: Any)
case class Foo(i: Bar) extends AnyVal

test() {
  implicitly[ExpressionEncoder[Foo]]
}
```

Before:

```
java.lang.ClassNotFoundException: scala.Any
 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
 ...
````

After:
```
java.lang.UnsupportedOperationException: No Encoder found for Any
 - field (class: "java.lang.Object", name: "i")
 - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i")
 - root class: "org.apache.spark.sql.catalyst.encoders.Foo"
 at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561)
```

### How was this patch tested?

Added unit test and test manually.

Closes #27959 from Ngone51/impr_anyval.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 5c4d44b)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…yVal type

### What changes were proposed in this pull request?

Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits:

1. Give better encode error message for some unsupported types, e.g. `Any`

2. Won't miss the walk path for the `AnyVal` type

### Why are the changes needed?

Firstly, PR apache#15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`.

Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types.

### Does this PR introduce any user-facing change?

Yes. For the test below:

```
case class Bar(i: Any)
case class Foo(i: Bar) extends AnyVal

test() {
  implicitly[ExpressionEncoder[Foo]]
}
```

Before:

```
java.lang.ClassNotFoundException: scala.Any
 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
 ...
````

After:
```
java.lang.UnsupportedOperationException: No Encoder found for Any
 - field (class: "java.lang.Object", name: "i")
 - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i")
 - root class: "org.apache.spark.sql.catalyst.encoders.Foo"
 at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561)
```

### How was this patch tested?

Added unit test and test manually.

Closes apache#27959 from Ngone51/impr_anyval.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants