[SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield #15918

windpiger · 2016-11-17T09:33:02Z

What changes were proposed in this pull request?

It will throw a UnsupportedOperationException when a class's subfield has not supported type Encoder.

This PR will fallback to KryoSer/KryoDeser for this kind of subfields.

before fix

case class MyClass(a: String, b:Option[Set[Int]])
val ds = Seq(MyClass("a",Some(Set(1))),MyClass("b",Some(Set(2)))).toDS

java.lang.UnsupportedOperationException: No Encoder found for Set[scala.Int]
- option value class: "scala.collection.immutable.Set"
- field (class: "scala.Option", name: "b")
- root class: "lineacce9ce6ecc049489e2e74fa620679d927.$read.$iw.$iw.$iw.$iw.MyClass"
	at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598)

after

case class MyClass(a: String, b:Option[Set[Int]])
val ds = Seq(MyClass("a",Some(Set(1))),MyClass("b",Some(Set(2)))).toDS
ds: org.apache.spark.sql.Dataset[MyClass] = [a: string, b: binary]
ds.foreach{
| r => val x = r.b
| x.get.foreach(println)
| }
1
2

How was this patch tested?

unittest added

…s subfiled

SparkQA · 2016-11-17T10:25:37Z

Test build #68764 has finished for PR 15918 at commit bb11c93.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-11-17T10:53:45Z

I think this introduces a big behavior change, right? now objects are serialized with a hybrid of two serializers? I am not sure this is a good idea.

windpiger · 2016-11-17T12:11:09Z

Yes, and there is a aspect to be concerned:
If the Set[Int] serialized to Binary, and then val ds1 = ds.select("b") ,the column type of ds1 is BinaryType, ds1 can not be deserialized to Set[Int]
this is WIP, I will continue to work on this

srowen · 2016-11-25T09:48:24Z

There's not obviously a way to implement this, IMHO. I am not sure mixing two serializations would make sense. I would close these for now.

rxin · 2016-11-27T10:07:45Z

FWIW I think there are some value here, but I agree that changing the default behavior can be surprising and bad.

koertkuipers · 2016-11-27T13:45:39Z

@srowen and @rxin what is the default behavior that is changed here? i see a current situation where an implicit encoder is provided that simply cannot handle the task at hand and this leads to failure.

either the implicits for ExpressionEncoder need to be more narrow so that they do not claim types they cannot handle (and then other implicit encoders can be used), or they need to be able to handle these types, for example by falling back to kryo as is suggested in this JIRA.

currrently implicitly[Encoder[Option[Set[Int]]]] gives you an ExpressionEncoder that cannot handle it. that is undesired and makes it difficult to provide an alternative implicit by the user.

i proposed making the ExpressionEncoders more narrow (that seemed the easier fix to me at first) but @marmbrus preferred the approach of falling back to kryo and broadening it. see:
http://apache-spark-developers-list.1001551.n3.nabble.com/getting-encoder-implicits-to-be-more-accurate-td19561.html

marmbrus · 2016-11-30T23:25:35Z

I agree with @koertkuipers that the only change in behavior is that cases that used to throw an error will now not throw an error. If done right (I haven't looked deeply at the PR itself yet), no case that is currently working should change.

It is maybe slightly odd to mix serialization types, but thats kind of already happening today if you use the kryo serializer. You are taking kryo encoded data and putting it as a binary value into a tungsten row. The change here makes it possible to do the same in cases where the incompatible object is nested within a compatible object. Currently you are forced into all or nothing (i.e. even if only a single field is incompatible you must treat the whole object as an opaque binary blob).

The one possible concern compatibility concern I can see is, if in the future we add support for an previously unsupported type, the schema will change from BinaryType to something else. However, given there are very few operations you can do on Binary, and this format is not persisted or guaranteed to be compatible across Spark versions, this actually seems okay.

Thoughts?

marmbrus · 2016-12-01T00:02:38Z

We should probably add a flag (maybe even off by default). The error message can tell you to turn on the flag if you are okay with the fallback.

koertkuipers · 2016-12-01T05:04:59Z

if we do a flag i would also prefer it if the current implicits are more narrow if the flag is not set, if possible.

marmbrus · 2016-12-02T00:56:21Z

I don't think you can limit the implicit. What type would pick up case classes, but not case classes that contain invalid things? I think you would need a macros for this kind of introspection. (I'd be happy to be proven wrong with a PR.)

I'd recommend you only import the implicits you need rather than using the wildcard.

koertkuipers · 2016-12-03T22:15:15Z

It can be done with shapeless (which perhaps uses macros under hood, I don't know). On Dec 1, 2016 19:56, "Michael Armbrust" <notifications@github.com> wrote: I don't think you can limit the implicit. What type would pick up case classes, but not case classes that contain invalid things? I think you would need a macros for this kind of introspection. (I'd be happy to be proven wrong with a PR.) I'd recommend you only import the implicits you need rather than using the wildcard. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15918 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAyIJLwL-MWdzQGb6Ioe2fr_GgP0rP05ks5rD2zNgaJpZM4K1D7U> .

marmbrus · 2017-02-14T00:22:58Z

@windpiger, were you still working on this? I think it would be a useful feature if we can get the tests to pass.

windpiger · 2017-02-14T02:39:23Z

oh sorry，Recently I was working on other works， I will continue to work on the this soon，and finish it.

windpiger · 2017-02-14T12:46:04Z

retest this please

SparkQA · 2017-02-14T13:37:25Z

Test build #72873 has finished for PR 15918 at commit bb11c93.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-15T07:18:40Z

Test build #72922 has started for PR 15918 at commit 18d4ba5.

SparkQA · 2017-02-15T07:19:04Z

Test build #72921 has finished for PR 15918 at commit 4aac7dd.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-15T07:22:47Z

Test build #72924 has started for PR 15918 at commit cbe91bf.

windpiger · 2017-02-15T08:10:28Z

retest this please

SparkQA · 2017-02-15T10:16:08Z

Test build #72927 has finished for PR 15918 at commit cbe91bf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-15T16:40:46Z

Test build #72936 has finished for PR 15918 at commit adf31b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JasonMWhite · 2017-03-12T17:51:08Z

@windpiger I see you flipped it back to WIP. What else needs to be done?

windpiger · 2017-03-13T01:46:05Z

this change of schemaFor will affect the default behavior.
For example:
udf in functions.scala
if the input type is not supported and fall back to kryo with BinaryType, here inputTypes before will be Nil which is ok for the following expression's typecheck, but after fall back to kryo it will failed when typecheck(BinaryType not equal to ReturnType), so I left logic to throw exception in the schemaFor for un-nested unsupported type , while for complex nested including Map etc, it will fall back to kryo.

even if I use schemaForDefaultBinaryType not schemaFor in deserializerFor , and I think it is ok for this serde situation. But I am not sure the change in schmaFor will affect other logics like udf described above.

HyukjinKwon · 2017-05-11T13:58:47Z

Hi @windpiger, is this still WIP?

[SPARK-18122][SQL]Fallback to Kryo for unsupported encoder for class'…

bb11c93

…s subfiled

windpiger changed the title ~~[SPARK-18122][SQL]Fallback to Kryo for unsupported encoder for class's subfield~~ [SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield Nov 17, 2016

windpiger added 5 commits February 15, 2017 09:55

Merge branch 'master' into UnsupportedEncoderUseKryo

446ff43

fix some test failed and add some comments

4aac7dd

modify a var name

5af435e

fix a test case

361470f

remove a redundant line

18d4ba5

remove a redundant line

cbe91bf

fix test failed

adf31b2

windpiger changed the title ~~[SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield~~ [SPARK-18122][SQL]Fallback to Kryo for unsupported encoder for class's subfield Feb 16, 2017

windpiger changed the title ~~[SPARK-18122][SQL]Fallback to Kryo for unsupported encoder for class's subfield~~ [SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield Feb 16, 2017

HyukjinKwon mentioned this pull request May 17, 2017

[INFRA] Close stale PRs #18017

Closed

asfgit closed this in 5d2750a May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield #15918

[SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield #15918

windpiger commented Nov 17, 2016

SparkQA commented Nov 17, 2016

srowen commented Nov 17, 2016

windpiger commented Nov 17, 2016

srowen commented Nov 25, 2016

rxin commented Nov 27, 2016

koertkuipers commented Nov 27, 2016

marmbrus commented Nov 30, 2016 •

edited

Loading

marmbrus commented Dec 1, 2016

koertkuipers commented Dec 1, 2016

marmbrus commented Dec 2, 2016

koertkuipers commented Dec 3, 2016 via email

marmbrus commented Feb 14, 2017

windpiger commented Feb 14, 2017

windpiger commented Feb 14, 2017

SparkQA commented Feb 14, 2017

SparkQA commented Feb 15, 2017

SparkQA commented Feb 15, 2017

SparkQA commented Feb 15, 2017

windpiger commented Feb 15, 2017

SparkQA commented Feb 15, 2017

SparkQA commented Feb 15, 2017

JasonMWhite commented Mar 12, 2017

windpiger commented Mar 13, 2017

HyukjinKwon commented May 11, 2017

[SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield #15918

[SPARK-18122][SQL][WIP]Fallback to Kryo for unsupported encoder for class's subfield #15918

Conversation

windpiger commented Nov 17, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Nov 17, 2016

srowen commented Nov 17, 2016

windpiger commented Nov 17, 2016

srowen commented Nov 25, 2016

rxin commented Nov 27, 2016

koertkuipers commented Nov 27, 2016

marmbrus commented Nov 30, 2016 • edited Loading

marmbrus commented Dec 1, 2016

koertkuipers commented Dec 1, 2016

marmbrus commented Dec 2, 2016

koertkuipers commented Dec 3, 2016 via email

marmbrus commented Feb 14, 2017

windpiger commented Feb 14, 2017

windpiger commented Feb 14, 2017

SparkQA commented Feb 14, 2017

SparkQA commented Feb 15, 2017

SparkQA commented Feb 15, 2017

SparkQA commented Feb 15, 2017

windpiger commented Feb 15, 2017

SparkQA commented Feb 15, 2017

SparkQA commented Feb 15, 2017

JasonMWhite commented Mar 12, 2017

windpiger commented Mar 13, 2017

HyukjinKwon commented May 11, 2017

marmbrus commented Nov 30, 2016 •

edited

Loading