[SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE #2323

marmbrus · 2014-09-08T23:03:07Z

No description provided.

marmbrus · 2014-09-08T23:03:16Z

/cc @yhuai

SparkQA · 2014-09-08T23:49:05Z

QA tests have started for PR 2323 at commit 59065bc.

This patch merges cleanly.

SparkQA · 2014-09-09T01:32:54Z

QA tests have finished for PR 2323 at commit 59065bc.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2014-09-09T02:09:52Z

JsonRDD and the Java API of Row are also using wrappers. Should we also check if these places will also trigger the NPE?

marmbrus · 2014-09-10T01:59:23Z

I've updated the usage in JSON RDD. Java Row wrapping should never happen before Kryo serialization AFAICT.

marmbrus · 2014-09-10T01:59:31Z

Jenkins, test this please

SparkQA · 2014-09-10T02:51:56Z

QA tests have started for PR 2323 at commit 646976b.

This patch merges cleanly.

SparkQA · 2014-09-10T04:51:57Z

Tests timed out after a configured wait of 120m.

marmbrus · 2014-09-10T06:42:35Z

Jenkins, test this please.

marmbrus · 2014-09-10T20:39:33Z

Jenkins, test this please.

SparkQA · 2014-09-10T20:52:41Z

QA tests have started for PR 2323 at commit 646976b.

This patch merges cleanly.

SparkQA · 2014-09-10T22:40:32Z

QA tests have finished for PR 2323 at commit 646976b.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2014-09-11T00:57:30Z

sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala

@@ -253,7 +254,7 @@ private[sql] object JsonRDD extends Logging {
      // This issue is documented at https://issues.scala-lang.org/browse/SI-7005
      JMapWrapper(map).mapValues(scalafy).map(identity)
    case list: java.util.List[_] =>
-      JListWrapper(list).map(scalafy)
+      (list: Seq[_]).map(scalafy)


Oh, after checking the code again, I think .map(scalafy) will convert the JListWrapper at here to an ArrayBuffer (JListWrapper is a Buffer and Buffer's newBuilder returns ArrayBuffer) and we will not have the Kryo issue. I tried from pyspark.sql import SQLContext;SQLContext(sc).jsonRDD(sc.parallelize(['{"a":[3]}']))._jschema_rdd.collect() and it's fine.

marmbrus · 2014-09-11T04:00:48Z

Okay, I reverted the JSON rdd changed and merged this to master. Thanks!

mohangadm · 2014-09-18T12:42:29Z

I have experienced the same kind of problem when using Avro with spark streaming API.
If avro message is simple, its fine. but if the avro message has Union/Arrays its failing with the exception Below:
ERROR scheduler.JobScheduler: Error running job streaming job 1411043845000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
value (xyz.Datum)
data (xyz.ResourceMessage)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Above exception shows up when used output operations.

below is the avro message.
{"version": "01", "sequence": "00001", "resource": "sensor-001", "controller": "002", "controllerTimestamp": "1411038710358", "data": {"value": [{"name": "Temperature", "value": "30"}, {"name": "Speed", "value": "60"}, {"name": "Location", "value": ["+401213.1", "-0750015.1"]}, {"name": "Timestamp", "value": "2014-09-09T08:15:25-05:00"}]}}

message is been successfully decoded in decoder, but throws exception for output operation.

Remove explicit conversion to avoid NPE

59065bc

marmbrus mentioned this pull request Sep 8, 2014

[SPARK-2314][SQL] Override collect and take in python library, and count in java library, with optimized versions. #1592

Closed

Fix JSON RDD Conversion too

646976b

yhuai reviewed Sep 11, 2014
View reviewed changes

marmbrus added 2 commits September 10, 2014 20:58

Merge remote-tracking branch 'origin/master' into kryoJListNPE

4d4d93c

Rollback JSON RDD changes

9634f11

asfgit closed this in f92cde2 Sep 11, 2014

marmbrus deleted the kryoJListNPE branch September 22, 2014 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE #2323

[SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE #2323

marmbrus commented Sep 8, 2014

marmbrus commented Sep 8, 2014

SparkQA commented Sep 8, 2014

SparkQA commented Sep 9, 2014

yhuai commented Sep 9, 2014

marmbrus commented Sep 10, 2014

marmbrus commented Sep 10, 2014

SparkQA commented Sep 10, 2014

SparkQA commented Sep 10, 2014

marmbrus commented Sep 10, 2014

marmbrus commented Sep 10, 2014

SparkQA commented Sep 10, 2014

SparkQA commented Sep 10, 2014

yhuai Sep 11, 2014

marmbrus commented Sep 11, 2014

mohangadm commented Sep 18, 2014

[SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE #2323

[SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE #2323

Conversation

marmbrus commented Sep 8, 2014

marmbrus commented Sep 8, 2014

SparkQA commented Sep 8, 2014

SparkQA commented Sep 9, 2014

yhuai commented Sep 9, 2014

marmbrus commented Sep 10, 2014

marmbrus commented Sep 10, 2014

SparkQA commented Sep 10, 2014

SparkQA commented Sep 10, 2014

marmbrus commented Sep 10, 2014

marmbrus commented Sep 10, 2014

SparkQA commented Sep 10, 2014

SparkQA commented Sep 10, 2014

yhuai Sep 11, 2014

Choose a reason for hiding this comment

marmbrus commented Sep 11, 2014

mohangadm commented Sep 18, 2014