-
-
Notifications
You must be signed in to change notification settings - Fork 841
Closed
Description
Hello,
I'm trying to serialize a DataTable in Scala (https://github.com/martincooper/scala-datatable) so that I can distribute instances of it over a network using Apache Spark.
I'm running into an error after registering the DataTable class with the Spark context's Kryo serializer:
conf.registerKryoClasses(Array(classOf[com.github.martincooper.datatable.DataTable]))
Here's the serialization trace from the Scala interpreter:
Serialization trace:
referent (java.lang.ref.WeakReference)
scala$reflect$runtime$TwoWayCaches$TwoWayCache$$toJavaMap (scala.reflect.runtime.TwoWayCaches$TwoWayCache)
classCache (scala.reflect.runtime.JavaMirrors$JavaMirror)
$outer (scala.reflect.runtime.JavaMirrors$JavaMirror$$anon$1)
currentOwner (scala.reflect.internal.Trees$TreeTypeSubstituter)
EmptyTreeTypeSubstituter (scala.reflect.runtime.JavaUniverse)
$outer (scala.reflect.internal.Types$ClassNoArgsTypeRef)
columnType (com.github.martincooper.datatable.DataColumn)
columnIndexMapper (com.github.martincooper.datatable.DataColumnCollection)
columns (com.github.martincooper.datatable.DataTable)
value (scala.util.Success)
It seems that the void type cannot be serialized. Is that correct? I've tried registering the void class with the serializer (so classOf[java.lang.Void]) and that doesn't make a difference.
Thanks for your help in this matter.
EDIT:
I've attached a test case (Scala 2.11.6, Apache Spark 1.4, scala-datatable 0.7.0) below that reproduces the error:
import com.github.martincooper.datatable._
import org.apache.spark._
// build fake DataTable
val numrows = 4
val newDataTab = DataTable("test",Seq())
val col1 = new DataColumn[Int]("1",(1 to numrows).map {i => 0})
val updDataTab = newDataTab.get.columns.add(col1)
val col2 = new DataColumn[Double]("2",(1 to numrows).map {i => 0.0})
val finalDataTab = updDataTab.get.columns.add(col2)
// set up Spark session
val conf = new SparkConf().setMaster("local[2]").setAppName("serialization test")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.registerKryoClasses(Array(classOf[com.github.martincooper.datatable.DataTable],classOf[com.github.martincooper.datatable.DataColumnCollection]))
val sc = new SparkContext(conf)
val dataTabBroadcast = sc.broadcast(finalDataTab)
Metadata
Metadata
Assignees
Labels
No labels