Closed
Description
Hi,
Environment: Spark 1.3.0
Elasticsearch: 1.4.4
Elasticsearch-Hadoop: 2.1.0.Beta4
Oracle Express 11g XE
I am trying to run a join between 2 DataFrames, one from Elasticsearch and another one from Oracle and getting exception as below.
This issue only happens when loading Elasticsearch data using Spark SQL "load" function. It doesn't happen when loading using esDF function. However, when using esDF function, mappings of columns to column names is broken (issue: #451)
com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.elasticsearch.spark.sql.ScalaEsRow
at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1050)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1062)
at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:138)
at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.joins.HashJoin$$anon$1.hasNext(HashJoin.scala:66)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1497)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Code:
//=======================================================================
// Preparing Elasticsearch DataFrame
//=======================================================================
val sqlContext = new SQLContext(sc)
val hours = sqlContext.load("summary/hours", "org.elasticsearch.spark.sql")
hours.foreach (x => println(x))
//=======================================================================
// Preparing Oracle DataFrame
//=======================================================================
val users = sqlContext.load("jdbc", Map("url" -> dbConnectionString,
"dbtable" -> "sats.users",
"driver" -> "oracle.jdbc.driver.OracleDriver"))
users.foreach (x => println(x))
//=======================================================================
// Joining ES and Oracle DataFrames
//=======================================================================
val hoursAug = hours.join(users, hours("User") === users("USERNAME"))
hoursAug.foreach (x => println(x))
The elements of Elasticsearch and Oracle DataFrames are printed successfully on the screen (first two foreach{...}) and the exceptions happens when join is executed.
Can you please advise if this is a bug and if yes, is there any workaround?
Thanks,
Dmitriy Fingerman