-
Notifications
You must be signed in to change notification settings - Fork 995
Description
Im integrating Elasticsearch-Hadoop with Spark
Spark version is 1.1.1
hadoop-client version is 1.0.4
elastisearch-hadoop version is 2.0.2
When I count a resource with a Date field(stored as long value),exception occered
Code as below:
JavaPairRDD esRDD = jsc.hadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class);
long docCount = esRDD.count();
error message as below:
java.lang.ClassCastException: java.util.Date cannot be cast to org.apache.hadoop.io.Writable
at org.elasticsearch.hadoop.mr.LinkedMapWritable.put(LinkedMapWritable.java:39)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.addToMap(JdkValueReader.java:88)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:216)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:139)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:113)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:99)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:78)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:268)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:321)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1218)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
In my opinion, maybe JdkValueReader:340 has a bug. It should not simply return a new java.util.Date()