Skip to content

java.util.Date cannot be cast to org.apache.hadoop.io.Writable #340

@yujianok

Description

@yujianok

Im integrating Elasticsearch-Hadoop with Spark
Spark version is 1.1.1
hadoop-client version is 1.0.4
elastisearch-hadoop version is 2.0.2

When I count a resource with a Date field(stored as long value),exception occered

Code as below:
JavaPairRDD esRDD = jsc.hadoopRDD(conf, EsInputFormat.class, Text.class, MapWritable.class);
long docCount = esRDD.count();

error message as below:
java.lang.ClassCastException: java.util.Date cannot be cast to org.apache.hadoop.io.Writable
at org.elasticsearch.hadoop.mr.LinkedMapWritable.put(LinkedMapWritable.java:39)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.addToMap(JdkValueReader.java:88)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:216)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:139)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:113)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:99)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:78)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:268)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:321)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1218)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

In my opinion, maybe JdkValueReader:340 has a bug. It should not simply return a new java.util.Date()

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions