Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalAccessError: tried to access method org.apache.spark.TaskContext.interrupted()Z from class org.elasticsearch.spark.rdd.AbstractEsRDDIterator #292

Closed
arnaudsj opened this issue Oct 7, 2014 · 3 comments

Comments

@arnaudsj
Copy link

arnaudsj commented Oct 7, 2014

I am trying to use elastichsearch-hadoop with Spark using the following Clojure code:

(ns test-spark-es.core
  (:require  [flambo.api :as f]
             [flambo.conf :as fconf])
  (:import [org.elasticsearch.spark.api.java])
  (:gen-class))

(def master "local[*]")
(def conf {"es.nodes" "127.0.0.1"
           "es.resource" "my_twitter_river/status"})
(def env {"spark.executor.memory" "4G",
          "spark.files.overwrite" "true"})

(defn new-spark-context []
  (let [c (-> (fconf/spark-conf)
              (fconf/master master)
              (fconf/app-name "")
              (fconf/set "spark.akka.timeout" "300")
              (fconf/set conf)
              (fconf/set-executor-env env))]
    (f/spark-context c)))

(defonce sc (new-spark-context))

(defonce esRDD (org.elasticsearch.spark.api.java.JavaEsSpark/esRDD sc "my_twitter_river/status" "*"))

(f/take esRDD 10)

I can tell that ES is being contacted and results obviously retrieved but any attempt to iterate through the resulting RDD results in the following error:

14/10/06 20:21:56 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IllegalAccessError: tried to access method org.apache.spark.TaskContext.interrupted()Z from class org.elasticsearch.spark.rdd.AbstractEsRDDIterator
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:1080)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:1080)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    at org.apache.spark.scheduler.Task.run(Task.scala:54)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:744)

Any idea of what I might be missing?

@costin
Copy link
Member

costin commented Oct 7, 2014

What version of Spark and es-hadoop are you using? Spark 1.1 broke backwards compatibility but this was fixed in master - however using Spark 1.1 with Beta 1 will not work unfortunately.

@arnaudsj
Copy link
Author

arnaudsj commented Oct 8, 2014

@costin thank you, that did the trick! BTW, is the default RDD type returned to Spark a PairRDD?

@costin
Copy link
Member

costin commented Oct 8, 2014

Yes. Beta2 has changed the simple RDD[Map] to a PairRDD - it's essentially the same data however it's easier to manipulate. You can easily transform the PairRDD to an RDD if you have to.

@costin costin closed this as completed Oct 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants