incompatible with CDH4 YARN/Hadoop 2 #8

github4venkat · 2013-03-19T01:11:09Z

Will you be releasing a version of elasticsearch-hadoop compilable with hadoop 2.X
OR could someone help how to make it work for the mentioned version.

Thanks,
Venkat

Pig Stack Trace

ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.elasticsearch.hadoop.mr.ESOutputFormat.checkOutputSpecs(ESOutputFormat.java:104)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)

Downchuck · 2013-03-19T16:52:45Z

At first glance, I don't see any interface changes between CDH2 and CDH4 - I was able to get CDH2 running with this package. checkOutputSpecs doesn't do much, you can certainly download this project and just have the method do nothing (noop). Given how new this project is, I strongly recommend downloading and patching for the next week or two. It's quite simple source code.

costin · 2013-03-19T23:51:10Z

Hi,

CDH is just one of the distros which will be testing against to make sure we're compatible. Currently we're aiming just for vanilla Apache since we're in the early days and things are still in flux.
I suspect the error comes from the Hadoop 2.x/YARN changes ported back to CDH4 - I know it's not a lot but it's a start.
We'll look into it as soon as possible but it will definitely not be this week (am currently travelling).

Cheers,

costin · 2013-03-20T07:27:48Z

@github4venkat Btw, what version of CDH are you using - 4.1.x, 4.5.x? Is upgrade to the latest one an option?

github4venkat · 2013-03-20T08:13:18Z

@ALL Thanks for the replies.

@costin ...... I am using 4.1.x....
For making it work, I had to ignore the code(comment/replace) which was only compilable with old hadoop core...
for example, TaskAttemptContext used as a class was not acceptable with hadoop 2.x.....
But the project works very well....... was able to index bulk documents into my es: cluster.
Thanks,
Venkat

costin · 2013-03-20T09:23:41Z

@github4venkat Great to hear it works for you - this is just an initial drop and we have a lot more goodies in store.

The class incompatibilities in Hadoop 2.x/CDH 4 are quite unfortunate (they could have used a different package) and we'll probably have to create a separate branch/artifact.package for it since it's not backwards compatible. Will keep you posted.

github4venkat · 2013-03-20T17:49:43Z

@costin yes, i understand this community is pretty new! but helpful. Sure, I will look forward for updates in this space. This came in timely as I was looking to write my own loader to bulk index data from hdfs to elasticsearch.

costin · 2013-03-26T16:43:12Z

@github4venkat Hi, I've tried to replicate the issue with little success. I assume you're using YARN/Hadoop-2.x version instead of MR1/Hadoop 1.x?
Since JobContext class has been changed from a class to an interface (so much for backwards compatibility - why not add another interface?) one needs to recompile. However you mentioned you also had to comment out TaskAttemptContext but I don't see why one would have to do that - could you comment on this?

Note that support for Hadoop 2 (not CDH4 MR1) is problematic since it's not yet stable. I would not recommend using it since it's simply work in progress and many (if not all) of the projects within Hadoop eco-systems, such as Pig or Hive, do not support it.

costin · 2013-03-26T17:16:48Z

@github4venkat A quick update - I've added a new branch, called cdh, that allows the project to be compiled against CDH4 YARN/Hadoop 2.x (you can change the version to that of MRv1 as well). The project compiles cleanly on both versions - please confirm whether you still experience any issues.
Note the master has been updated as well - the branch however contains the CDH4 dependencies to make it easier.
By the way, instead of listing the dependencies by hand, I've used that Cloudera recommended approach listed here

P.S. I've updated the issue title as well to better reflect the problem.

github4venkat · 2013-03-27T22:46:46Z

@costin Thanks for creating the branch.... Meanwhile I had written custom MR jobs to load data into elasticsearch...
But I will try your branch to compare performance and use it if better..
And yes, I did comment out those 'Interface but class was expected' part, because that was the simplest for me to do at that time, as all I needed from those was es config details.(Did that to quickly test if the actually functionality suits my use case)

I will post here, once I have tested your branch on my Hadoop 2.x environment. Thanks.

costin · 2013-03-28T06:34:37Z

@github4venkat I'd be interested to know what's the difference between your custom MR jobs and what we provide - what do we lack?

github4venkat · 2013-03-28T07:17:29Z

@costin Just ran a test to load 54 mb file... with no additional setting in my pig script, it takes 1.5 hrs. Not sure if the code hits REST for each event, if so is there a setting where I can use Bulk loading into elastic search?

In my custom map reduce, I am writing such that reducers bulk load using a Bulk processor for elasticsearch.

Let me know if any external settings will better my loading performance.

costin · 2013-03-28T12:28:25Z

@github4venkat There are two things to consider here:

the code is currently still in its early days. the REST interface doesn't use the bulk loading endpoint yet but it will shortly. More over the load will be done in parallel which should give similar if not better performance.
pig/hive add a significant overhead over a custom MR job. We do support dedicated input/output format (it's what Hive and Pig are using underneath).

Note that the main advantage of REST, and why that route was chosen, was to have a small, stand-alone, no-dependency jar. That's why ElasticSearch jar is not pulled in - we don't want any extra dependencies since this tend to be tedious once deployed across multi-node clusters.

To conclude, stay tuned as the functionality matures in the next couple of weeks. Performance is a key element that I'm aiming for but currently am focused on drafting the integration points (Hive/Pig in place with Cascading following suit).

github4venkat · 2013-03-28T17:46:56Z

@costin sure, I will look forward for bulk loading support in this space... thanks....

tzolov · 2013-04-22T08:10:22Z

@costin do you plan to merge the cdh branch with the master?
It will be quite handy. One can build against different hadoop distributions/versions without manually tweaking the gradle configuration.
Or perhaps there is a better way to do this with Gradle?

costin · 2013-04-24T14:20:07Z

At some point yes. In case of a build one could specify a prefix/profile while in the build system, we'll probably produce and upload different artifacts.

costin · 2013-10-23T17:37:29Z

Closing this as we publish a yarn binary (since before 1.3 M1)

nfx · 2014-01-10T15:58:32Z

it's still the issue with cdh4.5.0

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.elasticsearch.hadoop.mr.ESOutputFormat.checkOutputSpecs(ESOutputFormat.java:163)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)

costin · 2014-01-10T16:39:31Z

Make sure you are using the 'yarn' artifact:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/download.html

On Fri, Jan 10, 2014 at 5:58 PM, Serge Smertin notifications@github.comwrote:

it's still the issue with cdh4.5.0

java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.JobContext, but class was expected
at
org.elasticsearch.hadoop.mr.ESOutputFormat.checkOutputSpecs(ESOutputFormat.java:163)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-32039128
.

Costin

costin closed this as completed Oct 23, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incompatible with CDH4 YARN/Hadoop 2 #8

incompatible with CDH4 YARN/Hadoop 2 #8

github4venkat commented Mar 19, 2013

Downchuck commented Mar 19, 2013

costin commented Mar 19, 2013

costin commented Mar 20, 2013

github4venkat commented Mar 20, 2013

costin commented Mar 20, 2013

github4venkat commented Mar 20, 2013

costin commented Mar 26, 2013

costin commented Mar 26, 2013

github4venkat commented Mar 27, 2013

costin commented Mar 28, 2013

github4venkat commented Mar 28, 2013

costin commented Mar 28, 2013

github4venkat commented Mar 28, 2013

tzolov commented Apr 22, 2013

costin commented Apr 24, 2013

costin commented Oct 23, 2013

nfx commented Jan 10, 2014

costin commented Jan 10, 2014

incompatible with CDH4 YARN/Hadoop 2 #8

incompatible with CDH4 YARN/Hadoop 2 #8

Comments

github4venkat commented Mar 19, 2013

Pig Stack Trace

Downchuck commented Mar 19, 2013

costin commented Mar 19, 2013

costin commented Mar 20, 2013

github4venkat commented Mar 20, 2013

costin commented Mar 20, 2013

github4venkat commented Mar 20, 2013

costin commented Mar 26, 2013

costin commented Mar 26, 2013

github4venkat commented Mar 27, 2013

costin commented Mar 28, 2013

github4venkat commented Mar 28, 2013

costin commented Mar 28, 2013

github4venkat commented Mar 28, 2013

tzolov commented Apr 22, 2013

costin commented Apr 24, 2013

costin commented Oct 23, 2013

nfx commented Jan 10, 2014

costin commented Jan 10, 2014