New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incompatible with CDH4 YARN/Hadoop 2 #8
Comments
At first glance, I don't see any interface changes between CDH2 and CDH4 - I was able to get CDH2 running with this package. checkOutputSpecs doesn't do much, you can certainly download this project and just have the method do nothing (noop). Given how new this project is, I strongly recommend downloading and patching for the next week or two. It's quite simple source code. |
Hi, CDH is just one of the distros which will be testing against to make sure we're compatible. Currently we're aiming just for vanilla Apache since we're in the early days and things are still in flux. Cheers, |
@github4venkat Btw, what version of CDH are you using - 4.1.x, 4.5.x? Is upgrade to the latest one an option? |
@ALL Thanks for the replies. @costin ...... I am using 4.1.x.... |
@github4venkat Great to hear it works for you - this is just an initial drop and we have a lot more goodies in store. The class incompatibilities in Hadoop 2.x/CDH 4 are quite unfortunate (they could have used a different package) and we'll probably have to create a separate branch/artifact.package for it since it's not backwards compatible. Will keep you posted. |
@costin yes, i understand this community is pretty new! but helpful. Sure, I will look forward for updates in this space. This came in timely as I was looking to write my own loader to bulk index data from hdfs to elasticsearch. |
@github4venkat Hi, I've tried to replicate the issue with little success. I assume you're using YARN/Hadoop-2.x version instead of MR1/Hadoop 1.x? Note that support for Hadoop 2 (not CDH4 MR1) is problematic since it's not yet stable. I would not recommend using it since it's simply work in progress and many (if not all) of the projects within Hadoop eco-systems, such as Pig or Hive, do not support it. |
@github4venkat A quick update - I've added a new branch, called cdh, that allows the project to be compiled against CDH4 YARN/Hadoop 2.x (you can change the version to that of MRv1 as well). The project compiles cleanly on both versions - please confirm whether you still experience any issues. P.S. I've updated the issue title as well to better reflect the problem. |
@costin Thanks for creating the branch.... Meanwhile I had written custom MR jobs to load data into elasticsearch... I will post here, once I have tested your branch on my Hadoop 2.x environment. Thanks. |
@github4venkat I'd be interested to know what's the difference between your custom MR jobs and what we provide - what do we lack? |
@costin Just ran a test to load 54 mb file... with no additional setting in my pig script, it takes 1.5 hrs. Not sure if the code hits REST for each event, if so is there a setting where I can use Bulk loading into elastic search? In my custom map reduce, I am writing such that reducers bulk load using a Bulk processor for elasticsearch. Let me know if any external settings will better my loading performance. |
@github4venkat There are two things to consider here:
Note that the main advantage of REST, and why that route was chosen, was to have a small, stand-alone, no-dependency jar. That's why ElasticSearch jar is not pulled in - we don't want any extra dependencies since this tend to be tedious once deployed across multi-node clusters. To conclude, stay tuned as the functionality matures in the next couple of weeks. Performance is a key element that I'm aiming for but currently am focused on drafting the integration points (Hive/Pig in place with Cascading following suit). |
@costin sure, I will look forward for bulk loading support in this space... thanks.... |
@costin do you plan to merge the |
At some point yes. In case of a build one could specify a prefix/profile while in the build system, we'll probably produce and upload different artifacts. |
Closing this as we publish a |
it's still the issue with cdh4.5.0 java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected |
Make sure you are using the 'yarn' artifact: On Fri, Jan 10, 2014 at 5:58 PM, Serge Smertin notifications@github.comwrote:
Costin |
Will you be releasing a version of elasticsearch-hadoop compilable with hadoop 2.X
OR could someone help how to make it work for the mentioned version.
Thanks,
Venkat
Pig Stack Trace
ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.elasticsearch.hadoop.mr.ESOutputFormat.checkOutputSpecs(ESOutputFormat.java:104)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
The text was updated successfully, but these errors were encountered: