Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive dependency required when not needed #165

Closed
drewinin opened this issue Mar 11, 2014 · 3 comments
Closed

Hive dependency required when not needed #165

drewinin opened this issue Mar 11, 2014 · 3 comments

Comments

@drewinin
Copy link

We're on hadoop 2.2.0 and have a project with many hadoop jobs. As soon as I bring in the m2 build of the elastic search dependency, existing unrelated jobs blow up.

Exception in thread "main" java.lang.VerifyError: class org.apache.hadoop.yarn.proto.YarnProtos$URLProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

This is b/c of a convergence issue. The optional Hive dep in elasticsearch-hadoop is trying to bring in protofub-java 2.4.1 whereas hadoop 2.2.0 brings in 2.5.0. We're not using hive at all, so I'd just assume exclude the unneeded dependency like so:

<dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch-hadoop</artifactId>
         <version>1.3.0.M2</version>
        <exclusions>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-service</artifactId>
            </exclusion>
        </exclusions> 
    </dependency>

Yay, that unbreaks our existing jobs. However jobs that use ESOutputFormat now break.

8:31:20.739 [Thread-12] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local272885569_0001
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/io/ShortWritable
at org.elasticsearch.hadoop.mr.WritableValueWriter.write(WritableValueWriter.java:71) ~[elasticsearch-hadoop-1.3.0.M2.jar:1.3.0.M2]
at org.elasticsearch.hadoop.mr.WritableValueWriter.write(WritableValueWriter.java:45) ~[elasticsearch-hadoop-1.3.0.M2.jar:1.3.0.M2]
at org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:258) ~[elasticsearch-hadoop-1.3.0.M2.jar:1.3.0.M2]
...
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.io.ShortWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_40]
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_40]
at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_40]
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_40]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.7.0_40]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_40]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.7.0_40]
... 13 common frames omitted

What's happening in WritableValueWriter is it checks "instanceof ShortWritable" before "instanceof AbstractMapWritable". The class loader fails because the hive artifact has been excluded. Since you have your own HiveValueWriter and almost all the hive stuff is contained within org.elasticsearch.hadoop.hive, it looked a little out of place having a hive class loaded in WritableValueWriter.

@costin costin closed this as completed in d940e19 Mar 11, 2014
@costin
Copy link
Member

costin commented Mar 11, 2014

Fixed in master. The hive dependency issue is unfortunate - it might be fixed if we move to dedicated artifacts (one for MR, one for Cascading, one for Hive, etc...)
The check for ShortWritable was a mistake - ShortWritable is available in Hadoop2 but not in 1 and this import sneaked in.

This is now fixed in master (with proper fall backs to Hadoop1) - please try it out and let us know whether it works for you or not.

Cheers!

@drewinin
Copy link
Author

I pulled down the nightly and that did the trick. The optional dependencies have made resolving conflicts in my pom much trickier, so +1 for multiple artifacts.

Thanks a bunch Costin!

@costin
Copy link
Member

costin commented Apr 1, 2014

For what it's worth, in the upcoming M3 (and already in master) in addition to the 'big' single jar, we also ship one jar per module. You can already download elasticsearch-hadoop-cascading-1.3.0.BUILD-SNAPSHOT with its own javadoc and sources jar.
As with the big jar, the jar is usable stand-alone and doesn't require any other dependencies (everything including the mr functionality is nested in).

See #182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants