Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NoClassDefFoundError: org/apache/commons/httpclient/URIException #586

Closed
michaelbironneau opened this issue Oct 30, 2015 · 8 comments

Comments

@michaelbironneau
Copy link

I can't get Hive to run an INSERT query into an external table stored in Elasticsearch. I'm using build elasticsearch-hadoop-2.2.0.BUILD-20151030.024731-132 with Elasticsearch 2.0 and Hive 1.2.1. The query is something like this:

add jar hdfs:///tmp/elasticsearch-hadoop.jar;
INSERT OVERWRITE TABLE [external table stored in Elasticsearch]
    SELECT
      [some columns]
FROM [some table]

When I perform the INSERT query, I get the following exception:

Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:247) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 14 more Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/URIException at org.elasticsearch.hadoop.hive.HiveUtils.structObjectInspector(HiveUtils.java:57) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:82) at org.elasticsearch.hadoop.hive.EsSerDe.initialize(EsSerDe.java:97) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:356) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:363) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232) ... 15 more

Is there a compatibility problem with this version of Hive, or do I need to include a different JAR?

Thanks

@costin
Copy link
Member

costin commented Oct 30, 2015

First off, you can try out the latest 2.2.0 beta1. As for Hive, it looks like your runtime is missing some jars, in particular commons-http client which should be present in both Hive and Hadoop.
For example hive 1.2.1 \lib contains commons-httpclient-3.0.1. Not sure whether the Tez processor has a different runtime (it shouldn't).

@sasauz
Copy link

sasauz commented Feb 17, 2016

@michaelbironneau Did you solved your problem?

@michaelbironneau
Copy link
Author

@sasauz No but I didn't spend much time on it so it was probably something wrong in my setup. We ended up using Logstash.

@costin
Copy link
Member

costin commented Mar 3, 2016

@sasauz Have you tried the solution indicated in the comment above?

@sasauz
Copy link

sasauz commented Mar 4, 2016

@costin Thank you for your question. I have commons-httpclient-3.0.1.jar in hive/lib and commons-httpclient-3.1 in hadoop/lib, but it still doesn't work. I add manually commons-httpclient-3.1.jar (from hdfs://user/oozie/share/lib) and elasticsearch-hive-2.2.0.jar every time when I run a query.

I have now 3 servers in Hadoop-Cluster (Master1, Secondary Master2, and Slave1). Should I insert commons-httpclient-3.1.jar in hadoop/lib and hive/lib in all 3 servers? or only on Slave1?

@costin
Copy link
Member

costin commented Mar 4, 2016

The classpath needs to be the same on all nodes where a job is running.
Unless instructed to add a classpath from HDFS, Hive & co will not add the
jar to the classpath.
But what distro are you using? http-client is included with Hadoop since
0.20.x to the latest 2.7.x

On Fri, Mar 4, 2016 at 12:17 PM, sasauz notifications@github.com wrote:

@costin https://github.com/costin Thank you for your question. I have
commons-httpclient-3.0.1.jar in hive/lib and commons-httpclient-3.1 in
hadoop/lib, but it still doesn't work. I add manually commons-httpclient-
3.1.jar (from hdfs://user/oozie/share/lib) and
elasticsearch-hive-2.2.0.jar every time when I run a query.

I have now 3 servers in Hadoop-Cluster (Master1, Secondary Master2, and
Slave1). Should I insert commons-httpclient-3.1.jar in hadoop/lib and
hive/lib in all 3 servers? or only on Slave1?


Reply to this email directly or view it on GitHub
#586 (comment)
.

@sasauz
Copy link

sasauz commented Mar 4, 2016

@costin thank you for your answer. I will try it next week and give you a feedback.
I use Hortonworks (HDP 2.3 deployed with Ambari) and I run my queries with Cloudera Hue 3.9.

@ArcTheMaster
Copy link

Hi @costin and @michaelbironneau,
I met the same error when using Hive on tez last week. I'm using Hive 2.0.1 and ES-Hadoop 2.3.

I think I found how to solve it. Before working with the connector, add the following jars in Hive:
hive (shfs3453)> add jar /opt/application/Hive/current/lib/elasticsearch-hadoop.jar; add jar /opt/application/Hive/current/lib/elasticsearch-hadoop.jar Added [/opt/application/Hive/current/lib/elasticsearch-hadoop.jar] to class path Added resources: [/opt/application/Hive/current/lib/elasticsearch-hadoop.jar] hive (shfs3453)> add jar /opt/application/Hadoop/current/share/hadoop/common/lib/commons-httpclient-3.1.jar; add jar /opt/application/Hadoop/current/share/hadoop/common/lib/commons-httpclient-3.1.jar Added [/opt/application/Hadoop/current/share/hadoop/common/lib/commons-httpclient-3.1.jar] to class path Added resources: [/opt/application/Hadoop/current/share/hadoop/common/lib/commons-httpclient-3.1.jar]

You must adapt the path to the Hadoop lib according to your environment. Moreover, Hadoop must be installed on each datanode/workernode of your cluster otherwise it will not work.

Anyway, it's strange to require this version of commons-httpclient instead of the Hive built-in version.
Both have the class:
[root@uabigspark01 ~]# strings -f /opt/application/Hadoop/current/share/hadoop/tools/lib/* |grep "org/apache/commons/httpclient/URIException" /opt/application/Hadoop/current/share/hadoop/tools/lib/commons-httpclient-3.1.jar: org/apache/commons/httpclient/URIException.class /opt/application/Hadoop/current/share/hadoop/tools/lib/commons-httpclient-3.1.jar: org/apache/commons/httpclient/URIException.classPK [root@uabigspark01 ~]# strings -f /opt/application/Hive/current/lib/* |grep "org/apache/commons/httpclient/URIException" /opt/application/Hive/current/lib/commons-httpclient-3.0.1.jar: org/apache/commons/httpclient/URIException.class /opt/application/Hive/current/lib/commons-httpclient-3.0.1.jar: org/apache/commons/httpclient/URIException.classPK

Maybe this can be modified in ESHadoop? Or maybe a few lines can be added in the documentation to help people with Hive on Tez?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants