Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to insert data into ES using elasticsearch-hadoop-2.1.0.Beta4.jar #473

Closed
xiaoyi78 opened this issue Jun 12, 2015 · 7 comments
Closed

Comments

@xiaoyi78
Copy link

Would appreciate if you could look into this issue.

I'm trying to insert data from hive to ES using elasticsearch-hadoop-2.1.0.Beta4.jar, but it's not successful. I was using elasticsearch-hadoop-2.0.2.jar version, it didn't work. That's why I downloaded elasticsearch-hadoop-2.1.0.Beta4.jar to try. I can create external tables and read data from ES without any issues. So I think the jar is in the right directory(In my case, I put the jar file in /hive/lib folder).

My environment is :

Hadoop 2.0.0-cdh4.5.0
hive-common-0.10.0

My script is :

DROP TABLE IF EXISTS es_edge_v_security_price_rolling;

CREATE EXTERNAL TABLE es_edge_v_security_price_rolling (
    currency    STRING,
    data_date   STRING,
    price       DOUBLE,
sls_sec_sec_num STRING
    )
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'system/service', 'es.nodes' = 'localhost:9200');

INSERT OVERWRITE TABLE dod.es_edge_v_security_price_rolling 
select e.currency, e.data_date, e.price, e.sls_sec_sec_num 
from dod.edge_v_security_price_rolling e limit 10;

Error message from the failed reduce task log:

java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Reduce operator initialization failed
at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:157)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:373)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:377)
at org.apache.hadoop.hive.ql.exec.LimitOperator.initializeOp(LimitOperator.java:41)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150)
... 14 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:315)
... 25 more
2015-06-12 05:18:10,301 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

Please let me know if you need more information. Thanks.

@costin
Copy link
Member

costin commented Jun 12, 2015

edited your post to add minor formatting - please do so yourself in the future

@xiaoyi78 It looks like there's a problem with your Hadoop install.
The NPE stacktrace is triggered by the Hadoop classes in particular ReflectionUtils.setJobConf which tries to do some reflection:

    //If JobConf and JobConfigurable are in classpath, AND
    //theObject is of type JobConfigurable AND
    //conf is of type JobConf then
    //invoke configure on theObject

This happens way before the connector kicks in and suggests that you might be having an incomplete Hadoop/Hive configuration or classpath or potentially you are using a mixture of Hadoop 1 and 2. Try using a different Hadoop install just for testing or upgrade your CDH install to the latest stable branch (even on 4.5.x).

@xiaoyi78
Copy link
Author

Thanks for your quick response and pointing out this is not related to the connector. We will investigate further on our configuration. Cheers.

@xiaoyi78
Copy link
Author

Sorry Costin, the error message I provided to you in my first post is from 'syslog' tab, but when I looked into 'stderr' tab, it has the error message related to elastic search. Please see below. Sorry, I need to learn how to format the code block below. Please bear with me one more time. It seems to me I got some classpath issue. But I still don't understand what went wrong for my classpath, I can create external table pointing to ES and read data from it. It would fail if my classpath was not set up properly. Appreciate if you could shed some light...

The code block below :

Continuing ...
java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveOutputFormat
Continuing ...
java.lang.NullPointerException
Continuing ...
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:315)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:377)
at org.apache.hadoop.hive.ql.exec.LimitOperator.initializeOp(LimitOperator.java:41)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)

@xiaoyi78
Copy link
Author

Sorry, just reformat my log block for my previous post.

Continuing ...
java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveOutputFormat
Continuing ...
java.lang.NullPointerException
Continuing ...
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:315)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:377)
at org.apache.hadoop.hive.ql.exec.LimitOperator.initializeOp(LimitOperator.java:41)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:436)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:392)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)

@costin
Copy link
Member

costin commented Jun 16, 2015

@xiaoyi78 the es-hadoop jar is not found in the classpath hence the CNFE. Hive is notorious for being hard to setup the classpath; it's best to double check their documentation (here is one link).
One reliable configuration for the latest versions of Hive is setting the HIVE_AUX_JARS_PATH as apparently setting the hive.xml is ignored by HiveServer2.

Note that this is not a es-hadoop bug - its classes are simply not found and thus cannot be loaded by Hive.

@costin
Copy link
Member

costin commented Jun 16, 2015

@xiaoyi78 By the way, make sure you only use just one version of es-hadoop. If you are using multiple (1.3 and 2.0 and 2.1), the classes are likely to trip on each other as the runtime will pick classes from each jar and these are not compatible.

@xiaoyi78
Copy link
Author

Thanks Cosin. Finally, I figured out the issue. As you mentioned in another post, most of this type issue was caused by classpath not set up correctly. In my case, I had to add the es-hadoop jar file into another location -- "CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hadoop-0.20-mapreduce/lib", which is for map-reduce job. This makes sense because the exception was thrown during reduce task and it could not find es-hadoop jar file.

So for people who experience the similar issue, please note apart from setting up the correct Hive classpath, but also for MapReduce job classpath.

Thanks again for the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants