Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch and Hive integration on Yarn #393

Closed
abhisheksoni16 opened this issue Mar 12, 2015 · 7 comments
Closed

elasticsearch and Hive integration on Yarn #393

abhisheksoni16 opened this issue Mar 12, 2015 · 7 comments

Comments

@abhisheksoni16
Copy link

Hi,
I am using Apache 2.6.0 and Hive 0.13 versions. I have downloaded elasticsearch-hadoop-2.1.0.Beta3.zip and elasticsearch-1.4.4 version. while executing INSERT query in hive to insert the data into elasticsearch I am getting below error:

ERROR - Caused by: org.xml.sax.SAXParseException; systemId: file:///tmp/hadoop-root/.../job.xml; lineNumber: 665; columnNumber: 51; Character reference "&#0" is an invalid XML character.

Please help to solve the problem, am I using any wrong version of lib?

I also gone through the issue mentioned in the #359 but didn't get the solution.

Please help, thanks in advance.
Abhishek

@abhisheksoni16
Copy link
Author

Hi, I gone through the #359 and changed my hive query to insert only single column. it is working fine but at the end I am getting below error:

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"172.27.36.42"}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"172.27.36.42"}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]]
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:517)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:615)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]]
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:277)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:562)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:506)
... 13 more
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:303)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:287)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:291)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:118)
at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:367)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:84)
at org.elasticsearch.hadoop.hive.HiveUtils.init(HiveUtils.java:142)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat.getHiveRecordWriter(EsHiveOutputFormat.java:93)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat.getHiveRecordWriter(EsHiveOutputFormat.java:42)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:287)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:274)
... 15 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 7 Reduce: 1 Cumulative CPU: 9.22 sec HDFS Read: 1706351 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 9 seconds 220 msec

please help me to solve the problem.

@Vinaypandit
Copy link

I would check a couple of things.

  1. Make sure that you are using elasticsearch-hadoop-2.1.0.BUILD-20150314.023416-331.jar. There was a bug that was fixed in master
  2. I would check you Create table statement and make sure that the 'es.nodes' is referring to the host name on which the elastic search is running

@abhisheksoni16
Copy link
Author

Hi Vinaypandit,
thank you very much for reply. I am not able to find the jar you said. it will be great help for me if you can share the link to download it.

Thanks & Regards
Abhishek Soni

@mshirley
Copy link

@nataliaking
Copy link

I am wondering if anyone has resolved above error. I am using Elasticsearch v1.1.2, Hive 0.12.0-cdh5.1.3. I am using most recent build elasticsearch-hadoop-2.1.0.BUILD-20150426.023656-380.jar

2015-04-26 05:39:02,081 ERROR [main] org.elasticsearch.hadoop.rest.NetworkClient: Node [Connection timed out] failed (HERE_GOES_ES_HOSTNAME:9200); no other nodes left - aborting...

@costin
Copy link
Member

costin commented Apr 26, 2015

@nataliaking That indicates that the nodes where the Hive script is running cannot access Elasticsearch on port 9200. Make sure the Elasticsearch nodes have the REST interface enabled and the IP used is accessible from the Hive nodes.
Note the default is localhost:9200 which is likely not going to work when deployed in a cluster.

P.S. For questions please the mailing list not the issue tracker. And please don't hijack existing reports.

Cheers,

@costin
Copy link
Member

costin commented Apr 26, 2015

@abhisheksoni16 @Vinaypandit This issue has been fixed in master some time ago - see the #409 #359 issues. Note that unless the classpath is cleared, older versions without the fix might be loaded instead.

Cheers,

@costin costin closed this as completed Apr 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants