Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic search Hive integration issues #359

Closed
Vinaypandit opened this issue Jan 18, 2015 · 24 comments
Closed

Elastic search Hive integration issues #359

Vinaypandit opened this issue Jan 18, 2015 · 24 comments

Comments

@Vinaypandit
Copy link

I an using HDP 2.2 Sandbox and am trying the get elastic search hive integration to work. I am using
elasticsearch-hadoop-2.0.2.jar to talk to the hadoop and elastic search.

This following elastic search RPM has been installed
elasticsearch-1.4.2-1.noarch

These are the hadoop package installed on my sandbox
hadoop_2_2_0_0_2041-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-yarn-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-client-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-mapreduce-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-hdfs-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-libhdfs-2.6.0.2.2.0.0-2041.el6.x86_64
hive_2_2_0_0_2041-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-metastore-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-server-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-jdbc-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-webhcat-server-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-server2-0.14.0.2.2.0.0-2041.el6.noarch

I am trying to use this example https://github.com/hortonworks/hadoop-tutorials/blob/master/Community/T07_Elasticsearch_Hadoop_Integration.md and when I try to insert data into the table I get the following exception

2015-01-17 21:26:00,834 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1421441970386_0016_000001
2015-01-17 21:26:01,135 FATAL [main] org.apache.hadoop.conf.Configuration: error parsing conf job.xml
org.xml.sax.SAXParseException; systemId: file:///hadoop/yarn/local/usercache/root/appcache/application_1421441970386_0016/container_1421441970386_0016_01_000001/job.xml; lineNumber: 846; columnNumber: 51; Character reference "&#0" is an invalid XML character.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2354)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2423)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2376)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2283)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1110)
at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:51)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1421)
2015-01-17 21:26:01,138 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster

I have seen similar messages on the web but cause seems to be a clash of jars in the class path. I have checked all the necessary solution posted on the web for this issue, but am unable to make any headway. Any help on this issue would be greatly appreciated

Regards
Vinay Pandit

@aritrachatterjee15
Copy link

This problem is reproducible with the CDH5.3 distribution as well.

What is interesting is, when the number of columns in the query is reduced to a single column, the problem doesn't show up. But, multi-columns is causing the problem.

HDP 2.1 works fine.

It seems this is a problem with the Hive/Hadoop versions in CDH 5.3 and HDP 2.2.

@angusws
Copy link

angusws commented Jan 18, 2015

I was also hit with this today. Looks like HIVE-8307? Based on comments there and on MongoDB's HADOOP-176 this is something that'll need to be fixed in the elasticsearch-hadoop project. I haven't looked at this codebase yet but will be investigating further tomorrow.

@costin
Copy link
Member

costin commented Jan 18, 2015

@aritratony @angusws What version of es-hadoop are you using? 2.0.x, 2.1.x or master? I recommend giving the master a try since Hive 0.14 introduced some breaking changes which have been addressed in master.

@angusws Thanks for the pointers - we already did some filtering (as the bug occurs first in Hive 0.13 if I'm not mistaken) however we should probably be inclusive rather then exclusive in the algorithm. I'll try to take a stab at it tomorrow and keep you updated.

@angusws
Copy link

angusws commented Jan 18, 2015

@costin I tested against 2.0.2 and 2.1.0.beta3 (same issue with both). I'll test with master tomorrow morning and reply here, thanks.

@aritrachatterjee15
Copy link

@costin Just tested with CDH5.3 and last night's es-hadoop SNAPSHOT. The problem still exists.

costin added a commit that referenced this issue Jan 19, 2015
@costin
Copy link
Member

costin commented Jan 19, 2015

@aritratony Thanks for the confirmation.
@aritratony @angusws I've pushed out a new nightly build that does a better job at filtering job properties. Can you please try it out and report back?

@angusws
Copy link

angusws commented Jan 19, 2015

@costin Thanks for such a quick turnaround. Pleased to confirm that 37b4989 fixes the problem for me.

@costin
Copy link
Member

costin commented Jan 19, 2015

@angusws Thanks for the fast response! Glad to hear it's working - cheers!

costin added a commit that referenced this issue Jan 19, 2015
relates to #359
(cherry picked from commit 37b4989)
(cherry picked from commit 61830e4)
(cherry picked from commit cc0883d)
costin added a commit that referenced this issue Jan 19, 2015
relates to #359
(cherry picked from commit 37b4989)
(cherry picked from commit 61830e4)
(cherry picked from commit cc0883d)
@aritrachatterjee15
Copy link

Tested with CDH5.3 and HDP2.2. Works for both. @costin Thanks for the quick turn around :)

@costin
Copy link
Member

costin commented Jan 19, 2015

Thank you for the fast feedback - makes fixing issues so much easier.

costin added a commit that referenced this issue Jan 23, 2015
@logic4fun
Copy link

Can you point me to the EsStorageHandler.jar that has this fix ? I have tried 2.0.2 and 2.1.0.Beta3, none of this has the fix.

@costin
Copy link
Member

costin commented Feb 27, 2015

It's the master or dev or snapshot build, see this section of the docs.

@logic4fun
Copy link

Sorry just checking, is it available as a stand alone zip , the 2.1.0.BUILD-SNAPSHOT?

@logic4fun
Copy link

I think it is in the link you pointed. Let me use it and see.

@abhisheksoni16
Copy link

Hi can anyone please provide me the link of elasticsearch-hadoop*.zip file which has the fix for the above error. I am also facing the same issue with Apache Hive 0.14.

ERROR - Caused by: org.xml.sax.SAXParseException; systemId: file:///tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426155298378_0002/container_1426155298378_0002_01_000001/job.xml; lineNumber: 665; columnNumber: 51; Character reference "&#0" is an invalid XML character.

@aromeyer
Copy link

Hi,
the path to the daily build snapshots : https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop
This bug is fixed for us using for example elasticsearch-hadoop-2.0.3.BUILD-20150321.030143-147.zip with Hive 0.13.1.

@ManikandanV
Copy link

Hi,
I am facing the same issue on Hive 1.0 with elasticsearch-hadoop-2.0.2.jar.

@costin
Copy link
Member

costin commented Mar 30, 2015

@ManikandanV and everyone else on this thread, the issue has been fixed in the dev build on both 2.0.x and 2.1.x branch which you can get from Maven.
This is described in the docs on 2.0.x and 2.1.x

If you are not using the right version, you face the issue and the solution is to upgrade.

@costin costin closed this as completed Mar 30, 2015
@ManikandanV
Copy link

Hi @costin , Thanks for your update. Let me clarify the version here. I am using below configuration.

Hadoop - 2.6.0
Hive - 0.14.0
elasticsearch-hadoop-2.0.3.BUILD-20150327.030156-153.jar

Using the above config as well I am getting the invalid XML character error ("&#0"). Can you please tell me which version of elasticsearch-hadoop-*.jar will have the fix of this error?

@costin
Copy link
Member

costin commented Apr 1, 2015

@ManikandanV I'm afraid I can't reproduce your problem. The fix has been in master and 2.x branch for quite some time and it has been confirmed to work by others as you can see from this thread.

Can you double check that the same jar is available on all your nodes? Make sure it's the only version of es-hadoop and that no other version is available either within the Hadoop or Hive classpath; you can verify this by hand or by looking at the hadoop classpath within the console.
Can you try your script locally vs on a remote Hive cluster?
Additionally, can you please turn on logging all the way (see the chapter in the reference docs).

@costin
Copy link
Member

costin commented Apr 1, 2015

@ManikandanV Please see issue #409 Most likely you have a different (older) version of es-hadoop somewhere in your classpath. Make sure you get rid of it (including the Hadoop and Hive libraries) - you can check this by trying to invoke the es-hadoop connector and getting a NoClassDefFound error. Once you are positive the classpath is clear, use the latest es-hadoop dev build.

Cheers,

@ManikandanV
Copy link

Hi @costin , I followed the steps mentioned on #409. And Its working for me now. Thanks a lot for your help. Now I am using below configurations.

Hadoop - 2.6.0
Hive - 1.0
elasticsearch-hadoop-2.0.3.BUILD-20150327.030156-153.jar

@daksh-talwar
Copy link

Hello,

Did these changes make it to any stable build for ES - i.e. ones found here https://www.elastic.co/downloads/past-releases ?

@jbaiera
Copy link
Member

jbaiera commented May 14, 2018

These changes are marked as being released in version 2.0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants