New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic search Hive integration issues #359
Comments
This problem is reproducible with the CDH5.3 distribution as well. What is interesting is, when the number of columns in the query is reduced to a single column, the problem doesn't show up. But, multi-columns is causing the problem. HDP 2.1 works fine. It seems this is a problem with the Hive/Hadoop versions in CDH 5.3 and HDP 2.2. |
I was also hit with this today. Looks like HIVE-8307? Based on comments there and on MongoDB's HADOOP-176 this is something that'll need to be fixed in the elasticsearch-hadoop project. I haven't looked at this codebase yet but will be investigating further tomorrow. |
@aritratony @angusws What version of es-hadoop are you using? 2.0.x, 2.1.x or master? I recommend giving the master a try since Hive 0.14 introduced some breaking changes which have been addressed in master. @angusws Thanks for the pointers - we already did some filtering (as the bug occurs first in Hive 0.13 if I'm not mistaken) however we should probably be inclusive rather then exclusive in the algorithm. I'll try to take a stab at it tomorrow and keep you updated. |
@costin I tested against 2.0.2 and 2.1.0.beta3 (same issue with both). I'll test with master tomorrow morning and reply here, thanks. |
@costin Just tested with CDH5.3 and last night's es-hadoop SNAPSHOT. The problem still exists. |
@aritratony Thanks for the confirmation. |
@angusws Thanks for the fast response! Glad to hear it's working - cheers! |
Tested with CDH5.3 and HDP2.2. Works for both. @costin Thanks for the quick turn around :) |
Thank you for the fast feedback - makes fixing issues so much easier. |
Can you point me to the EsStorageHandler.jar that has this fix ? I have tried 2.0.2 and 2.1.0.Beta3, none of this has the fix. |
It's the master or dev or snapshot build, see this section of the docs. |
Sorry just checking, is it available as a stand alone zip , the 2.1.0.BUILD-SNAPSHOT? |
I think it is in the link you pointed. Let me use it and see. |
Hi can anyone please provide me the link of elasticsearch-hadoop*.zip file which has the fix for the above error. I am also facing the same issue with Apache Hive 0.14. ERROR - Caused by: org.xml.sax.SAXParseException; systemId: file:///tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1426155298378_0002/container_1426155298378_0002_01_000001/job.xml; lineNumber: 665; columnNumber: 51; Character reference "�" is an invalid XML character. |
Hi, |
Hi, |
@ManikandanV and everyone else on this thread, the issue has been fixed in the dev build on both 2.0.x and 2.1.x branch which you can get from Maven. If you are not using the right version, you face the issue and the solution is to upgrade. |
Hi @costin , Thanks for your update. Let me clarify the version here. I am using below configuration. Hadoop - 2.6.0 Using the above config as well I am getting the invalid XML character error ("�"). Can you please tell me which version of elasticsearch-hadoop-*.jar will have the fix of this error? |
@ManikandanV I'm afraid I can't reproduce your problem. The fix has been in master and 2.x branch for quite some time and it has been confirmed to work by others as you can see from this thread. Can you double check that the same jar is available on all your nodes? Make sure it's the only version of es-hadoop and that no other version is available either within the Hadoop or Hive classpath; you can verify this by hand or by looking at the hadoop classpath within the console. |
@ManikandanV Please see issue #409 Most likely you have a different (older) version of es-hadoop somewhere in your classpath. Make sure you get rid of it (including the Hadoop and Hive libraries) - you can check this by trying to invoke the es-hadoop connector and getting a Cheers, |
Hello, Did these changes make it to any stable build for ES - i.e. ones found here https://www.elastic.co/downloads/past-releases ? |
These changes are marked as being released in version 2.0.3 |
I an using HDP 2.2 Sandbox and am trying the get elastic search hive integration to work. I am using
elasticsearch-hadoop-2.0.2.jar to talk to the hadoop and elastic search.
This following elastic search RPM has been installed
elasticsearch-1.4.2-1.noarch
These are the hadoop package installed on my sandbox
hadoop_2_2_0_0_2041-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-yarn-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-client-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-mapreduce-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-hdfs-2.6.0.2.2.0.0-2041.el6.x86_64
hadoop_2_2_0_0_2041-libhdfs-2.6.0.2.2.0.0-2041.el6.x86_64
hive_2_2_0_0_2041-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-metastore-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-server-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-jdbc-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-webhcat-server-0.14.0.2.2.0.0-2041.el6.noarch
hive_2_2_0_0_2041-server2-0.14.0.2.2.0.0-2041.el6.noarch
I am trying to use this example https://github.com/hortonworks/hadoop-tutorials/blob/master/Community/T07_Elasticsearch_Hadoop_Integration.md and when I try to insert data into the table I get the following exception
2015-01-17 21:26:00,834 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1421441970386_0016_000001
2015-01-17 21:26:01,135 FATAL [main] org.apache.hadoop.conf.Configuration: error parsing conf job.xml
org.xml.sax.SAXParseException; systemId: file:///hadoop/yarn/local/usercache/root/appcache/application_1421441970386_0016/container_1421441970386_0016_01_000001/job.xml; lineNumber: 846; columnNumber: 51; Character reference "�" is an invalid XML character.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2354)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2423)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2376)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2283)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1110)
at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:51)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1421)
2015-01-17 21:26:01,138 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
I have seen similar messages on the web but cause seems to be a clash of jars in the class path. I have checked all the necessary solution posted on the web for this issue, but am unable to make any headway. Any help on this issue would be greatly appreciated
Regards
Vinay Pandit
The text was updated successfully, but these errors were encountered: