New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive Column Comments causing Hive Query to fail #322
Comments
@sagarnarla I've tested the fix only on local setups - on remote ones it's might not work properly. Note that this is a workaround for a bug in Hadoop so ideally, Hadoop would address this at its core. I've upgraded master to Hive 0.14 and pushed a new 2.1.0.BUILD-SNAPSHOT in the repo - can you please try it out and see whether it fixes your issue. What distro are you using specifically? |
Works! Thanks FYI: I am using HDP 2.1.6 |
@sagarnarla I'm glad to hear master fixes the issue. Are you by any chance using Windows, either as a client or to run the cluster? |
All nodes are on windows. When will we see the fix in a stable release? and which version? |
Most likely that is causing the problem - I haven't encountered this on *nix systems. The fix is currently scheduled for 2.1.0 only - Hive 0.14 has introduced a lot of internal incompatibilities (including snapshot dependencies which now cannot be found) so I'm reluctant in adding Hive 0.14 support in the 2.0.x branch. Thanks |
Hello, |
@jbaptisteGH I'm afraid there's not much we can do about it. This is a bug in Hive and es-hadoop tries to work around it - apparently it doesn't succeed all the time. I've noticed the bug occurs if the client or the server is on Windows - if that's not the case then it might be that Hive 0.14/HDP 2.2 made this worse. |
I am trying to make the changes you made in the master brach in However the issue does not seem to go away. Is there any other change that I need to make? |
I am using Hive 0.13.0.2.1.6.0-2103 and I am creating a simple 2 columns table and inserting one record into it.
Query:
CREATE TABLE IF NOT EXISTS testinsertion (
id STRING,
name STRING
) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.resource' = 'data/shouldwork',
'es.nodes' = '172.16.0.14,172.16.0.15,172.16.0.16',
'es.hive.disable.columns.comments.fix' = '0'
);
INSERT INTO TABLE testinsertion
SELECT "id1","myname" from hivesampletable limit 1;
Unfortunately the Job.XML that gets generated for the Hive Query has this line in it, which contains an illegal XML character sequence:
<property><name>columns.comments</name><value>�</value><source>programatically</source></property>
This is leading to an invalid XML which in turn causes the Hive query to fail.
I noticed there is what I assume a fix for this issue in the
fixHive13InvalidComments
method. This method should clear out or remove thecolumns.comments
property altogether. Unfortunately this is not seem to being applied. Usinges.hive.disable.columns.comments.fix
does not seem to have any effect.The text was updated successfully, but these errors were encountered: