Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive Column Comments causing Hive Query to fail #322

Closed
sagarnarla opened this issue Nov 13, 2014 · 8 comments
Closed

Hive Column Comments causing Hive Query to fail #322

sagarnarla opened this issue Nov 13, 2014 · 8 comments

Comments

@sagarnarla
Copy link

I am using Hive 0.13.0.2.1.6.0-2103 and I am creating a simple 2 columns table and inserting one record into it.
Query:
CREATE TABLE IF NOT EXISTS testinsertion (
id STRING,
name STRING
) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.resource' = 'data/shouldwork',
'es.nodes' = '172.16.0.14,172.16.0.15,172.16.0.16',
'es.hive.disable.columns.comments.fix' = '0'
);

INSERT INTO TABLE testinsertion
SELECT "id1","myname" from hivesampletable limit 1;

Unfortunately the Job.XML that gets generated for the Hive Query has this line in it, which contains an illegal XML character sequence:
<property><name>columns.comments</name><value>&#0;</value><source>programatically</source></property>

This is leading to an invalid XML which in turn causes the Hive query to fail.

I noticed there is what I assume a fix for this issue in the fixHive13InvalidComments method. This method should clear out or remove the columns.comments property altogether. Unfortunately this is not seem to being applied. Using es.hive.disable.columns.comments.fix does not seem to have any effect.

@costin
Copy link
Member

costin commented Dec 5, 2014

@sagarnarla I've tested the fix only on local setups - on remote ones it's might not work properly. Note that this is a workaround for a bug in Hadoop so ideally, Hadoop would address this at its core. I've upgraded master to Hive 0.14 and pushed a new 2.1.0.BUILD-SNAPSHOT in the repo - can you please try it out and see whether it fixes your issue.

What distro are you using specifically?

@sagarnarla sagarnarla reopened this Dec 5, 2014
@sagarnarla
Copy link
Author

Works! Thanks

FYI: I am using HDP 2.1.6

@costin
Copy link
Member

costin commented Dec 9, 2014

@sagarnarla I'm glad to hear master fixes the issue. Are you by any chance using Windows, either as a client or to run the cluster?

@sagarnarla
Copy link
Author

All nodes are on windows.

When will we see the fix in a stable release? and which version?

@costin
Copy link
Member

costin commented Dec 9, 2014

Most likely that is causing the problem - I haven't encountered this on *nix systems. The fix is currently scheduled for 2.1.0 only - Hive 0.14 has introduced a lot of internal incompatibilities (including snapshot dependencies which now cannot be found) so I'm reluctant in adding Hive 0.14 support in the 2.0.x branch.

Thanks

@jbaptisteGH
Copy link

Hello,
This issue is back with HDP2.2 / Hive 0.14 on Linux, either with the "es.hive.disable.columns.comments.fix" option.
(I tried with both the master and 2.1 Beta serdes)

@costin
Copy link
Member

costin commented Jan 5, 2015

@jbaptisteGH I'm afraid there's not much we can do about it. This is a bug in Hive and es-hadoop tries to work around it - apparently it doesn't succeed all the time. I've noticed the bug occurs if the client or the server is on Windows - if that's not the case then it might be that Hive 0.14/HDP 2.2 made this worse.
I'd like to help but again, Hive configuration shouldn't use an illegal XML character in the first place; es-hadoop tries to remove it but if the configuration is read/parsed before es-hadoop kicks in, the error will appear. And that's out side es-hadoop control which is a library as oppose to Hive which is a runtime.

@sagarnarla
Copy link
Author

I am trying to make the changes you made in the master brach in fixHive13InvalidComments to the 2.0 branch (56928e6#diff-f7e8fbf5f42badb596e0cdbbfbfdca08).

However the issue does not seem to go away. Is there any other change that I need to make?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants