Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while joining two tables stored on Elasticsearch using HiveQL #180

Closed
skhuntia opened this issue Mar 28, 2014 · 13 comments
Closed

Issue while joining two tables stored on Elasticsearch using HiveQL #180

skhuntia opened this issue Mar 28, 2014 · 13 comments

Comments

@skhuntia
Copy link

I tried to join two tables but it doesnt work.
The query didnt fail as such, but in result I got "no data available".

I am using "elasticsearch-hadoop-1.3.0.M2.jar" and "elasticsearch-1.0.0".

This is how I created two hive external tables

CREATE EXTERNAL TABLE activity(description string,username string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '/test/activity/',
'es.host' = '10.309.500.163',
'es.port' = '9200');

CREATE EXTERNAL TABLE user(username string,ethnicity string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '/test/user/',
'es.host' = '10.309.500.163',
'es.port' = '9200');

and the query that I ran is:

SELECT activity.,user. FROM activity JOIN user ON (activity.username = user.username)

The result was "no data available"
That means somehow the join condition failed.
But The "username" column/field in both tables has a lot of common records.

I dont understand why I am not getting the desired result. Am I doing anything wrong?

Thanks

@costin
Copy link
Member

costin commented Mar 28, 2014

This has been fixed in master - please use that instead.

@costin costin closed this as completed Mar 28, 2014
@costin costin added doc and removed doc labels Mar 30, 2014
@skhuntia
Copy link
Author

skhuntia commented Apr 1, 2014

Thanks for the reply,
Like you said I used the master.
But still it doesn't work for me.
When I try to join the tables this is the error that I got.

Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Field(s) [[block__offset__inside__file, input__file__name]] not found in the Elasticsearch mapping specified; did you mean []?)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

I have tried it with both ES1.0.0 and ES1.0.0RC2

@costin
Copy link
Member

costin commented Apr 1, 2014

This is a regression available in master which I'm investigating.
Can you please open up a separate issue next time?

Thanks!

@costin
Copy link
Member

costin commented Apr 1, 2014

@skhuntia scratch the 'open up a new issue' - you already did that, I confused the issue.

@costin costin reopened this Apr 1, 2014
@costin costin removed the duplicate label Apr 1, 2014
@skhuntia
Copy link
Author

skhuntia commented Apr 2, 2014

Thanks for the info.
I will be waiting for this issue to be fixed.

@costin costin closed this as completed in 9db0317 Apr 3, 2014
@skhuntia
Copy link
Author

skhuntia commented Apr 7, 2014

I tried it again, but this time I got the below exception.

Exception in thread "main" java.lang.NoSuchFieldError: VIRTUAL_COLUMNS
at org.elasticsearch.hadoop.hive.HiveUtils.columnToAlias(HiveUtils.java:80)
at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:106)
at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:51)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:388)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:516)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:496)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1474)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:270)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

@costin
Copy link
Member

costin commented Apr 7, 2014

What Hive version are you using?

@skhuntia
Copy link
Author

skhuntia commented Apr 7, 2014

I am using hive- 0.10.0
cloudera dist 4.4.0

@costin
Copy link
Member

costin commented Apr 7, 2014

I've pushed a fix for it and triggered a nightly build - the new jars should be ready in 10'. You can build the sources in the meantime if you want.
I'll ping you once the build completes so you can try it out.

Cheers,

@skhuntia
Copy link
Author

skhuntia commented Apr 7, 2014

Thanks a lot Costin.

@costin
Copy link
Member

costin commented Apr 7, 2014

Can you try now to the latest snapshot - 20140407.114803-90? Let me know how it goes.

@skhuntia
Copy link
Author

skhuntia commented Apr 7, 2014

This works!!!
I am able to join two tables and all other operations.

Thanks

@costin
Copy link
Member

costin commented Apr 7, 2014

Great - thanks for confirming!

costin added a commit that referenced this issue Apr 8, 2014
Filter out Hive 'virtual' columns
Filter out Elasticsearch built-in types
Change validation default to warn to cope with cases where the schema
is incomplete
Fix #180
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants