New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue while joining two tables stored on Elasticsearch using HiveQL #180
Comments
This has been fixed in master - please use that instead. |
Thanks for the reply, Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Field(s) [[block__offset__inside__file, input__file__name]] not found in the Elasticsearch mapping specified; did you mean []?)' I have tried it with both ES1.0.0 and ES1.0.0RC2 |
This is a regression available in master which I'm investigating. Thanks! |
@skhuntia scratch the 'open up a new issue' - you already did that, I confused the issue. |
Thanks for the info. |
I tried it again, but this time I got the below exception. Exception in thread "main" java.lang.NoSuchFieldError: VIRTUAL_COLUMNS |
What Hive version are you using? |
I am using hive- 0.10.0 |
I've pushed a fix for it and triggered a nightly build - the new jars should be ready in 10'. You can build the sources in the meantime if you want. Cheers, |
Thanks a lot Costin. |
Can you try now to the latest snapshot - 20140407.114803-90? Let me know how it goes. |
This works!!! Thanks |
Great - thanks for confirming! |
Filter out Hive 'virtual' columns Filter out Elasticsearch built-in types Change validation default to warn to cope with cases where the schema is incomplete Fix #180
I tried to join two tables but it doesnt work.
The query didnt fail as such, but in result I got "no data available".
I am using "elasticsearch-hadoop-1.3.0.M2.jar" and "elasticsearch-1.0.0".
This is how I created two hive external tables
CREATE EXTERNAL TABLE activity(description string,username string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '/test/activity/',
'es.host' = '10.309.500.163',
'es.port' = '9200');
CREATE EXTERNAL TABLE user(username string,ethnicity string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '/test/user/',
'es.host' = '10.309.500.163',
'es.port' = '9200');
and the query that I ran is:
SELECT activity.,user. FROM activity JOIN user ON (activity.username = user.username)
The result was "no data available"
That means somehow the join condition failed.
But The "username" column/field in both tables has a lot of common records.
I dont understand why I am not getting the desired result. Am I doing anything wrong?
Thanks
The text was updated successfully, but these errors were encountered: