New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to insert data into ES from Hive - Cannot determine write shards #467
Comments
Edited your post to add formatting The exception indicates the cluster for some reason behaves like it's read-only. Can you please turn on logging and report back the results as a gist? P.S. Please do add formatting to your posts - it's minimally intrusive and significantly increases the readability of its content as you can see above. Thanks. |
Apologies Costin, new to this. I'm not sure how to turn on logging, I've tried adding log4j.category.org.elasticsearch.hadoop.hive=DEBUG to Hive's log4j.properties file but the output in the log is the same as before: I don't think the cluster is in read only as if I send in the data over rest it will create the index and insert the data no problem. |
No worries. Hive tends to be painful to configure especially since there are multiple config files. I'm not sure whether you are trying to run things across a cluster or in a pseudo-distributed config (everything is on the same machine); have you looked at their wiki. The cluster might not be read-only but something is clearly off as the index is not created before hand and it should be. By the way, can you drop the table and create it each time - just in case the table definition already exists with invalid/incorrect configuration... |
Hi Costin, Ive just dropped the table as suggested and created a new one:
Notice that I changed 'es.resource' = 'proxylog-carltest/event' to be 'es.resource' = 'carltest/event' and this has worked. But ES index naming does support hyphens as we have several indices with them in?? |
Seems like a bug - probably the HTTP encoding is messed by the hyphen. By the way, you don't need to set |
@pricecarl Hi, I've tried reproducing your error to no avail. I have a similar table definition (with I'm not sure what actual Hive version is in CDH 5.3 (Hive 0.13 + patches doesn't say much) however based on the logs there's no escaping of the hyphen at all. |
Hi Costin, I've just tried it with carl-test/event as the index name/ type and it works fine. ‐ 8208 2010 HYPHEN Taken from: So I could of picked up one of these some how. I'm testing now and will report back. |
It seems that none of these work. it only works if you actually type the hyphen. |
Hi Costin, I've been investigating this further and it seems that changing the name to 'carltest/event' from 'proxylog-carltest/event' only worked because the latter didn't have it's ES template applied to it. I have a template:
Which would have been applied to 'proxylog-carltest/event', so does this mean Es-Hadoop cannot work with templates, or can you see where I am going wrong? |
Thanks for the update. I don't see why it shouldn't work since the template only takes care of the index creation. |
I've tried replicating the test again this time using a template (Elasticsearch 1.5.2, latest master in es-hadoop though it shouldn't matter) and have the following logs (notice the test creates the template right before the index):
followed by:
|
I can't imagine why a template would make things different - it's the same as if the index was defined before hand. Unless you probably have some type of routing that makes the index be allocated on a node that the connector cannot access? Cheers, |
Can you try adding a mapping to your test template and try this test again. It works fine for me too if I don't include the mappings section. |
That was it!
Need to look at what's wrong but it seems there's an error from Elasticsearch when the index is touched (due to the mapping) that is swallowed by the connector which then continues however as there's no index, there are no shards allocated for it. Thanks for your patience - I'll try to fix this as soon as possible and report back. |
Awesome, Thank you Costin. |
@pricecarl Took a while to figure out why the exception occurred and it was due to the mapping which was incorrect - the exception is not caused by the data but rather by the mapping definition which is about to be applied but fails. Thanks, |
Hi Costin, We have the latest build we are currently testing it out and will get back you ASAP. |
Hi Costin, Apologies I've taken a while to get back you on this. Now I have a different error:
I thought this might be because I was only trying to insert the timetaken fields and it was expecting the dates as well but got the same error:
Any ideas? |
Again, it all boils down to your mapping. If you look closely, you'll notice the root exception is actually triggered from Elasticsearch - your field doesn't provide a proper mapping. Again, template validation (including mappings) is something scheduled for Elasticsearch 2.0. In the meantime you can verify them directly by creating a dummy index and adding some sample data to it (command line or whatever REST tools you might want to use). |
Sorry Costin, I don't understand. I have created a mapping which matches the fieldnames coming in exactly, so I don't see how the fields don't know their mapping? You said "it tries to touch create the index" - is there a way I can get it to "touch" it with the mapping sent at the same time? |
@pricecarl Maybe my answer was not clear enough. Forget about the connector and Hive for a moment; just use Elasticsearch directly either through curl or Marvel or whatever tool you are comfortable with. Using the same configuration for the index as above, try to add data in JSON format as you expect it to be in Hive and see what happens. Then replicate it into Hive and see whether there's a difference. Without any logging in Hive (to understand what is going on behind the scenes and in your setup) and with the exception changing, isolating the problem at Elasticsearch level is a sure way forward. P.S. This issue would be a suitable discussion on the forum as it might help others bumping into the same problem in the future. |
Hi Costin I've tried as you suggested and it's the same error. Whilst doing this though I noticed that in the template I have the word type in the mapping part with a capital T. I have changed this now and its working fine. |
So ... does this mean the error is gone? Can the issue be closed or not Thanks, On Fri, Jun 19, 2015 at 1:01 PM, pricecarl notifications@github.com wrote:
|
Yes it can be closed. Thank you very much for your help. :) |
Glad to hear the situation has been resolved. Cheers! On Fri, Jun 19, 2015 at 4:33 PM, pricecarl notifications@github.com wrote:
|
Hello,
I'm having a problem inserting data from Hive into ES using ES Hadoop and I cant for the life of my figure out what is going wrong.
I've created a simple Hive table in a database called dev:
When I run a select query on this table it pulls back the single vale absolutely fine. I've also created another simple Hive table in a database called elasticsearchtables:
When I run the following Hive query:
I get a map reduce error and looking in the hive log it states:
Task with the most failures(4):
Diagnostic Messages for this Task:
My ES cluster is running 1.5.2 and i'm using the latest beta release of ES-Hadoop and I'm running CDH 5.1.3
Any help with this would be greatly appreciated.
The text was updated successfully, but these errors were encountered: