New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElasticSearch, Hive, and HBase #439
Comments
I think it is due to regexp_replace part where I replace ''' with '"'. When I load data to hbase, json is stored as string with single quote. When I try to load json message from hbase to elasticsearch on Hive, I have to convert single quote to double quote. When this is done in mapreduce, it seems that Hive fails to replace single quote to double quote before sending it to ElasticSearch. I might be wrong, but that is what I am guessing. Please let me know what I can do. Thanks again. |
@tugisuwon I've been meaning to reply. First off, I've added minor formatting to your post to make it readable - hope you don't mind. By the way, if you need to do replacing, it means your initial data is not proper JSON so I would rather correct that; this would simplify the issue not just in this case but for any other system that consumes JSON that you might want to use. Hope this helps, |
@costin Thank you for a quick reply. It's my first time using github, so please make any modification if necessary :) Let me share a little more details about my setup. I loaded data to hbase using python, so actual json message was converted to string during this process. So you are right that initial data is not proper JSON anymore, but a string. This is why I added regexp_replace(message, ''','"') on hive before sending it to ES as JSON. example: Is there any way for me to load HBase with JSON instead of string so that I don't have to worry about this? Or do you know any way for me to convert string to JSON in hive correctly? Sorry about the massive questions... but really appreciate for your help and feedback. Thanks. |
Why not correct the JSON string directly. So instead of doing the replacing of quotes, use a JSON library when loading the data in HBase from python? Blindly replacing the quotes will cause problems - for example consider this string Basically what I'm suggesting is to address the problem at the source - when you are generating the String that is about to be loaded in HBase since again, that is not JSON and there's too much information loss for a simple replace to work. Not to mention that it's inefficient since instead of streaming the data, you know have to transform it which will act as a bottleneck that should not be needed. |
@costin Thanks again. I finally found out that there was a non-English word in the message that caused the problem. I was able to run everything now. Thanks again. |
Glad to hear it however my advice still stands. By using a JSON library, this case that requires quoting or escaping will not occur any longer. Closing the issue. |
Hello,
I have HDP 2.2 installed on my server and I am using elasticSearch to do indexing as following:
Now I am getting the following error. When it pushes about 1300~1500 records to elasticsearch, command from step 5 fails with the following error:
Can anyone help me with this issue? I manually deleted the json message in error due to privacy.
I have over 2 million rows in HBase table that needs to be pushed to ES.
Thank you in advance.
The text was updated successfully, but these errors were encountered: