-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reading an index and writing into another one #156
Comments
In the above gist, if I change the number of shards for hread index to 4, I get the following exception (it is still ok with 3 shards): |
The issue is that the table properties are written to the same Hadoop configuration object which means only one |
I'm a bit disappointed because using an intermediate hive table in my usecase would not be feasible : the size of my indices are way too high. Anyway, I understand your explanation. I'll now try to use your Hadoop-ES bridge directly in Java to see if I can go further. Thanks for the quick answer! |
Hi, This has been fixed in master - can you please try it out? The next nightly build (#336) should include it but of course, you can build it yourself until the build is published. Basically, there's no need for an intermediate table - you can have different input/output indices in the same job. |
Hi, Great job! I have updated the gist https://gist.github.com/jpparis-orange/9319913#file-hivecopyesindextoanother with the appropriate commands. They are doing the right job now ! I'm really pleased to see this this issue closed! |
Great! Could you also post some stats between using the intermediate table vs using ES directly? If it's easier we can just chat on IRC about this - I'm costin on #elasticsearch.
Costin |
Improve conf to allow for dedicated read and write resource as oppose to a single, unified resource used for both. This allows for different ES indices to be used in the same index, one as a source and the other as a sink. 'es.resource' is still supported and used as a fall back. Higher level abstractions, such as Cascading, Hive and Pig, set the proper property automatically. fix #156 fix #45 fix #26
Hello,
I'm trying to read an ES index from hive and writing back into another one in the same time. The job runs fine without any errors but no documents appear in the ES index connected to eswrite table.
Is this possible to do such copy?
Here is the version of the different components I use:
I have prepared a gist recreation here https://gist.github.com/jpparis-orange/9319913#file-hivecopyesindextoanother. At the end, you'll find the hive commands (after the shell exit) used to copy.
The COPY command give the following output.
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
...
Ended Job = job_1393406785607_0155
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.33 sec HDFS Read: 1683 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 330 msec
OK
After that, my hwrite index is still empty, but the version of docs in hread is 2 ! It seems to me that I'm writing in the hread index.
When using a temporary hive table, I can copy my data with the following 2 commands:
INSERT OVERWRITE TABLE tmp SELECT * FROM es_read;
INSERT OVERWRITE TABLE es_write SELECT * FROM tmp;
This issue seems to be related to #125 and #70.
thanks
jp
The text was updated successfully, but these errors were encountered: