Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch+Hadoop read and write #45

Closed
marcelopaesrech opened this issue May 13, 2013 · 3 comments
Closed

Elasticsearch+Hadoop read and write #45

marcelopaesrech opened this issue May 13, 2013 · 3 comments

Comments

@marcelopaesrech
Copy link

Hi, I want to use Elasticsearch+Hadoop MapReduce and I want to read from elasticsearch and write to elasticsearch in the same MapReduce task. In the sample only one way is showed (or read or write because exists only one es.resource). What I want is something like follows:

Read from:
/radio/artists/_search?q=me*

Write to :
/radio/statistics

Best regards.

@tzolov
Copy link

tzolov commented May 13, 2013

@marcelopaesrech this looks the same issue as #26. IMO there is no technical reason prohibiting this case.

Furthermore i've implemented a simple fix that adds ES_QUERY in addition to the ES_RESOURCE and it works fine.

@marcelopaesrech
Copy link
Author

Yes, I want the input of Map is /radio/artists/_search?q=me* and the Reducers write to /radio/statistics (or maybe another index). Any intermediate data that Hadoop generates might be stored by hdfs, I don't care. But the final result I want to store in a index on ES.

I read your issue and is the same thing I think.

@costin
Copy link
Member

costin commented Mar 10, 2014

@tzolov @marcelopaesrech hey, finally got around fixing this. Cascading/Hive and Pig set the read/write automatically - in case of MapReduce jobs one can use the es.resource.read and es.resource.write properties. es.resource is still supported and used as fall-back if the aforementioned properties are not defined.

costin added a commit that referenced this issue Apr 8, 2014
Improve conf to allow for dedicated read and write resource as oppose to
a single, unified resource used for both. This allows for different ES
indices to be used in the same index, one as a source and the other as
a sink.

'es.resource' is still supported and used as a fall back.
Higher level abstractions, such as Cascading, Hive and Pig, set the
proper property automatically.

fix #156
fix #45
fix #26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants