Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Amazon Elastic Map Reduce #96

Closed
revendless-team opened this issue Oct 7, 2013 · 11 comments
Closed

Integration with Amazon Elastic Map Reduce #96

revendless-team opened this issue Oct 7, 2013 · 11 comments

Comments

@revendless-team
Copy link

No description provided.

@costin
Copy link
Member

costin commented Oct 7, 2013

AWS is just another Hadoop cluster. Take a look at the configuration
section in the reference documentation for more information:
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/index.html

On Mon, Oct 7, 2013 at 1:39 PM, revendless-team notifications@github.comwrote:

Is there any way to run the elasticsearch-hadoop connector on AWS Map
Reduce?

My plan is to use a sheduled nightly AWS Map Reduce Job to index/sync all
the data of my database with my separated (also on AWS EC2 running)
Elasticsearch Cluster Indexes. Is that a possible use case? How do i
configure my Elasticsearch Cluster Hostnames and Ports with the jar file?
(that the connector knows how to access the data?).

Thanks a lot!


Reply to this email directly or view it on GitHubhttps://github.com//issues/96
.

Costin

@costin
Copy link
Member

costin commented Oct 7, 2013

Just to add, it makes sense to have a dedicated section in the docs with specific instructions to make it easy for folks to start up. It's not there yet but I hope it wouldn't take long until it will - ofc, you can help.

Cheers.

@paulhredowl
Copy link

i've gotten this library working pretty much out of the box with elastic map reduce custom jar feature

@costin
Copy link
Member

costin commented Oct 7, 2013

That's good to hear. Nevertheless I think it would help to have a pointer
even to show how easy it is to get started.

On Mon, Oct 7, 2013 at 4:40 PM, paulhredowl notifications@github.comwrote:

i've gotten this library working pretty much out of the box with elastic
map reduce custom jar feature


Reply to this email directly or view it on GitHubhttps://github.com//issues/96#issuecomment-25808861
.

Costin

@costin
Copy link
Member

costin commented Oct 15, 2013

not yet - will let you know once there's something in trunk.

@costin
Copy link
Member

costin commented Oct 15, 2013

Just to be clear, es-hadoop works as it is with AWS ElasticMapReduce - as I've mentioned before issue is about getting some getting started documentation in.

@bstempi
Copy link

bstempi commented Mar 12, 2014

@paulhredowl How did you package your JAR? I used Eclipse to package a fat-JAR and I'm getting ClassNotFoundExceptions despite the class clearly being there. I couldn't tell if I did something wrong with my packaging or if I was bumping into an issue similar to #95.

@costin
Copy link
Member

costin commented Mar 12, 2014

Can you paste the stacktrace in a separate issue? Are you using the proper binary (yarn vs non-yarn) from M2? You can switch to the nightly build jar which works transparently across both of them.

@bstempi
Copy link

bstempi commented Mar 13, 2014

I think I have it resolved. For whatever reason, when using the "mapreduce" Hadoop API, I have to call Job.setJarByClass(), otherwise it can't find the formats. I'm not sure why...its the same JAR that I declare on the CLI. I'm not sure who's issue this is.

@ghost
Copy link

ghost commented Mar 18, 2014

Hey Guys - from what I get in the documentation and from the thread - once I create an EMR Cluster I can just be able to point Elasticsearch to the the EMR Cluster and then try to "index" the data? Does this mean that Elasticsearch acts as form of abstraction layer that would let me have better insight to the data without having to write pig/hive jobs. Sorry if that sounds a little novice (I am a virgin at both - just a DBA trying to make a living)

@costin
Copy link
Member

costin commented Oct 28, 2015

Closing this long standing issue. Not only there's documentation in place mentioning how to configure ES within a cloud environment but also we added the WAN support to allow working with a cluster only through a restricted set of gateway nodes.

Cheers,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants