Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Shared Gateway #2458

Closed
kimchy opened this issue Dec 3, 2012 · 19 comments
Closed

Deprecate Shared Gateway #2458

kimchy opened this issue Dec 3, 2012 · 19 comments

Comments

@kimchy
Copy link
Member

kimchy commented Dec 3, 2012

Shared gateways (shared FS storage or S3 for example) are problematic performance wise since they constantly need to snapshot the state of the index to a shared location, and then use that as the system of record. The local gateway on the other hand doesn't need it, and performs much better.

The main benefit of a shared gateway is the fact that the data is actually stored on another persistent location (i.e. using ephemeral disks on AWS, but still having the data on s3), but then its actually abusing the shared gateway design (to be used as a backup).

In the near future, we will have a proper snapshot(backup)/restore API, which will be the proper way to do backups, but relaying on the shared gateway for that is problematic. Note, backups can still be made by "rsync" the data location for each node "manually".

@kimchy kimchy closed this as completed in 677e6ce Dec 3, 2012
kimchy added a commit that referenced this issue Dec 3, 2012
kimchy added a commit that referenced this issue Dec 3, 2012
@ejain
Copy link
Contributor

ejain commented Dec 4, 2012

Is there an open issue for the snapshot/restore API yet?

@jgriswoldinfogroup
Copy link

Why would you deprecate this feature prior to the availability of a backup/restore API?

@fatemehmd
Copy link

Is there any tutorial on how to configure instances with EBS now that S3 is not an option?

@ejain
Copy link
Contributor

ejain commented Dec 13, 2012

On Wed, Dec 12, 2012 at 10:27 PM, Fatemeh notifications@github.com wrote:

Is there any tutorial on how to configure instances with EBS now that S3 is not an option?

deprecated != removed

I do hope the backup/restore feature is implemented before support for
the S3 gateway is removed.

@karmi
Copy link
Contributor

karmi commented Dec 21, 2012

@fatemehmd The http://www.elasticsearch.org/tutorials/2012/03/21/deploying-elasticsearch-with-chef-solo.html tutorial now walks exactly through that scenario, via the support in the Chef cookbok.

@youurayy
Copy link

youurayy commented Jan 6, 2013

Never noticed this in the logs, now my S3-gateway-configured cluster crashed after running out of JVM memory on one of the nodes.

Would be beneficial for other users to add this deprecation into the docs.

@kimchy
Copy link
Member Author

kimchy commented Jan 6, 2013

Indeed, deprecate does not mean we are going to remove it. It will not be removed before we have the snapshot/restore API, but even before, I would suggest running in local gateway mode on EBS for example, compared to using the s3 gateway, because of the overhead that comes with continuously snapshotting to it and treating it as the main source of truth.

@ypocat the OOM should not be caused because of the s3 gateway case, it probably happened because of other reasons (common one is faceting on fields that end up abusing memory, we are working on that as well...)

@truthtrap
Copy link

We are actually quite happy with the S3 gateway. We use it with all of the clusters we run. The main use is as a backup, that we can restore from when the cluster hangs or dies. The advantage of this approach is that we are extremely flexible in how we work with nodes.

Working with EBS is not a solution. It would require a very complicated automated setup that manages instance with additional EBS volume(s). It is an approach we often use for things like Postgres and MongoDB. But part of the elasticsearch enthusiasm we feel is the ease of working with cluster technology.

A snapshotting feature is a good replacement. It would really great if we could have some sort of Point in Time Restore with it, but it is not (yet) required. I would like to ask you to leave some overlap of features after you release snapshotting. We do rely on S3 when we upgrade our clusters, for example.

So, please, at least one release with snapshotting and the (deprecated) S3 gateway.

@karmi
Copy link
Contributor

karmi commented Mar 11, 2013

Working with EBS is not a solution. It would require a very complicated automated setup that manages instance with additional EBS volume(s).

I can understand why EBS volumes are not a good option in many scenarios, either from technical or economical standpoint. However, I'd say that the provisioning overhead is really low. Given how good abstraction the Fog (Ruby), jClouds (Java) and other libraries provide, I wouldn't describe it as "very complicated"...

@truthtrap
Copy link

I don't want to discuss complexity of AWS related issues here. But, if you want to build a cluster-wide 'snapshot mechanism' with EBS that keeps the flexibility of the ElasticSearch (in combination with the AWS Cloud Plugin) you are in for quite a ride.

If you just want persistence of a node, 'plain EBS' is fine. Unfortunately, that is not enough for us. We want to scale a cluster (OUT or IN) within a couple of minutes. We need to be able to rotate all instances in a cluster very easily, without worrying about the data. We have to be able to replace a non-responsive ElasticSearch node by terminating the instance. Etc.

(If you are interested how we approach these things you can read Resilience & Reliability on AWS. It has a dedicated chapter on ElasticSearch, just to show how incredibly impressed we are with it. Most of the work was already done.)

@ejain
Copy link
Contributor

ejain commented Mar 11, 2013

I'll second that setting up EBS complicates things in a setup where nodes are added and removed frequently, especially if performance is an issue.

@kimchy
Copy link
Member Author

kimchy commented Mar 12, 2013

Snapshotting to s3 would bring both the advantages of the local gateway with the s3 gateway. we won't remove the s3 gateway before Snapshotting is in place at least for one major version

@youurayy
Copy link

Just to add my 2 cents, the S3 shared gateway did not prevent my cluster from crashing into an irreparable state. I had to code an utility which went through the Lucene index files on disk and recovered / reindexed the data into a freshly initialized cluster. I believe I am much better off with the local gateway and daily snapshots of my EBS RAID5 arrays.

@kimchy
Copy link
Member Author

kimchy commented Mar 13, 2013

the idea here is that snapshot/restore with local gateway allows to strike the right balance between keeping up to date local recoverability with long term recoverability from something like s3

@truthtrap
Copy link

@Shay, thanks for leaving some overlap in the current s3 gateway and the
new snapshotting feature :)

for us 'local recoverability' is on shard level. we will always plan for
loss of an instance, without loosing the cluster. the cluster can recover
itself. we choose to treat nodes as ephemeral. with full cluster BREAKDOWN
a little bit of lag is not a problem. and for full cluster SHUTDOWN we can
manage this properly ourselves.

we are extreme fans of EBS, actually. and there is another interesting
application for EBS, and that is performance. ephemeral is a lot slower
with most rdbms we tried, for example. so, perhaps EBS is necessary in
cases of severe disk access. AWS has SSD ephemeral disks, but that is still
a bit above budget for most of our apps.

another interesting feature of EBS is that you can easily have 20 smaller
volumes, for the same price as a big volume. because of the nature of EBS
you increase your potential read/write throughput more or less linearly.
this principle could be applied to individual indexes, or even shards, if
they can be assigned to different parts on the filesystem. this would be
better manageable than raid, in case local (instance) recoverability is an
issue.

groet,
jurg.

On Wed, Mar 13, 2013 at 5:11 AM, Shay Banon notifications@github.comwrote:

the idea here is that snapshot/restore with local gateway allows to strike
the right balance between keeping up to date local recoverability with long
term recoverability from something like s3


Reply to this email directly or view it on GitHubhttps://github.com//issues/2458#issuecomment-14823515
.

@oravecz
Copy link

oravecz commented Apr 1, 2013

We have been using the S3 shared gateway as a backup since 2010 in production. Our use case is perhaps a bit different from some ES users because we use ES to store smallish amounts of data. We also deploy to Elastic Beanstalk so instances are created and destroyed by Amazon and snapshotting and reuse of EBS is not appropriate. Sometimes we deploy a memory-only store with ES which only can rely on the shared gateway for any kind of cluster recovery.

I am hopeful that the S3 gateway will not go away altogether, or perhaps it is replaced with the snapshot to s3 that Shay had mentioned. My question however is what is the difference between the S3 Gateway now and the "Snapshot to S3" feature besides the frequency with which they will sync (which is customizable for the shared gateway)?

@kimchy
Copy link
Member Author

kimchy commented Apr 1, 2013

@oravecz effectively, a schedule snapshot to s3 using the future snapshot API will work in a similar manner to s3 gateway. Recovery will work a bit differently, where if you loose all the cluster data (loose all instances with ephemeral drives), you will need to explicitly "call recover" on the new cluster to recover the data from s3.

@thomaswitt
Copy link

To be honest, we're not a big fan of the more EBS-centric way of running ElasticSearch.

Please do consider that nearly every major downtime at AWS had to do something with EBS (often in conjunction with the loss of data). EBS is - in my opinion - one of the most flawed services (just google "aws downtimes ebs"). Which is also not AWS' fault, we have quite some large customers who invested Millions of $ in their "unbreakable" or "fully redundant" SAN and they ALL had downtimes from some hours to several days.

So we're heavily relying on running all our elastic search stuff only on local instance storage and spread the copies to multiple nodes in multiple availability zones. The S3 gateway always seemed to be a big help in avoiding long reindexing times in times of catastrophic events.

In my opinion, it'd be a good idea to have an easy out-of-the-box-solution for ppl who don't want to run ElasticSearch on a non-local, distributed filesystem.

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
@torquemad
Copy link

@kimchy - I know this MR is quite old, but was there ever any migration plan/documentation from the shared FS gateway to the local gateway without re-indexing? I haven't been able to find anything concrete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants