Deprecate Shared Gateway #2458

kimchy · 2012-12-03T10:43:29Z

Shared gateways (shared FS storage or S3 for example) are problematic performance wise since they constantly need to snapshot the state of the index to a shared location, and then use that as the system of record. The local gateway on the other hand doesn't need it, and performs much better.

The main benefit of a shared gateway is the fact that the data is actually stored on another persistent location (i.e. using ephemeral disks on AWS, but still having the data on s3), but then its actually abusing the shared gateway design (to be used as a backup).

In the near future, we will have a proper snapshot(backup)/restore API, which will be the proper way to do backups, but relaying on the shared gateway for that is problematic. Note, backups can still be made by "rsync" the data location for each node "manually".

closes #2458

ejain · 2012-12-04T17:39:00Z

Is there an open issue for the snapshot/restore API yet?

jgriswoldinfogroup · 2012-12-10T16:06:15Z

Why would you deprecate this feature prior to the availability of a backup/restore API?

fatemehmd · 2012-12-13T06:27:10Z

Is there any tutorial on how to configure instances with EBS now that S3 is not an option?

ejain · 2012-12-13T18:31:20Z

On Wed, Dec 12, 2012 at 10:27 PM, Fatemeh notifications@github.com wrote:

Is there any tutorial on how to configure instances with EBS now that S3 is not an option?

deprecated != removed

I do hope the backup/restore feature is implemented before support for
the S3 gateway is removed.

karmi · 2012-12-21T12:13:45Z

@fatemehmd The http://www.elasticsearch.org/tutorials/2012/03/21/deploying-elasticsearch-with-chef-solo.html tutorial now walks exactly through that scenario, via the support in the Chef cookbok.

youurayy · 2013-01-06T15:22:45Z

Never noticed this in the logs, now my S3-gateway-configured cluster crashed after running out of JVM memory on one of the nodes.

Would be beneficial for other users to add this deprecation into the docs.

kimchy · 2013-01-06T22:46:23Z

Indeed, deprecate does not mean we are going to remove it. It will not be removed before we have the snapshot/restore API, but even before, I would suggest running in local gateway mode on EBS for example, compared to using the s3 gateway, because of the overhead that comes with continuously snapshotting to it and treating it as the main source of truth.

@ypocat the OOM should not be caused because of the s3 gateway case, it probably happened because of other reasons (common one is faceting on fields that end up abusing memory, we are working on that as well...)

truthtrap · 2013-03-11T08:16:18Z

We are actually quite happy with the S3 gateway. We use it with all of the clusters we run. The main use is as a backup, that we can restore from when the cluster hangs or dies. The advantage of this approach is that we are extremely flexible in how we work with nodes.

Working with EBS is not a solution. It would require a very complicated automated setup that manages instance with additional EBS volume(s). It is an approach we often use for things like Postgres and MongoDB. But part of the elasticsearch enthusiasm we feel is the ease of working with cluster technology.

A snapshotting feature is a good replacement. It would really great if we could have some sort of Point in Time Restore with it, but it is not (yet) required. I would like to ask you to leave some overlap of features after you release snapshotting. We do rely on S3 when we upgrade our clusters, for example.

So, please, at least one release with snapshotting and the (deprecated) S3 gateway.

karmi · 2013-03-11T08:57:55Z

Working with EBS is not a solution. It would require a very complicated automated setup that manages instance with additional EBS volume(s).

I can understand why EBS volumes are not a good option in many scenarios, either from technical or economical standpoint. However, I'd say that the provisioning overhead is really low. Given how good abstraction the Fog (Ruby), jClouds (Java) and other libraries provide, I wouldn't describe it as "very complicated"...

truthtrap · 2013-03-11T10:15:38Z

I don't want to discuss complexity of AWS related issues here. But, if you want to build a cluster-wide 'snapshot mechanism' with EBS that keeps the flexibility of the ElasticSearch (in combination with the AWS Cloud Plugin) you are in for quite a ride.

If you just want persistence of a node, 'plain EBS' is fine. Unfortunately, that is not enough for us. We want to scale a cluster (OUT or IN) within a couple of minutes. We need to be able to rotate all instances in a cluster very easily, without worrying about the data. We have to be able to replace a non-responsive ElasticSearch node by terminating the instance. Etc.

(If you are interested how we approach these things you can read Resilience & Reliability on AWS. It has a dedicated chapter on ElasticSearch, just to show how incredibly impressed we are with it. Most of the work was already done.)

ejain · 2013-03-11T23:01:41Z

I'll second that setting up EBS complicates things in a setup where nodes are added and removed frequently, especially if performance is an issue.

kimchy · 2013-03-12T00:10:29Z

Snapshotting to s3 would bring both the advantages of the local gateway with the s3 gateway. we won't remove the s3 gateway before Snapshotting is in place at least for one major version

youurayy · 2013-03-13T02:44:41Z

Just to add my 2 cents, the S3 shared gateway did not prevent my cluster from crashing into an irreparable state. I had to code an utility which went through the Lucene index files on disk and recovered / reindexed the data into a freshly initialized cluster. I believe I am much better off with the local gateway and daily snapshots of my EBS RAID5 arrays.

kimchy · 2013-03-13T04:11:29Z

the idea here is that snapshot/restore with local gateway allows to strike the right balance between keeping up to date local recoverability with long term recoverability from something like s3

truthtrap · 2013-03-14T08:09:19Z

@Shay, thanks for leaving some overlap in the current s3 gateway and the
new snapshotting feature :)

for us 'local recoverability' is on shard level. we will always plan for
loss of an instance, without loosing the cluster. the cluster can recover
itself. we choose to treat nodes as ephemeral. with full cluster BREAKDOWN
a little bit of lag is not a problem. and for full cluster SHUTDOWN we can
manage this properly ourselves.

we are extreme fans of EBS, actually. and there is another interesting
application for EBS, and that is performance. ephemeral is a lot slower
with most rdbms we tried, for example. so, perhaps EBS is necessary in
cases of severe disk access. AWS has SSD ephemeral disks, but that is still
a bit above budget for most of our apps.

another interesting feature of EBS is that you can easily have 20 smaller
volumes, for the same price as a big volume. because of the nature of EBS
you increase your potential read/write throughput more or less linearly.
this principle could be applied to individual indexes, or even shards, if
they can be assigned to different parts on the filesystem. this would be
better manageable than raid, in case local (instance) recoverability is an
issue.

groet,
jurg.

On Wed, Mar 13, 2013 at 5:11 AM, Shay Banon notifications@github.comwrote:

the idea here is that snapshot/restore with local gateway allows to strike
the right balance between keeping up to date local recoverability with long
term recoverability from something like s3

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2458#issuecomment-14823515
.

oravecz · 2013-04-01T13:08:54Z

We have been using the S3 shared gateway as a backup since 2010 in production. Our use case is perhaps a bit different from some ES users because we use ES to store smallish amounts of data. We also deploy to Elastic Beanstalk so instances are created and destroyed by Amazon and snapshotting and reuse of EBS is not appropriate. Sometimes we deploy a memory-only store with ES which only can rely on the shared gateway for any kind of cluster recovery.

I am hopeful that the S3 gateway will not go away altogether, or perhaps it is replaced with the snapshot to s3 that Shay had mentioned. My question however is what is the difference between the S3 Gateway now and the "Snapshot to S3" feature besides the frequency with which they will sync (which is customizable for the shared gateway)?

kimchy · 2013-04-01T21:30:46Z

@oravecz effectively, a schedule snapshot to s3 using the future snapshot API will work in a similar manner to s3 gateway. Recovery will work a bit differently, where if you loose all the cluster data (loose all instances with ephemeral drives), you will need to explicitly "call recover" on the new cluster to recover the data from s3.

thomaswitt · 2013-10-13T23:40:04Z

To be honest, we're not a big fan of the more EBS-centric way of running ElasticSearch.

Please do consider that nearly every major downtime at AWS had to do something with EBS (often in conjunction with the loss of data). EBS is - in my opinion - one of the most flawed services (just google "aws downtimes ebs"). Which is also not AWS' fault, we have quite some large customers who invested Millions of $ in their "unbreakable" or "fully redundant" SAN and they ALL had downtimes from some hours to several days.

So we're heavily relying on running all our elastic search stuff only on local instance storage and spread the copies to multiple nodes in multiple availability zones. The S3 gateway always seemed to be a big help in avoiding long reindexing times in times of catastrophic events.

In my opinion, it'd be a good idea to have an easy out-of-the-box-solution for ppl who don't want to run ElasticSearch on a non-local, distributed filesystem.

closes elastic#2458

torquemad · 2018-12-04T02:27:28Z

@kimchy - I know this MR is quite old, but was there ever any migration plan/documentation from the shared FS gateway to the local gateway without re-indexing? I haven't been able to find anything concrete.

kimchy closed this as completed in 677e6ce Dec 3, 2012

kimchy added a commit that referenced this issue Dec 3, 2012

Deprecate Shared Gateway

4f098ec

closes #2458

kimchy added a commit that referenced this issue Dec 3, 2012

Deprecate Shared Gateway

e71c087

closes #2458

quentins mentioned this issue May 21, 2013

API feature for ES snapshots/backups #3070

Closed

sopel mentioned this issue Jun 25, 2013

Ensure datastore persistence cityindex-attic/logsearch#25

Closed

spinscale mentioned this issue Jul 23, 2013

Backing up and Restoring a Specific Index #313

Closed

dpb587 mentioned this issue Dec 17, 2013

Analyze/Implement Auto Scaling for Elasticsearch cityindex-attic/logsearch#270

Closed

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Deprecate Shared Gateway

1478ef4

closes elastic#2458

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Deprecate Shared Gateway

3da9d77

closes elastic#2458

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate Shared Gateway #2458

Deprecate Shared Gateway #2458

kimchy commented Dec 3, 2012

ejain commented Dec 4, 2012

jgriswoldinfogroup commented Dec 10, 2012

fatemehmd commented Dec 13, 2012

ejain commented Dec 13, 2012

karmi commented Dec 21, 2012

youurayy commented Jan 6, 2013

kimchy commented Jan 6, 2013

truthtrap commented Mar 11, 2013

karmi commented Mar 11, 2013

truthtrap commented Mar 11, 2013

ejain commented Mar 11, 2013

kimchy commented Mar 12, 2013

youurayy commented Mar 13, 2013

kimchy commented Mar 13, 2013

truthtrap commented Mar 14, 2013

oravecz commented Apr 1, 2013

kimchy commented Apr 1, 2013

thomaswitt commented Oct 13, 2013

torquemad commented Dec 4, 2018

Navigation Menu

Deprecate Shared Gateway #2458

Deprecate Shared Gateway #2458

Comments

kimchy commented Dec 3, 2012

ejain commented Dec 4, 2012

jgriswoldinfogroup commented Dec 10, 2012

fatemehmd commented Dec 13, 2012

ejain commented Dec 13, 2012

karmi commented Dec 21, 2012

youurayy commented Jan 6, 2013

kimchy commented Jan 6, 2013

truthtrap commented Mar 11, 2013

karmi commented Mar 11, 2013

truthtrap commented Mar 11, 2013

ejain commented Mar 11, 2013

kimchy commented Mar 12, 2013

youurayy commented Mar 13, 2013

kimchy commented Mar 13, 2013

truthtrap commented Mar 14, 2013

oravecz commented Apr 1, 2013

kimchy commented Apr 1, 2013

thomaswitt commented Oct 13, 2013

torquemad commented Dec 4, 2018