New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate Shared Gateway #2458
Comments
Is there an open issue for the snapshot/restore API yet? |
Why would you deprecate this feature prior to the availability of a backup/restore API? |
Is there any tutorial on how to configure instances with EBS now that S3 is not an option? |
On Wed, Dec 12, 2012 at 10:27 PM, Fatemeh notifications@github.com wrote:
deprecated != removed I do hope the backup/restore feature is implemented before support for |
@fatemehmd The http://www.elasticsearch.org/tutorials/2012/03/21/deploying-elasticsearch-with-chef-solo.html tutorial now walks exactly through that scenario, via the support in the Chef cookbok. |
Never noticed this in the logs, now my S3-gateway-configured cluster crashed after running out of JVM memory on one of the nodes. Would be beneficial for other users to add this deprecation into the docs. |
Indeed, deprecate does not mean we are going to remove it. It will not be removed before we have the snapshot/restore API, but even before, I would suggest running in local gateway mode on EBS for example, compared to using the s3 gateway, because of the overhead that comes with continuously snapshotting to it and treating it as the main source of truth. @ypocat the OOM should not be caused because of the s3 gateway case, it probably happened because of other reasons (common one is faceting on fields that end up abusing memory, we are working on that as well...) |
We are actually quite happy with the S3 gateway. We use it with all of the clusters we run. The main use is as a backup, that we can restore from when the cluster hangs or dies. The advantage of this approach is that we are extremely flexible in how we work with nodes. Working with EBS is not a solution. It would require a very complicated automated setup that manages instance with additional EBS volume(s). It is an approach we often use for things like Postgres and MongoDB. But part of the elasticsearch enthusiasm we feel is the ease of working with cluster technology. A snapshotting feature is a good replacement. It would really great if we could have some sort of Point in Time Restore with it, but it is not (yet) required. I would like to ask you to leave some overlap of features after you release snapshotting. We do rely on S3 when we upgrade our clusters, for example. So, please, at least one release with snapshotting and the (deprecated) S3 gateway. |
I can understand why EBS volumes are not a good option in many scenarios, either from technical or economical standpoint. However, I'd say that the provisioning overhead is really low. Given how good abstraction the Fog (Ruby), jClouds (Java) and other libraries provide, I wouldn't describe it as "very complicated"... |
I don't want to discuss complexity of AWS related issues here. But, if you want to build a cluster-wide 'snapshot mechanism' with EBS that keeps the flexibility of the ElasticSearch (in combination with the AWS Cloud Plugin) you are in for quite a ride. If you just want persistence of a node, 'plain EBS' is fine. Unfortunately, that is not enough for us. We want to scale a cluster (OUT or IN) within a couple of minutes. We need to be able to rotate all instances in a cluster very easily, without worrying about the data. We have to be able to replace a non-responsive ElasticSearch node by terminating the instance. Etc. (If you are interested how we approach these things you can read Resilience & Reliability on AWS. It has a dedicated chapter on ElasticSearch, just to show how incredibly impressed we are with it. Most of the work was already done.) |
I'll second that setting up EBS complicates things in a setup where nodes are added and removed frequently, especially if performance is an issue. |
Snapshotting to s3 would bring both the advantages of the local gateway with the s3 gateway. we won't remove the s3 gateway before Snapshotting is in place at least for one major version |
Just to add my 2 cents, the S3 shared gateway did not prevent my cluster from crashing into an irreparable state. I had to code an utility which went through the Lucene index files on disk and recovered / reindexed the data into a freshly initialized cluster. I believe I am much better off with the local gateway and daily snapshots of my EBS RAID5 arrays. |
the idea here is that snapshot/restore with local gateway allows to strike the right balance between keeping up to date local recoverability with long term recoverability from something like s3 |
@Shay, thanks for leaving some overlap in the current s3 gateway and the for us 'local recoverability' is on shard level. we will always plan for we are extreme fans of EBS, actually. and there is another interesting another interesting feature of EBS is that you can easily have 20 smaller groet, On Wed, Mar 13, 2013 at 5:11 AM, Shay Banon notifications@github.comwrote:
|
We have been using the S3 shared gateway as a backup since 2010 in production. Our use case is perhaps a bit different from some ES users because we use ES to store smallish amounts of data. We also deploy to Elastic Beanstalk so instances are created and destroyed by Amazon and snapshotting and reuse of EBS is not appropriate. Sometimes we deploy a memory-only store with ES which only can rely on the shared gateway for any kind of cluster recovery. I am hopeful that the S3 gateway will not go away altogether, or perhaps it is replaced with the snapshot to s3 that Shay had mentioned. My question however is what is the difference between the S3 Gateway now and the "Snapshot to S3" feature besides the frequency with which they will sync (which is customizable for the shared gateway)? |
@oravecz effectively, a schedule snapshot to s3 using the future snapshot API will work in a similar manner to s3 gateway. Recovery will work a bit differently, where if you loose all the cluster data (loose all instances with ephemeral drives), you will need to explicitly "call recover" on the new cluster to recover the data from s3. |
To be honest, we're not a big fan of the more EBS-centric way of running ElasticSearch. Please do consider that nearly every major downtime at AWS had to do something with EBS (often in conjunction with the loss of data). EBS is - in my opinion - one of the most flawed services (just google "aws downtimes ebs"). Which is also not AWS' fault, we have quite some large customers who invested Millions of $ in their "unbreakable" or "fully redundant" SAN and they ALL had downtimes from some hours to several days. So we're heavily relying on running all our elastic search stuff only on local instance storage and spread the copies to multiple nodes in multiple availability zones. The S3 gateway always seemed to be a big help in avoiding long reindexing times in times of catastrophic events. In my opinion, it'd be a good idea to have an easy out-of-the-box-solution for ppl who don't want to run ElasticSearch on a non-local, distributed filesystem. |
@kimchy - I know this MR is quite old, but was there ever any migration plan/documentation from the shared FS gateway to the local gateway without re-indexing? I haven't been able to find anything concrete. |
Shared gateways (shared FS storage or S3 for example) are problematic performance wise since they constantly need to snapshot the state of the index to a shared location, and then use that as the system of record. The local gateway on the other hand doesn't need it, and performs much better.
The main benefit of a shared gateway is the fact that the data is actually stored on another persistent location (i.e. using ephemeral disks on AWS, but still having the data on s3), but then its actually abusing the shared gateway design (to be used as a backup).
In the near future, we will have a proper snapshot(backup)/restore API, which will be the proper way to do backups, but relaying on the shared gateway for that is problematic. Note, backups can still be made by "rsync" the data location for each node "manually".
The text was updated successfully, but these errors were encountered: