Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API feature for ES update #2659

Closed
gakhov opened this issue Feb 17, 2013 · 7 comments
Closed

API feature for ES update #2659

gakhov opened this issue Feb 17, 2013 · 7 comments

Comments

@gakhov
Copy link
Contributor

gakhov commented Feb 17, 2013

As we know, to update elasticsearch we need to follow some steps aka flush index, shutdown node etc.

This works fine if we have only a few indices. But imagine we have a weekly-based (or even daily-based) indices ... this process requires to create some custom scripts to iterate through indices and prepare them (e.g. flush) ... sounds not reasonable and dangerous

It would be nice to have some API from elasticsearch which on call can prepare whole my cluster for update procedure. Besides that, in the future to this API (as a step of preparation) could be added an option to backup that cluster before (as soon as Backup API would be you released).

What do you think?

@s1monw
Copy link
Contributor

s1monw commented Feb 18, 2013

hey @gakhov I wonder what else than _flush should be called here. You can call flush without specifying an index and flush all indices in the cluster (http://www.elasticsearch.org/guide/reference/api/admin-indices-flush.html) using

$ curl -XPOST 'http://localhost:9200/_flush'

are you thinking of anything else?

@gakhov
Copy link
Contributor Author

gakhov commented Feb 19, 2013

hi! @s1monw I don't really sure ... just wanted to have upgrade process more simple, since i'm a bit lazy .. lol

Currently, to upgrade ES i need to

  • prevent writing
  • flush indices
  • iterate through all nodes and do backup/stop/update/start (some downtime is acceptable for us, so we don't do the trick with renaming a cluster to keep frontend application running).

Of course, ES can't help me with the actual upgrading procedure, but I thought it can manage the preparation process.

Right now we update our ES nodes with deployment script (fabric and puppet), so we need just to be sure that ES cluster (and indices) is ready for that from the data state point of view.

In our case, we can't stop the indexation process, but if it receives write exception, the failed items will be rescheduled. So, we need just to do index.blocks.read_only before starting the actual upgrade. We don't use automatical shard allocation, so no worries about ES starting reallocate shards when nodes will go off for upgrade.

Actually, i also thinking about the backup procedure (e.g. with rsync), which requires data copy. All these two require some common steps: at least flush and read_only. And thought it would be very useful to have some standard way to do that, e.g. shortcuts from ES API.

So, it might be read_only + flush as backup preparation procedure and backup+(optionally)shutdown as upgrade prepration procedure. In some cases, optimize also (e.g. to 1 segment)

Does it make sense for you?

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2013

hey @gakhov

well laziness is good otherwise there wouldn't be any automation I guess, I'm not really sure if I get this right so lets just clarify here if we are missing something API wise. At the current stage if you are doing a code upgrade on ElasticSearch you don't need to necessarily prevent writing to ElasticSearch while you upgrade. You can do this entire upgrade procedure without downtime etc. There are a couple of things you might want to apply before doing this:

  • Flushing all your indices might be a good idea
  • Prevent shards from being relocated or allocated on other nodes while nodes are going down during the upgrade
  • now you can start upgrading nodes. It might make sense at this point to move all allocated shards away from the machine you upgrade using
curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
    }
}' 
  • now you can simply shutting down that node and move over to a new version & bring it up again
  • continue doing this for the rest of the cluster.

you might also want to look into this gist from clinton https://gist.github.com/clintongormley/3888120 that shows a technique that can be used to upgrade.

In general I recommend to practice this on a staging system with more than 2 machines etc. Another thing I would recommend is to take your time doing this, don't rush, wait after your cluster stabilizes after upgrading a node.

I can see the problem with backing up stuff but then you can simply run 2 commands:

# make all indices read only
curl -XPUT 'localhost:9200/_settings' -d '
{
  "index.blocks.write" : true
}

# flush data to disk and clean transaction logs
curl -XPOST 'http://localhost:9200/_flush'

# optimize to a single segment -- yet I would not recommend this!!
curl -XPOST 'http://localhost:9200/_optimize'

I think this would still work find in a client scrip so we don't need a dedicated API, right? I we have more sophisticated backup / restore apis this might change.

@gakhov
Copy link
Contributor Author

gakhov commented Feb 19, 2013

Thank you, @s1monw
I agree all that could be done with existing API. I just notices that the algorithm contained some steps which need to be done every time, so could be wrapped in a single call.

Concerning optimize, i remember that "Simon says: optimize is bad for you", but still at some point could be useful for our time-sliced indices.

I think, we can close this ticket for now. Probably, this should be skipped until some more use cases come (e.g. after ES will have nice backup API).

Anyway, thank you for the help and patience :)

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2013

@gakhov I agree if you have time sliced indices which don't change anymore at some point it might totally make sense to call optimize during a quite period. Yeah I think there are always use-cases but in general if you have a incrementally changing index optimize is not needed IMO.

I will close this for now! Hope that helped you to get more clarity along the lines.

simon

@s1monw s1monw closed this as completed Feb 19, 2013
@synhershko
Copy link
Contributor

@s1monw this is a good procedure for standard-case upgrades, but this still doesn't account for upgrading to an incompatible version (0.20 to 0.21 for example), where nodes will fail to communicate with one another, and clients compiled against a previous version of ES.jar won't be able to access the cluster.

Does the only option we have when upgrading to a new major version is creating a new cluster? If so, could it become an easier upgrade path in the future?

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2013

Does the only option we have when upgrading to a new major version is creating a new cluster? If so, could it become an easier upgrade path in the future?

yeah I think at this point you have to gracefully bring your cluster over to a new clustername and once you have the majority of the nodes migrated you bring you clients over and continue with the rest of the machines. I don't see how we can get around this with major API incompatibilities at this point but there is lots of room for improvement here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants