Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying updates to production with zero downtime #734

Closed
evgenyneu opened this issue Dec 15, 2014 · 23 comments
Closed

Deploying updates to production with zero downtime #734

evgenyneu opened this issue Dec 15, 2014 · 23 comments

Comments

@evgenyneu
Copy link

Hi, thanks for very useful tool. How can I deploy updates to my rails app on production with zero downtime.

Currently I run the following on production, but it causes about 10 seconds of downtime.

sudo fig pull web
sudo fig up -d web

My production fig.yml:

db:
  image: postgres:9.3
  volumes_from:
    - db-data
  ports:
    - 5432
web:
  image: myaccount/my_private_repo
  command: bundle exec unicorn -p 3000 -c ./config/unicorn.rb
  volumes_from:
    - gems-2.1
  ports:
    - "80:3000"
  links:
    - db

Thanks

@gregtap
Copy link

gregtap commented Dec 16, 2014

zero downtime can not be achieved while rebooting a container. You need to either load balance to an other container while updating this one or update the code in your container and run a kill -HUP on unicorn.

@aanand
Copy link

aanand commented Dec 19, 2014

Fig doesn't provide a solution for zero-downtime restarts. We shouldn't rule it out, but it's a complicated task and needs a thorough discussion of what functionality would best serve the majority of production cases.

@evgenyneu
Copy link
Author

Thank you, @coulix and @aanand. I assumed that FIG was suitable for production out of the box without extra work (which was silly of me, I admit). Do you think a paragraph or two about it can be useful in the readme? And maybe with links to tutorials and blog posts where people describe their production setups. Just to make it clear that FIG can not do zero-downtime restart, at least at the moment.

@aanand
Copy link

aanand commented Jan 6, 2015

Since Fig is explicitly a development tool at the moment, I don't feel it's urgent that we address zero-downtime restarts in the docs. As Compose shapes up, it's definitely going to become a concern (see the roadmap), and at that point we'll start to talk about production use more officially.

@baptistedonaux
Copy link

@evgenyneu Great post ! +1

@zacheryph
Copy link

Couple separate questions:

  • If specifically for development, best recommendations for production?

This is possibly a stupid question. Yes I know there are a lot of options out there. Im pretty new to docker and still learning the ropes a bit. Though it'd be nice to be able to do development work and production deployments with as similar a configuration file as possible. Or like... common configuration file with overriding files for each environment.

  • I have a load-balancer for project setup, can i recycle web/workers one at a time?

This (at least for me) would solve the very simple zero-downtime using fig in production on a single machine. I noticed there is no way to fig kill web_1 and fig restart web does not rebuild the container even after fig build web to rebuild the image. Example of what I see happening below.

# as example: i have 3 web containers running
fig recycle web
# - fig build web - rebuild our image
# - tear down/kill web_1, start new web_1 fresh from new image
# - tear down/kill web_2, start new web_2 fresh from new image
# - tear down/kill web_3, start new web_3 fresh from new image

You could remove the auto fig build web and have the user do it manually before hand if you wanted, and make users do fig build web && fig recycle web. Also maybe adding a -p, --pause option to set a pause in between to give services time to restart if you are using rails or java or similar.

@fullofcaffeine
Copy link

+1, there doesn't seem to be a "best-practice" way of doing that, it looks as if there's not a lot of people using Docker with Rails yet, at least not for production purposes. The plethora of information out there is hard to digest. It'd be nice to have zero-downtime deployments out-of-the-box with Fig or Compose.

I guess a suitable way for simple setups is to keep the container running and restart, say, puma inside it (provided it's not being run in the foreground)? Capistrano could be used to orchestrate that, after updating a git repo on the host connected to the container via a volume. Any thoughts?

@ndreckshage
Copy link

@aanand any update on this by any chance? @fullofcaffeine solution seems to make sense, if using a process manager within a container. Would be great if there were a few recommended strategies.

@neg3ntropy
Copy link

We are moving our CI and Alpha environments to docker-compose.

Since we like to release there any commit from any team, we need to do dozens of updates in a single work day, most of them affecting only 1 service.

Restarting 1 service causes a partial, quick downtime, while restarting the whole thing takes about 5/10 minutes.

Since dependency information and update checking (via image pull) are all managed via docker-compose, it is only logical that some kind of selective restart of services with updated images is handled by compose itself.

Is there any way to implement it? (or a better issue to track?)

@dnephin
Copy link

dnephin commented Nov 12, 2015

To perform a real zero-downtime deployment you need a load balancer, and a tool to add/remove backends from the load balancer as nodes are stopped and started. If the load balancer is restarted as part of the deploy you won't have zero-downtime, so it has to be outside of the compose file context and part of the infrastructure.

Since compose doesn't manage any of that infrastructure for you it's not really possible to do a real zero-downtime deployment, without some other tooling.

I think for some cases (like dev and staging environments) what you're looking for is very-little-downtime deployments, which is something we can aim for with compose.

In 1.4.x we made "smart-recreate" the default. This means that a container is only recreated if it changes, or one of it's dependencies changes. In 1.5.0 we added experimental support for the new docker networks, which removes the need to recreate containers when only their dependencies have changed.

In 1.6.0 we should be making the new networking the default, and we can look at doing parallel restarts of all containers, which should make for relative short downtime.

With the current release I would expect only a few seconds of downtime to recreate containers. Can you tell me more about why it takes 5 to 10 minutes?

Some related issues: #1663, #1264, #1035

@neg3ntropy
Copy link

@dnephin I think compose is fast enough in starting everything from scratch.

The wasted time for us is in restarting ALL the applications inside the containers after just ONE was recently updated. Each app, takes about 10/30 seconds (with high CPU) to initialize.

What I would like to have is a selective restart/rebuild of only those containers that have changes (image, dependencies). This results in a short partial downtime, rather than a full, longer one.
Something like:
docker-compose pull # checks updates for all
docker-compose up -d --smart-restart # restarts all contaners that have newer images, changed dependencies.

@dnephin
Copy link

dnephin commented Nov 25, 2015

What I would like to have is a selective restart/rebuild of only those containers that have change

That logic already exists at the service level. If all the containers for a service have the latest image and config, it won't be restarted (unless one of the dependencies change). It will just say "... is up-to-date. As I mentioned if you use --x-networking it removes the need to recreate when the links change as well.

If you're looking for support at the container level, that was recently requested in #2451

@alexw23
Copy link

alexw23 commented Dec 26, 2015

I found using docker-compose pull first saves the reload time and brings it down to a minimal number of seconds (depending on how long your container takes to boot)

@alexw23
Copy link

alexw23 commented Dec 27, 2015

I just worked out another flow to have Zero downtime (mainly for web apps).

Proxy

We use jwilder/nginx-proxy to handle routing to app servers, this will assist us in dynamically routing requests to services.

First Deploy

For the first deploy run docker-compose --project-name=app-0001 up -d.

Rolling Update

We edit the docker-compose.yml with the new image id and run docker-compose --project-name=app-0002 up -d. We now have version 0.2 of the app up and running. The load balancer will already begin routing requests, and given we are using nginx LB we will have 0 downtime.

If you need to reach a desired scale you can now run that command to scale up resources before you shutdown the older version.

Now we can do docker-compose --project-name=app-0001 stop to close down the previous deploy. (Optionally we can run an rm to remove the data - but it might be a good idea to only remove this on the next deploy i.e. deploy app-0003 up, app-0002 stop, app-0001 rm).

Truly rolling update

If you have reasoning behind limiting the number of resources running at a given time you could simply stagger the scale. I.e:

app-0001 scale web=10
...
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=6 app-0002 scale web=4
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=10 app-0002 scale web=0
...
app-001 stop
Rollback

A rollback is also quite simple, run up on 0001 and stop on 0002.

80/20 deploy

This would be as simple as running scale web=2 on app-0001, and scale web=8 on app-0002.

Automate Everything

This also reminds me of a similar deploy strategy to capistrano, I might even look at using a similar tool to wrap docker-compose and save a rewrite of the deploy logic.

@zacheryph
Copy link

That looks like an awesome way of handling it @alexw23. I did not know about the --project-name param but that looks to cure this problem. Also simple enough to write a wrapper around for your own deployments

@ccll
Copy link

ccll commented Jan 13, 2016

@alexw23 Awesome on the "--project-name"!

@koliyo
Copy link

koliyo commented Jan 27, 2016

@alexw23 Thanks, very informative!

@flaccid
Copy link

flaccid commented Jan 31, 2016

cough rancher

@dnephin
Copy link

dnephin commented Feb 3, 2016

Closing this as duplicate of #1786

@gingerlime
Copy link

Thanks for sharing the process, @alexw23. I have a couple of questions I hope you don't mind clarifying (apologies if I'm missing something obvious, I'm not too familiar with docker/compose yet):

  • Would this approach mean you need to have a docker-compose file just for the web components? (i.e. not including nginx-proxy or any data stores)? Otherwise you end up creating more copies of those other components when they are not required (for nginx-proxy in particular this might create a problem since it publishes port 80, doesn't it?). How do you keep things organized / linked in this case?
  • Do you know if/how this works if the container take time to warm-up? Would nginx-proxy start sending requests even if the container isn't ready to serve requests? (e.g. rails takes some time to load... but as far as the container is concerned, it's up and running).

If you are able to share some examples or scripts it would be great, and thanks again for sharing your solution!

@zahaim
Copy link

zahaim commented Sep 28, 2016

Hi guys,

In case someone still needs a rolling upgrade example I came with the following solution on my (rocketchat) application:

for i in `docker ps -f name=<container_name>_* -q` ; do
  docker-compose scale <service_name>=5
  docker stop $i
  sleep 10
  docker rm -f $i
done;```

@steobrien
Copy link

We faced this requirement recently, and came up with a simple solution requiring nothing more than:

  • docker-compose
  • nginx
  • a sprinkling of bash

Here's the writeup, hope this can help others: https://engineering.tines.com/blog/simple-zero-downtime-deploys

@game-fuse
Copy link

If you want zero downtime rolling updates without Kubernetes or anything extra, just docker. Check out my write up:

https://medium.com/@mitchmeyer1/zero-downtime-rolling-updates-with-docker-nginx-ruby-on-rails-f854a96310b2

It essentially uses bash scripts to spin up new app containers, switch which containers the web(nginx) container is pointing to, then remove the old containers after a delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests