Deploying updates to production with zero downtime #734

evgenyneu · 2014-12-15T23:44:23Z

Hi, thanks for very useful tool. How can I deploy updates to my rails app on production with zero downtime.

Currently I run the following on production, but it causes about 10 seconds of downtime.

sudo fig pull web
sudo fig up -d web

My production fig.yml:

db:
  image: postgres:9.3
  volumes_from:
    - db-data
  ports:
    - 5432
web:
  image: myaccount/my_private_repo
  command: bundle exec unicorn -p 3000 -c ./config/unicorn.rb
  volumes_from:
    - gems-2.1
  ports:
    - "80:3000"
  links:
    - db

Thanks

The text was updated successfully, but these errors were encountered:

gregtap · 2014-12-16T00:21:09Z

zero downtime can not be achieved while rebooting a container. You need to either load balance to an other container while updating this one or update the code in your container and run a kill -HUP on unicorn.

aanand · 2014-12-19T22:53:50Z

Fig doesn't provide a solution for zero-downtime restarts. We shouldn't rule it out, but it's a complicated task and needs a thorough discussion of what functionality would best serve the majority of production cases.

evgenyneu · 2015-01-04T22:44:03Z

Thank you, @coulix and @aanand. I assumed that FIG was suitable for production out of the box without extra work (which was silly of me, I admit). Do you think a paragraph or two about it can be useful in the readme? And maybe with links to tutorials and blog posts where people describe their production setups. Just to make it clear that FIG can not do zero-downtime restart, at least at the moment.

aanand · 2015-01-06T13:21:50Z

Since Fig is explicitly a development tool at the moment, I don't feel it's urgent that we address zero-downtime restarts in the docs. As Compose shapes up, it's definitely going to become a concern (see the roadmap), and at that point we'll start to talk about production use more officially.

baptistedonaux · 2015-01-08T10:40:01Z

@evgenyneu Great post ! +1

zacheryph · 2015-03-04T02:44:13Z

Couple separate questions:

If specifically for development, best recommendations for production?

This is possibly a stupid question. Yes I know there are a lot of options out there. Im pretty new to docker and still learning the ropes a bit. Though it'd be nice to be able to do development work and production deployments with as similar a configuration file as possible. Or like... common configuration file with overriding files for each environment.

I have a load-balancer for project setup, can i recycle web/workers one at a time?

This (at least for me) would solve the very simple zero-downtime using fig in production on a single machine. I noticed there is no way to fig kill web_1 and fig restart web does not rebuild the container even after fig build web to rebuild the image. Example of what I see happening below.

# as example: i have 3 web containers running
fig recycle web
# - fig build web - rebuild our image
# - tear down/kill web_1, start new web_1 fresh from new image
# - tear down/kill web_2, start new web_2 fresh from new image
# - tear down/kill web_3, start new web_3 fresh from new image

You could remove the auto fig build web and have the user do it manually before hand if you wanted, and make users do fig build web && fig recycle web. Also maybe adding a -p, --pause option to set a pause in between to give services time to restart if you are using rails or java or similar.

fullofcaffeine · 2015-04-01T05:15:15Z

+1, there doesn't seem to be a "best-practice" way of doing that, it looks as if there's not a lot of people using Docker with Rails yet, at least not for production purposes. The plethora of information out there is hard to digest. It'd be nice to have zero-downtime deployments out-of-the-box with Fig or Compose.

I guess a suitable way for simple setups is to keep the container running and restart, say, puma inside it (provided it's not being run in the foreground)? Capistrano could be used to orchestrate that, after updating a git repo on the host connected to the container via a volume. Any thoughts?

ndreckshage · 2015-07-02T12:15:56Z

@aanand any update on this by any chance? @fullofcaffeine solution seems to make sense, if using a process manager within a container. Would be great if there were a few recommended strategies.

neg3ntropy · 2015-11-11T18:27:30Z

We are moving our CI and Alpha environments to docker-compose.

Since we like to release there any commit from any team, we need to do dozens of updates in a single work day, most of them affecting only 1 service.

Restarting 1 service causes a partial, quick downtime, while restarting the whole thing takes about 5/10 minutes.

Since dependency information and update checking (via image pull) are all managed via docker-compose, it is only logical that some kind of selective restart of services with updated images is handled by compose itself.

Is there any way to implement it? (or a better issue to track?)

dnephin · 2015-11-12T22:59:50Z

To perform a real zero-downtime deployment you need a load balancer, and a tool to add/remove backends from the load balancer as nodes are stopped and started. If the load balancer is restarted as part of the deploy you won't have zero-downtime, so it has to be outside of the compose file context and part of the infrastructure.

Since compose doesn't manage any of that infrastructure for you it's not really possible to do a real zero-downtime deployment, without some other tooling.

I think for some cases (like dev and staging environments) what you're looking for is very-little-downtime deployments, which is something we can aim for with compose.

In 1.4.x we made "smart-recreate" the default. This means that a container is only recreated if it changes, or one of it's dependencies changes. In 1.5.0 we added experimental support for the new docker networks, which removes the need to recreate containers when only their dependencies have changed.

In 1.6.0 we should be making the new networking the default, and we can look at doing parallel restarts of all containers, which should make for relative short downtime.

With the current release I would expect only a few seconds of downtime to recreate containers. Can you tell me more about why it takes 5 to 10 minutes?

Some related issues: #1663, #1264, #1035

neg3ntropy · 2015-11-25T15:54:04Z

@dnephin I think compose is fast enough in starting everything from scratch.

The wasted time for us is in restarting ALL the applications inside the containers after just ONE was recently updated. Each app, takes about 10/30 seconds (with high CPU) to initialize.

What I would like to have is a selective restart/rebuild of only those containers that have changes (image, dependencies). This results in a short partial downtime, rather than a full, longer one.
Something like:
docker-compose pull # checks updates for all
docker-compose up -d --smart-restart # restarts all contaners that have newer images, changed dependencies.

dnephin · 2015-11-25T16:00:32Z

What I would like to have is a selective restart/rebuild of only those containers that have change

That logic already exists at the service level. If all the containers for a service have the latest image and config, it won't be restarted (unless one of the dependencies change). It will just say "... is up-to-date. As I mentioned if you use --x-networking it removes the need to recreate when the links change as well.

If you're looking for support at the container level, that was recently requested in #2451

alexw23 · 2015-12-26T23:34:28Z

I found using docker-compose pull first saves the reload time and brings it down to a minimal number of seconds (depending on how long your container takes to boot)

alexw23 · 2015-12-27T08:25:16Z

I just worked out another flow to have Zero downtime (mainly for web apps).

Proxy

We use jwilder/nginx-proxy to handle routing to app servers, this will assist us in dynamically routing requests to services.

First Deploy

For the first deploy run docker-compose --project-name=app-0001 up -d.

Rolling Update

We edit the docker-compose.yml with the new image id and run docker-compose --project-name=app-0002 up -d. We now have version 0.2 of the app up and running. The load balancer will already begin routing requests, and given we are using nginx LB we will have 0 downtime.

If you need to reach a desired scale you can now run that command to scale up resources before you shutdown the older version.

Now we can do docker-compose --project-name=app-0001 stop to close down the previous deploy. (Optionally we can run an rm to remove the data - but it might be a good idea to only remove this on the next deploy i.e. deploy app-0003 up, app-0002 stop, app-0001 rm).

Truly rolling update

If you have reasoning behind limiting the number of resources running at a given time you could simply stagger the scale. I.e:

app-0001 scale web=10
...
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=6 app-0002 scale web=4
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=10 app-0002 scale web=0
...
app-001 stop

Rollback

A rollback is also quite simple, run up on 0001 and stop on 0002.

80/20 deploy

This would be as simple as running scale web=2 on app-0001, and scale web=8 on app-0002.

Automate Everything

This also reminds me of a similar deploy strategy to capistrano, I might even look at using a similar tool to wrap docker-compose and save a rewrite of the deploy logic.

zacheryph · 2015-12-29T13:44:26Z

That looks like an awesome way of handling it @alexw23. I did not know about the --project-name param but that looks to cure this problem. Also simple enough to write a wrapper around for your own deployments

ccll · 2016-01-13T01:46:09Z

@alexw23 Awesome on the "--project-name"!

koliyo · 2016-01-27T17:23:52Z

@alexw23 Thanks, very informative!

flaccid · 2016-01-31T00:38:29Z

cough rancher

dnephin · 2016-02-03T18:08:16Z

Closing this as duplicate of #1786

gingerlime · 2016-08-03T07:15:05Z

Thanks for sharing the process, @alexw23. I have a couple of questions I hope you don't mind clarifying (apologies if I'm missing something obvious, I'm not too familiar with docker/compose yet):

Would this approach mean you need to have a docker-compose file just for the web components? (i.e. not including nginx-proxy or any data stores)? Otherwise you end up creating more copies of those other components when they are not required (for nginx-proxy in particular this might create a problem since it publishes port 80, doesn't it?). How do you keep things organized / linked in this case?
Do you know if/how this works if the container take time to warm-up? Would nginx-proxy start sending requests even if the container isn't ready to serve requests? (e.g. rails takes some time to load... but as far as the container is concerned, it's up and running).

If you are able to share some examples or scripts it would be great, and thanks again for sharing your solution!

zahaim · 2016-09-28T12:46:59Z

Hi guys,

In case someone still needs a rolling upgrade example I came with the following solution on my (rocketchat) application:

for i in `docker ps -f name=<container_name>_* -q` ; do
  docker-compose scale <service_name>=5
  docker stop $i
  sleep 10
  docker rm -f $i
done;```

steobrien · 2021-07-07T19:58:14Z

We faced this requirement recently, and came up with a simple solution requiring nothing more than:

docker-compose
nginx
a sprinkling of bash

Here's the writeup, hope this can help others: https://engineering.tines.com/blog/simple-zero-downtime-deploys

game-fuse · 2023-09-27T13:51:25Z

If you want zero downtime rolling updates without Kubernetes or anything extra, just docker. Check out my write up:

https://medium.com/@mitchmeyer1/zero-downtime-rolling-updates-with-docker-nginx-ruby-on-rails-f854a96310b2

It essentially uses bash scripts to spin up new app containers, switch which containers the web(nginx) container is pointing to, then remove the old containers after a delay.

dnephin added the kind/question label Dec 19, 2014

wernight mentioned this issue Jan 28, 2015

What's the difference between fig up -d and fig restart? #895

Closed

wernight mentioned this issue Feb 6, 2015

Standard website use case examples please #928

Closed

thaJeztah mentioned this issue Apr 7, 2015

Not recommended for production use. Why? #1264

Closed

dnephin closed this as completed Feb 3, 2016

WadeWaldron mentioned this issue Jun 15, 2016

docker-compose scale should allow for scaling down instances using filter criteria #3605

Closed

domino14 mentioned this issue Jul 20, 2016

Better docker deploy process domino14/Webolith#146

Closed

vbrown608 mentioned this issue Sep 21, 2016

Find a happy home for rake assets:precompile EFForg/action-center-platform#185

Closed

rahul286 mentioned this issue Nov 22, 2018

Zero downtime deploys EasyEngine/feature-requests#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying updates to production with zero downtime #734

Deploying updates to production with zero downtime #734

evgenyneu commented Dec 15, 2014

gregtap commented Dec 16, 2014

aanand commented Dec 19, 2014

evgenyneu commented Jan 4, 2015

aanand commented Jan 6, 2015

baptistedonaux commented Jan 8, 2015

zacheryph commented Mar 4, 2015

fullofcaffeine commented Apr 1, 2015

ndreckshage commented Jul 2, 2015

neg3ntropy commented Nov 11, 2015

dnephin commented Nov 12, 2015

neg3ntropy commented Nov 25, 2015

dnephin commented Nov 25, 2015

alexw23 commented Dec 26, 2015

alexw23 commented Dec 27, 2015

zacheryph commented Dec 29, 2015

ccll commented Jan 13, 2016

koliyo commented Jan 27, 2016

flaccid commented Jan 31, 2016

dnephin commented Feb 3, 2016

gingerlime commented Aug 3, 2016

zahaim commented Sep 28, 2016

steobrien commented Jul 7, 2021

game-fuse commented Sep 27, 2023

Deploying updates to production with zero downtime #734

Deploying updates to production with zero downtime #734

Comments

evgenyneu commented Dec 15, 2014

gregtap commented Dec 16, 2014

aanand commented Dec 19, 2014

evgenyneu commented Jan 4, 2015

aanand commented Jan 6, 2015

baptistedonaux commented Jan 8, 2015

zacheryph commented Mar 4, 2015

fullofcaffeine commented Apr 1, 2015

ndreckshage commented Jul 2, 2015

neg3ntropy commented Nov 11, 2015

dnephin commented Nov 12, 2015

neg3ntropy commented Nov 25, 2015

dnephin commented Nov 25, 2015

alexw23 commented Dec 26, 2015

alexw23 commented Dec 27, 2015

Proxy

First Deploy

Rolling Update

Truly rolling update

Rollback

80/20 deploy

Automate Everything

zacheryph commented Dec 29, 2015

ccll commented Jan 13, 2016

koliyo commented Jan 27, 2016

flaccid commented Jan 31, 2016

dnephin commented Feb 3, 2016

gingerlime commented Aug 3, 2016

zahaim commented Sep 28, 2016

steobrien commented Jul 7, 2021

game-fuse commented Sep 27, 2023