Document best practices for how to manage releases in production #1786

bfirsh · 2015-07-29T15:52:05Z

We currently have some lightweight documentation about how to use Compose in production, but this could do with improvements:

How to manage releases, particularly when deploying through the Docker Hub (e.g. in development build images, in production use image from Hub)
A step-by-step guide to deploying a Compose app from dev through to production
Some examples of how to deploy apps using init scripts
Some examples of how to deploy apps using Swarm
How to deploy apps on a single server (particularly useful for internal tools, etc)

Resources

Examples of production Compose files

funkyfuture · 2015-08-04T22:14:24Z

init scripts

and service-configs for systemd, that thing will take over. i thought of a command to generate these configs.

dnephin · 2015-08-31T19:27:08Z

There's some previous discussion and requests in #93, #1264, #1730, #1035. I've closed those tickets in favour of this one, but linking to them for reference.

neg3ntropy · 2016-02-04T08:17:55Z

I would like some recommendations on how to deploy config files through docker/compose/swarm.
We have a setup (that was recommended by consultants) that makes images with config files and declares volumes to export them. This looks good in principle, but it does not work as expected in a number of cases.

dnephin · 2016-02-04T18:13:49Z

Putting the configs into a "config" image that just exposes a volume seems like a reasonable way to do it. I'd be great to hear more about the cases where it doesn't work, either here or in a new issue.

neg3ntropy · 2016-02-05T04:52:10Z

@dnephin If you go down the road of one single configuration container and use --volumes-from you sacrifice some security (every container sees all configs) but it looks easy to set up and nice: it's immutable and does not use any host fs path.

Once you operate outside localhost and do not recreate everything at each run, you start learning the subtleties of the --volumes-from and compose recreation policies: the config image is updated and restarted, but client container still mount their current volumes unless they are recreated as well for independent reasons. This took a while to be noticed and left us with a workaround of deleting the old config container whenever the config changes.

Another solution would seem to be avoiding immutability and change data inside the same volumes, running a cp or similar. At this point it would just be easier to pull the configs from a git repo and skip the image build altogether... which was the stateful solution I originally had in mind. If you want no fixed host path, you need a config data-only container and a config copier.

I am not 100% happy with any of the solutions. Either I am not seeing a better way or maybe some feature is still missing (like some crazy image multiple inheritance or some smarter detection of dependencies when using --volumes-from that I can't figure out).
What this need is essentially is to add a build step, an environment layer, without really building a new image.

prcorcoran · 2016-02-28T22:00:23Z

I have a suggested solution for supporting the zero-downtime deployment a lot of us want.

Why not simply add a new option to docker-compose.yml like "zero_downtime:" that would work as follows:

web:
image: sbgc (rails)
restart: always
links:
- postgres
- proxy
- cache
zero_downtime: 50 (delay 50 milliseconds before stopping old container. default would be 0)

I run separate containers for nginx, web(rails), postgres and cache(memcached). However, it's the application code in the web container that changes and the only one I need zero downtime on.

$ docker-compose up -d web

During "up" processing that creates the new "web" container, if the zero_downtime option is specified, start up the new container first exactly like scale web=2 would. Then stop and remove sbgc_web_1 like it currently does. Then rename sbgc_web_2 to sbgc_web_1. If a delay was specified (as in the 50 milliseconds example above) it would delay 50 milliseconds to give the new container time to come up before stopping the old one.

If there were 10 web containers already running it would start from the end and work backwards.

This is how I do zero downtime deploys today. Clunky but works: [updated]
$ docker-compose scale web=2 (start new container running as sbgc_web_2)
$ docker stop sbgc_web_1 (stop old container)
$ docker rm sbgc_web_1 (remove old container)

Update: we need a way to rename the sbgc_web_2 container to sbgc_web_1. Thought we could just use 'docker rename sbgc_web_2 sbgc_web_1' which works but then running 'docker-compose scale web=2' will produce sbgc_web_3 instead of sbgc_web_2 as expected.

davibe · 2016-03-09T18:00:46Z

What happens to links if you do that ? I guess you need a load balancer container linked to the ones you launch and remove and you can't restart it (?)

prcorcoran · 2016-03-11T14:21:12Z

The links between containers are fine in the scenario above. Adding a load balancer in front would work but seems like overkill if we just need to replace a running web container with a new version. I can accomplish that manually by scaling up and stopping the old container but it leaves the new container numbered at 2. If the internals of docker-compose were changed to accommodate starting the new one first, stopping the old one and renumbering the new one I think this would be a pretty good solution.

davibe · 2016-03-12T09:31:09Z

In a real use case you want to wait for the second (newer) service to be ready before considering it healthy. This may include connecting to dbs, performing stuff. It's very application specific. Then you want to wait for connection draining on the older copy before closing it. Again connection draining and timeouts is application specific too. It could be a bit overkill to add support for all of that to docker-compose.

prcorcoran · 2016-03-12T15:18:08Z

Right, the 2nd container would need time to start up which could take some time depending on the application. That is why I proposed adding a delay:
":zero_downtime: 50 (delay 50 milliseconds before stopping old container. default would be 0)
As far as stopping the original goes it wouldn't be any different than what docker-compose stop does currently.

Basically my proposal is just to start the new container first, give it time to come up if needed and then stop and remove the old container. This can be accomplished manually with the docker command line today. The only remaining piece would be to rename the new container. Also possible to do manually today except that docker compose doesn't change the internal number of the container.

vincetse · 2016-03-29T01:05:39Z

Hey folks, I was facing the need for a zero-downtime deployment for a web service today, and tried to take the scaling approach which didn't work well for me before I realized I could do it by extending my app into 2 identical services (named service_a and service_b in my sample repo) and restarting them one at a time. Hope some of you will find this pattern useful.

https://github.com/vincetse/docker-compose-zero-downtime-deployment

davibe · 2016-03-29T12:51:30Z

It does not work for me. I have added an issue on your repo.

iantanwx · 2016-04-25T10:09:24Z

I just came across this ticket while deciding on whether or not to use compose with flocker and docker swarm, or whether to use ECS for scaling/deployment jobs, using the docker cli only for certain ad-hoc cluster management tasks.

I've decided to go with compose to keep things native. I'm not fond of the AWS API, and I think most developers, like me, would rather not mess about with ridiculously nested JSON objects and so on.

I then came across DevOps Toolkit by Viktor Farcic, and he uses a pretty elegant solution to implement blue-green deployments with compose and Jenkins (if you guys use Jenkins). It's pretty effective having tested it in staging. Otherwise it would seem @vincetse has a pretty good solution that doesn't involve much complexity.

sebglon · 2016-05-17T10:09:14Z

a very good implementation of the rolling-upgrade already exists on Rancher
http://docs.rancher.com/rancher/latest/en/rancher-compose/upgrading/

zh99998 · 2016-07-21T12:44:07Z

as now docker swarm will be native, no need haproxy/nginx for load-balancing, and native health check arguments. is there any more simplified solution?

oelmekki · 2017-04-16T14:13:15Z

Update: we need a way to rename the sbgc_web_2 container to sbgc_web_1. Thought we could just use 'docker rename sbgc_web_2 sbgc_web_1' which works but then running 'docker-compose scale web=2' will produce sbgc_web_3 instead of sbgc_web_2 as expected.

If anyone wonders why, that's because of a label that docker-compose adds on container:

           "Labels": {
                // ...
                "com.docker.compose.container-number": "3",
                // ...
            }

Sadly, it's not yet possible to update labels on running containers.

(also, to save people a bit of time: trying to break docker-compose by using its labels: section to force the value of that label does not work :P )

oelmekki · 2017-04-17T08:53:56Z

Ok, I managed to automate zero downtime deploy, thanks @prcorcoran for the guidelines.

I'll give here a more detailed way about how to perform it, when using nginx.

scale your service up, using eg docker-compose scale web=2
wait for the container to be ready, either with a blind timeout or ping a specific url on container
update nginx upstream list for your domain to only list new container IP
perform a nginx configuration reload (eg sudo nginx -s reload, do not do a restart, or it will close active connections)
wait for old container to be done with its running requests (a timeout is fine)
stop the previous container using docker, not docker-compose
remove the previous container using docker, not docker-compose (or not, if you want to be able to rollback)
scale the service down, using eg docker-compose scale web=1

useful commands

To find container ids after scaling up, I use:

docker-compose ps -q <service>

This can be used to find new container IP and to stop and remove old container.

~~The first id is the old container, the second id is the new one~~ The order is not guaranteed, containers has to be inspected to know which one is the oldest.

To find container creation date:

docker inspect -f '{{.Created}}' <container id>

To find new container IP:

docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' <container id>

a few more considerations

As mentioned in previous comments, the number in the container name will keep incrementing. It will be eg app_web_1, then app_web_2, then app_web_3, etc. I didn't find that to be a problem (if there's ever a hard limit in this number, a cold restart of the app reset it). I didn't have either to rename containers manually to keep the newest container up, we just have to manually stop the old container.

You can't specify port mapping in your docker-compose file, because then you can't have two containers running at the same time (they would try to bind to the same port). Instead, you need to specify the port in nginx upstream configuration, which means you have to decide about it outside of docker-compose configuration.

The described method works when you only want a single container per service. That being said, it shouldn't be too hard to just have a look at how many containers are running, scaling to the double of that number, then stop/rm that number of old containers.

Obviously, the more services you have to rotate, the more complicated it gets.

jonesnc · 2017-08-22T15:44:21Z

@oelmekki The scale command has been deprecated. The recommend way to scale is now:

docker-compose up --scale web=2

jonesnc · 2017-08-22T15:49:41Z

@oelmekki also, if the web container has port bindings to the host, won't running scale create a port conflict?

Bind for 0.0.0.0:9010 failed: port is already allocated is the message I get for a container that has the following ports:

ports:
      - 9000:9000
      - 9010:9010

If you have a setup that utilizes nginx, for instance, this probably won't be an issue since the service you're scaling is not the service that has port bindings to the host.

oelmekki · 2017-08-23T06:58:07Z

@jonesnc

also, if the web container has port bindings to the host, won't running scale create a port conflict?

That's why I explicitly mention not to do it :)

In my previous comment:

You can't specify port mapping in your docker-compose file, because then you can't have two containers running at the same time (they would try to bind to the same port). Instead, you need to specify the port in nginx upstream configuration, which means you have to decide about it outside of docker-compose configuration.

--

You can't bind those ports on host, but you can bind those ports on containers, which have each their own IP. So the job is to find the IP of the new container and replace the old container IP with it in nginx upstream configuration. If you don't mind reading golang code, you can see an implementation example here.

jonesnc · 2017-08-23T15:08:00Z

@oelmekki oops! That part of your post didn't register in my brain, I guess.

stale · 2019-10-10T00:55:54Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-10-13T02:01:29Z

This issue has been automatically marked as not stale anymore due to the recent activity.

ndeloof · 2019-10-14T09:24:25Z

How to manage releases, particularly when deploying through the Docker Hub (e.g. in development build images, in production use image from Hub)

This is one of the main focus of https://github.com/docker/app, and considering how long this issue has been opened without any concrete answer, I think it's better to just close it.

string-areeb · 2019-10-14T09:29:00Z

@ndeloof That may be the main focus of docker/app, but there is still no way to do zero downtime or rolling updates, so I think only 1 portion of this bug has been solved by app

In fact, the above listed methods don't work now. If the image is updated and docker-compose scale web=2 is run, then both containers are recreated, instead of 1 new container of the new image

ndeloof · 2019-10-14T09:31:45Z

You're perfectly right, but the purpose of this issue has been to document those deployment practices, and obviously there's no standard way to achieve this, especially considering the various platforms compose can be used for (single engine, swarm, kubernetes, ecs ...)

augnustin · 2020-01-29T15:00:19Z

Thanks @oelmekki for your insights. It has been very useful and encouraging when there is so few info on rolling updates with docker-compose.

I ended up writing the following script docker_update.sh <service_name>, which seems to work very decently. It relies on healthcheck command, which is not mandatory (change -f "health=healthy" accordingly) but cleaner IMHO than waiting for container to simply being up, when it takes a little time to boot (which will be the case if you run eg. npm install && npm start as a command).

#!/bin/bash

cd "$(dirname "$0")/.."

SERVICE_NAME=${1?"Usage: docker_update <SERVICE_NAME>"}

echo "[INIT] Updating docker service $SERVICE_NAME"

OLD_CONTAINER_ID=$(docker ps --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $1}')
OLD_CONTAINER_NAME=$(docker ps --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $2}')

echo "[INIT] Scaling $SERVICE_NAME up"
docker-compose up -d --no-deps --scale $SERVICE_NAME=2 --no-recreate $SERVICE_NAME

NEW_CONTAINER_ID=$(docker ps --filter="since=$OLD_CONTAINER_NAME" --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $1}')
NEW_CONTAINER_NAME=$(docker ps --filter="since=$OLD_CONTAINER_NAME" --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $2}')

until [[ $(docker ps -a -f "id=$NEW_CONTAINER_ID" -f "health=healthy" -q) ]]; do
  echo -ne "\r[WAIT] New instance $NEW_CONTAINER_NAME is not healthy yet ...";
  sleep 1
done
echo ""
echo "[DONE] $NEW_CONTAINER_NAME is ready!"

echo "[DONE] Restarting nginx..."
docker-compose restart nginx

echo -n "[INIT] Killing $OLD_CONTAINER_NAME: "
docker stop $OLD_CONTAINER_ID
until [[ $(docker ps -a -f "id=$OLD_CONTAINER_ID" -f "status=exited" -q) ]]; do
  echo -ne "\r[WAIT] $OLD_CONTAINER_NAME is getting killed ..."
  sleep 1
done
echo ""
echo "[DONE] $OLD_CONTAINER_NAME was stopped."

echo -n "[DONE] Removing $OLD_CONTAINER_NAME: "
docker rm -f $OLD_CONTAINER_ID
echo "[DONE] Scaling down"
docker-compose up -d --no-deps --scale $SERVICE_NAME=1 --no-recreate $SERVICE_NAME

And here's my docker-compose.yml:

  app:
    build: .
    command: /app/server.sh
    healthcheck:
      test: curl -sS http://127.0.0.1:4000 || exit 1
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 30s
    volumes:
      - ..:/app
    working_dir: /app
  nginx:
    depends_on:
      - app
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - "../nginx:/etc/nginx/conf.d"
      - "/var/log/nginx:/var/log/nginx"

And my nginx.conf:

upstream project_app {
  server app:4000;
}

server {
  listen 80;
  server_name example.com;

  location / {
    proxy_pass http://project_app;
  }
}

Hope it can be useful to some. I wish it would be integrated into docker-compose by default.

iamareebjamal · 2020-01-29T15:36:47Z

Thanks a lot, but since you have bound app port to static 4000, it means only 1 container, new or old can bind to it, hence new and old containers can't run simultaneously. I need to test it once since I may be wrong

decentral1se · 2021-06-06T13:17:41Z

How to manage releases, particularly when deploying through the Docker Hub (e.g. in development build images, in production use image from Hub)

This is one of the main focus of https://github.com/docker/app, and considering how long this issue has been opened without any concrete answer, I think it's better to just close it.

App is gone docker/roadmap#209. Still wondering if we'll ever have zero downtime deploys in compose? Especially since the future of swarm is very unclear still docker/roadmap#175.

nilansaha · 2022-08-22T03:45:08Z

would

When the old container is killed, there is a considerable downtime before nginx is reloaded again. Not sure if this have changed now but trying many times, always the case. Unreachable when the old container is killed and gets back up after a while.

bfirsh added kind/docs backlog labels Jul 29, 2015

amylindburg added this to the 1.5.0 milestone Aug 12, 2015

This was referenced Aug 31, 2015

Examples of production deployment #93

Closed

Not recommended for production use. Why? #1264

Closed

Enhance production documentation #1730

Closed

Do you have any documentation about downtime during fig up? #1035

Closed

bfirsh removed the backlog label Sep 2, 2015

bfirsh mentioned this issue Sep 5, 2015

Epic: Define an application once in a way that can be used in dev, test and prod #1784

Closed

5 tasks

dnephin mentioned this issue Oct 29, 2015

Docs for using multiple compose files #2290

Merged

aanand modified the milestone: 1.5.0 Nov 3, 2015

This was referenced Feb 3, 2016

Deploying updates to production with zero downtime #734

Closed

Supporting graceful upgrades and zero-downtime dpeloys #1663

Closed

[Feature request] deployment/releases functionality #1174

Closed

stas mentioned this issue Feb 8, 2017

requirements.txt doesn't pin all dependencies alephdata/aleph#130

Closed

shin- mentioned this issue Aug 22, 2017

Sequential restart when using docker-compose up with scale #5124

Closed

lindydonna mentioned this issue Nov 13, 2017

Enable blue/green deployments pulumi/pulumi#554

Closed

stale bot added the stale label Oct 10, 2019

stale bot removed the stale label Oct 13, 2019

ndeloof closed this as completed Oct 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document best practices for how to manage releases in production #1786

Document best practices for how to manage releases in production #1786

bfirsh commented Jul 29, 2015

funkyfuture commented Aug 4, 2015

dnephin commented Aug 31, 2015

neg3ntropy commented Feb 4, 2016

dnephin commented Feb 4, 2016

neg3ntropy commented Feb 5, 2016

prcorcoran commented Feb 28, 2016

davibe commented Mar 9, 2016

prcorcoran commented Mar 11, 2016

davibe commented Mar 12, 2016

prcorcoran commented Mar 12, 2016

vincetse commented Mar 29, 2016

davibe commented Mar 29, 2016

iantanwx commented Apr 25, 2016

sebglon commented May 17, 2016

zh99998 commented Jul 21, 2016

oelmekki commented Apr 16, 2017 •

edited

Loading

oelmekki commented Apr 17, 2017 •

edited

Loading

jonesnc commented Aug 22, 2017

jonesnc commented Aug 22, 2017 •

edited

Loading

oelmekki commented Aug 23, 2017

jonesnc commented Aug 23, 2017

stale bot commented Oct 10, 2019

stale bot commented Oct 13, 2019

ndeloof commented Oct 14, 2019

string-areeb commented Oct 14, 2019 •

edited

Loading

ndeloof commented Oct 14, 2019

augnustin commented Jan 29, 2020 •

edited

Loading

iamareebjamal commented Jan 29, 2020

decentral1se commented Jun 6, 2021

nilansaha commented Aug 22, 2022

Document best practices for how to manage releases in production #1786

Document best practices for how to manage releases in production #1786

Comments

bfirsh commented Jul 29, 2015

funkyfuture commented Aug 4, 2015

dnephin commented Aug 31, 2015

neg3ntropy commented Feb 4, 2016

dnephin commented Feb 4, 2016

neg3ntropy commented Feb 5, 2016

prcorcoran commented Feb 28, 2016

davibe commented Mar 9, 2016

prcorcoran commented Mar 11, 2016

davibe commented Mar 12, 2016

prcorcoran commented Mar 12, 2016

vincetse commented Mar 29, 2016

davibe commented Mar 29, 2016

iantanwx commented Apr 25, 2016

sebglon commented May 17, 2016

zh99998 commented Jul 21, 2016

oelmekki commented Apr 16, 2017 • edited Loading

oelmekki commented Apr 17, 2017 • edited Loading

useful commands

a few more considerations

jonesnc commented Aug 22, 2017

jonesnc commented Aug 22, 2017 • edited Loading

oelmekki commented Aug 23, 2017

jonesnc commented Aug 23, 2017

stale bot commented Oct 10, 2019

stale bot commented Oct 13, 2019

ndeloof commented Oct 14, 2019

string-areeb commented Oct 14, 2019 • edited Loading

ndeloof commented Oct 14, 2019

augnustin commented Jan 29, 2020 • edited Loading

iamareebjamal commented Jan 29, 2020

decentral1se commented Jun 6, 2021

nilansaha commented Aug 22, 2022

oelmekki commented Apr 16, 2017 •

edited

Loading

oelmekki commented Apr 17, 2017 •

edited

Loading

jonesnc commented Aug 22, 2017 •

edited

Loading

string-areeb commented Oct 14, 2019 •

edited

Loading

augnustin commented Jan 29, 2020 •

edited

Loading