Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document best practices for how to manage releases in production #1786

Closed
bfirsh opened this issue Jul 29, 2015 · 30 comments
Closed

Document best practices for how to manage releases in production #1786

bfirsh opened this issue Jul 29, 2015 · 30 comments

Comments

@bfirsh
Copy link

bfirsh commented Jul 29, 2015

We currently have some lightweight documentation about how to use Compose in production, but this could do with improvements:

  • How to manage releases, particularly when deploying through the Docker Hub (e.g. in development build images, in production use image from Hub)
  • A step-by-step guide to deploying a Compose app from dev through to production
  • Some examples of how to deploy apps using init scripts
  • Some examples of how to deploy apps using Swarm
  • How to deploy apps on a single server (particularly useful for internal tools, etc)

Resources

@funkyfuture
Copy link

init scripts

and service-configs for systemd, that thing will take over. i thought of a command to generate these configs.

@amylindburg amylindburg added this to the 1.5.0 milestone Aug 12, 2015
@dnephin
Copy link

dnephin commented Aug 31, 2015

There's some previous discussion and requests in #93, #1264, #1730, #1035. I've closed those tickets in favour of this one, but linking to them for reference.

@neg3ntropy
Copy link

I would like some recommendations on how to deploy config files through docker/compose/swarm.
We have a setup (that was recommended by consultants) that makes images with config files and declares volumes to export them. This looks good in principle, but it does not work as expected in a number of cases.

@dnephin
Copy link

dnephin commented Feb 4, 2016

Putting the configs into a "config" image that just exposes a volume seems like a reasonable way to do it. I'd be great to hear more about the cases where it doesn't work, either here or in a new issue.

@neg3ntropy
Copy link

@dnephin If you go down the road of one single configuration container and use --volumes-from you sacrifice some security (every container sees all configs) but it looks easy to set up and nice: it's immutable and does not use any host fs path.

Once you operate outside localhost and do not recreate everything at each run, you start learning the subtleties of the --volumes-from and compose recreation policies: the config image is updated and restarted, but client container still mount their current volumes unless they are recreated as well for independent reasons. This took a while to be noticed and left us with a workaround of deleting the old config container whenever the config changes.

Another solution would seem to be avoiding immutability and change data inside the same volumes, running a cp or similar. At this point it would just be easier to pull the configs from a git repo and skip the image build altogether... which was the stateful solution I originally had in mind. If you want no fixed host path, you need a config data-only container and a config copier.

I am not 100% happy with any of the solutions. Either I am not seeing a better way or maybe some feature is still missing (like some crazy image multiple inheritance or some smarter detection of dependencies when using --volumes-from that I can't figure out).
What this need is essentially is to add a build step, an environment layer, without really building a new image.

@prcorcoran
Copy link

I have a suggested solution for supporting the zero-downtime deployment a lot of us want.

Why not simply add a new option to docker-compose.yml like "zero_downtime:" that would work as follows:

web:
image: sbgc (rails)
restart: always
links:
- postgres
- proxy
- cache
zero_downtime: 50 (delay 50 milliseconds before stopping old container. default would be 0)

I run separate containers for nginx, web(rails), postgres and cache(memcached). However, it's the application code in the web container that changes and the only one I need zero downtime on.

$ docker-compose up -d web

During "up" processing that creates the new "web" container, if the zero_downtime option is specified, start up the new container first exactly like scale web=2 would. Then stop and remove sbgc_web_1 like it currently does. Then rename sbgc_web_2 to sbgc_web_1. If a delay was specified (as in the 50 milliseconds example above) it would delay 50 milliseconds to give the new container time to come up before stopping the old one.

If there were 10 web containers already running it would start from the end and work backwards.

This is how I do zero downtime deploys today. Clunky but works: [updated]
$ docker-compose scale web=2 (start new container running as sbgc_web_2)
$ docker stop sbgc_web_1 (stop old container)
$ docker rm sbgc_web_1 (remove old container)

Update: we need a way to rename the sbgc_web_2 container to sbgc_web_1. Thought we could just use 'docker rename sbgc_web_2 sbgc_web_1' which works but then running 'docker-compose scale web=2' will produce sbgc_web_3 instead of sbgc_web_2 as expected.

@davibe
Copy link

davibe commented Mar 9, 2016

What happens to links if you do that ? I guess you need a load balancer container linked to the ones you launch and remove and you can't restart it (?)

@prcorcoran
Copy link

The links between containers are fine in the scenario above. Adding a load balancer in front would work but seems like overkill if we just need to replace a running web container with a new version. I can accomplish that manually by scaling up and stopping the old container but it leaves the new container numbered at 2. If the internals of docker-compose were changed to accommodate starting the new one first, stopping the old one and renumbering the new one I think this would be a pretty good solution.

@davibe
Copy link

davibe commented Mar 12, 2016

In a real use case you want to wait for the second (newer) service to be ready before considering it healthy. This may include connecting to dbs, performing stuff. It's very application specific. Then you want to wait for connection draining on the older copy before closing it. Again connection draining and timeouts is application specific too. It could be a bit overkill to add support for all of that to docker-compose.

@prcorcoran
Copy link

Right, the 2nd container would need time to start up which could take some time depending on the application. That is why I proposed adding a delay:
":zero_downtime: 50 (delay 50 milliseconds before stopping old container. default would be 0)
As far as stopping the original goes it wouldn't be any different than what docker-compose stop does currently.

Basically my proposal is just to start the new container first, give it time to come up if needed and then stop and remove the old container. This can be accomplished manually with the docker command line today. The only remaining piece would be to rename the new container. Also possible to do manually today except that docker compose doesn't change the internal number of the container.

@vincetse
Copy link

Hey folks, I was facing the need for a zero-downtime deployment for a web service today, and tried to take the scaling approach which didn't work well for me before I realized I could do it by extending my app into 2 identical services (named service_a and service_b in my sample repo) and restarting them one at a time. Hope some of you will find this pattern useful.

https://github.com/vincetse/docker-compose-zero-downtime-deployment

@davibe
Copy link

davibe commented Mar 29, 2016

It does not work for me. I have added an issue on your repo.

@iantanwx
Copy link

I just came across this ticket while deciding on whether or not to use compose with flocker and docker swarm, or whether to use ECS for scaling/deployment jobs, using the docker cli only for certain ad-hoc cluster management tasks.

I've decided to go with compose to keep things native. I'm not fond of the AWS API, and I think most developers, like me, would rather not mess about with ridiculously nested JSON objects and so on.

I then came across DevOps Toolkit by Viktor Farcic, and he uses a pretty elegant solution to implement blue-green deployments with compose and Jenkins (if you guys use Jenkins). It's pretty effective having tested it in staging. Otherwise it would seem @vincetse has a pretty good solution that doesn't involve much complexity.

@sebglon
Copy link

sebglon commented May 17, 2016

a very good implementation of the rolling-upgrade already exists on Rancher
http://docs.rancher.com/rancher/latest/en/rancher-compose/upgrading/

@zh99998
Copy link

zh99998 commented Jul 21, 2016

as now docker swarm will be native, no need haproxy/nginx for load-balancing, and native health check arguments. is there any more simplified solution?

@oelmekki
Copy link

oelmekki commented Apr 16, 2017

Update: we need a way to rename the sbgc_web_2 container to sbgc_web_1. Thought we could just use 'docker rename sbgc_web_2 sbgc_web_1' which works but then running 'docker-compose scale web=2' will produce sbgc_web_3 instead of sbgc_web_2 as expected.

If anyone wonders why, that's because of a label that docker-compose adds on container:

           "Labels": {
                // ...
                "com.docker.compose.container-number": "3",
                // ...
            }

Sadly, it's not yet possible to update labels on running containers.

(also, to save people a bit of time: trying to break docker-compose by using its labels: section to force the value of that label does not work :P )

@oelmekki
Copy link

oelmekki commented Apr 17, 2017

Ok, I managed to automate zero downtime deploy, thanks @prcorcoran for the guidelines.

I'll give here a more detailed way about how to perform it, when using nginx.

  1. scale your service up, using eg docker-compose scale web=2
  2. wait for the container to be ready, either with a blind timeout or ping a specific url on container
  3. update nginx upstream list for your domain to only list new container IP
  4. perform a nginx configuration reload (eg sudo nginx -s reload, do not do a restart, or it will close active connections)
  5. wait for old container to be done with its running requests (a timeout is fine)
  6. stop the previous container using docker, not docker-compose
  7. remove the previous container using docker, not docker-compose (or not, if you want to be able to rollback)
  8. scale the service down, using eg docker-compose scale web=1

useful commands

To find container ids after scaling up, I use:

docker-compose ps -q <service>

This can be used to find new container IP and to stop and remove old container.

The first id is the old container, the second id is the new one The order is not guaranteed, containers has to be inspected to know which one is the oldest.

To find container creation date:

docker inspect -f '{{.Created}}' <container id>

To find new container IP:

docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' <container id>

a few more considerations

As mentioned in previous comments, the number in the container name will keep incrementing. It will be eg app_web_1, then app_web_2, then app_web_3, etc. I didn't find that to be a problem (if there's ever a hard limit in this number, a cold restart of the app reset it). I didn't have either to rename containers manually to keep the newest container up, we just have to manually stop the old container.

You can't specify port mapping in your docker-compose file, because then you can't have two containers running at the same time (they would try to bind to the same port). Instead, you need to specify the port in nginx upstream configuration, which means you have to decide about it outside of docker-compose configuration.

The described method works when you only want a single container per service. That being said, it shouldn't be too hard to just have a look at how many containers are running, scaling to the double of that number, then stop/rm that number of old containers.

Obviously, the more services you have to rotate, the more complicated it gets.

@jonesnc
Copy link

jonesnc commented Aug 22, 2017

@oelmekki The scale command has been deprecated. The recommend way to scale is now:

docker-compose up --scale web=2

@jonesnc
Copy link

jonesnc commented Aug 22, 2017

@oelmekki also, if the web container has port bindings to the host, won't running scale create a port conflict?

Bind for 0.0.0.0:9010 failed: port is already allocated is the message I get for a container that has the following ports:

ports:
      - 9000:9000
      - 9010:9010

If you have a setup that utilizes nginx, for instance, this probably won't be an issue since the service you're scaling is not the service that has port bindings to the host.

@oelmekki
Copy link

@jonesnc

also, if the web container has port bindings to the host, won't running scale create a port conflict?

That's why I explicitly mention not to do it :)

In my previous comment:

You can't specify port mapping in your docker-compose file, because then you can't have two containers running at the same time (they would try to bind to the same port). Instead, you need to specify the port in nginx upstream configuration, which means you have to decide about it outside of docker-compose configuration.

--

You can't bind those ports on host, but you can bind those ports on containers, which have each their own IP. So the job is to find the IP of the new container and replace the old container IP with it in nginx upstream configuration. If you don't mind reading golang code, you can see an implementation example here.

@jonesnc
Copy link

jonesnc commented Aug 23, 2017

@oelmekki oops! That part of your post didn't register in my brain, I guess.

@stale
Copy link

stale bot commented Oct 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 10, 2019
@stale
Copy link

stale bot commented Oct 13, 2019

This issue has been automatically marked as not stale anymore due to the recent activity.

@stale stale bot removed the stale label Oct 13, 2019
@ndeloof
Copy link
Contributor

ndeloof commented Oct 14, 2019

How to manage releases, particularly when deploying through the Docker Hub (e.g. in development build images, in production use image from Hub)

This is one of the main focus of https://github.com/docker/app, and considering how long this issue has been opened without any concrete answer, I think it's better to just close it.

@ndeloof ndeloof closed this as completed Oct 14, 2019
@string-areeb
Copy link

string-areeb commented Oct 14, 2019

@ndeloof That may be the main focus of docker/app, but there is still no way to do zero downtime or rolling updates, so I think only 1 portion of this bug has been solved by app

In fact, the above listed methods don't work now. If the image is updated and docker-compose scale web=2 is run, then both containers are recreated, instead of 1 new container of the new image

@ndeloof
Copy link
Contributor

ndeloof commented Oct 14, 2019

You're perfectly right, but the purpose of this issue has been to document those deployment practices, and obviously there's no standard way to achieve this, especially considering the various platforms compose can be used for (single engine, swarm, kubernetes, ecs ...)

@augnustin
Copy link

augnustin commented Jan 29, 2020

Thanks @oelmekki for your insights. It has been very useful and encouraging when there is so few info on rolling updates with docker-compose.

I ended up writing the following script docker_update.sh <service_name>, which seems to work very decently. It relies on healthcheck command, which is not mandatory (change -f "health=healthy" accordingly) but cleaner IMHO than waiting for container to simply being up, when it takes a little time to boot (which will be the case if you run eg. npm install && npm start as a command).

#!/bin/bash

cd "$(dirname "$0")/.."

SERVICE_NAME=${1?"Usage: docker_update <SERVICE_NAME>"}

echo "[INIT] Updating docker service $SERVICE_NAME"

OLD_CONTAINER_ID=$(docker ps --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $1}')
OLD_CONTAINER_NAME=$(docker ps --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $2}')

echo "[INIT] Scaling $SERVICE_NAME up"
docker-compose up -d --no-deps --scale $SERVICE_NAME=2 --no-recreate $SERVICE_NAME

NEW_CONTAINER_ID=$(docker ps --filter="since=$OLD_CONTAINER_NAME" --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $1}')
NEW_CONTAINER_NAME=$(docker ps --filter="since=$OLD_CONTAINER_NAME" --format "table {{.ID}}  {{.Names}}  {{.CreatedAt}}" | grep $SERVICE_NAME | tail -n 1 | awk -F  "  " '{print $2}')

until [[ $(docker ps -a -f "id=$NEW_CONTAINER_ID" -f "health=healthy" -q) ]]; do
  echo -ne "\r[WAIT] New instance $NEW_CONTAINER_NAME is not healthy yet ...";
  sleep 1
done
echo ""
echo "[DONE] $NEW_CONTAINER_NAME is ready!"

echo "[DONE] Restarting nginx..."
docker-compose restart nginx

echo -n "[INIT] Killing $OLD_CONTAINER_NAME: "
docker stop $OLD_CONTAINER_ID
until [[ $(docker ps -a -f "id=$OLD_CONTAINER_ID" -f "status=exited" -q) ]]; do
  echo -ne "\r[WAIT] $OLD_CONTAINER_NAME is getting killed ..."
  sleep 1
done
echo ""
echo "[DONE] $OLD_CONTAINER_NAME was stopped."

echo -n "[DONE] Removing $OLD_CONTAINER_NAME: "
docker rm -f $OLD_CONTAINER_ID
echo "[DONE] Scaling down"
docker-compose up -d --no-deps --scale $SERVICE_NAME=1 --no-recreate $SERVICE_NAME

And here's my docker-compose.yml:

  app:
    build: .
    command: /app/server.sh
    healthcheck:
      test: curl -sS http://127.0.0.1:4000 || exit 1
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 30s
    volumes:
      - ..:/app
    working_dir: /app
  nginx:
    depends_on:
      - app
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - "../nginx:/etc/nginx/conf.d"
      - "/var/log/nginx:/var/log/nginx"

And my nginx.conf:

upstream project_app {
  server app:4000;
}

server {
  listen 80;
  server_name example.com;

  location / {
    proxy_pass http://project_app;
  }
}

Hope it can be useful to some. I wish it would be integrated into docker-compose by default.

@iamareebjamal
Copy link

Thanks a lot, but since you have bound app port to static 4000, it means only 1 container, new or old can bind to it, hence new and old containers can't run simultaneously. I need to test it once since I may be wrong

@decentral1se
Copy link

How to manage releases, particularly when deploying through the Docker Hub (e.g. in development build images, in production use image from Hub)

This is one of the main focus of https://github.com/docker/app, and considering how long this issue has been opened without any concrete answer, I think it's better to just close it.

App is gone docker/roadmap#209. Still wondering if we'll ever have zero downtime deploys in compose? Especially since the future of swarm is very unclear still docker/roadmap#175.

@nilansaha
Copy link

would

When the old container is killed, there is a considerable downtime before nginx is reloaded again. Not sure if this have changed now but trying many times, always the case. Unreachable when the old container is killed and gets back up after a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests