New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Blue/Green deployment #914

Open
samber opened this Issue Nov 29, 2016 · 16 comments

Comments

Projects
None yet
@samber
Copy link
Contributor

samber commented Nov 29, 2016

Hey guys !

What about an API route to update the constraints of a local Traefik instance ?

Example:

# traefik.toml
[consulCatalog]
  endpoint = 127.0.0.1:8500
  constraints = []
# erase old constraints
$ curl -X PUT' /api/providers/consulCatalog/constraints' -d '["tag==api-us-east", "tag!=api-blue"]'
# set/erase one constraint
$ curl -X PATCH '/api/providers/consulCatalog/constraints' -d 'tag!=api-blue'
# remove one constraint
$ curl -X DELETE '/api/providers/consulCatalog/constraints/tag!=api-blue'
# flush constraints
$ curl -X DELETE '/api/providers/consulCatalog/constraints'
@samber

This comment has been minimized.

Copy link
Contributor

samber commented Nov 29, 2016

I would be even better to have a custom flag in provider configuration, dedicated to blue/green deployment:

  • Backend configuration from kv store:
    • container 1 /traefik/backends/etcd/my-service/servers/my-container-1/bluegreen = v4.2
    • container 2 /traefik/backends/etcd/my-service/servers/my-container-2/bluegreen = v5.0
  • Frontend configuration of a service from Traefik api or backends:
    • /traefik/frontends/my-service/bluegreen = v4.2
    • Then /traefik/frontends/my-service/bluegreen = v5.0
    • To rollback: /traefik/frontends/my-service/bluegreen = v4.2
  • UI: <select> button to switch bluegreen mode.
@WTFKr0

This comment has been minimized.

Copy link
Contributor

WTFKr0 commented Nov 29, 2016

I like that

@jsierles

This comment has been minimized.

Copy link

jsierles commented Nov 30, 2016

This looks promising. Is there a way to do blue/green deployment today, without this API?

@emilevauge

This comment has been minimized.

Copy link
Member

emilevauge commented Dec 2, 2016

I like this idea of a constraints API. @samber by chance, would you work on this ?

@JavaJeffG

This comment has been minimized.

Copy link

JavaJeffG commented Jun 12, 2017

Support for Blue/Green deployments would be great. We have to support many stateful legacy applications, meaning we also need Traefik's support of sticky sessions. It would be great if there was a way with Traefik to (assuming old version is 'blue'):

  • Manage a new 'green' version of a service.

  • Only allow special traffic (maybe with a certain URL parameter) to hit the new 'green' version for testing purposes.

  • Support a request so that all new traffic goes to the new 'green' version of the service, while continuing to send existing 'sticky' traffic to the old 'blue' version of the service.

  • Support a subsequent request so that all traffic is sent to the new 'green' service.

I would not be surprised if there are lots of teams who have to support stateful applications, who would really like to be using blue/green deployments.

Thanks.

@bintut

This comment has been minimized.

Copy link

bintut commented Jul 10, 2017

I am requesting a feature to support blue/green deployment when using Docker swarm mode as its backend configuration.

The idea is to support a label for source IP address(es), IP address range or subnet (or geolocation perhaps) and forward the users' traffic to a backend server intended for the green deployment. Let's say serviceA is running version 1.0.0 in the blue environment (active) at overlay network A then another serviceA is also running but this time it is on version 1.0.1 in the green environment attached to both overlay networks A and B but for selected users only whose IP addresses matches to the label.

Thank you in advance.

@bintut

This comment has been minimized.

Copy link

bintut commented Jul 22, 2017

In Docker swarm mode backend, ServiceA and ServiceB are basically the same applications with different Docker image tags/versions but ServiceB comes maybe with additional feature and/or minor fixes.

  • Blue: ServiceA on v1.0.0 (active as catch all)
  • Green: ServiceB on v1.0.1 (active selectively if traefik.backend.allow.ipaddr, traefik.backend.allow.network, and/or traefik.backend.allow.country labels is/are present)
@bintut

This comment has been minimized.

Copy link

bintut commented Jul 23, 2017

The Nomad project has support for blue/green + canary deployments. However, it is not stated there how they are going to decide whose traffic will be forwarded to the canary version or maybe I'm not yet aware of it. I hope that in Traefik using Docker swarm mode backend, the source IP address(es), source network and/or source geolocation will be supported to decide whose traffic will be forwarded to the canary version.

@mustafayildirim

This comment has been minimized.

Copy link

mustafayildirim commented Sep 19, 2017

It would be great 👍

@BastienM

This comment has been minimized.

Copy link

BastienM commented Sep 20, 2017

I managed to come close to a workaround, while the feature is being worked on.

So far I am using a combination of Consul and File endpoint but lacks of auto-reloading, watch = true is not working for me and cross-endpoint communication.

Target workflow

  1. Deploy containers
  2. Update config.toml with correct backend to use for production/preproduction
  3. traefik reloads the configuration
  4. Static frontends now points to correct backends

(2. could be automated via a CI's job)

Current problem(s)

  • Cannot use Consul + Consul KV (KV entries are being ignored)
  • Cannot use Consul + Etcd (traefik fails to choose a endpoint)
  • Cannot reference a backend from another endpoint (traefik does not recognize the backend)
  • watch = true does not reload traefik's configuration on changes

Suggestion(s)

  • Allow frontends to cross-endpoint references backends
  • Allow usage of Consul's catalog and a KV store (KV entries can override settings or at least create frontends/backends aliases)

docker-compose.yml

version: '2'
services:
  consul:
    image: consul:latest
    container_name: consul
    networks:
      - discovery
    ports:
      - "8400:8400"
      - "8500:8500"
      - "172.17.0.1:53:8600/udp"
    labels:
      SERVICE_IGNORE: "true"

  registrator:
    image: gliderlabs/registrator:latest
    container_name: registrator
    privileged: true
    networks:
      - discovery
    depends_on:
      - consul
    command: -internal=true consul://consul:8500
    volumes:
      - /var/run/docker.sock:/tmp/docker.sock
    labels:
      SERVICE_IGNORE: "true"
  
  proxy:
    image: traefik
    container_name: proxy
    command: --web --file --consul --consul.endpoint=consul:8500 --consulcatalog.endpoint=consul:8500 --logLevel=DEBUG
    networks:
      - internet-gateway
      - discovery
      - proxy
    ports:
      - "80:80"
      - "443:433"
      - "8080:8080"
    depends_on:
      - consul
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - $PWD/config.toml:/traefik.toml
    labels:
      SERVICE_IGNORE: "true"

  app-blue:
    image: emilevauge/whoami
    networks:
      - proxy
    expose:
      - 80
    labels:
      SERVICE_NAME: "blue"
      SERVICE_TAGS: "traefik.frontend.rule=Host:blue.service.consul"

  app-green:
    image: emilevauge/whoami
    networks:
      - proxy
    expose:
      - 80
    labels:
      SERVICE_NAME: "green"
      SERVICE_TAGS: "traefik.frontend.rule=Host:green.service.consul"

networks:
  discovery:
  internet-gateway:
  proxy:

config.toml

[file]
watch = true

[backends]
  [backends.production]
    [backends.production.servers.server1]
    url = "http://<app-blue_ip>:80"
  [backends.preproduction]
    [backends.preproduction.servers.server1]
    url = "http://<app_green_ip>:80"

[frontends]
  [frontends.production]
  backend = "production"
    [frontends.production.routes.route-host-production]
    rule = "Host:production.myapp.local"
  [frontends.preproduction]
  backend = "preproduction"
    [frontends.preproduction.routes.route-host-preproduction]
    rule = "Host:preproduction.myapp.local"
@aantono

This comment has been minimized.

Copy link
Contributor

aantono commented Sep 20, 2017

I've run into the same issue with watch = true not watching the files and not reloading it upon change. Could be a regression in 1.4 line? << @ldez @timoreimann

@timoreimann

This comment has been minimized.

Copy link
Member

timoreimann commented Sep 20, 2017

Are we talking about file watches happening inside containers? AFAIK fsnotify does not work with Docker, so file change events won't propagate to processes watching for them.

@aantono

This comment has been minimized.

Copy link
Contributor

aantono commented Sep 20, 2017

I've tried that on my Mac (MacOS Sierra 10.12.6), but the prod deployment does run inside the container (Docker) with files embedded inside, not linked from outside host via a mount.

@pascalandy

This comment has been minimized.

Copy link
Contributor

pascalandy commented Sep 25, 2017

Hey folks,

I implemented Blue/Green deployment in my setup. It’s almost perfect.

BEFORE I had a downtime of 90 seconds. With my Blue/Green deployment setup, I got two little downtime of 5 seconds.

Here is how I do it:

  • ServiceBlue runs Nginx image 1.13.1
  • I want to update this service. So I start a new service:
  • ServiceGreen runs Nginx image 1.13.2
  • I wait for ServiceGreen to be online
  • Now Traefik load balance between both.
  • Life is good and there was no downtime.
  • Time the update ServiceBlue.
  • Shutdown ServiceBlue.

Here traefik is NOT aware that ServiceBlue is down. I get a 404. Fortunately, the next request goes to ServiceGreen. I have a 5 second downtime.

  • Wait for ServiceBlue to be online
  • Now Traefik load balance between both.
  • Time the shutdown ServiceGreen as I only want to run ServiceBlue in prod.

Here traefik is NOT aware that ServiceGreen is down. I get a 404. Fortunately, the next request goes to ServiceBlue. I have a 5 second downtime.

Question

How much time does Traefik need to actively learn that a service has disappeared ?

IMHO, a possible solution is that we need to be able to force Traefik to re-asset available services for —label traefik.backend=nginx.

Many cheers!

@Vanuan

This comment has been minimized.

Copy link
Contributor

Vanuan commented Aug 17, 2018

I've reviewed all the blue-green implementations in docker swarm. The crucial difference is how the switch between blue and green happens. It all seems to fall into 3 categories:

  1. Load balancer is configured statically. Configuration is stored in a file, points to either blue or green. The switch happens by redeploying load balancer with either blue or green configuration.
  2. Load balancer is configured statically. Configuration is stored in a file, points to both stacks and never changes. The switch happens by stopping/starting either blue or green stack.
  3. Load balancer is configured dynamically. Configuration is stored in memory, points to either blue or green. The switch happens by loading a new configuration into memory (using API, config file + SIGUSR1 or some other mechanism)

Here are my comments:

  1. Good if you're fine with short downtime.
  2. I don't think you'd want to remove old stack before you're convinced that current stack works. You can't quickly switch back if something goes wrong. So it's not is not suitable for my use case (may be fine for others).
  3. Looks the most suitable.

So let's focus on 3rd approach.
For nginx it's as easy as changing the conf file and running nginx -s reload.
For traefik, it's a bit trickier.
In one implementation, in swarm mode you change which stack is live by attaching service labels with information about its HTTP virtual hostname. Since traefik (to my knowledge) doesn't yet support TCP load balancing, this approach is fine.

The problem is that you have to run service update --label-rm hostname=live --label-add hostname=idle for every exposed service. Would be much nicer to do this:

docker stack deploy -c base.yml -c live.yml blue_stack
docker stack deploy -c base.yml -c idle.yml green_stack

Where live.yml contain all the labels for new (future live) stack and idle.yml have all the labels for current (future idle) stack.

Unfortunately it doesn't work since base.yml will change, so both stacks would be of the same version. I filed a ticket for suporting incremental stack deploys but I don't think it would be implemented.

So here's an idea on improvement: dump all the option into a base.yml. Or alternatively, generation service update commands out of live.yml and idle.yml.

Alternatively, if we don't want to go to that rabbit hole, we might use traefik API to change configuration instead of changing services and waiting for traefik to pickup those changes.

@mildred

This comment has been minimized.

Copy link

mildred commented Sep 12, 2018

blue/green is supported by nomad and consul catalog using tags. You define different tags for the blue servers and the green servers. Unfortunately, Traefik does not make this distinction and blue and green are grouped together on a single backend, see #3882

#914 (comment)
The Nomad project has support for blue/green + canary deployments. However, it is not stated there how they are going to decide whose traffic will be forwarded to the canary version or maybe I'm not yet aware of it. I hope that in Traefik using Docker swarm mode backend, the source IP address(es), source network and/or source geolocation will be supported to decide whose traffic will be forwarded to the canary version.

@bintut Nomad allows you to define different tags in the Consul Catalog for canary and non canary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment