Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce steps into blueGreen strategy #19

Closed
jessesuen opened this issue Feb 6, 2019 · 4 comments
Closed

Introduce steps into blueGreen strategy #19

jessesuen opened this issue Feb 6, 2019 · 4 comments
Labels
blue-green Blue-Green related issue
Milestone

Comments

@jessesuen
Copy link
Member

jessesuen commented Feb 6, 2019

blueGreen deploy strategy should have a steps field to control the rollout. Here is the current proposal:

  # manual gate
  strategy:
    blueGreen: 
      activeService: active-service
      previewService: preview-service
      steps:
      - setPreview: true
      - pause: true

  # manual teardown
  strategy:
    blueGreen: 
      activeService: active-service
      steps:
      - switchActive: true
      - pause: true

  # manual gate and manual teardown
  strategy:
    blueGreen: 
      activeService: active-service
      steps:
      - setPreview: true
      - pause: true
      - switchActive: true
      - pause: true

  # fully automated
  strategy:
    blueGreen: 
      activeService: active-service

  # fully automated with delayed teardown
  strategy:
    blueGreen: 
      activeService: active-service
      steps:
      - switchActive: true
      - wait: 600
@dthomson25 dthomson25 added this to the v0.3.0 milestone Feb 15, 2019
@dthomson25
Copy link
Member

dthomson25 commented Mar 19, 2019

After discussing the above feature internally and with users, we realized that adding steps will overcomplicate the BlueGreen strategy with only a little added benefit. Steps were considered to provide flexibility for users to define how they would like their BlueGreen strategy to deploy, but we found that we can cover all the relevant use cases through two new fields. The field, pauseForSecondsBeforeSwitchActive, would introduce the ability to pause for a time duration before switching the active service, and the field, waitBeforeScalingDown, would introduce the ability to not scale down the old active replicaset. This issue will now implement these fields

@dthomson25
Copy link
Member

Another reason for a pauseBeforeScalingDown field:

Changing the selector on a service can be a non-trivial operation on a cluster. In the case of a node port service, the Kubernetes cluster will need to update the IP tables of every node to reflect the new selector. With a large enough cluster, the rollout could scale down an old replicaset before the selector change is propagated to the entire cluster. Those requests will fail because there is no service behind the endpoint.

Unfortunately, the Argo Rollout controller has a hard time detecting when this work is done because the different CNIs handle networking differently and Argo Rollouts will require significantly more permissions to detect the changes. In order to provide a solution, Argo Rollouts should have a configurable amount of time before the old replicaset is scaled down to prevent traffic from going to deleted pods.

Related Articles:

@dthomson25 dthomson25 added the blue-green Blue-Green related issue label Apr 2, 2019
@dthomson25
Copy link
Member

This issue was implemented by #57 and #59. The pauseBeforeScalingDown field as renamed to scaleDownDelaySeconds and the pauseBeforeScalingDown was renamed autoPromoteActiveService in the implementations.

@dthomson25
Copy link
Member

dthomson25 commented Apr 30, 2019

Closing as both PRs are merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blue-green Blue-Green related issue
Projects
None yet
Development

No branches or pull requests

2 participants