Skip to content

Commit

Permalink
add parameter to configure max restarts for canary pause (#51)
Browse files Browse the repository at this point in the history
* add new ExtendedDaemonSetSpecStrategyCanaryAutoPause struct with Enabled and MaxRestarts, update README, add test for IsPodRestarting
  • Loading branch information
celenechang committed Nov 2, 2020
1 parent c951eae commit 738e5f9
Show file tree
Hide file tree
Showing 16 changed files with 412 additions and 44 deletions.
72 changes: 64 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ $ make deploy
By default, the controller only watches the ExtendedDaemonSet resources that are present in its own namespace. If you want to deploy the controller cluster wide, add a Kustomization to the `config/manager`

```yaml
env:
env:
- name: WATCH_NAMESPACE
value: ""
```
Expand All @@ -49,12 +49,12 @@ This creates a three node cluster with one master and two worker nodes:
$ kind create cluster --config examples/kind-cluster-configuration.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.15.3) 🖼
✓ Preparing nodes 📦📦📦
✓ Creating kubeadm config 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
✓ Preparing nodes 📦📦📦
✓ Creating kubeadm config 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Cluster creation complete. You can now use the cluster with:

$ export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"
Expand Down Expand Up @@ -230,6 +230,30 @@ Then add the label `extendeddaemonset.datadoghq.com/exclude=foo` to the node in

`kubectl label nodes <your-node-name> extendeddaemonset.datadoghq.com/exclude=foo`


#### Canary settings

The Canary deployment can be customized in a few ways.

- `replicas`: The number of replica pods to participate in the Canary deployment
- `duration`: The duration of the Canary deployment, after which the Canary deployment will end and the active ExtendedReplicaSet will update
- `autoPause.enabled`: Activation of the Canary deployment auto pausing feature (default is `true`)
- `autoPause.maxRestarts`: The maximum number of restarts tolerable before the Canary deployment is automatically paused (default is `2`)

Example configuration of the spec canary strategy:

```
spec:
strategy:
canary:
replicas: 1
duration: 5m
autoPause:
enabled: true
maxRestarts: 5
```


### Kubectl plugin

To build the the kubectl ExtendedDaemonSet plugin, you can run the command: `make build-plugin`. This will create the `kubectl-eds` Go binary, corresponding to your local OS and architecture.
Expand All @@ -241,12 +265,44 @@ Usage:
ExtendedDaemonset [command]

Available Commands:
canary control ExtendedDaemonset canary deployment
get get ExtendedDaemonSet deployment(s)
get-ers get-ers ExtendedDaemonSetReplicaset deployment(s)
help Help about any command
validate validate canary replicaset
```

#### Validate Canary deployment

As an alternative to waiting for the Canary duration to end, the deployment can be manually validated.

`kubectl-eds canary validate <ExtendedDaemonSet name>`

#### Pause Canary deployment

The Canary deployment can be paused to investigate an issue.

`kubectl-eds canary pause <ExtendedDaemonSet name>`

#### Unpause Canary deployment

The Canary deployment can be unpaused, and the Canary duration will continue.

`kubectl-eds canary unpause <ExtendedDaemonSet name>`

#### Fail Canary deployment

The Canary deployment can be manually failed. This command will restore the currently active ExtendedReplicaSet on the Canary pods.

`kubectl-eds canary fail <ExtendedDaemonSet name>`

#### Reset Canary deployment

Following failure of the Canary deployment, the `fail` annotation should be reset with this command.

`kubectl-eds canary reset <ExtendedDaemonSet name>`



### How to migrate from a DaemonSet

If you already have an application running in your cluster with a DaemonSet, it is possible to migrate to an ExtendedDaemonSet with a `smooth` migration path.
Expand Down
36 changes: 29 additions & 7 deletions api/v1alpha1/extendeddaemonset_default.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,13 @@ import (
)

const (
defaultCanaryReplica = 1
defaultCanaryDuration = 10
defaultSlowStartIntervalDuration = 1
defaultMaxParallelPodCreation = 250
defaultReconcileFrequency = 10 * time.Second
defaultCanaryReplica = 1
defaultCanaryDuration = 10
defaultCanaryAutoPauseEnabled = true
defaultCanaryAutoPauseMaxRestarts = 2
defaultSlowStartIntervalDuration = 1
defaultMaxParallelPodCreation = 250
defaultReconcileFrequency = 10 * time.Second
)

// IsDefaultedExtendedDaemonSet used to know if a ExtendedDaemonSet is already defaulted
Expand All @@ -28,7 +30,7 @@ func IsDefaultedExtendedDaemonSet(dd *ExtendedDaemonSet) bool {
}

if dd.Spec.Strategy.Canary != nil {
if defauled := IsDefaultedExtendedDaemonSetSpecStrategyCanary(dd.Spec.Strategy.Canary); !defauled {
if defaulted := IsDefaultedExtendedDaemonSetSpecStrategyCanary(dd.Spec.Strategy.Canary); !defaulted {
return false
}
}
Expand Down Expand Up @@ -85,11 +87,14 @@ func IsDefaultedExtendedDaemonSetSpecStrategyCanary(canary *ExtendedDaemonSetSpe
if canary.NodeSelector == nil {
return false
}
if canary.AutoPause == nil || canary.AutoPause.Enabled == nil || canary.AutoPause.MaxRestarts == nil {
return false
}
return true
}

// DefaultExtendedDaemonSet used to default an ExtendedDaemonSet
// return a list of errors in case of unvalid fields.
// return a list of errors in case of invalid fields.
func DefaultExtendedDaemonSet(dd *ExtendedDaemonSet) *ExtendedDaemonSet {
defaultedDD := dd.DeepCopy()
DefaultExtendedDaemonSetSpec(&defaultedDD.Spec)
Expand Down Expand Up @@ -130,9 +135,26 @@ func DefaultExtendedDaemonSetSpecStrategyCanary(c *ExtendedDaemonSetSpecStrategy
MatchLabels: map[string]string{},
}
}
if c.AutoPause == nil {
c.AutoPause = &ExtendedDaemonSetSpecStrategyCanaryAutoPause{}
}
DefaultExtendedDaemonSetSpecStrategyCanaryAutoPause(c.AutoPause)
return c
}

// DefaultExtendedDaemonSetSpecStrategyCanaryAutoPause used to default an ExtendedDaemonSetSpecStrategyCanary
func DefaultExtendedDaemonSetSpecStrategyCanaryAutoPause(a *ExtendedDaemonSetSpecStrategyCanaryAutoPause) *ExtendedDaemonSetSpecStrategyCanaryAutoPause {
if a.Enabled == nil {
enabled := defaultCanaryAutoPauseEnabled
a.Enabled = &enabled
}

if a.MaxRestarts == nil {
a.MaxRestarts = NewInt32(defaultCanaryAutoPauseMaxRestarts)
}
return a
}

// DefaultExtendedDaemonSetSpecStrategyRollingUpdate used to default an ExtendedDaemonSetSpecStrategyRollingUpdate
func DefaultExtendedDaemonSetSpecStrategyRollingUpdate(rollingupdate *ExtendedDaemonSetSpecStrategyRollingUpdate) *ExtendedDaemonSetSpecStrategyRollingUpdate {
rollingupdate.MaxUnavailable = intstr.ValueOrDefault(rollingupdate.MaxUnavailable, intstr.FromInt(1))
Expand Down
11 changes: 10 additions & 1 deletion api/v1alpha1/extendeddaemonset_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,16 @@ type ExtendedDaemonSetSpecStrategyCanary struct {
Duration *metav1.Duration `json:"duration,omitempty"`
NodeSelector *metav1.LabelSelector `json:"nodeSelector,omitempty"`
// +listType=set
NodeAntiAffinityKeys []string `json:"nodeAntiAffinityKeys,omitempty"`
NodeAntiAffinityKeys []string `json:"nodeAntiAffinityKeys,omitempty"`
AutoPause *ExtendedDaemonSetSpecStrategyCanaryAutoPause `json:"autoPause,omitempty"`
}

// ExtendedDaemonSetSpecStrategyCanaryAutoPause defines the canary deployment AutoPause parameters of the ExtendedDaemonSet
// +k8s:openapi-gen=true
type ExtendedDaemonSetSpecStrategyCanaryAutoPause struct {
Enabled *bool `json:"enabled,omitempty"`
// MaxRestarts defines the number of tolerable Canary pod restarts after which the Canary deployment is autopaused
MaxRestarts *int32 `json:"maxRestarts,omitempty"`
}

// ExtendedDaemonSetStatusState type representing the ExtendedDaemonSet state
Expand Down
30 changes: 30 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 48 additions & 16 deletions api/v1alpha1/zz_generated.openapi.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions config/crd/bases/v1/datadoghq.com_extendeddaemonsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,20 @@ spec:
canary:
description: Canary deployment configuration
properties:
autoPause:
description: ExtendedDaemonSetSpecStrategyCanaryAutoPause
defines the canary deployment AutoPause parameters of the
ExtendedDaemonSet
properties:
enabled:
type: boolean
maxRestarts:
description: MaxRestarts defines the number of tolerable
Canary pod restarts after which the Canary deployment
is autopaused
format: int32
type: integer
type: object
duration:
type: string
nodeAntiAffinityKeys:
Expand Down
13 changes: 13 additions & 0 deletions config/crd/bases/v1beta1/datadoghq.com_extendeddaemonsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,19 @@ spec:
canary:
description: Canary deployment configuration
properties:
autoPause:
description: ExtendedDaemonSetSpecStrategyCanaryAutoPause defines
the canary deployment AutoPause parameters of the ExtendedDaemonSet
properties:
enabled:
type: boolean
maxRestarts:
description: MaxRestarts defines the number of tolerable
Canary pod restarts after which the Canary deployment
is autopaused
format: int32
type: integer
type: object
duration:
type: string
nodeAntiAffinityKeys:
Expand Down
13 changes: 11 additions & 2 deletions controllers/extendeddaemonset/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -263,9 +263,18 @@ func (r *Reconciler) updateInstanceWithCurrentRS(logger logr.Logger, daemonset *
// Check if newDaemonset differs from existing daemonset, and update if so
if !apiequality.Semantic.DeepEqual(daemonset, newDaemonset) {
if updateDaemonsetSpec {
if err := r.client.Update(context.TODO(), newDaemonset); err != nil {
return newDaemonset, reconcile.Result{}, err
// Make and use a copy because undesired behaviors occur when making two update calls
newDaemonsetCopy := newDaemonset.DeepCopy()
if err := r.client.Update(context.TODO(), newDaemonsetCopy); err != nil {
return newDaemonsetCopy, reconcile.Result{}, err
}

// This ensures that the first client update respects the desired new status
newDaemonsetCopy.Status = newDaemonset.Status
if err := r.client.Status().Update(context.TODO(), newDaemonsetCopy); err != nil {
return newDaemonsetCopy, reconcile.Result{}, err
}
return newDaemonsetCopy, reconcile.Result{}, nil
}

if err := r.client.Status().Update(context.TODO(), newDaemonset); err != nil {
Expand Down
Loading

0 comments on commit 738e5f9

Please sign in to comment.