Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist manual unpause until end of a canary deployment, and then reset #77

Merged
merged 2 commits into from
Jan 21, 2021

Conversation

celenechang
Copy link
Contributor

@celenechang celenechang commented Jan 20, 2021

What does this PR do?

Previously, the ability to manually unpause a Canary deployment was added. However, if AutoPause is enabled and the Canary deployment exceeds the number of maxRestarts (for instance), then the Canary will be automatically paused again following an unpause action.

This PR persists a manual unpause action by storing it in an annotation, and inhibits the AutoPause feature until the Canary deployment ends and the annotation is removed.

Motivation

Better user experience

Additional Notes

Note that a user cannot unpause to effectively disable AutoPause, and then change their mind and re-enable AutoPause during the ongoing Canary deployment using the plugin. (They can manually edit the unpause annotation as a workaround.) Manual pause/unpause actions can be executed with the plugin at will.

Describe your test plan

  • make build (this also builds the plugin)
  • Start a kind cluster
  • Edit config/manager/manager.yaml and config/manager/kustomization.yaml to use image test/extendeddaemonset:test and image pull policy Never
  • make install
  • make IMG=test/extendeddaemonset:test docker-build
  • kind load docker-image test/extendeddaemonset:test
  • make deploy
  • Then apply the ExtendedDaemonSets in examples/ to start a canary deployment
  • bin/kubectl-eds canary pause foo; verify it is paused (with kubectl-eds get and by checking the annotation)
  • bin/kubectl-eds canary unpause foo; verify it is unpaused
  • When canary deployment finishes, verify the canary-unpaused annotation is cleared
  • Try a test with a canary deployment that is in CLB (can use bash:latest image) and AutoPause is enabled. Make sure that, after the canary is automatically paused, bin/kubectl-eds canary unpause foo allows the canary deployment to finish despite the restarts.
  • Other behaviors to try: canary fail foo, canary pause foo -> canary unpause foo

@celenechang celenechang added the enhancement New feature or request label Jan 20, 2021
@celenechang celenechang added this to the v0.5 milestone Jan 20, 2021
@celenechang celenechang requested a review from a team as a code owner January 20, 2021 04:10
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request contains a valid label.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request contains a valid label.

Copy link
Collaborator

@clamoriniere clamoriniere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thank you for fixing it.
juste a small nit.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request contains a valid label.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request contains a valid label.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request contains a valid label.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request contains a valid label.

@codecov-io
Copy link

codecov-io commented Jan 21, 2021

Codecov Report

Merging #77 (c20d8c0) into master (dd12c2a) will increase coverage by 0.16%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #77      +/-   ##
==========================================
+ Coverage   33.51%   33.67%   +0.16%     
==========================================
  Files          41       41              
  Lines        2987     3005      +18     
==========================================
+ Hits         1001     1012      +11     
- Misses       1899     1906       +7     
  Partials       87       87              
Flag Coverage Δ
unittests 33.67% <75.00%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...llers/extendeddaemonsetreplicaset/strategy/type.go 0.00% <ø> (ø)
controllers/extendeddaemonset/utils.go 86.95% <33.33%> (-8.29%) ⬇️
...ers/extendeddaemonsetreplicaset/strategy/canary.go 78.75% <57.14%> (-1.04%) ⬇️
controllers/extendeddaemonset/controller.go 75.87% <100.00%> (+0.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd12c2a...c20d8c0. Read the comment docs.

// we use the `extendedDaemonsetCopy` instance to have last version. that contains the latest metadata info (resource version)
// Copy the spec part into the extendedDaemonsetCopy.
extendedDaemonsetCopy.Spec = *newDaemonset.Spec.DeepCopy()
extendedDaemonsetCopy.Annotations = newDaemonset.Annotations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clamoriniere this is a key change since your review; this seemed to fix an issue of annotations not being deleted (and thus the state staying in failed, for instance)

@@ -194,7 +196,11 @@ func manageCanaryPodFailures(pods []*v1.Pod, params *Parameters, result *Result,
"MaxRestarts", autoFailMaxRestarts,
"Reason", highRestartReason,
)
} else if !result.IsPaused && autoPauseEnabled {
case result.IsUnpaused:
Copy link
Contributor Author

@celenechang celenechang Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clamoriniere this is also a key change made since your review. afaict we didn't have a way of updating the ERS paused condition to false, (once paused: true we were in a positive cycle) so making it explicit here. i believe for failed it doesn't matter since the ERS should drop out

Comment on lines +572 to +586
keysToDelete := []string{
datadoghqv1alpha1.ExtendedDaemonSetCanaryPausedAnnotationKey,
datadoghqv1alpha1.ExtendedDaemonSetCanaryPausedReasonAnnotationKey,
datadoghqv1alpha1.ExtendedDaemonSetCanaryUnpausedAnnotationKey,
datadoghqv1alpha1.ExtendedDaemonSetCanaryFailedAnnotationKey,
datadoghqv1alpha1.ExtendedDaemonSetCanaryFailedReasonAnnotationKey,
}
var updated bool
for _, key := range keysToDelete {
if _, ok := eds.Annotations[key]; ok {
delete(eds.Annotations, key)
updated = true
}
}
return updated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xornivore will be happy about this change :)

Comment on lines 63 to 67
// Unpaused annotation takes precedence
isUnpaused, found := dsAnnotations[datadoghqv1alpha1.ExtendedDaemonSetCanaryUnpausedAnnotationKey]
if found && isUnpaused == v1alpha1.ValueStringTrue {
return false, ""
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need the IsCanaryDeploymentUnpaused() if we consider unpause annotations here?

or maybe:

Suggested change
// Unpaused annotation takes precedence
isUnpaused, found := dsAnnotations[datadoghqv1alpha1.ExtendedDaemonSetCanaryUnpausedAnnotationKey]
if found && isUnpaused == v1alpha1.ValueStringTrue {
return false, ""
}
// Unpaused annotation takes precedence
if IsCanaryDeploymentUnpaused(dsAnnotations) {
return false, ""
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, indeed we may not. will give it a quick test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed! thanks for catching

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request contains a valid label.

Copy link
Collaborator

@clamoriniere clamoriniere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thank you for this fix

@celenechang celenechang merged commit 2570080 into master Jan 21, 2021
@celenechang celenechang deleted the celene/improve_unpause branch January 21, 2021 18:11
clamoriniere pushed a commit that referenced this pull request Jan 21, 2021
…et (#77)

* persist manual unpause until end of a canary deployment, and then reset

* fix clearCanaryAnnotations, update from feedback
clamoriniere pushed a commit that referenced this pull request Jan 21, 2021
…et (#77)

* persist manual unpause until end of a canary deployment, and then reset

* fix clearCanaryAnnotations, update from feedback
clamoriniere pushed a commit that referenced this pull request Jan 21, 2021
…et (#77)

* persist manual unpause until end of a canary deployment, and then reset

* fix clearCanaryAnnotations, update from feedback
clamoriniere pushed a commit that referenced this pull request Jan 21, 2021
…et (#77)

* persist manual unpause until end of a canary deployment, and then reset

* fix clearCanaryAnnotations, update from feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants