Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for successful deploys of StatefulSets with an OnDelete update strategy #926

Merged
merged 3 commits into from
Aug 8, 2023

Conversation

ayana-s
Copy link
Contributor

@ayana-s ayana-s commented Jun 2, 2023

What are you trying to accomplish with this PR?

🏗️ ISSUE: #912

When checking for a successful deploy, if a StatefulSet's updateStrategy is set to OnDelete (default is RollingUpdate), krane does not check if the rollout is successful and instead prints a warning message ("can't monitor the update since it doesn't occur until deletion") and moves on.

We want to enable krane to accurately detect successful deploys for StatefulSets with an updateStrategy of OnDelete.

How is this accomplished?

Update krane's deploy_succeeded? check to include StatefulSets with updateStrategy: OnDelete.

  • Currently, the method works for updateStrategy: RollingUpdate and checks that the following holds true:
observed_generation == current_generation
replicas == readyReplicas == currentReplicas
currentRevision == updateRevision
  • Because of a bug in Kubernetes, onDelete's currentRevision value is not updated reliably. Also, currentReplicas relies on currentRevision, so neither of these values can be used.
  • So, our added check will use updatedReplicas instead. If updatedReplicas doesn't match replicas, then pods have not been updated to the latest revision. readyReplicas can be used to check for their actual health.

As confirmed over Slack, we're going with Katrina's suggestion of putting this behaviour behind an annotation to allow control on a per-resource basis using a style Krane users are already accustomed to. We're extending the krane.shopify.io/required-rollout annotation to be compatible with StatefulSet resources.

Currently, for Deployments with annotations, the state of the replicaSet is checked to confirm a successful deploy. Because StatefulSet resources don't use replicaSets to create pods, we suggest only using full as an option for this annotation, with this definition:

  • full: deployment is successful when all pods are ready

The logic we propose in this PR is as follows:

# For every successful deploy, check: 
observed_generation == current_generation
status_data['updateRevision'] == pod_data["metadata"]["labels"]["controller-revision-hash"]

# For updateStrategy: RollingUpdate, the rest of the checks remain the same: 
replicas == readyReplicas == currentReplicas && currentRevision == updateRevision

# For updateStrategy: OnDelete && required-rollout: full, check:
replicas == readyReplicas == updatedReplicas

# For updateStrategy: OnDelete with no specified required-rollout value, no additional checks are conducted. The code runs as it did previously.

What should reviewers focus on?

  1. Is this tested thoroughly enough?

Testing

  • Added unit tests. Existing CI & integration tests pass. (to be confirmed)
  • Checked our clusters to ensure updateRevision (using k -n [namespace] --context [context] get sts [namespace] -o yaml | tail -n 10) and controller-revision-hash (using k -n [namespace] --context [context] get pod [pod name] -o yaml | grep -i revision) match after an update to our StatefulSet config (before the pod is marked as ready)
  • Manually deployed a test runtime, nothing broken

@ayana-s ayana-s self-assigned this Jun 2, 2023
@ayana-s ayana-s force-pushed the enable-ondelete-update-strategy branch 5 times, most recently from 0ba7a26 to 7920e4d Compare June 6, 2023 19:13
@ayana-s ayana-s force-pushed the enable-ondelete-update-strategy branch from 03a7722 to c93e811 Compare June 14, 2023 19:08
@@ -10,6 +10,7 @@ class Pod < KubernetesResource
)

attr_accessor :stream_logs
attr_reader :definition
Copy link
Contributor Author

@ayana-s ayana-s Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows StatefulSet resource to access each Pod's definition["metadata"]["labels"]["controller-revision-hash"] value, a new addition to how we check for successful deploys. Is there any issue with making this value accessible?

@ayana-s ayana-s force-pushed the enable-ondelete-update-strategy branch 2 times, most recently from d6dd4f9 to 1a325e6 Compare June 15, 2023 13:33
@ayana-s
Copy link
Contributor Author

ayana-s commented Jun 15, 2023

One of krane's integration tests (KraneDeployTest#test_pvc) fails both on my branch and on master - is this something that would block our deploy?

EDIT: Might just be a flakey test - after rerunning a couple times, this no longer fails on my branch ✅

@ayana-s ayana-s changed the title Enable ondelete update strategy Check for successful deploys of StatefulSets with an OnDelete update strategy Jun 16, 2023
@ayana-s ayana-s marked this pull request as ready for review June 16, 2023 16:07
@ayana-s ayana-s requested a review from a team as a code owner June 16, 2023 16:07
@ayana-s ayana-s requested review from RRethy, timothysmith0609 and null-sleep and removed request for a team June 16, 2023 16:07
lib/krane/kubernetes_resource/stateful_set.rb Show resolved Hide resolved
lib/krane/kubernetes_resource/stateful_set.rb Outdated Show resolved Hide resolved
lib/krane/kubernetes_resource/stateful_set.rb Outdated Show resolved Hide resolved
lib/krane/kubernetes_resource/stateful_set.rb Outdated Show resolved Hide resolved
@null-sleep
Copy link

null-sleep commented Jun 19, 2023

As a configuration the default value full for krane.shopify.io/required-rollout annotation makes sense to me. Is there an example use case for none? Or are we adding to maintain symmetry with existing configs having such an option?

Overall the PR looks good to me. Glad to see some good testing 👏

@ayana-s
Copy link
Contributor Author

ayana-s commented Jun 19, 2023

As a configuration the default value full for krane.shopify.io/required-rollout annotation makes sense to me. Is there an example use case for none? Or are we adding to maintain symmetry with existing configs having such an option?

Good call-out @null-sleep - thinking about this again, I'm going to remove none as a required-rollout option for StatefulSets.

Context for PR reviewers: rollout: none means that the deployment is successful as soon as the new pods are created (instead of when the pods are fully ready, like when rollout: full). I originally included this for symmetry with existing configs, like Dhruv mentioned. However, StatefulSets will wait for each pod to be ready before even creating the next one, so technically, none is not much different from the full option (it just skips the steps to make sure pods are ready and desired & current replicas match).

The reason it made sense for Deployments to have none as an additional option is because the logic there checks for the replicaSet to be created as opposed to the pods themselves - but this doesn't translate as well to StatefulSets.

EDIT: Asked for additional thoughts on Slack and will make this update when we receive a go-ahead for this change.

@Shopify Shopify deleted a comment from spy-v2 bot Jun 19, 2023
@ayana-s ayana-s force-pushed the enable-ondelete-update-strategy branch 2 times, most recently from dc02c33 to 11ef9c2 Compare August 3, 2023 12:27
@ayana-s
Copy link
Contributor Author

ayana-s commented Aug 3, 2023

Based on the above comment, this PR now only supports full (and not none) as a required-rollout option for StatefulSets.

@ayana-s ayana-s force-pushed the enable-ondelete-update-strategy branch from 29e055c to f7dad22 Compare August 3, 2023 13:23
@vvuibert

This comment was marked as resolved.

@ayana-s ayana-s force-pushed the enable-ondelete-update-strategy branch from f7dad22 to 0d9a51e Compare August 4, 2023 20:50
@ayana-s ayana-s merged commit da0abf2 into master Aug 8, 2023
14 checks passed
@shopify-shipit shopify-shipit bot temporarily deployed to rubygems August 8, 2023 17:27 Inactive
@shopify-shipit shopify-shipit bot temporarily deployed to rubygems September 26, 2023 12:56 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants