Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for successful deploys of StatefulSets with an OnDelete update strategy #926

Merged
merged 3 commits into from
Aug 8, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,14 +155,13 @@ If you want dynamic templates, you may render ERB with `krane render` and then p
- `krane.shopify.io/required-rollout`: Modifies how much of the rollout needs to finish
before the deployment is considered successful.
- _Compatibility_: Deployment
- `full`: The deployment is successful when all pods in the new `replicaSet` are ready.
- `none`: The deployment is successful as soon as the new `replicaSet` is created for the deployment.
- `maxUnavailable`: The deploy is successful when minimum availability is reached in the new `replicaSet`.
In other words, the number of new pods that must be ready is equal to `spec.replicas` - `strategy.RollingUpdate.maxUnavailable`
(converted from percentages by rounding up, if applicable). This option is only valid for deployments
that use the `RollingUpdate` strategy.
- Percent (e.g. 90%): The deploy is successful when the number of new pods that are ready is equal to
`spec.replicas` * Percent.
- `full`: The deployment is successful when all pods in the new `replicaSet` are ready.
- `none`: The deployment is successful as soon as the new `replicaSet` is created for the deployment.
- `maxUnavailable`: The deploy is successful when minimum availability is reached in the new `replicaSet`.
In other words, the number of new pods that must be ready is equal to `spec.replicas` - `strategy.RollingUpdate.maxUnavailable` (converted from percentages by rounding up, if applicable). This option is only valid for deployments that use the `RollingUpdate` strategy.
- Percent (e.g. 90%): The deploy is successful when the number of new pods that are ready is equal to `spec.replicas` * Percent.
- _Compatibility_: StatefulSet
- `full`: The deployment is successful when all pods are ready.
- `krane.shopify.io/predeployed`: Causes a Custom Resource to be deployed in the pre-deploy phase.
- _Compatibility_: Custom Resource Definition
- _Default_: `true`
Expand Down
1 change: 1 addition & 0 deletions lib/krane/kubernetes_resource/pod.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ class Pod < KubernetesResource
)

attr_accessor :stream_logs
attr_reader :definition
Copy link
Contributor Author

@ayana-s ayana-s Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows StatefulSet resource to access each Pod's definition["metadata"]["labels"]["controller-revision-hash"] value, a new addition to how we check for successful deploys. Is there any issue with making this value accessible?


def initialize(namespace:, context:, definition:, logger:,
statsd_tags: nil, parent: nil, deploy_started_at: nil, stream_logs: false)
Expand Down
35 changes: 25 additions & 10 deletions lib/krane/kubernetes_resource/stateful_set.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ class StatefulSet < PodSetBase
TIMEOUT = 10.minutes
ONDELETE = 'OnDelete'
SYNC_DEPENDENCIES = %w(Pod)
REQUIRED_ROLLOUT_TYPES = %w(full).freeze
attr_reader :pods

def sync(cache)
Expand All @@ -19,25 +20,35 @@ def status
end

def deploy_succeeded?
success = observed_generation == current_generation &&
desired_replicas == status_data['readyReplicas'].to_i &&
status_data['currentRevision'] == status_data['updateRevision']
if update_strategy == ONDELETE
# Gem cannot monitor update since it doesn't occur until delete
success = observed_generation == current_generation

@pods.each do |pod|
success &= pod.definition["metadata"]["labels"]["controller-revision-hash"] == status_data['updateRevision']
ayana-s marked this conversation as resolved.
Show resolved Hide resolved
end

if update_strategy == 'RollingUpdate'
success &= status_data['currentRevision'] == status_data['updateRevision']
success &= desired_replicas == status_data['readyReplicas'].to_i
success &= desired_replicas == status_data['currentReplicas'].to_i

elsif update_strategy == ONDELETE
unless @success_assumption_warning_shown
@logger.warn("WARNING: Your StatefulSet's updateStrategy is set to OnDelete, "\
"which means updates will not be applied until its pods are deleted. "\
"Consider switching to rollingUpdate.")
"which means the deployment won't succeed until all pods are updated by deletion.")
@success_assumption_warning_shown = true
end
else
success &= desired_replicas == status_data['currentReplicas'].to_i

if required_rollout == 'full'
ayana-s marked this conversation as resolved.
Show resolved Hide resolved
success &= desired_replicas == status_data['readyReplicas'].to_i
success &= desired_replicas == status_data['updatedReplicas'].to_i
end
end

success
end

def deploy_failed?
return false if update_strategy == ONDELETE
return false if update_strategy == ONDELETE && required_rollout != 'full'
pods.present? && pods.any?(&:deploy_failed?) &&
observed_generation == current_generation
end
Expand Down Expand Up @@ -67,5 +78,9 @@ def parent_of_pod?(pod_data)
pod_data["metadata"]["ownerReferences"].any? { |ref| ref["uid"] == @instance_data["metadata"]["uid"] } &&
@instance_data["status"]["currentRevision"] == pod_data["metadata"]["labels"]["controller-revision-hash"]
end

def required_rollout
krane_annotation_value("required-rollout") || nil
end
end
end
98 changes: 98 additions & 0 deletions test/fixtures/for_unit_tests/stateful_set_pod_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-10-07T16:05:37Z"
generateName: test-ss-
ayana-s marked this conversation as resolved.
Show resolved Hide resolved
labels:
name: test-ss
app: hello-cloud
controller-revision-hash: 2
statefulset.kubernetes.io/pod-name: test-ss-pod
name: test-ss
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: test-ss
uid: c31a9b4e-e6dd-11e9-8f47-e6322f98393a
resourceVersion: "31010"
uid: 4cf14557-e91c-11e9-8f47-e6322f98393a
spec:
containers:
- command:
- tail
- -f
- /dev/null
image: busybox
imagePullPolicy: IfNotPresent
name: app
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-bwg9f
readOnly: true
dnsPolicy: ClusterFirst
nodeName: minikube
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 600
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-bwg9f
secret:
defaultMode: 420
secretName: default-token-bwg9f
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-10-07T16:05:37Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-10-07T16:05:38Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-10-07T16:05:37Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://949e6b37ad1e85dfeca958bb5a54c459305ef3d87e12d03e1ba90e121701b572
image: busybox:latest
imageID: docker-pullable://busybox@sha256:fe301db49df08c384001ed752dff6d52b4305a73a7f608f21528048e8a08b51e
lastState: {}
name: app
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2019-10-07T16:05:38Z"
hostIP: 192.168.64.3
phase: Running
podIP: 172.17.0.4
qosClass: BestEffort #Burstable?
startTime: "2019-10-07T16:05:37Z"
4 changes: 4 additions & 0 deletions test/fixtures/for_unit_tests/stateful_set_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@ kind: StatefulSet
metadata:
name: test-ss
generation: 2
annotations:
krane.shopify.io/required-rollout: full
labels:
app: hello-cloud
name: test-ss
uid: c31a9b4e-e6dd-11e9-8f47-e6322f98393a
spec:
selector:
matchLabels:
Expand All @@ -30,6 +33,7 @@ status:
replicas: 2
readyReplicas: 2
currentReplicas: 2
updatedReplicas: 2
observedGeneration: 2
currentRevision: 2
updateRevision: 2
87 changes: 72 additions & 15 deletions test/unit/krane/kubernetes_resource/stateful_set_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,90 @@
class StatefulSetTest < Krane::TestCase
include ResourceCacheTestHelper

def test_deploy_succeeded_is_true_when_revision_and_replica_counts_match
template = build_ss_template(status: { "observedGeneration": 2 })
ss = build_synced_ss(template: template)
def test_deploy_succeeded_when_revision_matches_for_rollingupdate_strategy
ss_template = build_ss_template(updateStrategy: "RollingUpdate", rollout: nil)
ss = build_synced_ss(ss_template: ss_template)
assert_predicate(ss, :deploy_succeeded?)
end

def test_deploy_failed_ensures_controller_has_observed_deploy
template = build_ss_template(status: { "observedGeneration": 1 })
ss = build_synced_ss(template: template)
def test_deploy_succeeded_when_revision_matches_for_ondelete_strategy_without_annotation
ss_template = build_ss_template(updateStrategy: "OnDelete", rollout: nil)
ss = build_synced_ss(ss_template: ss_template)
assert_predicate(ss, :deploy_succeeded?)
end

def test_deploy_does_not_succeed_when_revision_does_not_match_without_annotation
ss_template = build_ss_template(status: { "updateRevision": 1 }, rollout: nil)
ss = build_synced_ss(ss_template: ss_template)
refute_predicate(ss, :deploy_succeeded?)
end

def test_deploy_succeeded_when_replica_counts_match_for_ondelete_strategy_with_full_annotation
ss_template = build_ss_template(updateStrategy: "OnDelete", rollout: "full")
ss = build_synced_ss(ss_template: ss_template)
assert_predicate(ss, :deploy_succeeded?)
end

def test_deploy_does_not_succeed_when_replica_counts_do_not_match_for_ondelete_strategy_with_full_annotation
ss_template = build_ss_template(status: { "readyReplicas": 1 }, updateStrategy: "OnDelete", rollout: "full")
ss = build_synced_ss(ss_template: ss_template)
refute_predicate(ss, :deploy_succeeded?)
end

def test_deploy_succeeded_when_replica_counts_match_for_rollingupdate_strategy
ss_template = build_ss_template(updateStrategy: "RollingUpdate", rollout: nil)
ss = build_synced_ss(ss_template: ss_template)
assert_predicate(ss, :deploy_succeeded?)
end

def test_deploy_does_not_succeed_when_replica_counts_do_not_match_for_rollingupdate_strategy
ss_template = build_ss_template(status: { "currentReplicas": 1 }, updateStrategy: "RollingUpdate", rollout: nil)
ss = build_synced_ss(ss_template: ss_template)
refute_predicate(ss, :deploy_succeeded?)
end

def test_deploy_fails_when_current_and_observed_generations_do_not_match
ss_template = build_ss_template(status: { "observedGeneration": 1 })
ss = build_synced_ss(ss_template: ss_template)
refute_predicate(ss, :deploy_succeeded?)
end

def test_deploy_failed_not_fooled_by_stale_status
def test_deploy_failed_not_fooled_by_stale_status_for_rollingupdate_strategy
ayana-s marked this conversation as resolved.
Show resolved Hide resolved
status = {
"observedGeneration": 1,
"readyReplicas": 0,
}
ss_template = build_ss_template(status: status, updateStrategy: "RollingUpdate")
ss = build_synced_ss(ss_template: ss_template)
ss.stubs(:pods).returns([stub(deploy_failed?: true)])
refute_predicate(ss, :deploy_failed?)
end

def test_deploy_failed_not_fooled_by_stale_status_for_ondelete_strategy
status = {
"observedGeneration": 1,
"readyReplicas": 0,
}
template = build_ss_template(status: status)
ss = build_synced_ss(template: template)
ss_template = build_ss_template(status: status, updateStrategy: "OnDelete")
ss = build_synced_ss(ss_template: ss_template)
ss.stubs(:pods).returns([stub(deploy_failed?: true)])
refute_predicate(ss, :deploy_failed?)
end

private

def build_ss_template(status: {})
ss_fixture.dup.deep_merge("status" => status)
def build_ss_template(status: {}, updateStrategy: "RollingUpdate", rollout: "full")
ss_fixture.dup.deep_merge(
"status" => status,
"spec" => {"updateStrategy" => {"type" => updateStrategy}},
"metadata" => {"annotations" => {"krane.shopify.io/rollout" => rollout}}
)
end

def build_synced_ss(template:)
ss = Krane::StatefulSet.new(namespace: "test", context: "nope", logger: logger, definition: template)
stub_kind_get("StatefulSet", items: [template])
stub_kind_get("Pod", items: [])
def build_synced_ss(ss_template:, pod_template: pod_fixture)
ss = Krane::StatefulSet.new(namespace: "test", context: "nope", logger: logger, definition: ss_template)
stub_kind_get("StatefulSet", items: [ss_template])
stub_kind_get("Pod", items: [pod_template])
ss.sync(build_resource_cache)
ss
end
Expand All @@ -46,4 +97,10 @@ def ss_fixture
File.read(File.join(fixture_path('for_unit_tests'), 'stateful_set_test.yml'))
).find { |fixture| fixture["kind"] == "StatefulSet" }
end

def pod_fixture
@pod_fixture ||= YAML.load_stream(
File.read(File.join(fixture_path('for_unit_tests'), 'stateful_set_pod_test.yml'))
).find { |fixture| fixture["kind"] == "Pod" }
end
end