Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support restart task for StatefulSet with updateStrategy: OnDelete #840

Merged
merged 7 commits into from
Feb 2, 2022

Conversation

timothysmith0609
Copy link
Contributor

Builds on #836 by handling the special case of a StatefulSet having updatStrategy: OnDelete. In this case, we need to find all pods with an owner reference to the targeted StatefulSet: this is accomplished using the UID of the StatefulSet.

More info on owner references can be found here.

@timothysmith0609 timothysmith0609 requested a review from a team as a code owner November 5, 2021 16:39
@timothysmith0609 timothysmith0609 changed the title support restart task for StatefulSet with updateStrategy: OnDelete Support restart task for StatefulSet with updateStrategy: OnDelete Nov 5, 2021
Copy link
Contributor

@peiranliushop peiranliushop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to bump krane version as as well.

def delete_statefulset_pods(record)
pods = kubeclient.get_pods(namespace: record.metadata.namespace)
pods.select! do |pod|
pod.metadata.ownerReferences.find { |ref| ref.uid == record.metadata.uid }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is record.metadata.uid something that identifies individual statefulsets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uniquely identifies the resource throughout the lifetime of the cluster. That is, no 2 resources in a cluster will ever have the same UID

@timothysmith0609
Copy link
Contributor Author

CI is showing a flaky error that I want to get to the bottom of before merging this. It appears it can treat the restart as successful even if multiple replicas of the statefulset have not yet reached the desired level

@kgalieva
Copy link
Member

Hey @timothysmith0609! Any updates on this PR?

@timothysmith0609 timothysmith0609 force-pushed the support-ondelete-statefulset-restart branch from abd2e52 to 2719707 Compare January 26, 2022 14:46
@timothysmith0609 timothysmith0609 requested a review from a team as a code owner January 26, 2022 14:46
@timothysmith0609 timothysmith0609 requested review from JamesOwenHall and removed request for a team January 26, 2022 14:46
@timothysmith0609
Copy link
Contributor Author

I completely forgot this PR existed. I just rebased on latest master and will see what the state of that CI flakiness is

@timothysmith0609
Copy link
Contributor Author

This is flaky due to transparently returning true for StatefulSet#deploy_succeeded?. Per c8cea8d, it seems reasonable to check the normal success criteria (that used for rollingUpdate strategy) even when deploying a stateful set with onDelete. We would still like to know if the underlying resources are healthy and that there are enough of them running, no?

@timothysmith0609
Copy link
Contributor Author

@peiranliushop are you ok with the change I made in c8cea8d?

@johnjmartin
Copy link

We would still like to know if the underlying resources are healthy and that there are enough of them running, no?

Definitely 👍. We expect pods to come up healthy shortly after they're deleted in this particular deploy strategy

@timothysmith0609
Copy link
Contributor Author

We expect pods to come up healthy shortly after they're deleted in this particular deploy strategy

This is a little nuanced, actually. When deploying, OnDelete means the underlying pods will not be deleted (this must be done manually by the user). In contrast, when we want to restart the pods (the feature being implemented, here), we need to wait for the underlying pods to restart, even if OnDelete is the rollout strategy.

@johnjmartin
Copy link

When deploying, OnDelete means the underlying pods will not be deleted (this must be done manually by the user). In contrast, when we want to restart the pods (the feature being implemented, here), we need to wait for the underlying pods to restart, even if OnDelete is the rollout strategy.

Ah yes. In our case, the way we plan to use the feature is to run a production-platform-next apply (which with OnDelete will not trigger a restart of our pods), then follow that with a production-platform-next restart. This way we can have the affect of restarting all of our StatefulSet pods in one smooth go on each deploy.

@timothysmith0609
Copy link
Contributor Author

🤦 I forgot the pods would be named the same even after being deleted. I'll fix this tomorrow and get the tests passing before merging

@JamesOwenHall
Copy link
Contributor

I forgot the pods would be named the same even after being deleted.

Ah true. You can probably just swap name for UID; that ought to fix it.

@timothysmith0609 timothysmith0609 merged commit 5749728 into master Feb 2, 2022
@timothysmith0609 timothysmith0609 deleted the support-ondelete-statefulset-restart branch February 2, 2022 15:25
@JamesHageman JamesHageman mentioned this pull request Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants