New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment rollout status not reliable when using status.conditions
#124264
Comments
/sig apps |
I am trying to fix some unit test on my local environement to test a fix for this. This is what I am trying to do: Modified |
And this is how I am doing it:
|
/triage accepted |
I can only reproduce this if the ReplicaSet with the new imaged existed before. If the new ReplicaSet does not exist, the progress is indicated immediately. @sicavz I think #124558 might fix it. Can you check if it helps? @harshap804 FYI, it is not a good idea to mutate the shared cache in the event handlers. |
What happened?
Currently I am working on a project where we implement a Kubernetes operator and we decided that for some flows we will need to wait for some deployments to complete the rollout before advancing to some other steps from our processes.
So, for an old deployment we only update an image in the pod template and then we wait for the updated deployment to finish the rollout.
The way the status of the Deployment is built/implemented is error prone and not reliable if we only take into account the
status.conditions
(as suggested in the documentation)What did you expect to happen?
After updating the spec of a deployment, when the
status.ObservedGeneration
is changed to the latestmetadata.Generation
, thestatus.conditions
should also be updated.How can we reproduce it (as minimally and precisely as possible)?
We found out that if:
status.condition
of typeProgressing
with the reasonProgressDeadlineExceeded
Then, under some circumstances (so, not always; it's a concurrency/timing issue), the first change of the
status.observedGeneration
toward the newmetadata.generation
value, comes with no change(!) in thestatus.conditions
(the old conditions are still there without any single change(!)).According to the Kubernetes documentation (and also the way the Operator SDK checks this rollout status), the interpretation of these would lead to say that the rollout failed, which is not accurate, because the actual rollout will only start a few seconds later...
This is an example from a log describing the issue (please observe that the two blocks are logged at a few milliseconds distance and the
status.conditions
are the same even thestatus.ObservedGeneration
changed):Anything else we need to know?
Please advise on how to reliably interpret the rollout status of a Deployment.
Kubernetes version
Cloud provider
OS version
The text was updated successfully, but these errors were encountered: