-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support timeout override for deployments #414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The STANDARD_TIMEOUT_MESSAGE should be fine. I believe we need to update the readme.
test/unit/kubernetes-deploy/kubernetes_resource/deployment_test.rb
Outdated
Show resolved
Hide resolved
replica_sets: [build_rs_template(status: { "replica" => 1 })] | ||
) | ||
refute deploy.deploy_timed_out?, "Deploy not started shouldn't have timed out" | ||
deploy.deploy_started_at = Time.now.utc - 3.minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't necessarily show that the annotation took precedence over the default and the progress deadline. IMO we should add an expired progress condition, an assertion as to what the default is, and an assertion on each edge of the time that the annotation specifies to prove it is the one causing the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added assertions on each edge and changed pretty_timeout_type
to check for timeout override
KubernetesDeploy::KubernetesResource::TIMEOUT_OVERRIDE_ANNOTATION => "10S", | ||
} | ||
container = deployment['spec']['template']['spec']['containers'].first | ||
container['readinessProbe'] = { "exec" => { "command" => ['- ls'] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this patch? Doesn't this change mean the pod could actually come up successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so, since - ls
will always fail and the container will remain unready. Is there another way to ensure a timeout that I should be using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I misread this as being an array of commands one of which is ls
. The bad-probe template uses ["test", "0", "-eq", "1"]
but it doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fleshed out the tests a little more + a few small changes.
It seems that Deployment#timeout
doesn't behave the way one might expect. If spec.progressDeadlineSeconds
is set, Deployment#timeout
will just return the default KubernetesDeploy timeout (7 minutes). This doesn't seem like the correct behaviour. I would have assumed that Deployment#timeout
would return progressDeadlineSeconds
if it is present on the definition (overridden by timeout_override
as necessary), but that doesn't appear to be the case. In fact, deployment check for progress deadlines seems more hidden than maybe it should be. While I'm making changes to the class, do you feel I should add a timeout
method to the Deploy class so that we surface that progressDeadlineSeconds
does indeed specify a timeout on the resource?
KubernetesDeploy::KubernetesResource::TIMEOUT_OVERRIDE_ANNOTATION => "10S", | ||
} | ||
container = deployment['spec']['template']['spec']['containers'].first | ||
container['readinessProbe'] = { "exec" => { "command" => ['- ls'] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so, since - ls
will always fail and the container will remain unready. Is there another way to ensure a timeout that I should be using?
replica_sets: [build_rs_template(status: { "replica" => 1 })] | ||
) | ||
refute deploy.deploy_timed_out?, "Deploy not started shouldn't have timed out" | ||
deploy.deploy_started_at = Time.now.utc - 3.minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added assertions on each edge and changed pretty_timeout_type
to check for timeout override
c9d037c
to
9410951
Compare
I think this behavior is ok. We probably could have given this a more specific name since this is the wall clock timeout which may or may not be relevant. But returning the PDS value from #timeout doesn't seem any more accurate as does about anything else I can think of... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that Deployment#timeout doesn't behave the way one might expect. If spec.progressDeadlineSeconds is set, Deployment#timeout will just return the default KubernetesDeploy timeout (7 minutes). This doesn't seem like the correct behaviour.
I agree that this is a bit unintuitive, but in I'm not sure having timeout
return the PDS value would make sense either. In every other case, timeout
is a hard deadline; arguably when there's a PDS value, that hard deadline is non-existent/infinite, so using the PDS (which tends to be fairly small) as its value would actually be more misleading.
While I'm making changes to the class, do you feel I should add a timeout method to the Deploy class so that we surface that progressDeadlineSeconds does indeed specify a timeout on the resource?
I agree there is something we could improve here, but it isn't obvious to me what is best, so I'd suggest making a separate PR with a proposal instead of making this one wait on it.
What are you trying to accomplish with this PR?
Closes #315
Required for #229
Support timeout override annotation for deployments
How is this accomplished?
Have the timeout override annotation override considerations of
progressDeadlineSeconds
and the hard timeout.What could go wrong?
Do we want to use the standard timeout message when using the override, or surface the fact that the timeout being used is in fact from the override annotation?
cc @Shopify/cloudx