Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support timeout override for deployments #414

Merged
merged 7 commits into from
Jan 24, 2019

Conversation

timothysmith0609
Copy link
Contributor

@timothysmith0609 timothysmith0609 commented Jan 15, 2019

What are you trying to accomplish with this PR?
Closes #315
Required for #229

Support timeout override annotation for deployments

How is this accomplished?
Have the timeout override annotation override considerations of progressDeadlineSeconds and the hard timeout.

What could go wrong?
Do we want to use the standard timeout message when using the override, or surface the fact that the timeout being used is in fact from the override annotation?

cc @Shopify/cloudx

@timothysmith0609 timothysmith0609 changed the title timeout_override for deployment Support timeout override for deployments Jan 15, 2019
Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The STANDARD_TIMEOUT_MESSAGE should be fine. I believe we need to update the readme.

test/integration/kubernetes_deploy_test.rb Outdated Show resolved Hide resolved
@timothysmith0609 timothysmith0609 self-assigned this Jan 17, 2019
replica_sets: [build_rs_template(status: { "replica" => 1 })]
)
refute deploy.deploy_timed_out?, "Deploy not started shouldn't have timed out"
deploy.deploy_started_at = Time.now.utc - 3.minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't necessarily show that the annotation took precedence over the default and the progress deadline. IMO we should add an expired progress condition, an assertion as to what the default is, and an assertion on each edge of the time that the annotation specifies to prove it is the one causing the timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added assertions on each edge and changed pretty_timeout_type to check for timeout override

KubernetesDeploy::KubernetesResource::TIMEOUT_OVERRIDE_ANNOTATION => "10S",
}
container = deployment['spec']['template']['spec']['containers'].first
container['readinessProbe'] = { "exec" => { "command" => ['- ls'] } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this patch? Doesn't this change mean the pod could actually come up successfully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so, since - ls will always fail and the container will remain unready. Is there another way to ensure a timeout that I should be using?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread this as being an array of commands one of which is ls. The bad-probe template uses ["test", "0", "-eq", "1"] but it doesn't really matter.

Copy link
Contributor Author

@timothysmith0609 timothysmith0609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleshed out the tests a little more + a few small changes.

It seems that Deployment#timeout doesn't behave the way one might expect. If spec.progressDeadlineSeconds is set, Deployment#timeout will just return the default KubernetesDeploy timeout (7 minutes). This doesn't seem like the correct behaviour. I would have assumed that Deployment#timeout would return progressDeadlineSeconds if it is present on the definition (overridden by timeout_override as necessary), but that doesn't appear to be the case. In fact, deployment check for progress deadlines seems more hidden than maybe it should be. While I'm making changes to the class, do you feel I should add a timeout method to the Deploy class so that we surface that progressDeadlineSeconds does indeed specify a timeout on the resource?

KubernetesDeploy::KubernetesResource::TIMEOUT_OVERRIDE_ANNOTATION => "10S",
}
container = deployment['spec']['template']['spec']['containers'].first
container['readinessProbe'] = { "exec" => { "command" => ['- ls'] } }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so, since - ls will always fail and the container will remain unready. Is there another way to ensure a timeout that I should be using?

replica_sets: [build_rs_template(status: { "replica" => 1 })]
)
refute deploy.deploy_timed_out?, "Deploy not started shouldn't have timed out"
deploy.deploy_started_at = Time.now.utc - 3.minutes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added assertions on each edge and changed pretty_timeout_type to check for timeout override

@dturn
Copy link
Contributor

dturn commented Jan 23, 2019

It seems that Deployment#timeout doesn't behave the way one might expect. If spec.progressDeadlineSeconds is set, Deployment#timeout will just return the default KubernetesDeploy timeout (7 minutes). This doesn't seem like the correct behaviour. I would have assumed that Deployment#timeout would return progressDeadlineSeconds if it is present on the definition (overridden by timeout_override as necessary), but that doesn't appear to be the case. In fact, deployment check for progress deadlines seems more hidden than maybe it should be. While I'm making changes to the class, do you feel I should add a timeout method to the Deploy class so that we surface that progressDeadlineSeconds does indeed specify a timeout on the resource?

I think this behavior is ok. We probably could have given this a more specific name since this is the wall clock timeout which may or may not be relevant. But returning the PDS value from #timeout doesn't seem any more accurate as does about anything else I can think of...

Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that Deployment#timeout doesn't behave the way one might expect. If spec.progressDeadlineSeconds is set, Deployment#timeout will just return the default KubernetesDeploy timeout (7 minutes). This doesn't seem like the correct behaviour.

I agree that this is a bit unintuitive, but in I'm not sure having timeout return the PDS value would make sense either. In every other case, timeout is a hard deadline; arguably when there's a PDS value, that hard deadline is non-existent/infinite, so using the PDS (which tends to be fairly small) as its value would actually be more misleading.

While I'm making changes to the class, do you feel I should add a timeout method to the Deploy class so that we surface that progressDeadlineSeconds does indeed specify a timeout on the resource?

I agree there is something we could improve here, but it isn't obvious to me what is best, so I'd suggest making a separate PR with a proposal instead of making this one wait on it.

@timothysmith0609 timothysmith0609 merged commit b8c1f3d into master Jan 24, 2019
@timothysmith0609 timothysmith0609 deleted the timeout_override_deploys branch January 24, 2019 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants