Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout on stateful sets #225

Closed
Teots opened this issue Dec 15, 2017 · 3 comments
Closed

Timeout on stateful sets #225

Teots opened this issue Dec 15, 2017 · 3 comments

Comments

@Teots
Copy link

Teots commented Dec 15, 2017

Hi,

I tried to deploy a stateful set with a high number of replicas and each of them takes a few minutes to start. Eventually, I ran into this error: StatefulSet/<my_service_name>: TIMED OUT (limit: 600s).

Is there any way to override this timeout? I've seen this PR, but the progressDeadlineSeconds is only available for the Deployment type and not for StatefulSet as far as I know.

@KnVerey
Copy link
Contributor

KnVerey commented Dec 15, 2017

Hi @Teots ! This is something we've run into internally at Shopify as well, and we've been discussing possible solutions. It's true that progressDeadlineSeconds / the "progressing" status condition is not currently available on StatefulSet.

What would be the ideal behaviour, in your experience?
A - The gem should keep watching the SS as long as it is progressing, even if the rollout might take hours. This could be done by supporting a progressDeadlineSeconds-like annotation and watching progress in the gem. We'd deprecate that annotation if/when SS get a proper progress condition.
B - The gem should ignore SS that take a long time to roll out, as indicated by a special annotation. This supposes that these slow SS are better monitored by other means, and that it isn't useful for the deploy to hang for minutes/hours watching them.
C - The gem should declare success after a configurable (via annotation) portion of the rollout has completed successfully. This view also supposes some external monitoring of the rollout, since partial success is less of a sure signal of eventual completion for differentiated pods than undifferentiated ones (like deployment pods).
D - The gem should let you change the hard timeout for a specific SS via an annotation. Like A, this could result in extremely long watches.

A would be hairiest to implement/maintain. B or D could be implemented generically for any resource type. C is similar to something we're already working on for deployments, except that SS don't have an existing concept of minimum availability for us to leverage (whereas for deployments, that is already part of the rollingUpdate configuration).

cc @psycotica0-shopify @dturn

@Teots
Copy link
Author

Teots commented Dec 18, 2017

Hi @KnVerey,

thanks for your response. In our current setup option D would suffice completely. Deployment is executed by our CD pipeline and thus a long watch isn't an issue as long as we still could specify a hard timeout. Just to avoid getting stuck forever.

@KnVerey
Copy link
Contributor

KnVerey commented Jan 13, 2018

Timeout customization is now available for all resources as of version 0.15.0. Documentation is at https://github.com/Shopify/kubernetes-deploy#customizing-behaviour-with-annotations.

@KnVerey KnVerey closed this as completed Jan 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants