-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update container start error so it respects MaxSlowStartDuration
#169
Update container start error so it respects MaxSlowStartDuration
#169
Conversation
9bfc4bf
to
c512b49
Compare
Codecov Report
@@ Coverage Diff @@
## main #169 +/- ##
==========================================
+ Coverage 63.05% 63.10% +0.04%
==========================================
Files 41 41
Lines 3094 3098 +4
==========================================
+ Hits 1951 1955 +4
Misses 1023 1023
Partials 120 120
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
c512b49
to
713dfa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jennchenn , seems good.
Left 2 small questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request contains a valid label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request contains a valid label.
What does this PR do?
During canary, if an error occurs during pod creation within the
MaxSlowStartDuration
, the pod is no longer auto-paused. IfMaxSlowStartDuration
is not defined, the behavior is unchanged.Motivation
Many recent deployments have been paused due to a
CreateContainerConfigError
-Error: failed to sync secret cache: timed out waiting for the condition
. Though this error is sometimes rectified a few seconds/minutes after it is first raised, the deployment remains paused, thus slowing down releases, etc. This change is being introduced in hopes of avoiding cases where canary is paused although the issue no longer exists.Additional Notes
Should a default value for
maxSlowStartDuration
be set for our clusters?Describe your test plan
Unit tests and E2E test were included with the change. I also deployed these changes to a few staging clusters and did not see the issue of a paused canary arising in those clusters.