-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do container restart on exit #4188
Conversation
f030d6c
to
aabf834
Compare
5ebc494
to
3dc103c
Compare
d5e547d
to
49be456
Compare
field.TaskID: mtask.GetID(), | ||
field.Container: container.Name, | ||
field.RuntimeID: runtimeID, | ||
"changeEventStatus": event.Status.String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a rule-of-thumb for when logged fields get a constant in the field
package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular rules here, I guess just whenever there is a field being used across multiple files/packages we should probably add it to the constants. In this case I don't think this field or the ones below will ever be used outside this particular function so I didn't bother adding constants.
agent/engine/task_manager.go
Outdated
exitCode := event.DockerContainerMetadata.ExitCode | ||
shouldRestart, reason := container.RestartTracker.ShouldRestart(exitCode, container.GetStartedAt(), container.GetDesiredStatus()) | ||
if shouldRestart { | ||
defer container.RestartTracker.RecordRestart() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what happens when a container restart fails? Since LastRestartAt
is updated even though the container didn't restart, is the next restart attempt delayed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the "StartContainer" call itself fails, then the container will never be attempted to be restarted again. The restart policy is only triggered on containers that have moved past the "StartContainer" phase into RUNNING. The container also needs to be in the RUNNING state for a minimum time for it to be eligible to be restarted (this is the RestartAttemptPeriod field)
agent/engine/task_manager.go
Outdated
shouldRestart, reason := container.RestartTracker.ShouldRestart(exitCode, container.GetStartedAt(), container.GetDesiredStatus()) | ||
if shouldRestart { | ||
defer container.RestartTracker.RecordRestart() | ||
logger.Info("Restarting container", eventLogFields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we mean to not include this current restart attempt in the restartCount
or lastRestartAt
? It feels a bit odd to have a log that reads "Restarting container, restartCount=0"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was intended, if you log before restarting then you havent restarted yet, and then there is another log line after the restart happens below.
I should maybe change it to increment before logging the "Restarted" log line below though 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the actual container hasn't been restarted yet, but we've begun another "restart attempt" is probably how I feel
Maybe we can rename to previousRestartCount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we will log after doing the restart no matter what, I will actually just delete this log line as it's a bit redundant anyways
Summary
In the function that handles container change events, we will evaluate if an exiting container has a restart policy defined.
If the container does have a restart policy, we will evaluate if we should restart the container using the
RestartTracker.ShouldRestart
function from the common library.If we do do a container restart, we record it using the
RestartTracker.RecordRestart
function.If the restart fails, we proceed with the regular container stop workflow.
Testing
New tests cover the changes: yes
Description for the changelog
NA
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.