Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do container restart on exit #4188

Merged
merged 1 commit into from
May 30, 2024
Merged

Do container restart on exit #4188

merged 1 commit into from
May 30, 2024

Conversation

sparrc
Copy link
Contributor

@sparrc sparrc commented May 22, 2024

Summary

In the function that handles container change events, we will evaluate if an exiting container has a restart policy defined.

If the container does have a restart policy, we will evaluate if we should restart the container using the RestartTracker.ShouldRestart function from the common library.

If we do do a container restart, we record it using the RestartTracker.RecordRestart function.

If the restart fails, we proceed with the regular container stop workflow.

Testing

New tests cover the changes: yes

Description for the changelog

NA

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sparrc sparrc requested a review from a team as a code owner May 22, 2024 21:01
@sparrc sparrc force-pushed the crp-do-restart branch 29 times, most recently from f030d6c to aabf834 Compare May 28, 2024 18:52
@sparrc sparrc force-pushed the crp-do-restart branch 2 times, most recently from 5ebc494 to 3dc103c Compare May 28, 2024 19:55
@sparrc sparrc changed the title [wip] Perform container restart if enabled Do container restart on exit May 28, 2024
@sparrc sparrc force-pushed the crp-do-restart branch 6 times, most recently from d5e547d to 49be456 Compare May 28, 2024 21:20
field.TaskID: mtask.GetID(),
field.Container: container.Name,
field.RuntimeID: runtimeID,
"changeEventStatus": event.Status.String(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rule-of-thumb for when logged fields get a constant in the field package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular rules here, I guess just whenever there is a field being used across multiple files/packages we should probably add it to the constants. In this case I don't think this field or the ones below will ever be used outside this particular function so I didn't bother adding constants.

exitCode := event.DockerContainerMetadata.ExitCode
shouldRestart, reason := container.RestartTracker.ShouldRestart(exitCode, container.GetStartedAt(), container.GetDesiredStatus())
if shouldRestart {
defer container.RestartTracker.RecordRestart()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what happens when a container restart fails? Since LastRestartAt is updated even though the container didn't restart, is the next restart attempt delayed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "StartContainer" call itself fails, then the container will never be attempted to be restarted again. The restart policy is only triggered on containers that have moved past the "StartContainer" phase into RUNNING. The container also needs to be in the RUNNING state for a minimum time for it to be eligible to be restarted (this is the RestartAttemptPeriod field)

shouldRestart, reason := container.RestartTracker.ShouldRestart(exitCode, container.GetStartedAt(), container.GetDesiredStatus())
if shouldRestart {
defer container.RestartTracker.RecordRestart()
logger.Info("Restarting container", eventLogFields,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we mean to not include this current restart attempt in the restartCount or lastRestartAt? It feels a bit odd to have a log that reads "Restarting container, restartCount=0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was intended, if you log before restarting then you havent restarted yet, and then there is another log line after the restart happens below.

I should maybe change it to increment before logging the "Restarted" log line below though 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the actual container hasn't been restarted yet, but we've begun another "restart attempt" is probably how I feel
Maybe we can rename to previousRestartCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we will log after doing the restart no matter what, I will actually just delete this log line as it's a bit redundant anyways

tinnywang
tinnywang previously approved these changes May 29, 2024
@sparrc sparrc merged commit 650b19b into aws:feature/crp May 30, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants