-
Notifications
You must be signed in to change notification settings - Fork 13
simplify container start logic, taking advantage of idempotency #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify container start logic, taking advantage of idempotency #99
Conversation
The start endpoint should be idempotent when trying to start a container that is already started (see [Daemon.containerStart][1], so we can take advantage of that. Once we either created the container, or received a "conflict" (name already in use), we can try starting the container. As there's a time window between a name being reserved and the container to exist (which may produce a 404 during that time window), we ignore "not found" errors, and retry the start. [1]: https://github.com/moby/moby/blob/5318877858c9fc94b5cccc3cf1a75d4ec46951b8/daemon/start.go#L76-L93 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
// Unfortunately the Docker API's /containers/{id}/wait API (and the | ||
// corresponding Client.ContainerWait method) don't allow waiting for | ||
// container startup, so instead we'll take a polling approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually forgot that this was not supported, and wondered "why?", because this could definitely be useful (probably?). I expected it would be a "quick fix" (just add additional conditions, or perhaps even to just accept "any" container state), but it looks like the conditions that are supported each have a dedicated channel for "waiters" https://github.com/moby/moby/blob/89aa33001e635c20707f6d4ed046590c22a24c4d/container/state.go#L193-L200
// Removal wakes up both removeOnlyWaiters and stopWaiters
// Container could be removed while still in "created" state
// in which case it is never actually stopped
if condition == container.WaitConditionRemoved {
s.removeOnlyWaiters = append(s.removeOnlyWaiters, waitC)
} else {
s.stopWaiters = append(s.stopWaiters, waitC)
}
I don't think that should rule out the option to add a "Wait running" condition though, so perhaps it's worth opening a ticket in moby/moby.
Perhaps rewriting the logic to support "any" state (which could be a "map" for channels per state?) could even be an option after that, but I'd not be terribly upset if we'd add one more property there 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think it's fine to remove the container status check as we're already using waitForStandaloneRunnerAfterInstall
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, I'm having some trouble with it getting it working. I have a branch here where I've combined #98 and this PR: main...container-start-cleanup
I had to add a small fixup and clean up some rebase issues, but the main problem that I'm having is that I'm receiving an EOF
from ContainerStart
:
error during connect: Post "http://<omitted>.sock/v1.48/containers/docker-model-runner/start": EOF
I'll play with it more on Monday, but let me know if that sparks any thoughts.
I've added another commit onto the tip of main...container-start-cleanup to ignore In any case if that branch looks okay to you, I'll replace #98 with it and close this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yes, feel free to push to your PR branch - merging this PR should merge into your branch as well (it was opened against that branch), so merging it would make this commit show up there. This one is interesting though; if you're able to reproduce, might be worth capturing daemon logs to see if there's a bug somewhere, as that looks ... unexpected?
|
Closing in favor of #107, which incorporates these commits. I couldn't figure out the source of the |
The start endpoint should be idempotent when trying to start a container that is already started (see Daemon.containerStart, so we can take advantage of that.
Once we either created the container, or received a "conflict" (name already in use), we can try starting the container. As there's a time window between a name being reserved and the container to exist (which may produce a 404 during that time window), we ignore "not found" errors, and retry the start.