Skip to content

simplify container start logic, taking advantage of idempotency #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

thaJeztah
Copy link
Member

The start endpoint should be idempotent when trying to start a container that is already started (see Daemon.containerStart, so we can take advantage of that.

Once we either created the container, or received a "conflict" (name already in use), we can try starting the container. As there's a time window between a name being reserved and the container to exist (which may produce a 404 during that time window), we ignore "not found" errors, and retry the start.

The start endpoint should be idempotent when trying to start a container
that is already started (see [Daemon.containerStart][1], so we can take
advantage of that.

Once we either created the container, or received a "conflict" (name
already in use), we can try starting the container. As there's a time
window between a name being reserved and the container to exist (which
may produce a 404 during that time window), we ignore "not found" errors,
and retry the start.

[1]: https://github.com/moby/moby/blob/5318877858c9fc94b5cccc3cf1a75d4ec46951b8/daemon/start.go#L76-L93

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Comment on lines -74 to -76
// Unfortunately the Docker API's /containers/{id}/wait API (and the
// corresponding Client.ContainerWait method) don't allow waiting for
// container startup, so instead we'll take a polling approach.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually forgot that this was not supported, and wondered "why?", because this could definitely be useful (probably?). I expected it would be a "quick fix" (just add additional conditions, or perhaps even to just accept "any" container state), but it looks like the conditions that are supported each have a dedicated channel for "waiters" https://github.com/moby/moby/blob/89aa33001e635c20707f6d4ed046590c22a24c4d/container/state.go#L193-L200

	// Removal wakes up both removeOnlyWaiters and stopWaiters
	// Container could be removed while still in "created" state
	// in which case it is never actually stopped
	if condition == container.WaitConditionRemoved {
		s.removeOnlyWaiters = append(s.removeOnlyWaiters, waitC)
	} else {
		s.stopWaiters = append(s.stopWaiters, waitC)
	}

I don't think that should rule out the option to add a "Wait running" condition though, so perhaps it's worth opening a ticket in moby/moby.

Perhaps rewriting the logic to support "any" state (which could be a "map" for channels per state?) could even be an option after that, but I'd not be terribly upset if we'd add one more property there 😅

Copy link
Collaborator

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think it's fine to remove the container status check as we're already using waitForStandaloneRunnerAfterInstall.

Copy link
Contributor

@xenoscopic xenoscopic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, I'm having some trouble with it getting it working. I have a branch here where I've combined #98 and this PR: main...container-start-cleanup

I had to add a small fixup and clean up some rebase issues, but the main problem that I'm having is that I'm receiving an EOF from ContainerStart:

error during connect: Post "http://<omitted>.sock/v1.48/containers/docker-model-runner/start": EOF

I'll play with it more on Monday, but let me know if that sparks any thoughts.

@xenoscopic
Copy link
Contributor

I've added another commit onto the tip of main...container-start-cleanup to ignore io.EOFs from ContainerStart and that seems to work fine. I don't see any obvious codepaths in the Moby server that could lead to an io.EOF being returned from the call, maybe it's something specific to the Cloud forwarding.

In any case if that branch looks okay to you, I'll replace #98 with it and close this PR.

Copy link
Member

@p1-0tr p1-0tr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member Author

I've added another commit onto the tip of main...container-start-cleanup to ignore io.EOFs from ContainerStart and that seems to work fine. I don't see any obvious codepaths in the Moby server that could lead to an io.EOF being returned from the call, maybe it's something specific to the Cloud forwarding.

In any case if that branch looks okay to you, I'll replace #98 with it and close this PR.

Yes, feel free to push to your PR branch - merging this PR should merge into your branch as well (it was opened against that branch), so merging it would make this commit show up there.

This one is interesting though; if you're able to reproduce, might be worth capturing daemon logs to see if there's a bug somewhere, as that looks ... unexpected?


		// For some reason, this error case can also manifest as an EOF on the
		// request (I'm not sure where this arises in the Moby server), so we'll
		// let that pass silently too.
		if !(errdefs.IsNotFound(err) || errors.Is(err, io.EOF)) {

@xenoscopic
Copy link
Contributor

Closing in favor of #107, which incorporates these commits.

I couldn't figure out the source of the io.EOF after a significant amount of digging, but I assume it's probably just a deconfliction / idempotentcy measure taking effect. It doesn't seem to affect the functionality.

@xenoscopic xenoscopic closed this Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants