fix(scheduling): release lock during runner init to unblock operations by doringeman · Pull Request #760 · docker/model-runner

doringeman · 2026-03-17T13:54:41Z

On top of #759.
Release the loader lock during long-running runner initialization (up to 5 minutes) so that other operations like docker model ps, model releases, and concurrent loads are not blocked, and expose in-flight loads in the ps output with a "Loading..." status.

With this change you can now run docker model ps while a model is loading and see that it's loading.
Terminal 1:

make -C cmd/cli/ install && MODEL_RUNNER_PORT=8080 make run

Terminal 2:

MODEL_RUNNER_HOST=http://localhost:8080 docker model run gpt-oss hi

Terminal 3:

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model ps
MODEL NAME  BACKEND    MODE        UNTIL
gpt-oss     llama.cpp  completion  Loading...

sourcery-ai

Hey - I've left some high level feedback:

In load, you still defer l.unlock() after acquiring the lock but then call l.unlock() explicitly before run(...); with the current guard-based lock this will double-unlock and can deadlock the goroutine—remove the defer and manage lock/unlock explicitly around the different phases instead.
When reacquiring the loader lock after run/wait, you use l.lock(context.Background()), which ignores the original call’s cancellation and could hang indefinitely; consider reusing the original ctx (or propagating a derived one) and handling lock acquisition failure so load respects caller cancellation throughout.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `load`, you still `defer l.unlock()` after acquiring the lock but then call `l.unlock()` explicitly before `run(...)`; with the current `guard`-based lock this will double-unlock and can deadlock the goroutine—remove the defer and manage lock/unlock explicitly around the different phases instead.
- When reacquiring the loader lock after `run`/`wait`, you use `l.lock(context.Background())`, which ignores the original call’s cancellation and could hang indefinitely; consider reusing the original `ctx` (or propagating a derived one) and handling lock acquisition failure so load respects caller cancellation throughout.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request refactors the loader logic to release the lock during the potentially long-running runner initialization. This is a good improvement to prevent blocking other operations. The implementation introduces a 'loading' state for runners and exposes this via the ps command. My review found one critical issue with how the lock is re-acquired after being released, which could lead to the service hanging on shutdown or potential race conditions.

doringeman · 2026-03-17T14:01:30Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant improvement by releasing the loader lock during long-running runner initializations, which unblocks other operations. The changes to expose the "Loading..." status in docker model ps are also well-implemented.

However, I've identified a critical race condition in pkg/inference/scheduling/loader.go. Releasing the lock allows multiple goroutines to attempt to load the same model concurrently, as there's no check for in-progress loads. I've provided detailed comments and suggestions to address this issue by tracking in-progress loads and preventing duplicate initializations. Addressing this is crucial for the stability and efficiency of the loader.

…te in ps Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman · 2026-03-17T14:09:55Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the runner loading logic to release the loader lock during the potentially long-running runner initialization process. This is a significant improvement as it prevents blocking other operations like docker model ps. The changes introduce a new loading state to track in-flight initializations, ensuring that slots are reserved and that duplicate loading attempts for the same model are handled gracefully by waiting for the existing one to complete. The implementation, while complex due to manual lock management, appears to be robust and correct. The new Loading... status in the ps output is a great addition for user experience. Overall, this is a well-executed and important fix for the model runner's scheduling.

doringeman mentioned this pull request Mar 17, 2026

fix(scheduling): acquire lock before reading runnerConfigs in load #759

Merged

sourcery-ai Bot reviewed Mar 17, 2026

View reviewed changes

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread pkg/inference/scheduling/loader.go

doringeman force-pushed the scheduling-loading branch 2 times, most recently from 24bdd4a to ec84ed8 Compare March 17, 2026 14:00

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread pkg/inference/scheduling/loader.go

Comment thread pkg/inference/scheduling/loader.go

Comment thread pkg/inference/scheduling/loader.go

fix(scheduling): release lock during runner init and show loading sta…

6e3a208

…te in ps Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman force-pushed the scheduling-loading branch from ec84ed8 to 6e3a208 Compare March 17, 2026 14:09

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

ilopezluna approved these changes Mar 17, 2026

View reviewed changes

doringeman merged commit d62da68 into docker:main Mar 17, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scheduling): release lock during runner init to unblock operations#760

fix(scheduling): release lock during runner init to unblock operations#760
doringeman merged 1 commit into
docker:mainfrom
doringeman:scheduling-loading

doringeman commented Mar 17, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

doringeman commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doringeman commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

doringeman commented Mar 17, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

doringeman commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doringeman commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants