Skip to content

fix(scheduling): release lock during runner init to unblock operations#760

Merged
doringeman merged 1 commit into
docker:mainfrom
doringeman:scheduling-loading
Mar 17, 2026
Merged

fix(scheduling): release lock during runner init to unblock operations#760
doringeman merged 1 commit into
docker:mainfrom
doringeman:scheduling-loading

Conversation

@doringeman
Copy link
Copy Markdown
Contributor

On top of #759.
Release the loader lock during long-running runner initialization (up to 5 minutes) so that other operations like docker model ps, model releases, and concurrent loads are not blocked, and expose in-flight loads in the ps output with a "Loading..." status.

With this change you can now run docker model ps while a model is loading and see that it's loading.
Terminal 1:

make -C cmd/cli/ install && MODEL_RUNNER_PORT=8080 make run

Terminal 2:

MODEL_RUNNER_HOST=http://localhost:8080 docker model run gpt-oss hi

Terminal 3:

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model ps
MODEL NAME  BACKEND    MODE        UNTIL
gpt-oss     llama.cpp  completion  Loading...

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In load, you still defer l.unlock() after acquiring the lock but then call l.unlock() explicitly before run(...); with the current guard-based lock this will double-unlock and can deadlock the goroutine—remove the defer and manage lock/unlock explicitly around the different phases instead.
  • When reacquiring the loader lock after run/wait, you use l.lock(context.Background()), which ignores the original call’s cancellation and could hang indefinitely; consider reusing the original ctx (or propagating a derived one) and handling lock acquisition failure so load respects caller cancellation throughout.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `load`, you still `defer l.unlock()` after acquiring the lock but then call `l.unlock()` explicitly before `run(...)`; with the current `guard`-based lock this will double-unlock and can deadlock the goroutine—remove the defer and manage lock/unlock explicitly around the different phases instead.
- When reacquiring the loader lock after `run`/`wait`, you use `l.lock(context.Background())`, which ignores the original call’s cancellation and could hang indefinitely; consider reusing the original `ctx` (or propagating a derived one) and handling lock acquisition failure so load respects caller cancellation throughout.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the loader logic to release the lock during the potentially long-running runner initialization. This is a good improvement to prevent blocking other operations. The implementation introduces a 'loading' state for runners and exposes this via the ps command. My review found one critical issue with how the lock is re-acquired after being released, which could lead to the service hanging on shutdown or potential race conditions.

Comment thread pkg/inference/scheduling/loader.go
@doringeman doringeman force-pushed the scheduling-loading branch 2 times, most recently from 24bdd4a to ec84ed8 Compare March 17, 2026 14:00
@doringeman
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by releasing the loader lock during long-running runner initializations, which unblocks other operations. The changes to expose the "Loading..." status in docker model ps are also well-implemented.

However, I've identified a critical race condition in pkg/inference/scheduling/loader.go. Releasing the lock allows multiple goroutines to attempt to load the same model concurrently, as there's no check for in-progress loads. I've provided detailed comments and suggestions to address this issue by tracking in-progress loads and preventing duplicate initializations. Addressing this is crucial for the stability and efficiency of the loader.

Comment thread pkg/inference/scheduling/loader.go
Comment thread pkg/inference/scheduling/loader.go
Comment thread pkg/inference/scheduling/loader.go
…te in ps

Signed-off-by: Dorin Geman <dorin.geman@docker.com>
@doringeman
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the runner loading logic to release the loader lock during the potentially long-running runner initialization process. This is a significant improvement as it prevents blocking other operations like docker model ps. The changes introduce a new loading state to track in-flight initializations, ensuring that slots are reserved and that duplicate loading attempts for the same model are handled gracefully by waiting for the existing one to complete. The implementation, while complex due to manual lock management, appears to be robust and correct. The new Loading... status in the ps output is a great addition for user experience. Overall, this is a well-executed and important fix for the model runner's scheduling.

@doringeman doringeman merged commit d62da68 into docker:main Mar 17, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants