Skip to content

Configure inference backend via compose up #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2025

Conversation

doringeman
Copy link
Collaborator

Added 3 new options: context-size, runtime-flags, and backend (which defaults to llama.cpp).
Also bumped docker/model-runner to docker/model-runner@6cf3f98.

This PR is for docker/model-runner#76.
You can test this by running this PR ⬆️ in a terminal with:

$ MODEL_RUNNER_PORT=8080 make run

And in a second terminal, build and test this PR with the following compose file:

services:
  model1:
    provider:
      type: model
      options:
        model: ai/smollm2
        context-size: 8192
        runtime-flags: "--no-prefill-assistant"
  model2:
    provider:
      type: model
      options:
        model: ai/llama3.2
        context-size: 1024
$ make install
$ MODEL_RUNNER_HOST=http://localhost:8080 docker compose --progress plain up
model1  Creating
model2  Creating
model2  Initializing model runner...
model2  Setting context size to 1024
model1  Initializing model runner...
model1  Setting context size to 8192
model1  Setting raw runtime flags to --no-prefill-assistant
model2  Successfully configured backend for model ai/llama3.2
model2  Created
model1  Successfully configured backend for model ai/smollm2
model1  Created

@doringeman doringeman requested a review from p1-0tr June 11, 2025 14:56
Copy link
Contributor

@xenoscopic xenoscopic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do you know what's driving the large vendor directory change? Is it avoidable?

@@ -542,6 +543,27 @@ func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) {
return unloadResp, nil
}

func (c *Client) ConfigureBackend(request scheduling.ConfigureRequest) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, just change paths and status if we decide to apply comments in docker/model-runner#76.

@doringeman
Copy link
Collaborator Author

Do you know what's driving the large vendor directory change? Is it avoidable?

It's due to the model-runner bump which brought in containerd (v2).

@doringeman doringeman force-pushed the compose-llama-args branch 3 times, most recently from 2d1eb45 to 6195350 Compare June 12, 2025 13:23
@doringeman doringeman force-pushed the compose-llama-args branch 2 times, most recently from 652c201 to 2bbda99 Compare June 13, 2025 08:33
Added 3 new options: context-size, runtime-flags, and backend (which defaults to llama.cpp). Also bumped docker/model-runner to docker/model-runner@6cf3f98.

Signed-off-by: Dorin Geman <dorin.geman@docker.com>
@doringeman doringeman force-pushed the compose-llama-args branch from 2bbda99 to 5cfc9d3 Compare June 13, 2025 08:38
@doringeman doringeman marked this pull request as ready for review June 13, 2025 08:39
@doringeman doringeman merged commit f7ab896 into docker:main Jun 13, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants