Configure inference backend via compose up #79

doringeman · 2025-06-11T14:56:54Z

Added 3 new options: context-size, runtime-flags, and backend (which defaults to llama.cpp).
Also bumped docker/model-runner to docker/model-runner@6cf3f98.

This PR is for docker/model-runner#76.
You can test this by running this PR ⬆️ in a terminal with:

$ MODEL_RUNNER_PORT=8080 make run

And in a second terminal, build and test this PR with the following compose file:

services:
  model1:
    provider:
      type: model
      options:
        model: ai/smollm2
        context-size: 8192
        runtime-flags: "--no-prefill-assistant"
  model2:
    provider:
      type: model
      options:
        model: ai/llama3.2
        context-size: 1024

$ make install
$ MODEL_RUNNER_HOST=http://localhost:8080 docker compose --progress plain up
model1  Creating
model2  Creating
model2  Initializing model runner...
model2  Setting context size to 1024
model1  Initializing model runner...
model1  Setting context size to 8192
model1  Setting raw runtime flags to --no-prefill-assistant
model2  Successfully configured backend for model ai/llama3.2
model2  Created
model1  Successfully configured backend for model ai/smollm2
model1  Created

xenoscopic

LGTM. Do you know what's driving the large vendor directory change? Is it avoidable?

xenoscopic · 2025-06-11T19:08:11Z

desktop/desktop.go

@@ -542,6 +543,27 @@ func (c *Client) Unload(req UnloadRequest) (UnloadResponse, error) {
 	return unloadResp, nil
 }

+func (c *Client) ConfigureBackend(request scheduling.ConfigureRequest) error {


Looks fine, just change paths and status if we decide to apply comments in docker/model-runner#76.

doringeman · 2025-06-12T09:10:15Z

Do you know what's driving the large vendor directory change? Is it avoidable?

It's due to the model-runner bump which brought in containerd (v2).

Added 3 new options: context-size, runtime-flags, and backend (which defaults to llama.cpp). Also bumped docker/model-runner to docker/model-runner@6cf3f98. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman requested a review from p1-0tr June 11, 2025 14:56

xenoscopic approved these changes Jun 11, 2025

View reviewed changes

doringeman force-pushed the compose-llama-args branch 3 times, most recently from 2d1eb45 to 6195350 Compare June 12, 2025 13:23

p1-0tr approved these changes Jun 12, 2025

View reviewed changes

doringeman force-pushed the compose-llama-args branch 2 times, most recently from 652c201 to 2bbda99 Compare June 13, 2025 08:33

Configure inference backend via compose up

5cfc9d3

Added 3 new options: context-size, runtime-flags, and backend (which defaults to llama.cpp). Also bumped docker/model-runner to docker/model-runner@6cf3f98. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman force-pushed the compose-llama-args branch from 2bbda99 to 5cfc9d3 Compare June 13, 2025 08:38

doringeman marked this pull request as ready for review June 13, 2025 08:39

doringeman merged commit f7ab896 into docker:main Jun 13, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configure inference backend via compose up #79

Configure inference backend via compose up #79

Uh oh!

doringeman commented Jun 11, 2025

Uh oh!

xenoscopic left a comment

Uh oh!

xenoscopic Jun 11, 2025

Uh oh!

doringeman commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Configure inference backend via compose up #79

Configure inference backend via compose up #79

Uh oh!

Conversation

doringeman commented Jun 11, 2025

Uh oh!

xenoscopic left a comment

Choose a reason for hiding this comment

Uh oh!

xenoscopic Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

doringeman commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!