feat(scheduler): add endpoint to retrieve model configurations #520

doringeman · 2025-12-17T09:06:19Z

Add GET /engines/_configure endpoint that returns active model configurations, similar to Ollama's /api/show endpoint. Accepts optional "model" query parameter to filter configs for a specific model; returns all configs if no model specified.

E.g.,

Run:

DEBUG=1 MODEL_RUNNER_PORT=8080 make run LOCAL_LLAMA=1

Install the CLI:

make -C cmd/cli install

Test:

$ curl http://localhost:8080/engines/_configure
[]

$ curl http://localhost:8080/engines/_configure\?model\=smollm2
[]

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model configure --context-size 2048 smollm2

$ curl http://localhost:8080/engines/_configure
[{"Backend":"llama.cpp","Model":"ai/smollm2:latest","ModelID":"sha256:354bf30d0aa3af413d2aa5ae4f23c66d78980072d1e07a5b0d776e9606a2f0b9","Mode":"completion","Config":{"context-size":2048}}]

$ curl http://localhost:8080/engines/_configure\?model\=smollm2
[{"Backend":"llama.cpp","Model":"ai/smollm2:latest","ModelID":"sha256:354bf30d0aa3af413d2aa5ae4f23c66d78980072d1e07a5b0d776e9606a2f0b9","Mode":"completion","Config":{"context-size":2048}}]

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model configure --context-size 1024 llama3.2

$ curl http://localhost:8080/engines/_configure
[{"Backend":"llama.cpp","Model":"ai/smollm2:latest","ModelID":"sha256:354bf30d0aa3af413d2aa5ae4f23c66d78980072d1e07a5b0d776e9606a2f0b9","Mode":"completion","Config":{"context-size":2048}},{"Backend":"llama.cpp","Model":"ai/llama3.2:latest","ModelID":"sha256:436bb282b41968a83638482999980267ca8d7e8b5574604460efa9efff11cf59","Mode":"completion","Config":{"context-size":1024}}]

$ curl http://localhost:8080/engines/_configure\?model\=llama3.2
[{"Backend":"llama.cpp","Model":"ai/llama3.2:latest","ModelID":"sha256:436bb282b41968a83638482999980267ca8d7e8b5574604460efa9efff11cf59","Mode":"completion","Config":{"context-size":1024}}]

$ docker model config --help
Usage:  docker model configure [OPTIONS] COMMAND

Manage model runtime configurations

Aliases:
  docker model configure, docker model config

Options:
      --context-size int32                      context size (in tokens)
      --gpu-memory-utilization float            fraction of GPU memory to use for the model executor (0.0-1.0) - vLLM only
      --hf_overrides string                     HuggingFace model config overrides (JSON) - vLLM only
      --mode string                             backend operation mode (completion, embedding, reranking)
      --speculative-draft-model string          draft model for speculative decoding
      --speculative-min-acceptance-rate float   minimum acceptance rate for speculative decoding
      --speculative-num-tokens int              number of tokens to predict speculatively
      --think                                   enable reasoning mode for thinking models

Commands:
  show        Show model configurations

Run 'docker model configure COMMAND --help' for more information on a command.

$ docker model config show --help
Usage:  docker model configure show [MODEL]

Show model configurations

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model config show
[]

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model config show smollm2
[]

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model config --context-size 2048 smollm2

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model config show smollm2
[
  {
    "Backend": "llama.cpp",
    "Model": "ai/smollm2:latest",
    "ModelID": "sha256:354bf30d0aa3af413d2aa5ae4f23c66d78980072d1e07a5b0d776e9606a2f0b9",
    "Mode": "completion",
    "Config": {
      "context-size": 2048
    }
  }
]

gemini-code-assist · 2025-12-17T09:06:33Z

Summary of Changes

Hello @doringeman, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new API endpoint, POST /engines/_configs, which allows users to query and retrieve details about currently active model configurations. This functionality is similar to Ollama's /api/show endpoint, providing a standardized way to inspect model settings, with an option to filter results by a specific model. The changes involve adding the HTTP handler, integrating with the scheduler's internal configuration loader, and enhancing the BackendMode enum for correct JSON handling.

Highlights

New API Endpoint: A new POST /engines/_configs endpoint has been added to allow retrieval of active model configurations. This endpoint can return all configurations or filter them by a specific model name provided in the request body.
Model Configuration Retrieval Logic: New internal logic has been implemented to gather and format active model configurations, including details like backend, model name, ID, mode, and specific configuration parameters.
JSON Serialization for BackendMode: The BackendMode enum now includes custom MarshalJSON and UnmarshalJSON implementations, ensuring proper serialization and deserialization to/from JSON strings.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new endpoint, POST /engines/_configs, to retrieve active model configurations, which is a valuable addition for observability. The implementation is clear and follows existing patterns. My review includes a few suggestions to enhance robustness and maintainability. Specifically, I've pointed out a potential issue in JSON unmarshaling that could silently accept invalid data, a minor performance optimization for slice filtering, and a recommendation to add explicit JSON tags to a struct for better long-term maintenance. Overall, the changes are well-implemented.

pkg/inference/backend.go

pkg/inference/scheduling/http_handler.go

pkg/inference/scheduling/loader.go

pkg/inference/scheduling/scheduler.go

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The comment on GetModelConfigs mentions reading the model from the request body, but the implementation uses a model query parameter; updating the comment to match the actual API will avoid confusion for future maintainers.
The custom JSON methods on BackendMode could be made more robust and conventional by using json.Marshal(m.String()) and strconv.Unquote (or json.Unmarshal into a string) instead of manual string concatenation and strings.Trim, which will handle escaping and edge cases correctly.
In getAllRunnerConfigs, entries are silently skipped when GetLocal fails; consider at least logging the error or otherwise surfacing it so that misconfigured or missing models are easier to diagnose when the new endpoint returns fewer configs than expected.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The comment on `GetModelConfigs` mentions reading the model from the request body, but the implementation uses a `model` query parameter; updating the comment to match the actual API will avoid confusion for future maintainers.
- The custom JSON methods on `BackendMode` could be made more robust and conventional by using `json.Marshal(m.String())` and `strconv.Unquote` (or `json.Unmarshal` into a string) instead of manual string concatenation and `strings.Trim`, which will handle escaping and edge cases correctly.
- In `getAllRunnerConfigs`, entries are silently skipped when `GetLocal` fails; consider at least logging the error or otherwise surfacing it so that misconfigured or missing models are easier to diagnose when the new endpoint returns fewer configs than expected.

## Individual Comments

### Comment 1
<location> `pkg/inference/scheduling/http_handler.go:354-356` </location>
<code_context>
 	w.WriteHeader(http.StatusAccepted)
 }

+// GetModelConfigs returns model configurations. If a model is specified in the request body,
+// returns only configs for that model; otherwise returns all configs.
+func (h *HTTPHandler) GetModelConfigs(w http.ResponseWriter, r *http.Request) {
+	model := r.URL.Query().Get("model")
+
</code_context>

<issue_to_address>
**nitpick:** Comment mentions request body while implementation uses a query parameter

The comment refers to the request body, but the code reads `model` from `r.URL.Query().Get("model")`. Please update the comment to match the actual behavior, e.g. `If a model is specified via the "model" query parameter`.
</issue_to_address>

### Comment 2
<location> `pkg/inference/backend.go:51-52` </location>
<code_context>
+}
+
+// UnmarshalJSON implements json.Unmarshaler for BackendMode.
+func (m *BackendMode) UnmarshalJSON(data []byte) error {
+	s := strings.Trim(string(data), "\"")
+	mode, ok := ParseBackendMode(s)
+	if !ok {
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Unmarshal logic should parse JSON properly instead of trimming quotes manually

Using `strings.Trim` on the raw JSON assumes a simple quoted string and will misbehave with escaping, invalid JSON, or non-string values. Prefer delegating parsing to the JSON package by unmarshaling into a temporary string first, then converting:

```go
func (m *BackendMode) UnmarshalJSON(data []byte) error {
    var s string
    if err := json.Unmarshal(data, &s); err != nil {
        return err
    }
    mode, ok := ParseBackendMode(s)
    if !ok {
        return fmt.Errorf("unknown backend mode: %q", s)
    }
    *m = mode
    return nil
}
```

Suggested implementation:

```golang
 // UnmarshalJSON implements json.Unmarshaler for BackendMode.
func (m *BackendMode) UnmarshalJSON(data []byte) error {
	var s string
	if err := json.Unmarshal(data, &s); err != nil {
		return err
	}

	mode, ok := ParseBackendMode(s)
	if !ok {
		return fmt.Errorf("unknown backend mode: %q", s)
	}
	*m = mode
	return nil
}

```

1. In `pkg/inference/backend.go`, ensure the imports include:
   ```go
   import (
       "encoding/json"
       "fmt"
       // ...
   )
   ```
   If `strings` is no longer used anywhere in this file, remove it from the imports to avoid an unused import error.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pkg/inference/scheduling/http_handler.go

pkg/inference/backend.go

Add GET /engines/_configure endpoint that returns active model configurations, similar to Ollama's /api/show endpoint. Accepts optional "model" query parameter to filter configs for a specific model; returns all configs if no model specified. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

Add `docker model configure show [MODEL]` to display active model configurations. Also alias `configure` to `config` for convenience. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

pkg/inference/backend.go Show resolved Hide resolved

pkg/inference/scheduling/http_handler.go Outdated Show resolved Hide resolved

pkg/inference/scheduling/loader.go Outdated Show resolved Hide resolved

doringeman force-pushed the list-configs branch 2 times, most recently from 87897ed to e19011d Compare December 17, 2025 09:35

ilopezluna approved these changes Dec 17, 2025

View reviewed changes

pkg/inference/scheduling/scheduler.go Outdated Show resolved Hide resolved

doringeman force-pushed the list-configs branch from e19011d to b0b8ad2 Compare December 17, 2025 10:49

doringeman requested a review from ilopezluna December 17, 2025 10:49

doringeman marked this pull request as ready for review December 17, 2025 10:50

sourcery-ai bot reviewed Dec 17, 2025

View reviewed changes

pkg/inference/scheduling/http_handler.go Outdated Show resolved Hide resolved

pkg/inference/backend.go Outdated Show resolved Hide resolved

ilopezluna approved these changes Dec 17, 2025

View reviewed changes

doringeman force-pushed the list-configs branch from b0b8ad2 to e0a47ea Compare December 17, 2025 10:59

feat(cli): add configure show subcommand

4830663

Add `docker model configure show [MODEL]` to display active model configurations. Also alias `configure` to `config` for convenience. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman force-pushed the list-configs branch from e0a47ea to 4830663 Compare December 17, 2025 11:00

doringeman merged commit 74d69c8 into docker:main Dec 17, 2025
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(scheduler): add endpoint to retrieve model configurations #520

feat(scheduler): add endpoint to retrieve model configurations #520

doringeman commented Dec 17, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(scheduler): add endpoint to retrieve model configurations #520

feat(scheduler): add endpoint to retrieve model configurations #520

Conversation

doringeman commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

doringeman commented Dec 17, 2025 •

edited

Loading