Add model-cli gateway by ericcurtin · Pull Request #812 · docker/model-runner

ericcurtin · 2026-03-30T13:41:45Z

OpenAI-compatible LLM gateway, invokable via
'model-cli gateway --config '. Supports Docker Model Runner as a first-class provider alongside Ollama, OpenAI, Anthropic, and 15+ others.

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

The retry and fallback logic in Router::chat_completion, chat_completion_stream, and embedding is nearly identical; consider extracting a shared helper (e.g. a generic run_with_retries_and_fallbacks) to reduce duplication and keep behavior consistent across request types.
In build_app, the master key is inserted into request extensions only to be immediately re-read in auth::auth_middleware; you can simplify the auth middleware by capturing the master_key directly in the closure and passing it to the auth function instead of going through extensions.
The streaming branch in chat_completion_handler re-wraps the provider ByteStream into an unbounded MPSC channel and UnboundedReceiverStream; you can simplify and reduce overhead by mapping the ByteStream directly into an axum::body::Body::from_stream without spawning an extra task and channel.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The retry and fallback logic in `Router::chat_completion`, `chat_completion_stream`, and `embedding` is nearly identical; consider extracting a shared helper (e.g. a generic `run_with_retries_and_fallbacks`) to reduce duplication and keep behavior consistent across request types.
- In `build_app`, the master key is inserted into request extensions only to be immediately re-read in `auth::auth_middleware`; you can simplify the auth middleware by capturing the `master_key` directly in the closure and passing it to the auth function instead of going through extensions.
- The streaming branch in `chat_completion_handler` re-wraps the provider `ByteStream` into an unbounded MPSC channel and `UnboundedReceiverStream`; you can simplify and reduce overhead by mapping the `ByteStream` directly into an `axum::body::Body::from_stream` without spawning an extra task and channel.

## Individual Comments

### Comment 1
<location path="model-cli/src/providers/anthropic.rs" line_range="362-363" />
<code_context>
+                    buffer = buffer[event_end + 2..].to_string();
+
+                    let mut event_data = String::new();
+                    for line in event_text.lines() {
+                        if let Some(data) = line.strip_prefix("data: ") {
+                            event_data = data.to_string();
+                        }
</code_context>
<issue_to_address>
**issue:** SSE parsing only keeps the last `data:` line per event, which may drop data for multi-line events.

Per the SSE spec, multiple `data:` lines in one event should be concatenated with `\n`. Here, `event_data` is overwritten on each `data:` line, so any earlier lines in a multi-line event would be lost. Please accumulate all `data:` lines (e.g., into a `Vec<String>` and `join('\n')`) so multi-line events are handled correctly.
</issue_to_address>

### Comment 2
<location path="model-cli/src/router.rs" line_range="130" />
<code_context>
+
+        let dep_count = self.deployments.get(model).map_or(0, |v| v.len());
+
+        for attempt in 0..max_attempts.max(dep_count) {
+            if let Some(idx) = self.next_index(model) {
+                let dep = &self.deployments[model][idx];
</code_context>
<issue_to_address>
**question (bug_risk):** The attempt loop bounds may exceed the configured retry count when multiple deployments exist.

Because the loop uses `max_attempts.max(dep_count)` with `max_attempts = num_retries + 1`, multiple deployments can cause more total attempts than `num_retries` suggests (e.g., 3 deployments, `num_retries = 0` → 3 attempts). If `num_retries` is intended as a hard cap on attempts, consider `min(max_attempts, dep_count)` instead, or clarify in docs that each deployment is tried at least once even if that exceeds `num_retries`.
</issue_to_address>

### Comment 3
<location path="model-cli/src/handlers.rs" line_range="99-103" />
<code_context>
+
+// ── Embeddings ──
+
+pub async fn embeddings_handler(
+    State(state): State<Arc<AppState>>,
+    Json(request): Json<EmbeddingRequest>,
+) -> Result<impl IntoResponse, AppError> {
+    let response = state.router.embedding(&request).await?;
+    Ok(Json(response))
+}
</code_context>
<issue_to_address>
**nitpick:** Embeddings responses don’t include the `x-model-cli-version` header like chat completions do.

Here the JSON is returned directly, so the version header isn’t added. For consistency with non-streaming chat completions and to support client-side introspection, consider wrapping the embeddings result in a `Response` and setting the same `x-model-cli-version` header.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

doringeman

LGTM! WDYT of adding some Makefile rules for it?

gemini-code-assist · 2026-03-30T13:45:59Z

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

doringeman · 2026-03-30T13:51:18Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces model-cli, a Rust-based LLM gateway featuring provider routing, load balancing, and fallback mechanisms. Feedback highlights critical security concerns regarding timing attacks in API key validation and permissive CORS configurations. Other recommendations focus on improving performance and maintainability through optimized dependency features, simplified stream processing, and more idiomatic Axum middleware patterns.

gemini-code-assist · 2026-03-30T13:55:27Z

+fn build_app(state: Arc<AppState>) -> AxumRouter {
+    let master_key = state.master_key.clone();
+
+    // Middleware that injects the master key into request extensions for auth checking.
+    let auth_layer = axum::middleware::from_fn(
+        move |mut request: axum::extract::Request, next: axum::middleware::Next| {
+            let key = master_key.clone();
+            async move {
+                request.extensions_mut().insert(key);
+                auth::auth_middleware(request, next).await
+            }
+        },
+    );


The method used to pass the master_key to the authentication middleware is a bit of a workaround. A more idiomatic axum approach is to use middleware::from_fn_with_state, which passes state directly to the middleware function. This would make the auth_middleware function cleaner as it wouldn't need to extract the key from request extensions.

This would require changing the signature of auth_middleware in src/auth.rs to:

pub async fn auth_middleware( axum::extract::State(master_key): axum::extract::State<Option<String>>, request: axum::extract::Request, next: axum::middleware::Next, ) -> Result<Response, AppError>

And then you can use master_key directly instead of extracting it from extensions.

Suggested change

fn build_app(state: Arc<AppState>) -> AxumRouter {

let master_key = state.master_key.clone();

// Middleware that injects the master key into request extensions for auth checking.

let auth_layer = axum::middleware::from_fn(

move |mut request: axum::extract::Request, next: axum::middleware::Next| {

let key = master_key.clone();

async move {

request.extensions_mut().insert(key);

auth::auth_middleware(request, next).await

}

},

);

fn build_app(state: Arc<AppState>) -> AxumRouter {

let auth_layer =

axum::middleware::from_fn_with_state(state.master_key.clone(), auth::auth_middleware);

sourcery-ai

New security issues found

sourcery-ai

New security issues found

OpenAI-compatible LLM gateway, invokable via 'model-cli gateway --config <file>'. Supports Docker Model Runner as a first-class provider alongside Ollama, OpenAI, Anthropic, and 15+ others. Signed-off-by: Eric Curtin <eric.curtin@docker.com>

jganoff

Nice!! I'm excited about this.

ericcurtin force-pushed the docker-model-gateway branch from 5369ad7 to cb7f667 Compare March 30, 2026 13:42

sourcery-ai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread model-cli/src/providers/anthropic.rs Outdated

Comment thread model-cli/src/router.rs Outdated

Comment thread model-cli/src/handlers.rs

doringeman approved these changes Mar 30, 2026

View reviewed changes

gemini-code-assist Bot reviewed Mar 30, 2026

View reviewed changes

ilopezluna approved these changes Mar 31, 2026

View reviewed changes

ericcurtin force-pushed the docker-model-gateway branch from cb7f667 to 03779c5 Compare March 31, 2026 10:18

sourcery-ai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread cmd/cli/commands/gateway.go Outdated

ericcurtin force-pushed the docker-model-gateway branch 2 times, most recently from ab207ad to 92736d4 Compare March 31, 2026 10:37

github-advanced-security AI found potential problems Mar 31, 2026

View reviewed changes

Comment thread .github/workflows/e2e-test.yml Dismissed

ericcurtin force-pushed the docker-model-gateway branch 2 times, most recently from ff616b3 to 3f8328d Compare March 31, 2026 10:50

sourcery-ai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread cmd/cli/docs/reference/model_gateway.md Outdated

ericcurtin force-pushed the docker-model-gateway branch from 3f8328d to bd0003b Compare March 31, 2026 10:54

Add model-cli gateway

0d9fc40

OpenAI-compatible LLM gateway, invokable via 'model-cli gateway --config <file>'. Supports Docker Model Runner as a first-class provider alongside Ollama, OpenAI, Anthropic, and 15+ others. Signed-off-by: Eric Curtin <eric.curtin@docker.com>

ericcurtin force-pushed the docker-model-gateway branch from bd0003b to 0d9fc40 Compare March 31, 2026 10:59

ericcurtin merged commit 1304519 into main Mar 31, 2026
16 checks passed

ericcurtin deleted the docker-model-gateway branch March 31, 2026 11:12

jganoff reviewed Mar 31, 2026

View reviewed changes

Conversation

ericcurtin commented Mar 30, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doringeman left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot commented Mar 30, 2026

Uh oh!

doringeman commented Mar 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jganoff left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jganoff left a comment •

edited

Loading