Skip to content

Add model-cli gateway#812

Merged
ericcurtin merged 1 commit intomainfrom
docker-model-gateway
Mar 31, 2026
Merged

Add model-cli gateway#812
ericcurtin merged 1 commit intomainfrom
docker-model-gateway

Conversation

@ericcurtin
Copy link
Copy Markdown
Contributor

OpenAI-compatible LLM gateway, invokable via
'model-cli gateway --config '. Supports Docker Model Runner as a first-class provider alongside Ollama, OpenAI, Anthropic, and 15+ others.

@ericcurtin ericcurtin force-pushed the docker-model-gateway branch from 5369ad7 to cb7f667 Compare March 30, 2026 13:42
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The retry and fallback logic in Router::chat_completion, chat_completion_stream, and embedding is nearly identical; consider extracting a shared helper (e.g. a generic run_with_retries_and_fallbacks) to reduce duplication and keep behavior consistent across request types.
  • In build_app, the master key is inserted into request extensions only to be immediately re-read in auth::auth_middleware; you can simplify the auth middleware by capturing the master_key directly in the closure and passing it to the auth function instead of going through extensions.
  • The streaming branch in chat_completion_handler re-wraps the provider ByteStream into an unbounded MPSC channel and UnboundedReceiverStream; you can simplify and reduce overhead by mapping the ByteStream directly into an axum::body::Body::from_stream without spawning an extra task and channel.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The retry and fallback logic in `Router::chat_completion`, `chat_completion_stream`, and `embedding` is nearly identical; consider extracting a shared helper (e.g. a generic `run_with_retries_and_fallbacks`) to reduce duplication and keep behavior consistent across request types.
- In `build_app`, the master key is inserted into request extensions only to be immediately re-read in `auth::auth_middleware`; you can simplify the auth middleware by capturing the `master_key` directly in the closure and passing it to the auth function instead of going through extensions.
- The streaming branch in `chat_completion_handler` re-wraps the provider `ByteStream` into an unbounded MPSC channel and `UnboundedReceiverStream`; you can simplify and reduce overhead by mapping the `ByteStream` directly into an `axum::body::Body::from_stream` without spawning an extra task and channel.

## Individual Comments

### Comment 1
<location path="model-cli/src/providers/anthropic.rs" line_range="362-363" />
<code_context>
+                    buffer = buffer[event_end + 2..].to_string();
+
+                    let mut event_data = String::new();
+                    for line in event_text.lines() {
+                        if let Some(data) = line.strip_prefix("data: ") {
+                            event_data = data.to_string();
+                        }
</code_context>
<issue_to_address>
**issue:** SSE parsing only keeps the last `data:` line per event, which may drop data for multi-line events.

Per the SSE spec, multiple `data:` lines in one event should be concatenated with `\n`. Here, `event_data` is overwritten on each `data:` line, so any earlier lines in a multi-line event would be lost. Please accumulate all `data:` lines (e.g., into a `Vec<String>` and `join('\n')`) so multi-line events are handled correctly.
</issue_to_address>

### Comment 2
<location path="model-cli/src/router.rs" line_range="130" />
<code_context>
+
+        let dep_count = self.deployments.get(model).map_or(0, |v| v.len());
+
+        for attempt in 0..max_attempts.max(dep_count) {
+            if let Some(idx) = self.next_index(model) {
+                let dep = &self.deployments[model][idx];
</code_context>
<issue_to_address>
**question (bug_risk):** The attempt loop bounds may exceed the configured retry count when multiple deployments exist.

Because the loop uses `max_attempts.max(dep_count)` with `max_attempts = num_retries + 1`, multiple deployments can cause more total attempts than `num_retries` suggests (e.g., 3 deployments, `num_retries = 0` → 3 attempts). If `num_retries` is intended as a hard cap on attempts, consider `min(max_attempts, dep_count)` instead, or clarify in docs that each deployment is tried at least once even if that exceeds `num_retries`.
</issue_to_address>

### Comment 3
<location path="model-cli/src/handlers.rs" line_range="99-103" />
<code_context>
+
+// ── Embeddings ──
+
+pub async fn embeddings_handler(
+    State(state): State<Arc<AppState>>,
+    Json(request): Json<EmbeddingRequest>,
+) -> Result<impl IntoResponse, AppError> {
+    let response = state.router.embedding(&request).await?;
+    Ok(Json(response))
+}
</code_context>
<issue_to_address>
**nitpick:** Embeddings responses don’t include the `x-model-cli-version` header like chat completions do.

Here the JSON is returned directly, so the version header isn’t added. For consistency with non-streaming chat completions and to support client-side introspection, consider wrapping the embeddings result in a `Response` and setting the same `x-model-cli-version` header.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread model-cli/src/providers/anthropic.rs Outdated
Comment thread model-cli/src/router.rs Outdated
Comment thread model-cli/src/handlers.rs
Copy link
Copy Markdown
Contributor

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! WDYT of adding some Makefile rules for it?

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@doringeman
Copy link
Copy Markdown
Contributor

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces model-cli, a Rust-based LLM gateway featuring provider routing, load balancing, and fallback mechanisms. Feedback highlights critical security concerns regarding timing attacks in API key validation and permissive CORS configurations. Other recommendations focus on improving performance and maintainability through optimized dependency features, simplified stream processing, and more idiomatic Axum middleware patterns.

Comment thread model-cli/src/auth.rs Outdated
Comment thread model-cli/src/main.rs Outdated
Comment thread model-cli/Cargo.toml Outdated
Comment thread model-cli/src/handlers.rs Outdated
Comment thread model-cli/src/handlers.rs Outdated
Comment thread model-cli/src/main.rs
Comment on lines +132 to +144
fn build_app(state: Arc<AppState>) -> AxumRouter {
let master_key = state.master_key.clone();

// Middleware that injects the master key into request extensions for auth checking.
let auth_layer = axum::middleware::from_fn(
move |mut request: axum::extract::Request, next: axum::middleware::Next| {
let key = master_key.clone();
async move {
request.extensions_mut().insert(key);
auth::auth_middleware(request, next).await
}
},
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method used to pass the master_key to the authentication middleware is a bit of a workaround. A more idiomatic axum approach is to use middleware::from_fn_with_state, which passes state directly to the middleware function. This would make the auth_middleware function cleaner as it wouldn't need to extract the key from request extensions.

This would require changing the signature of auth_middleware in src/auth.rs to:

pub async fn auth_middleware(
    axum::extract::State(master_key): axum::extract::State<Option<String>>,
    request: axum::extract::Request,
    next: axum::middleware::Next,
) -> Result<Response, AppError>

And then you can use master_key directly instead of extracting it from extensions.

Suggested change
fn build_app(state: Arc<AppState>) -> AxumRouter {
let master_key = state.master_key.clone();
// Middleware that injects the master key into request extensions for auth checking.
let auth_layer = axum::middleware::from_fn(
move |mut request: axum::extract::Request, next: axum::middleware::Next| {
let key = master_key.clone();
async move {
request.extensions_mut().insert(key);
auth::auth_middleware(request, next).await
}
},
);
fn build_app(state: Arc<AppState>) -> AxumRouter {
let auth_layer =
axum::middleware::from_fn_with_state(state.master_key.clone(), auth::auth_middleware);

Comment thread model-cli/src/providers/openai.rs Outdated
@ericcurtin ericcurtin force-pushed the docker-model-gateway branch from cb7f667 to 03779c5 Compare March 31, 2026 10:18
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New security issues found

Comment thread cmd/cli/commands/gateway.go Outdated
@ericcurtin ericcurtin force-pushed the docker-model-gateway branch 2 times, most recently from ab207ad to 92736d4 Compare March 31, 2026 10:37
Comment thread .github/workflows/e2e-test.yml Dismissed
@ericcurtin ericcurtin force-pushed the docker-model-gateway branch 2 times, most recently from ff616b3 to 3f8328d Compare March 31, 2026 10:50
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New security issues found

Comment thread cmd/cli/docs/reference/model_gateway.md Outdated
@ericcurtin ericcurtin force-pushed the docker-model-gateway branch from 3f8328d to bd0003b Compare March 31, 2026 10:54
OpenAI-compatible LLM gateway, invokable via
'model-cli gateway --config <file>'. Supports Docker Model Runner as a
first-class provider alongside Ollama, OpenAI, Anthropic, and 15+ others.

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
@ericcurtin ericcurtin force-pushed the docker-model-gateway branch from bd0003b to 0d9fc40 Compare March 31, 2026 10:59
@ericcurtin ericcurtin merged commit 1304519 into main Mar 31, 2026
16 checks passed
@ericcurtin ericcurtin deleted the docker-model-gateway branch March 31, 2026 11:12
Copy link
Copy Markdown

@jganoff jganoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! I'm excited about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants