refactor(server): extract orchestration and lazy body capture#245
refactor(server): extract orchestration and lazy body capture#245SantiagoDePolonia merged 11 commits intomainfrom
Conversation
Move inference and native batch orchestration into gateway services, leaving server handlers as HTTP adapters. Reduce eager request body capture by keeping small-body fast paths and using bounded selector peeking for larger streams.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces a gateway layer that centralizes batch and inference orchestration (model resolution, authorization, fallback, request patching, policy application, usage logging), refactors server to delegate to gateway, adds streaming buffer and provider converter updates, changes request snapshot/peek semantics, and adds many helpers and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant HTTP as HTTPHandler
participant BO as BatchOrchestrator
participant MR as ModelResolver
participant WP as WorkflowPolicyResolver
participant Provider as Provider
participant Store as BatchStore
Client->>HTTP: Create batch request
HTTP->>BO: Create(ctx, req, meta)
BO->>MR: DetermineBatchExecutionSelection(...)
MR-->>BO: Selection(provider/model)
BO->>WP: Match(selector)
WP-->>BO: Policy
BO->>Provider: Prepare/Submit batch
Provider-->>BO: BatchResponse
BO->>Store: Persist StoredBatch
Store-->>BO: OK
BO-->>HTTP: BatchCreateResult
HTTP-->>Client: 202 Created
sequenceDiagram
participant Client as Client
participant HTTP as HTTPHandler
participant IO as InferenceOrchestrator
participant MR as ModelResolver
participant MA as ModelAuthorizer
participant WP as WorkflowPolicyResolver
participant Provider as Provider
participant UL as UsageLogger
Client->>HTTP: Chat completion request
HTTP->>IO: PrepareChatRequest(ctx, req, meta)
IO->>MR: ResolveRequestModel(...)
MR-->>IO: ResolvedSelector
IO->>MA: ValidateModelAccess(ctx, selector)
MA-->>IO: Authorized
IO->>WP: Match(selector)
WP-->>IO: Policy
HTTP->>IO: ExecuteChatCompletion(...)
IO->>Provider: Call provider API
Provider-->>IO: ChatResponse
IO->>UL: LogUsage(...)
UL-->>IO: Written
IO-->>HTTP: ChatCompletionResult
HTTP-->>Client: 200 OK
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 15
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/gateway/batch_helpers.go`:
- Around line 85-103: The metadata-preservation logic incorrectly only protects
gateway keys when stored.Batch.Metadata already contains them and then performs
a redundant maps.Copy; to fix, treat the hardcoded gateway keys ("provider",
"provider_batch_id") as always-owned: initialize stored.Batch.Metadata if nil,
build a set (or map) of the hardcoded keys and, when iterating
upstream.Metadata, unconditionally skip any key in that set so upstream cannot
overwrite gateway-owned keys; remove the maps.Copy(stored.Batch.Metadata,
preservedGatewayMetadata) no-op and then call
SanitizePublicBatchMetadata(stored.Batch.Metadata) as before (refer to
stored.Batch.Metadata, upstream.Metadata, preservedGatewayMetadata, maps.Copy,
and SanitizePublicBatchMetadata).
In `@internal/gateway/batch_selection.go`:
- Around line 27-38: DetermineBatchExecutionSelectionWithAuthorizer currently
dereferences req (e.g., req.InputFileID, req.Metadata, req.Requests,
req.Endpoint) without checking for nil, which can panic; add an early nil-guard
at the start of DetermineBatchExecutionSelectionWithAuthorizer that returns a
typed core.NewInvalidRequestError (or existing invalid-request pattern used
elsewhere) when req == nil, then proceed with existing validation logic so all
invalid inputs produce consistent errors instead of a panic.
In `@internal/gateway/batch_usage.go`:
- Around line 195-205: The function extractTokenTotals currently synthesizes
totalTokens when either hasInput or hasOutput exists, which can under-count;
change the logic so totalTokens is only synthesized when both hasInput and
hasOutput are true (i.e., require hasInput && hasOutput before setting
totalTokens = inputTokens + outputTokens and setting hasTotal = true), and
otherwise leave hasTotal false so callers know the total is not authoritative;
update references in extractTokenTotals to rely on the revised hasTotal flag.
- Around line 286-296: The float→int boundary check in intFromFloat64 uses
float64(^uint(0) >> 1) which loses precision; replace that runtime float-derived
bound with explicit integer constants computed from strconv.IntSize so the exact
integer limits are used before float comparison. Concretely, compute bits :=
strconv.IntSize, set maxIntVal := 1<<(bits-1)-1 and minIntVal := -1 << (bits-1)
(as integer constants), convert those to float64 for the v < min || v > max
comparisons, and keep the NaN/Inf check and final int(v) cast; update imports to
include strconv. Use the function name intFromFloat64 to locate and change the
bounds logic.
In `@internal/gateway/inference_execute.go`:
- Around line 60-65: The code currently dereferences req/resp in StreamResponses
and streamResponses (and other paths around the mentioned ranges) without nil
checks, which can panic; add input validation at the start of StreamResponses
and at entry/return points of streamResponses (and the other sequences around
lines 189-220 and 400-412) to guard against nil req or nil resp values and, when
invalid, return a typed core.GatewayError (using the project's
constructor/format for GatewayError) with an OpenAI-compatible error shape
instead of panicking; ensure all early-return error paths use core.GatewayError
so upstream clients always receive the required error type.
In `@internal/gateway/model_helpers.go`:
- Around line 104-107: The function MarshalRequestBody is just a thin wrapper
over json.Marshal which hides the encoder and prevents deterministic cache-key
bytes; either remove MarshalRequestBody and let callers call json.Marshal
directly (delete the MarshalRequestBody function and update call sites) or
replace its implementation to produce a canonical/deterministic JSON encoding
for cache-key computation (implement canonicalization inside MarshalRequestBody
so callers use a stable encoder rather than json.Marshal). Reference the
MarshalRequestBody symbol and json.Marshal when making the change.
- Around line 61-72: The function resolvedModelPrefix currently reassigns the
named parameter providerName; instead, create a local variable (e.g., name :=
strings.TrimSpace(providerName)) and use that for checks so the original
parameter is not mutated. First if-check the trimmed local name and return it if
non-empty; then check workflow and workflow.Resolution for nil and, if present,
set a second local var to strings.TrimSpace(workflow.Resolution.ProviderName)
and return it if non-empty; otherwise return
strings.TrimSpace(workflow.Resolution.ResolvedSelector.Provider). Keep all
existing nil checks (workflow and workflow.Resolution) and preserve return
behavior.
- Around line 23-51: The clone helpers CloneChatRequestForSelector and
CloneResponsesRequestForSelector currently do a shallow copy which leaves
slices/maps (e.g., ChatRequest.Messages, ChatRequest.Tools and
ResponsesRequest.Metadata) shared with the original; update both functions to
deep-copy those mutable fields after creating cloned := *req: for ChatRequest,
allocate and copy the Messages slice (make + copy) and allocate and copy the
Tools slice of maps (make + copy), and for ResponsesRequest allocate and copy
the Metadata map by creating a new map and copying each key/value; preserve the
existing StreamOptions deep-copy logic and return the cloned pointer.
In `@internal/gateway/usage.go`:
- Around line 41-44: The ShouldEnforceReturningUsageData method currently only
checks o.usageLogger != nil and EnforceReturningUsageData; update it to also
respect the Config().Enabled flag so enforcement only happens when usage logging
is enabled—i.e., ensure ShouldEnforceReturningUsageData returns true only when
o.usageLogger != nil && o.usageLogger.Config().Enabled &&
o.usageLogger.Config().EnforceReturningUsageData so the check aligns with how
logUsage treats disabled logging.
In `@internal/server/model_validation.go`:
- Around line 156-158: peekRequestBodySelectorHints currently returns a parsed
hint as soon as it finds a non-empty model which can miss a provider that
appears later; update the logic so the peek is only treated as authoritative
when provider presence is determined. Modify peekRequestBodySelectorHints (and
its return type) to include a completion/comprehensive flag (e.g.
hints.complete) or ensure it keeps scanning bounded top-level fields until
provider is observed or definitively absent, and then in model_validation.go
change the early-return condition to require hints.parsed && hints.complete (or
hints.provider != "") before returning model/provider; otherwise fall back to
the full-body parse. Ensure symbols referenced: peekRequestBodySelectorHints and
the hints fields (parsed, provider, add complete) are updated consistently.
In `@internal/server/request_selector_peek.go`:
- Around line 92-104: The code sets hints.parsed = true when only the "model"
field was read, causing ApplyBodySelectorHints to treat the body as fully parsed
and lose streaming info; instead, assign hints.model from
readOptionalJSONString(dec) and return without setting hints.parsed (or set a
dedicated flag like hints.modelOnly) so that JSONBodyParsed/StreamRequested are
only set after a full scan; update uses of
requestBodySelectorHints/ApplyBodySelectorHints accordingly to respect the new
distinction.
In `@internal/server/request_snapshot.go`:
- Around line 208-225: storeRequestBodySnapshot currently attaches the full
captured body via snapshot.WithOwnedCapturedBody but does not refresh the
snapshot's semantic hinting (WhiteBoxPrompt), leaving selectors that were
outside the initial peek window stale; after computing updated :=
snapshot.WithOwnedCapturedBody(capturedBody, bodyNotCaptured) you must recompute
the semantic prompt using the actual body (e.g. call requestBodyBytes or
equivalent to obtain the full text) and then attach it to the snapshot (e.g. via
a WithWhiteBoxPrompt/WithUpdatedWhiteBoxPrompt-style method) before storing it
back into the request context so downstream policy/audit/model resolution sees
the refreshed prompt.
In `@internal/server/translated_inference_service.go`:
- Around line 46-62: The lazy, unsynchronized initialization in
translatedInferenceService.inference() causes a data race on s.orchestrator when
called concurrently; fix by constructing the InferenceOrchestrator eagerly in
the translatedInferenceService constructor (or, if you must keep lazy init,
protect initialization with a sync.Once field) so that
NewInferenceOrchestrator(...) is called once during service wiring; update
translatedInferenceService to either set s.orchestrator in its constructor or
add a sync.Once (e.g., orchestratorOnce) and call orchestratorOnce.Do(...)
inside inference() to guarantee a single safe initialization of s.orchestrator.
In `@internal/server/usage_logger_test.go`:
- Around line 5-15: The test logger usageCaptureLogger is appending to entries
in Write without synchronization; add a sync.Mutex field to usageCaptureLogger
and lock/unlock it around mutations (in Write) and any reads of entries (if
tests read entries concurrently), or alternatively document that
usageCaptureLogger is not safe for concurrent use; update the struct
(usageCaptureLogger) and ensure Write uses the mutex to protect entries, and
consider protecting Config and Close if they will access shared state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ab7669f7-4368-423c-89d0-3d5951ed119c
📒 Files selected for processing (37)
internal/core/request_snapshot.gointernal/core/semantic.gointernal/gateway/batch_helpers.gointernal/gateway/batch_orchestrator.gointernal/gateway/batch_orchestrator_test.gointernal/gateway/batch_selection.gointernal/gateway/batch_usage.gointernal/gateway/fallback.gointernal/gateway/inference_execute.gointernal/gateway/inference_orchestrator.gointernal/gateway/inference_orchestrator_test.gointernal/gateway/inference_prepare.gointernal/gateway/interfaces.gointernal/gateway/model_helpers.gointernal/gateway/request_model_resolution.gointernal/gateway/usage.gointernal/gateway/workflow_policy.gointernal/server/batch_request_preparer.gointernal/server/handlers.gointernal/server/internal_chat_completion_executor.gointernal/server/model_access.gointernal/server/model_validation.gointernal/server/model_validation_test.gointernal/server/native_batch_service.gointernal/server/native_batch_support.gointernal/server/request_model_resolution.gointernal/server/request_selector_peek.gointernal/server/request_snapshot.gointernal/server/request_snapshot_test.gointernal/server/request_support.gointernal/server/translated_inference_service.gointernal/server/translated_inference_service_test.gointernal/server/translated_request_patcher.gointernal/server/usage_logger_test.gointernal/server/workflow_helpers.gointernal/server/workflow_policy.gointernal/server/workflow_policy_test.go
💤 Files with no reviewable changes (3)
- internal/server/request_support.go
- internal/server/workflow_policy_test.go
- internal/server/translated_inference_service_test.go
| resp := *upstream | ||
| resp.Provider = providerType | ||
| resp.ProviderBatchID = providerBatchID | ||
| resp.ID = "batch_" + uuid.NewString() | ||
| resp.Object = "batch" | ||
| resp.InputFileID = FirstNonEmpty(req.InputFileID, batchPreparation.OriginalInputFileID, resp.InputFileID) | ||
| if resp.Endpoint == "" { | ||
| resp.Endpoint = core.NormalizeOperationPath(req.Endpoint) | ||
| } | ||
| if resp.CompletionWindow == "" { | ||
| resp.CompletionWindow = req.CompletionWindow | ||
| } | ||
| if resp.CompletionWindow == "" { | ||
| resp.CompletionWindow = "24h" | ||
| } | ||
| if resp.Metadata == nil { | ||
| resp.Metadata = map[string]string{} | ||
| } | ||
| resp.Metadata["provider"] = providerType | ||
| resp.Metadata["provider_batch_id"] = providerBatchID | ||
| resp.Metadata = SanitizePublicBatchMetadata(resp.Metadata) | ||
|
|
||
| if o.batchStore != nil { | ||
| mergedHints := batchrewrite.MergeEndpointHints(preparedHints, hints) | ||
| stored := &batchstore.StoredBatch{ | ||
| Batch: &resp, | ||
| RequestEndpointByCustomID: mergedHints, | ||
| OriginalInputFileID: batchPreparation.OriginalInputFileID, | ||
| RewrittenInputFileID: batchPreparation.RewrittenInputFileID, | ||
| RequestID: strings.TrimSpace(meta.RequestID), | ||
| UserPath: core.UserPathFromContext(ctx), | ||
| WorkflowVersionID: workflowVersionID(workflow), | ||
| UsageEnabled: boolPtr(workflow == nil || workflow.UsageEnabled()), | ||
| } | ||
| if err := o.batchStore.Create(ctx, stored); err != nil { | ||
| o.rollbackPreparedBatch(ctx, providerType, batchPreparation, providerBatchID) | ||
| return nil, core.NewProviderError("batch_store", http.StatusInternalServerError, "failed to persist batch", err) | ||
| } | ||
| if hintedRouter, ok := o.provider.(core.NativeBatchHintRoutableProvider); ok && len(mergedHints) > 0 { | ||
| hintedRouter.ClearBatchResultHints(providerType, providerBatchID) | ||
| } | ||
| } |
There was a problem hiding this comment.
Reject native batch creation when the batch store is unavailable.
Line 165 replaces the provider ID with a gateway-generated ID, but Lines 184-203 only persist that mapping when o.batchStore is non-nil. If the store is nil, the client receives an ID that Get, Cancel, and Results cannot resolve, leaving the upstream batch orphaned.
Suggested fix
func (o *BatchOrchestrator) Create(ctx context.Context, req *core.BatchRequest, meta BatchMeta) (*BatchCreateResult, error) {
if req == nil {
return nil, core.NewInvalidRequestError("batch request is required", nil)
}
+ if o.batchStore == nil {
+ return nil, core.NewProviderError("batch_store", http.StatusInternalServerError, "batch store is unavailable", nil)
+ }
ctx = contextWithRequestID(ctx, meta.RequestID)|
@coderabbitai review |
1 similar comment
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/server/internal_chat_completion_executor.go (1)
130-142:⚠️ Potential issue | 🟡 MinorSilent
json.Marshalfailure and duplicated dispatch fallback.If
json.Marshal(req)fails at line 135, the error is dropped and the code re-runs the same no-cache dispatch path that already exists at line 131-133, just inlined. The analogous path ininternal/server/translated_inference_service.go(line 201-204) at least logs the marshal failure withslog.Debug("marshalRequestBody failed", "err", marshalErr)before falling back, which aids debugging of a cache bypass that will otherwise be invisible.Consider deduplicating the fallback and adding an observability hook:
♻️ Proposed refactor
- if e.responseCache == nil || (workflow != nil && !workflow.CacheEnabled()) { + dispatch := func() (*core.ChatResponse, string, string, string, bool, string, error) { resp, providerType, providerName, failoverModel, usedFallback, err := e.orchestrator.DispatchChatCompletion(ctx, workflow, req) return resp, providerType, providerName, failoverModel, usedFallback, "", err + } + if e.responseCache == nil || (workflow != nil && !workflow.CacheEnabled()) { + return dispatch() } body, err := json.Marshal(req) if err != nil { - resp, providerType, providerName, failoverModel, usedFallback, execErr := e.orchestrator.DispatchChatCompletion(ctx, workflow, req) - if execErr != nil { - return nil, "", "", "", false, "", execErr - } - return resp, providerType, providerName, failoverModel, usedFallback, "", nil + slog.Debug("json.Marshal chat request failed; bypassing response cache", "err", err) + return dispatch() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/internal_chat_completion_executor.go` around lines 130 - 142, json.Marshal(req) failure is currently swallowed and duplicates the no-cache dispatch path; change internal_chat_completion_executor.go to deduplicate by calling e.orchestrator.DispatchChatCompletion only from the common no-cache branch (used when e.responseCache==nil || (workflow!=nil && !workflow.CacheEnabled())) and remove the inlined retry after a marshal error, and when json.Marshal(req) fails log the marshal error (e.g., with slog.Debug("marshalRequestBody failed", "err", err) or the project's logger) before falling back to calling e.orchestrator.DispatchChatCompletion(ctx, workflow, req) so you return the dispatch results (resp, providerType, providerName, failoverModel, usedFallback, "", err) consistently and do not drop the marshal error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/gateway/batch_usage.go`:
- Around line 289-306: intFromFloat64 currently truncates fractional floats
(e.g., 1.9 -> 1) which undercounts tokens; add a check to reject non-integer
floats by returning (0,false) when v is not equal to its integer value (use
math.Trunc or equivalent) before doing the final int conversion. Preserve the
existing NaN/Inf and range checks (math.IsNaN, math.IsInf, maxInt/minInt logic
and the intBits==64 special case) and insert the fractional check after the
range checks (or after NaN/Inf) so any non-whole float values cause the function
to return false instead of truncating in the intFromFloat64 function.
In `@internal/gateway/inference_execute.go`:
- Around line 371-386: The primary call currently returns early when err==nil
but stream==nil, which bypasses tryFallbackStream; change the logic in the
call/response block so that only a non-nil stream returns immediately (i.e.,
require err==nil && stream!=nil to return), and if stream==nil with err==nil set
err = emptyProviderStreamError(providerType) (or otherwise treat the nil stream
as an error) and then call tryFallbackStream with that error; update the
downstream closure that calls call(ctx, cloneForSelector(req, selector)) to
follow the same rule so cloneForSelector/emptyProviderStreamError are used
consistently during fallback resolution.
In `@internal/streaming/stream_buffer.go`:
- Around line 30-34: NewStreamBuffer currently discards the original pooled
*[]byte (pooled) when it replaces data with an oversized allocation, so
Release's capacity-based check (cap(b.data) > maxPooledStreamBufferSize) ends up
dropping the pooled slice instead of returning the original small slice to
streamBufferPool; fix by storing the original pooled handle on the stream buffer
(e.g., a pooled *[]byte or pooledPtr field) inside NewStreamBuffer and ensure
Release always returns that original pooled slice to streamBufferPool (if
non-nil) regardless of subsequent b.data reallocation or growth, and clear the
pooled pointer after Put; update logic in both NewStreamBuffer and Release (and
any append-growth branch) to preserve and return pooled rather than relying
solely on cap(b.data).
---
Outside diff comments:
In `@internal/server/internal_chat_completion_executor.go`:
- Around line 130-142: json.Marshal(req) failure is currently swallowed and
duplicates the no-cache dispatch path; change
internal_chat_completion_executor.go to deduplicate by calling
e.orchestrator.DispatchChatCompletion only from the common no-cache branch (used
when e.responseCache==nil || (workflow!=nil && !workflow.CacheEnabled())) and
remove the inlined retry after a marshal error, and when json.Marshal(req) fails
log the marshal error (e.g., with slog.Debug("marshalRequestBody failed", "err",
err) or the project's logger) before falling back to calling
e.orchestrator.DispatchChatCompletion(ctx, workflow, req) so you return the
dispatch results (resp, providerType, providerName, failoverModel, usedFallback,
"", err) consistently and do not drop the marshal error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ab2d3233-fce1-4532-9d3d-6635e0b39fe2
📒 Files selected for processing (22)
internal/gateway/batch_helpers.gointernal/gateway/batch_selection.gointernal/gateway/batch_usage.gointernal/gateway/inference_execute.gointernal/gateway/model_helpers.gointernal/gateway/refactor_findings_test.gointernal/gateway/usage.gointernal/providers/anthropic/anthropic.gointernal/providers/responses_converter.gointernal/server/internal_chat_completion_executor.gointernal/server/model_validation.gointernal/server/model_validation_test.gointernal/server/native_batch_support_test.gointernal/server/request_selector_peek.gointernal/server/request_selector_peek_test.gointernal/server/request_snapshot.gointernal/server/request_snapshot_test.gointernal/server/semantic_requests.gointernal/server/translated_inference_service.gointernal/server/usage_logger_test.gointernal/streaming/stream_buffer.gointernal/streaming/stream_buffer_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/server/internal_chat_completion_executor.go (1)
131-170:⚠️ Potential issue | 🟡 MinorLatent nil-workflow deref on the cache-hit return.
The guard at line 131 only short-circuits to no-cache dispatch when
workflow != nil && !workflow.CacheEnabled(). Ifworkflowis nil (e.g., a future code path wheretranslatedWorkflowreturnsnil, nil), the code proceeds intoresponseCache.HandleInternalRequest, and on a cache hit line 167 dereferencesworkflow.ProviderType, panicking. Current tests pass becausetranslatedWorkflowalways returns a non-nil workflow, but it's cheap to harden.🛡️ Suggested hardening
- if e.responseCache == nil || (workflow != nil && !workflow.CacheEnabled()) { + if e.responseCache == nil || workflow == nil || !workflow.CacheEnabled() { return e.dispatchChatCompletionNoCache(ctx, workflow, req) }This also makes the intent (cache requires an enabled workflow) explicit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/internal_chat_completion_executor.go` around lines 131 - 170, The cache-hit return can nil-deref workflow.ProviderType/gateway.ProviderNameFromWorkflow when workflow is nil; update the branch after responseCache.HandleInternalRequest (the result != nil && result.CacheType != "" case) to guard for workflow == nil and return safe zero-values (e.g., empty providerType/providerName and appropriate cache type) instead of accessing workflow.ProviderType or calling gateway.ProviderNameFromWorkflow; you can mirror the no-cache short-circuit logic around workflow.CacheEnabled and ensure responseCache.HandleInternalRequest and the cached-return path never assume workflow is non-nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/gateway/batch_usage.go`:
- Around line 137-141: The code uses hasAnyCost to decide setting all three
usage cost pointers (stored.Batch.Usage.InputCost, OutputCost, TotalCost) which
can incorrectly record zeroes for missing components; introduce separate flags
(e.g., hasInputCost, hasOutputCost, hasTotalCost) that are set when an entry
provides that specific component, and then assign each pointer only if its
corresponding flag is true so you only write observed cost components rather
than all three whenever any cost exists.
---
Outside diff comments:
In `@internal/server/internal_chat_completion_executor.go`:
- Around line 131-170: The cache-hit return can nil-deref
workflow.ProviderType/gateway.ProviderNameFromWorkflow when workflow is nil;
update the branch after responseCache.HandleInternalRequest (the result != nil
&& result.CacheType != "" case) to guard for workflow == nil and return safe
zero-values (e.g., empty providerType/providerName and appropriate cache type)
instead of accessing workflow.ProviderType or calling
gateway.ProviderNameFromWorkflow; you can mirror the no-cache short-circuit
logic around workflow.CacheEnabled and ensure
responseCache.HandleInternalRequest and the cached-return path never assume
workflow is non-nil.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: de98fff3-76f2-438f-9a2f-a1ef0e513038
📒 Files selected for processing (10)
internal/gateway/batch_usage.gointernal/gateway/inference_execute.gointernal/gateway/refactor_findings_test.gointernal/server/handlers.gointernal/server/internal_chat_completion_executor.gointernal/server/internal_chat_completion_executor_test.gointernal/server/translated_inference_service.gointernal/streaming/stream_buffer.gointernal/streaming/stream_buffer_test.gotests/e2e/release-e2e-scenarios.md
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/admin/dashboard/static/js/modules/audit-list.js`:
- Around line 219-223: The auditEntryErrorMessage function is currently scanning
data.response_body unconditionally which can yield false positives; update
auditEntryErrorMessage to only call findNestedAuditErrorMessage when the entry
represents an error status (e.g., check entry.status / data.status / HTTP
error-range) or when response_body already has a top-level provider-error shape
(e.g., top-level keys like "error" or "message" or known provider-specific error
fields); keep the existing behavior of returning
auditErrorMessageFromField(data.error_message) first, and only fallback to
inspecting data.response_body after the new gate checks pass (use the existing
symbols auditEntryErrorMessage, data.error_message, data.response_body,
findNestedAuditErrorMessage to locate and change the logic).
In `@internal/admin/dashboard/templates/index.html`:
- Around line 775-781: Remove the redundant aria-label on the input with id
"auth-key-user-path" so the visible <label> ("User Path (optional)") is used as
the control name; keep the existing aria-describedby
"auth-key-user-path-help-copy" for help text. Update any layout/test assertion
that expects the aria-label to instead assert the presence of the <label
for="auth-key-user-path"> and that the input retains
aria-describedby="auth-key-user-path-help-copy". Ensure no other attributes are
changed.
In `@internal/gateway/batch_usage.go`:
- Around line 216-224: readFirstInt currently returns negative values (e.g. -1)
from various inputs which allows malformed token counts to decrement aggregates;
modify readFirstInt so after calling intFromAny(value) it rejects any num < 0 by
continuing the loop (i.e., treat negatives as non-existent/invalid) and only
return when num >= 0, referencing the readFirstInt function and the intFromAny
helper so the validation sits in readFirstInt before any aggregation or storage.
In `@internal/server/internal_chat_completion_executor.go`:
- Around line 135-139: Update the stale slog.Debug message after
json.Marshal(req) to accurately reflect the failing operation (e.g.,
"json.Marshal(req) failed" or "marshal request failed") and include the error
value (err) in the log; also consider raising the level from Debug to Warn since
bypassing cache is notable. Make the change in the block that calls
json.Marshal(req) and currently falls back to
e.dispatchChatCompletionNoCache(ctx, workflow, req) so the log clearly
identifies the marshal failure and associated error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 16616bc9-3431-4aa3-96a9-7ed424caabf9
📒 Files selected for processing (16)
internal/admin/dashboard/static/css/dashboard.cssinternal/admin/dashboard/static/js/modules/audit-list.jsinternal/admin/dashboard/static/js/modules/audit-list.test.jsinternal/admin/dashboard/static/js/modules/dashboard-layout.test.jsinternal/admin/dashboard/templates/audit-pane.htmlinternal/admin/dashboard/templates/helper-disclosure.htmlinternal/admin/dashboard/templates/index.htmlinternal/gateway/batch_usage.gointernal/gateway/batch_usage_test.gointernal/llmclient/client.gointernal/llmclient/client_test.gointernal/responsecache/handle_request_test.gointernal/responsecache/semantic.gointernal/responsecache/simple.gointernal/server/internal_chat_completion_executor.gointernal/server/internal_chat_completion_executor_test.go
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/server/internal_chat_completion_executor.go (1)
148-173:⚠️ Potential issue | 🟠 MajorPreserve provider/fallback metadata for cached hits.
On a cache hit, the dispatch callback does not run, so
providerType,providerName,failoverModel, andusedFallbackare unavailable. Lines 167-173 then attribute the cached response to the workflow provider and clear fallback state, which can make audit records for cached fallback responses point at the wrong route. Persist this metadata with the cache entry or return it throughInternalHandleResultinstead of reconstructing it fromworkflow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/internal_chat_completion_executor.go` around lines 148 - 173, The cache hit handling currently reconstructs provider/fallback info from workflow (in internal_chat_completion_executor.go) which is incorrect because the dispatch callback doesn't run on a cache hit; change the cache plumbing so responseCache.HandleInternalRequest / InternalHandleResult carries providerType, providerName, failoverModel and usedFallback alongside Body/CacheType, and when result.CacheType != "" use those fields instead of deriving from workflow (i.e., replace the reconstructed cachedProviderType/cachedProviderName and usedFallback handling with the values returned on the InternalHandleResult), or alternatively persist those metadata into the cached entry at write time and read them out here before json.Unmarshal into cached. Ensure variable names providerType/providerName/failoverModel/usedFallback are used consistently.
♻️ Duplicate comments (1)
internal/admin/dashboard/static/js/modules/audit-list.js (1)
88-115:⚠️ Potential issue | 🟠 MajorAvoid treating successful
message/statusbodies as audit errors.A 2xx entry with
response_body: { status: 'success', message: 'completed' }still passeshasTopLevelAuditErrorShape()and is later surfaced as an error message. Require an error-like discriminator for message-only bodies.🐛 Proposed fix
+ function auditErrorDiscriminatorLooksLikeError(value) { + if (value == null) return false; + const numeric = Number(value); + if (Number.isFinite(numeric)) return numeric >= 400; + return /\b(error|fail|failed|failure|invalid|timeout|unavailable|denied|unauthorized|forbidden)\b/i + .test(String(value)); + } + function hasTopLevelAuditErrorShape(value) { if (value == null) return false; let candidate = value; if (typeof candidate === 'string') { @@ if (Array.isArray(candidate) || typeof candidate !== 'object') return false; if (candidate.error !== undefined) return true; - if (typeof candidate.message === 'string' && candidate.message.trim()) return true; + if (typeof candidate.message === 'string' && candidate.message.trim()) { + return auditErrorDiscriminatorLooksLikeError(candidate.status_code) || + auditErrorDiscriminatorLooksLikeError(candidate.status) || + auditErrorDiscriminatorLooksLikeError(candidate.code) || + auditErrorDiscriminatorLooksLikeError(candidate.type); + } const topLevelErrorFields = ['detail', 'error_message', 'error_msg', 'title']; for (let i = 0; i < topLevelErrorFields.length; i++) { const field = topLevelErrorFields[i]; if (typeof candidate[field] === 'string' && candidate[field].trim()) return true;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/admin/dashboard/static/js/modules/audit-list.js` around lines 88 - 115, The code currently calls hasTopLevelAuditErrorShape even for entries with a parsed status code (so 2xx responses like {status:'success',message:'completed'} get treated as errors); update shouldInspectAuditResponseBody to short-circuit when auditEntryStatusCode(entry, data) returns a non-null status: if statusCode !== null return statusCode >= 400, else fall back to hasTopLevelAuditErrorShape(data && data.response_body). This keeps hasTopLevelAuditErrorShape as-is and prevents successful 2xx bodies from being flagged as errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/admin/dashboard/static/js/modules/dashboard-layout.test.js`:
- Around line 301-324: The current assertion uses
assert.doesNotMatch(authKeyForm, /id="auth-key-user-path"[^>]*aria-label=/)
which fails when aria-label appears before id; change it to first extract the
input tag for the user path from authKeyForm (use a regex that matches the
element containing id="auth-key-user-path", e.g. match
/<[^>]*id="auth-key-user-path"[^>]*>/) into a variable (e.g. userPathInput or
similar) and then run assert.doesNotMatch on that extracted tag to check it does
not contain aria-label, updating the test around authKeyForm and
userPathInputIndex to reference the new variable.
In `@internal/gateway/batch_usage.go`:
- Around line 312-314: The guard in the MaxInt64 rounding check (intBits, v,
maxInt) contains a redundant conjunct `v > 0` because when `v == maxInt` (where
maxInt == float64(math.MaxInt64)) it's always positive; remove the `&& v > 0`
portion so the line becomes `if intBits == 64 && v == maxInt { return 0, false
}`, or if you intended to also reject the negative boundary, replace the
condition with an explicit check (e.g., check for `v == minInt` as well) and
document that intent.
---
Outside diff comments:
In `@internal/server/internal_chat_completion_executor.go`:
- Around line 148-173: The cache hit handling currently reconstructs
provider/fallback info from workflow (in internal_chat_completion_executor.go)
which is incorrect because the dispatch callback doesn't run on a cache hit;
change the cache plumbing so responseCache.HandleInternalRequest /
InternalHandleResult carries providerType, providerName, failoverModel and
usedFallback alongside Body/CacheType, and when result.CacheType != "" use those
fields instead of deriving from workflow (i.e., replace the reconstructed
cachedProviderType/cachedProviderName and usedFallback handling with the values
returned on the InternalHandleResult), or alternatively persist those metadata
into the cached entry at write time and read them out here before json.Unmarshal
into cached. Ensure variable names
providerType/providerName/failoverModel/usedFallback are used consistently.
---
Duplicate comments:
In `@internal/admin/dashboard/static/js/modules/audit-list.js`:
- Around line 88-115: The code currently calls hasTopLevelAuditErrorShape even
for entries with a parsed status code (so 2xx responses like
{status:'success',message:'completed'} get treated as errors); update
shouldInspectAuditResponseBody to short-circuit when auditEntryStatusCode(entry,
data) returns a non-null status: if statusCode !== null return statusCode >=
400, else fall back to hasTopLevelAuditErrorShape(data && data.response_body).
This keeps hasTopLevelAuditErrorShape as-is and prevents successful 2xx bodies
from being flagged as errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ebed7a93-b3f8-45f9-9b9a-af83a1ba814c
📒 Files selected for processing (7)
internal/admin/dashboard/static/css/dashboard.cssinternal/admin/dashboard/static/js/modules/audit-list.jsinternal/admin/dashboard/static/js/modules/audit-list.test.jsinternal/admin/dashboard/static/js/modules/dashboard-layout.test.jsinternal/admin/dashboard/templates/index.htmlinternal/gateway/batch_usage.gointernal/server/internal_chat_completion_executor.go
| const userPathIndex = authKeyForm.indexOf('<label for="auth-key-user-path">User Path (optional)</label>'); | ||
| const helperIndex = authKeyForm.indexOf("{{template \"helper-disclosure\" \"{ heading: ''"); | ||
| const userPathInputIndex = authKeyForm.indexOf('id="auth-key-user-path"'); | ||
| const descriptionIndex = authKeyForm.indexOf('<span>Description (optional)</span>'); | ||
|
|
||
| assert.match(indexTemplate, /x-text="key\.expires_at \? formatDateUTC\(key\.expires_at\) : '\\u2014'"/); | ||
| assert.match(indexTemplate, /:title="key\.expires_at \? formatTimestampUTC\(key\.expires_at\) : ''"/); | ||
| assert.match(indexTemplate, /x-model="authKeyForm\.user_path"[^>]*aria-label="API key user path"/); | ||
| assert.match(indexTemplate, /id="auth-key-user-path"[^>]*x-model="authKeyForm\.user_path"[^>]*aria-describedby="auth-key-user-path-help-copy"/); | ||
| assert.match(indexTemplate, /class="model-alias-editor auth-key-editor"/); | ||
| assert.match(plusIconTemplate, /{{define "plus-icon"}}[\s\S]*<path d="M12 5v14"><\/path>[\s\S]*<path d="M5 12h14"><\/path>[\s\S]*{{end}}/); | ||
| assert.match(indexTemplate, /class="pagination-btn pagination-btn-primary pagination-btn-with-icon"[\s\S]*{{template "plus-icon"}}[\s\S]*<span>Create API Key<\/span>/); | ||
| assert.match(authKeyForm, /class="pagination-btn pagination-btn-primary pagination-btn-with-icon"[\s\S]*x-show="!authKeyFormSubmitting"[\s\S]*{{template "plus-icon"}}[\s\S]*x-text="authKeyFormSubmitting \? 'Creating\.\.\.' : 'Create API Key'"/); | ||
| assert.notEqual(userPathIndex, -1); | ||
| assert.notEqual(helperIndex, -1); | ||
| assert.notEqual(userPathInputIndex, -1); | ||
| assert.notEqual(descriptionIndex, -1); | ||
| assert.ok(userPathIndex < helperIndex); | ||
| assert.ok(helperIndex < descriptionIndex); | ||
| assert.ok(helperIndex < userPathInputIndex); | ||
| assert.ok(userPathInputIndex < descriptionIndex); | ||
| assert.match(authKeyForm, /class="alias-form-label-with-help"[\s\S]*<label for="auth-key-user-path">User Path \(optional\)<\/label>[\s\S]*class="auth-key-inline-help"[\s\S]*copyId: 'auth-key-user-path-help-copy'/); | ||
| assert.match(authKeyForm, /placeholder="ex\. \/department1\/team-a"/); | ||
| assert.match(authKeyForm, /copyId: 'auth-key-user-path-help-copy'/); | ||
| assert.match(authKeyForm, /When set, this key overrides X-GoModel-User-Path for audit logging and downstream request context\./); | ||
| assert.doesNotMatch(authKeyForm, /id="auth-key-user-path"[^>]*aria-label=/); |
There was a problem hiding this comment.
Make the aria-label regression assertion attribute-order independent.
Line 324 only fails when aria-label appears after id. Extract the input tag first, then assert against that tag so an accessibility regression is caught regardless of attribute order.
🧪 Proposed test hardening
const authKeyForm = authKeyFormMatch[0];
const userPathIndex = authKeyForm.indexOf('<label for="auth-key-user-path">User Path (optional)</label>');
const helperIndex = authKeyForm.indexOf("{{template \"helper-disclosure\" \"{ heading: ''");
- const userPathInputIndex = authKeyForm.indexOf('id="auth-key-user-path"');
+ const userPathInputMatch = authKeyForm.match(/<input\b(?=[^>]*\bid="auth-key-user-path")[^>]*>/);
+ assert.ok(userPathInputMatch, 'Expected auth key user path input');
+ const userPathInput = userPathInputMatch[0];
+ const userPathInputIndex = userPathInputMatch.index;
const descriptionIndex = authKeyForm.indexOf('<span>Description (optional)</span>');
@@
- assert.match(indexTemplate, /id="auth-key-user-path"[^>]*x-model="authKeyForm\.user_path"[^>]*aria-describedby="auth-key-user-path-help-copy"/);
+ assert.match(userPathInput, /\bx-model="authKeyForm\.user_path"/);
+ assert.match(userPathInput, /\baria-describedby="auth-key-user-path-help-copy"/);
@@
- assert.doesNotMatch(authKeyForm, /id="auth-key-user-path"[^>]*aria-label=/);
+ assert.doesNotMatch(userPathInput, /\baria-label=/);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/admin/dashboard/static/js/modules/dashboard-layout.test.js` around
lines 301 - 324, The current assertion uses assert.doesNotMatch(authKeyForm,
/id="auth-key-user-path"[^>]*aria-label=/) which fails when aria-label appears
before id; change it to first extract the input tag for the user path from
authKeyForm (use a regex that matches the element containing
id="auth-key-user-path", e.g. match /<[^>]*id="auth-key-user-path"[^>]*>/) into
a variable (e.g. userPathInput or similar) and then run assert.doesNotMatch on
that extracted tag to check it does not contain aria-label, updating the test
around authKeyForm and userPathInputIndex to reference the new variable.
Summary
internal/gatewayso HTTP handlers only adapt Echo requests/responses.gateway.BatchOrchestratorand remove server-side business-logic duplication.Validation
go test ./internal/core ./internal/servergo test ./internal/gateway ./internal/servermake perf-checkmake test-racego test -v -tags=e2e -timeout=5m ./tests/e2e/...go test -v -tags=contract -timeout=5m ./tests/contract/...make lintNotes
Summary by CodeRabbit
New Features
Improvements
Tests