feat(hook): move hooks to direct func call#35
Conversation
📝 WalkthroughWalkthroughReplaces the legacy HookContext/HookManager hook framework with a new async RequestContext-based system; handlers now call standalone async hooks (authorization, rate_limit, observability), move request-scoped storage into RequestContext extensions, and remove the old hooks2/metric modules. Changes
Sequence DiagramssequenceDiagram
participant Client as Client
participant Handler as Handler (chat_completions / embeddings)
participant ReqCtx as RequestContext
participant Auth as hooks::authorization
participant RLPre as hooks::rate_limit::pre_check
participant Provider as Model Provider
participant RLPost as hooks::rate_limit::post_check
participant Obs as hooks::observability
Client->>Handler: HTTP request
Handler->>ReqCtx: construct/obtain RequestContext
Handler->>Obs: record_start_time(&mut ReqCtx)
Handler->>Auth: authorization::check(&mut ReqCtx)
Auth-->>Handler: ok
Handler->>RLPre: rate_limit::pre_check(&mut ReqCtx)
RLPre-->>Handler: ok
Handler->>Provider: invoke model/provider
Provider-->>Handler: response
Handler->>RLPost: rate_limit::post_check(&mut ReqCtx, response)
Handler->>Obs: record_usage(&mut ReqCtx, response)
Handler-->>Client: HTTP response
sequenceDiagram
participant Client as Client
participant Handler as Handler (streaming)
participant ReqCtx as RequestContext (cloned)
participant Auth as hooks::authorization
participant RLPre as hooks::rate_limit::pre_check
participant Provider as Provider (stream)
participant RLStream as hooks::rate_limit::post_check_streaming
participant Obs as hooks::observability
Client->>Handler: streaming request
Handler->>ReqCtx: construct/clone RequestContext
Handler->>Auth: authorization::check(&mut ReqCtx)
Handler->>RLPre: rate_limit::pre_check(&mut ReqCtx)
Handler->>Provider: start stream
Provider-->>Handler: first chunk
Handler->>Obs: record_first_token_latency(&mut ReqCtx)
Note right of Handler: per-chunk TokenUsage inserted into ReqCtx extensions when present
Provider-->>Handler: stream completes / [DONE]
Handler->>RLStream: rate_limit::post_check_streaming(&mut ReqCtx)
Handler->>Obs: record_streaming_usage(&mut ReqCtx)
Handler-->>Client: streaming end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/proxy/handlers/chat_completions/mod.rs (1)
157-160:⚠️ Potential issue | 🟠 MajorFinalize stream accounting before aborting on provider errors.
When the upstream stream yields
Some(Err(_)), this branch closes the SSE immediately. Any usage already accumulated inrequest_ctxis then lost, so failed streams are neither rate-accounted nor measured. Run the same finalization used by thedonebranch before returningNone.Suggested fix
Some(Err(err)) => { error!("Stream error: {}", err); + if let Err(err) = + hooks::rate_limit::post_check_streaming(&mut request_ctx).await + { + error!("Rate limit post_check_streaming error: {}", err); + } + hooks::observability::record_streaming_usage(&mut request_ctx).await; drop(span); None }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/handlers/chat_completions/mod.rs` around lines 157 - 160, When handling Some(Err(err)) you currently log and return None without finalizing request usage; replicate the same finalization performed in the `done` branch before returning to ensure usage/rate accounting and metrics are recorded. Modify the Some(Err(err)) branch (the block that currently does error!("Stream error: {}", err); drop(span); None) to invoke the same finalization logic used by the `done` branch for `request_ctx` (or extract that finalization into a helper and call it here), then drop the span and return None.
🧹 Nitpick comments (7)
src/proxy/hooks/authorization/mod.rs (1)
87-88: Consider combining consecutive lock acquisitions.Two separate
extensions_mut().awaitcalls acquire the write lock twice in succession. This could be combined into a single lock acquisition for slightly better performance.♻️ Proposed optimization
- ctx.extensions_mut().await.insert(model); - ctx.extensions_mut().await.insert(RequestModel(model_name)); + { + let mut ext = ctx.extensions_mut().await; + ext.insert(model); + ext.insert(RequestModel(model_name)); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/authorization/mod.rs` around lines 87 - 88, The two consecutive write-lock acquisitions on ctx.extensions_mut().await should be merged: obtain the mutable extensions once (e.g., let mut extensions = ctx.extensions_mut().await) and then call extensions.insert(model) followed by extensions.insert(RequestModel(model_name)); update the code around the ctx.extensions_mut().await calls so both insertions use the single mutable reference to avoid duplicate await/lock overhead.src/proxy/handlers/embeddings/mod.rs (2)
23-30: Missing#[fastrace::trace]decorator.Per coding guidelines, public functions in
src/**/*.rsshould use#[fastrace::trace]for distributed tracing. Theembeddingshandler is missing this decorator.♻️ Proposed fix
+#[fastrace::trace] pub async fn embeddings( State(_state): State<AppState>, mut request_ctx: RequestContext,As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/handlers/embeddings/mod.rs` around lines 23 - 30, The public handler function embeddings is missing the required #[fastrace::trace] attribute; add the #[fastrace::trace] decorator immediately above the pub async fn embeddings(...) declaration so the function is instrumented for distributed tracing, ensuring the attribute appears before the function signature (no other code changes needed inside hooks::observability or the function body).
33-38: Consider handling theunwrap()more gracefully.While the TODO acknowledges this, the
unwrap()could panic ifauthorization::checkfails to insert theModelfor any reason. An explicit error return would be more defensive.♻️ Proposed defensive approach
- //TODO: safe unwrap - let model = request_ctx + let model = request_ctx .extensions() .await .get::<ResourceEntry<Model>>() .cloned() - .unwrap(); + .ok_or_else(|| { + error!("Model not found in context after authorization"); + EmbeddingError::AuthorizationError( + crate::proxy::hooks::authorization::AuthorizationError::MissingApiKeyInContext + ) + })?;Alternatively, you could add a specific error variant for this case.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/handlers/embeddings/mod.rs` around lines 33 - 38, The code currently calls unwrap on request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned(), which can panic if the Model entry is missing; change this to a safe check that returns an explicit error instead of panicking — e.g., replace the unwrap with a match or ok_or/ok_or_else and return a typed error/Response (create or use an existing error variant) from the handler when the ResourceEntry<Model> is absent; reference the same symbols (request_ctx, extensions().await, ResourceEntry<Model>, and the surrounding handler function) and ensure the error path conveys that authorization::check failed to insert the Model.src/proxy/handlers/models.rs (1)
52-57: Theexpect()here is redundant.The
RequestContext::from_request_partsimplementation already usesexpect()when extracting theApiKeyfrom request extensions (seesrc/proxy/hooks/context.rs:30-32). If the ApiKey were missing, extraction would have panicked before reaching this handler. Consider usingunwrap()with a comment referencing the invariant, or simply trust the prior extraction.♻️ Suggested simplification
let api_key = request_ctx .extensions() .await .get::<ResourceEntry<ApiKey>>() .cloned() - .expect("apikey should exist in context"); + .expect("ApiKey guaranteed by RequestContext extraction");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/handlers/models.rs` around lines 52 - 57, The call using expect() when retrieving api_key in models.rs is redundant because RequestContext::from_request_parts already guarantees an ApiKey is present; replace the expect("apikey should exist in context") with a simple unwrap() (or remove the explicit panic and call .cloned().unwrap()) and add a short comment referencing that RequestContext::from_request_parts enforces this invariant (see RequestContext::from_request_parts) so future readers know the absence cannot occur.src/proxy/hooks/context.rs (1)
22-40: Consider a more informative rejection type.Using
type Rejection = ()means extraction failures produce an opaque 500 error. While theexpect()guarantees ApiKey presence (assuming auth middleware runs), a custom rejection withIntoResponsewould provide clearer error handling if the invariant is ever violated.This is acceptable for now since the auth middleware guarantees ApiKey presence, but if the middleware chain changes in the future, the empty rejection provides no diagnostic information.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/context.rs` around lines 22 - 40, The extractor impl for FromRequestParts<AppState> for RequestContext uses type Rejection = () and an expect() on parts.extensions.remove::<ResourceEntry<ApiKey>>(), which yields opaque 500s on failure; replace the unit rejection with a small custom rejection type (e.g., RequestContextRejection) that implements IntoResponse and conveys a clear error message/status, change from_request_parts to return Err(RequestContextRejection::MissingApiKey) when the ResourceEntry<ApiKey> is absent (avoid expect), and reference the FromRequestParts<AppState> impl, the from_request_parts method, and ResourceEntry<ApiKey> when making the change so the extractor produces informative responses on extraction failure.src/proxy/hooks/rate_limit/mod.rs (1)
24-28: LetRateLimitHookErrorown its Axum response conversion.The handlers currently have to unpack
RateLimitHookError::Raw(resp)manually just to return the response. ImplementingIntoResponsehere keeps the HTTP mapping at the error-type boundary and matches the repo's proxy error guideline.As per coding guidelines,
src/{proxy,admin}/**/*.rs: Implement IntoResponse for error types to support Axum HTTP handlers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/rate_limit/mod.rs` around lines 24 - 28, Implement IntoResponse for RateLimitHookError so handlers can return the error directly; match the Raw(Response<Body>) variant and return its inner Response as the HTTP response. Concretely, add an impl axum::response::IntoResponse for RateLimitHookError that matches self { RateLimitHookError::Raw(resp) => resp.into_response() } (ensure you import axum::response::IntoResponse and use the existing Response<Body> type).src/proxy/hooks/observability/mod.rs (1)
8-9: Tighten this module's public API surface.
StartTimelooks like an internal extension marker, so making itpubunnecessarily widens the hooks API. While touching these exports, the public helpers should also get///docs and#[fastrace::trace]so the new observability path stays documented and visible in traces.As per coding guidelines,
src/**/*.rs: Use#[fastrace::trace]decorator for distributed tracing on public functions, and**/*.rs: Use///for doc comments on public items in Rust.Also applies to: 59-79
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/observability/mod.rs` around lines 8 - 9, StartTime is an internal marker and should not be exported: remove the pub from the StartTime tuple struct (change `pub struct StartTime(Instant);` to `struct StartTime(Instant)`), and for the module's public helper functions (the functions currently exported around the 59–79 region) add triple-slash doc comments describing their purpose and annotate each with `#[fastrace::trace]` to enable distributed tracing; ensure signatures remain the same and only visibility, docs, and the `#[fastrace::trace]` attribute are changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/proxy/handlers/chat_completions/mod.rs`:
- Around line 73-83: The success (non-streaming) completion path currently never
calls hooks::observability::record_usage, so add a call to
hooks::observability::record_usage(request_ctx,
&ResponseData::ChatCompletion(response.clone()), start_time). Locate the
non-streaming branch that builds resp = Json(response).into_response() and,
after injecting headers with hooks::rate_limit::inject_response_headers(...) and
before returning the response, await hooks::observability::record_usage with the
same request_ctx, the cloned ResponseData::ChatCompletion(response.clone()), and
the previously saved start_time (the same value used for chat_completions) so
latency/token metrics are emitted like the streaming path.
In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 43-52: The code holds the extensions guard returned by
ctx.extensions_mut().await across .await points (e.g., in run_post_check calling
apply_post_check and similarly in the pre/concurrency check blocks), which can
deadlock request-local state; fix by reading needed resources (api_key, model)
and releasing the guard before calling async check functions, then reacquire
ctx.extensions_mut().await only to mutate RateLimitState or ConcurrencyState
with the results; specifically change run_post_check (and the blocks around
apply_pre_check and apply_concurrency_check) to compute check results outside of
the guard, drop the guard, await the check futures, then re-lock to update
RateLimitState / ConcurrencyState.
- Around line 174-180: The post_check_streaming function must not treat missing
TokenUsage as zero; update the TokenUsage retrieval in post_check_streaming to
detect when ctx.extensions().await.get::<TokenUsage>() is None, log an error via
the request context logger (e.g. ctx.logger or similar) explaining that
TokenUsage is missing for the streaming request, and return an Err (or early
return) instead of using 0 so token-based post-checks still run or upstream
logic can handle the unknown usage; modify the branch around TokenUsage lookup
in post_check_streaming (and any downstream uses of total_tokens) to avoid
silently defaulting to 0.
---
Outside diff comments:
In `@src/proxy/handlers/chat_completions/mod.rs`:
- Around line 157-160: When handling Some(Err(err)) you currently log and return
None without finalizing request usage; replicate the same finalization performed
in the `done` branch before returning to ensure usage/rate accounting and
metrics are recorded. Modify the Some(Err(err)) branch (the block that currently
does error!("Stream error: {}", err); drop(span); None) to invoke the same
finalization logic used by the `done` branch for `request_ctx` (or extract that
finalization into a helper and call it here), then drop the span and return
None.
---
Nitpick comments:
In `@src/proxy/handlers/embeddings/mod.rs`:
- Around line 23-30: The public handler function embeddings is missing the
required #[fastrace::trace] attribute; add the #[fastrace::trace] decorator
immediately above the pub async fn embeddings(...) declaration so the function
is instrumented for distributed tracing, ensuring the attribute appears before
the function signature (no other code changes needed inside hooks::observability
or the function body).
- Around line 33-38: The code currently calls unwrap on
request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned(), which can
panic if the Model entry is missing; change this to a safe check that returns an
explicit error instead of panicking — e.g., replace the unwrap with a match or
ok_or/ok_or_else and return a typed error/Response (create or use an existing
error variant) from the handler when the ResourceEntry<Model> is absent;
reference the same symbols (request_ctx, extensions().await,
ResourceEntry<Model>, and the surrounding handler function) and ensure the error
path conveys that authorization::check failed to insert the Model.
In `@src/proxy/handlers/models.rs`:
- Around line 52-57: The call using expect() when retrieving api_key in
models.rs is redundant because RequestContext::from_request_parts already
guarantees an ApiKey is present; replace the expect("apikey should exist in
context") with a simple unwrap() (or remove the explicit panic and call
.cloned().unwrap()) and add a short comment referencing that
RequestContext::from_request_parts enforces this invariant (see
RequestContext::from_request_parts) so future readers know the absence cannot
occur.
In `@src/proxy/hooks/authorization/mod.rs`:
- Around line 87-88: The two consecutive write-lock acquisitions on
ctx.extensions_mut().await should be merged: obtain the mutable extensions once
(e.g., let mut extensions = ctx.extensions_mut().await) and then call
extensions.insert(model) followed by
extensions.insert(RequestModel(model_name)); update the code around the
ctx.extensions_mut().await calls so both insertions use the single mutable
reference to avoid duplicate await/lock overhead.
In `@src/proxy/hooks/context.rs`:
- Around line 22-40: The extractor impl for FromRequestParts<AppState> for
RequestContext uses type Rejection = () and an expect() on
parts.extensions.remove::<ResourceEntry<ApiKey>>(), which yields opaque 500s on
failure; replace the unit rejection with a small custom rejection type (e.g.,
RequestContextRejection) that implements IntoResponse and conveys a clear error
message/status, change from_request_parts to return
Err(RequestContextRejection::MissingApiKey) when the ResourceEntry<ApiKey> is
absent (avoid expect), and reference the FromRequestParts<AppState> impl, the
from_request_parts method, and ResourceEntry<ApiKey> when making the change so
the extractor produces informative responses on extraction failure.
In `@src/proxy/hooks/observability/mod.rs`:
- Around line 8-9: StartTime is an internal marker and should not be exported:
remove the pub from the StartTime tuple struct (change `pub struct
StartTime(Instant);` to `struct StartTime(Instant)`), and for the module's
public helper functions (the functions currently exported around the 59–79
region) add triple-slash doc comments describing their purpose and annotate each
with `#[fastrace::trace]` to enable distributed tracing; ensure signatures
remain the same and only visibility, docs, and the `#[fastrace::trace]`
attribute are changed.
In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 24-28: Implement IntoResponse for RateLimitHookError so handlers
can return the error directly; match the Raw(Response<Body>) variant and return
its inner Response as the HTTP response. Concretely, add an impl
axum::response::IntoResponse for RateLimitHookError that matches self {
RateLimitHookError::Raw(resp) => resp.into_response() } (ensure you import
axum::response::IntoResponse and use the existing Response<Body> type).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 936007f2-ab77-4963-9a72-ae9c602416c5
📒 Files selected for processing (14)
src/proxy/handlers/chat_completions/mod.rssrc/proxy/handlers/chat_completions/types.rssrc/proxy/handlers/embeddings/mod.rssrc/proxy/handlers/embeddings/types.rssrc/proxy/handlers/models.rssrc/proxy/hooks/authorization/mod.rssrc/proxy/hooks/context.rssrc/proxy/hooks/metric.rssrc/proxy/hooks/mod.rssrc/proxy/hooks/observability/mod.rssrc/proxy/hooks/rate_limit/mod.rssrc/proxy/hooks2/context.rssrc/proxy/hooks2/mod.rssrc/proxy/mod.rs
💤 Files with no reviewable changes (3)
- src/proxy/hooks/metric.rs
- src/proxy/hooks2/context.rs
- src/proxy/hooks2/mod.rs
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/proxy/hooks/rate_limit/mod.rs (1)
176-189:⚠️ Potential issue | 🟠 MajorSilently defaulting to zero tokens when
TokenUsageis missing.When
TokenUsageis absent (common for providers that don't emit usage in streaming chunks), defaulting to0bypasses token-based post-checks entirely. This could allow requests to exceed rate limits without detection.Consider logging a warning or returning an error when usage data is unavailable for streaming requests.
🛡️ Proposed fix to log a warning
pub async fn post_check_streaming(ctx: &mut RequestContext) -> Result<()> { let total_tokens = { match ctx.extensions().await.get::<TokenUsage>() { Some(usage) => usage.total_tokens, - None => 0, + None => { + error!("TokenUsage missing for streaming request; token-based rate limiting skipped"); + 0 + } } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/rate_limit/mod.rs` around lines 176 - 189, post_check_streaming currently treats missing TokenUsage as zero which can bypass rate checks; update post_check_streaming to detect when ctx.extensions().await.get::<TokenUsage>() is None and either log a warning via the request logger on RequestContext (include request id/ctx info) or return an Err to surface missing usage, then only call run_post_check(ctx, total_tokens) when usage is present; reference TokenUsage, post_check_streaming, run_post_check and RequestContext when making the change.
🧹 Nitpick comments (5)
src/proxy/hooks/rate_limit/mod.rs (4)
191-205: Missing#[fastrace::trace]decorator oninject_response_headers.Per coding guidelines, this public function should have the tracing decorator.
♻️ Proposed fix
+#[fastrace::trace] pub async fn inject_response_headers(As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/rate_limit/mod.rs` around lines 191 - 205, Add the #[fastrace::trace] attribute above the public async function inject_response_headers to enable distributed tracing; locate the inject_response_headers definition and insert the decorator on the line immediately preceding the "pub async fn inject_response_headers(...)" declaration (ensure the crate feature/imports for fastrace are available in the module if not already).
167-174: Missing#[fastrace::trace]decorator onpost_check.Per coding guidelines, this public function should have the tracing decorator.
♻️ Proposed fix
+#[fastrace::trace] pub async fn post_check(ctx: &mut RequestContext, response: &ResponseData) -> Result<()> {As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/rate_limit/mod.rs` around lines 167 - 174, Add the #[fastrace::trace] attribute to the public async function post_check so it participates in distributed tracing; locate the pub async fn post_check(ctx: &mut RequestContext, response: &ResponseData) -> Result<()> declaration and add the decorator directly above it (leaving the function body and the call to run_post_check(ctx, usage.total_tokens).await unchanged) to comply with the tracing guideline.
79-165: Missing#[fastrace::trace]decorator on public function.Per coding guidelines, public functions should have the
#[fastrace::trace]decorator for distributed tracing.♻️ Proposed fix
+#[fastrace::trace] pub async fn pre_check(ctx: &mut RequestContext) -> Result<(), RateLimitError> {As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/rate_limit/mod.rs` around lines 79 - 165, Add the #[fastrace::trace] attribute directly above the public function declaration for pre_check to enable distributed tracing (i.e., annotate pub async fn pre_check(...) with #[fastrace::trace]); ensure the fastrace crate is available in scope (add the dependency/import if missing) and then run format/lint to confirm no attribute ordering or import issues.
39-50: Consider returningResultinstead of panicking on missing resources.Using
.expect()will panic if resources are missing. While this may be by design (assuming prior middleware guarantees their presence), returning aResultwould make the contract explicit and avoid runtime panics if upstream logic changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/hooks/rate_limit/mod.rs` around lines 39 - 50, The get_resources function should not panic with expect; change its signature from async fn get_resources(ctx: &RequestContext) -> (ResourceEntry<ApiKey>, ResourceEntry<Model>) to return a Result: async fn get_resources(ctx: &RequestContext) -> Result<(ResourceEntry<ApiKey>, ResourceEntry<Model>), E> (use an existing error type or define a small error like RateLimitError::MissingResource or anyhow::Error), then replace the .expect() calls on guard.get::<ResourceEntry<ApiKey>>() and guard.get::<ResourceEntry<Model>>() with ok_or/ok_or_else to return Err with a clear message when a resource is missing; finally update callers of get_resources to propagate the error with ? or handle it accordingly.src/proxy/handlers/embeddings/mod.rs (1)
33-39: Address the TODO:.unwrap()on missingModelcan panic.The
unwrap()will panic ifResourceEntry<Model>is missing from extensions. Consider returning an error instead of panicking, which would provide clearer failure semantics and avoid runtime panics.♻️ Proposed fix
let model = request_ctx .extensions() .await .get::<ResourceEntry<Model>>() .cloned() - .unwrap(); + .ok_or_else(|| EmbeddingError::Internal("Model not found in request context".into()))?;This assumes
EmbeddingErrorhas anInternalvariant; adjust based on the actual error enum definition intypes.rs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/proxy/handlers/embeddings/mod.rs` around lines 33 - 39, The code currently unwraps the model from request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned().unwrap(), which can panic if the extension is missing; change this to handle the missing case by returning a proper error (e.g., map the None into an Err(EmbeddingError::Internal("missing Model in extensions") or another appropriate EmbeddingError variant) and propagate it out of the surrounding function instead of panicking, ensuring any subsequent uses of model operate on the safely extracted value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 176-189: post_check_streaming currently treats missing TokenUsage
as zero which can bypass rate checks; update post_check_streaming to detect when
ctx.extensions().await.get::<TokenUsage>() is None and either log a warning via
the request logger on RequestContext (include request id/ctx info) or return an
Err to surface missing usage, then only call run_post_check(ctx, total_tokens)
when usage is present; reference TokenUsage, post_check_streaming,
run_post_check and RequestContext when making the change.
---
Nitpick comments:
In `@src/proxy/handlers/embeddings/mod.rs`:
- Around line 33-39: The code currently unwraps the model from
request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned().unwrap(),
which can panic if the extension is missing; change this to handle the missing
case by returning a proper error (e.g., map the None into an
Err(EmbeddingError::Internal("missing Model in extensions") or another
appropriate EmbeddingError variant) and propagate it out of the surrounding
function instead of panicking, ensuring any subsequent uses of model operate on
the safely extracted value.
In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 191-205: Add the #[fastrace::trace] attribute above the public
async function inject_response_headers to enable distributed tracing; locate the
inject_response_headers definition and insert the decorator on the line
immediately preceding the "pub async fn inject_response_headers(...)"
declaration (ensure the crate feature/imports for fastrace are available in the
module if not already).
- Around line 167-174: Add the #[fastrace::trace] attribute to the public async
function post_check so it participates in distributed tracing; locate the pub
async fn post_check(ctx: &mut RequestContext, response: &ResponseData) ->
Result<()> declaration and add the decorator directly above it (leaving the
function body and the call to run_post_check(ctx, usage.total_tokens).await
unchanged) to comply with the tracing guideline.
- Around line 79-165: Add the #[fastrace::trace] attribute directly above the
public function declaration for pre_check to enable distributed tracing (i.e.,
annotate pub async fn pre_check(...) with #[fastrace::trace]); ensure the
fastrace crate is available in scope (add the dependency/import if missing) and
then run format/lint to confirm no attribute ordering or import issues.
- Around line 39-50: The get_resources function should not panic with expect;
change its signature from async fn get_resources(ctx: &RequestContext) ->
(ResourceEntry<ApiKey>, ResourceEntry<Model>) to return a Result: async fn
get_resources(ctx: &RequestContext) -> Result<(ResourceEntry<ApiKey>,
ResourceEntry<Model>), E> (use an existing error type or define a small error
like RateLimitError::MissingResource or anyhow::Error), then replace the
.expect() calls on guard.get::<ResourceEntry<ApiKey>>() and
guard.get::<ResourceEntry<Model>>() with ok_or/ok_or_else to return Err with a
clear message when a resource is missing; finally update callers of
get_resources to propagate the error with ? or handle it accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ce8bb26a-5bc9-4b1b-ab37-125b41a441f3
📒 Files selected for processing (7)
src/proxy/handlers/chat_completions/mod.rssrc/proxy/handlers/chat_completions/types.rssrc/proxy/handlers/embeddings/mod.rssrc/proxy/handlers/embeddings/types.rssrc/proxy/hooks/authorization/mod.rssrc/proxy/hooks/observability/mod.rssrc/proxy/hooks/rate_limit/mod.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- src/proxy/hooks/authorization/mod.rs
- src/proxy/handlers/embeddings/types.rs
- src/proxy/handlers/chat_completions/types.rs
- src/proxy/hooks/observability/mod.rs
- src/proxy/handlers/chat_completions/mod.rs
|
TODO: The rate limit module has been slightly refactored from its previous implementation, but it remains tightly coupled. This work will continue in future pull requests. |
Second part of #34
Summary by CodeRabbit