Skip to content

feat(guardrail): integrate into chat handler#104

Merged
bzp2010 merged 2 commits into
mainfrom
bzp/feat-guardrail-in-chat
May 12, 2026
Merged

feat(guardrail): integrate into chat handler#104
bzp2010 merged 2 commits into
mainfrom
bzp/feat-guardrail-in-chat

Conversation

@bzp2010
Copy link
Copy Markdown
Collaborator

@bzp2010 bzp2010 commented May 12, 2026

This PR includes only non-streaming output; patches for streaming output will be added in a subsequent PR.

Summary by CodeRabbit

  • New Features
    • Added guardrails across chat endpoints (/chat, /messages, /responses) for input/output validation, automated rewriting, and blocking of unsafe outputs; model-aware enforcement and rewrites preserve request/response shape.
  • Improvements
    • Rewrote message handling flows so request and response messages can be round-tripped through guardrail checks with consistent replay handling.
  • Tests
    • Added unit tests covering conversions, stage mismatches, and runtime check behavior.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b41e7d9a-bc70-4d3e-994f-669453c3b432

📥 Commits

Reviewing files that changed from the base of the PR and between eb99781 and 5b3bbfe.

📒 Files selected for processing (1)
  • tests/package.json
✅ Files skipped from review due to trivial changes (1)
  • tests/package.json

📝 Walkthrough

Walkthrough

This PR adds guardrail request/response validation to the proxy's chat-completions, messages, and responses handlers. It introduces a bridge layer for message format conversion, extends the format handler with guardrail lifecycle hooks, and implements those hooks across all three adapters with format-specific conversion logic.

Changes

Guardrail Bridge and Handler Integration

Layer / File(s) Summary
Guardrail bridge types and message conversions
src/proxy/guardrails.rs
Error enums (GuardrailBridgeError, GuardrailExecutionError), ConfiguredGuardrailRuntime trait with GuardrailRuntimeHandle wrapper, bidirectional converters between ChatMessage and GuardrailMessage with tool-call and multipart-content support, payload builders for input/output guardrail payloads, model-to-guardrail resolution, and comprehensive unit tests.
Format handler guardrail lifecycle integration
src/proxy/handlers/format_handler.rs
FormatHandlerAdapter trait extended with four guardrail hooks (payload generation and rewrite application for input/output), request and response handler logic that resolves guardrails, applies input checks/rewrites before gateway call, and applies output checks/rewrites before response completion, with internal async helpers that iterate guardrails, compute payloads, run checks, dispatch rewrites, and block on validation errors.
Chat completions guardrail integration
src/proxy/handlers/chat_completions/mod.rs
ChatCompletionsAdapter implements four guardrail methods: converting request messages to input payload, rewriting request from guardrail results, converting response choice messages to output payload, rewriting response choices with message-count validation, and bridge_error helper for error conversion.
Messages adapter guardrail integration with format conversion
src/proxy/handlers/messages/mod.rs
MessagesAdapter implements guardrail hooks with full OpenAI-to-Anthropic message conversion: split system prompts from user messages, map ChatMessage to Anthropic SystemPrompt/AnthropicMessage structures with tool-call and multipart-content handling, convert response content blocks back to ChatMessage, validate roles, parse tool-call JSON arguments, handle image URLs, and include text-extraction and error-conversion helpers.
Responses adapter guardrail integration with message replay and rewriting
src/proxy/handlers/responses/mod.rs, src/proxy/handlers/responses/runtime.rs
ResponsesAdapter and runtime helpers implement guardrail hooks with message replay tracking: add replay_messages_len to lifecycle state, expose message conversion functions (request_input_messages, response_output_to_chat_messages), implement rewrite_request_from_messages (validates length, splits replay/current, derives new instructions/input) and rewrite_response_from_messages (rebuilds response output), with private conversion helpers for tool messages, role validation, and content transformation.
Module structure and visibility adjustments
src/proxy/handlers/mod.rs, src/proxy/mod.rs, tests/package.json
Declare new guardrails submodule, change format_handler from public to private with crate-level re-exports of FormatHandlerAdapter and format_handler, re-wire router endpoints (/v1/chat/completions, /v1/messages, /v1/responses) to use handlers::format_handler::<Adapter> directly, and update test package manager metadata.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • api7/aisix#100: Implements guardrail-type abstractions and traits that this PR's bridge layer wraps and converts between.
  • api7/aisix#92: Introduces the FormatHandlerAdapter and chat/messages/responses handler structure that this PR extends with guardrail hooks.
  • api7/aisix#89: Establishes the Responses API handler and runtime helpers that this PR wires into the guardrail message-rewriting pipeline.

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Security Check ❌ Error Hardcoded AWS credentials in test code (AKIA123 access key, "secret" secret). Found in src/proxy/guardrails.rs line 634-635 and src/config/entities/guardrails.rs lines 182-183. Replace hardcoded test credentials with environment variables or safe test fixtures. Also sanitize runtime error messages in guardrails.rs line 94-96 to avoid leaking sensitive error details in GatewayError::Internal.
E2e Test Quality Review ⚠️ Warning No E2E tests. Only 10 unit tests in guardrails.rs. Missing integration test coverage for block/rewrite paths and handler implementations. Add E2E tests covering guardrail block responses, rewrite validation, and handler integrations. Add unit tests for handler guardrail payload and rewrite methods.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: integrating guardrail functionality into the chat handler as evidenced by modifications across format_handler.rs, chat_completions/mod.rs, and related adapter implementations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bzp/feat-guardrail-in-chat

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/proxy/handlers/format_handler.rs (1)

274-284: 💤 Low value

Note: Streaming responses bypass output guardrails.

Output guardrails are only applied to complete (non-streaming) responses. Streaming responses are forwarded without output guardrail checks. If this is intentional (e.g., due to complexity of buffering streams), consider adding a comment documenting this limitation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/proxy/handlers/format_handler.rs` around lines 274 - 284, The current
match arm handling ChatResponse::Stream forwards streaming responses directly to
handle_stream_response::<A> (symbols: ChatResponse::Stream,
handle_stream_response) which skips output guardrail checks; either implement
buffering & post-processing so streams are validated before emission, or if
skipping is intentional, add an explicit comment above this arm documenting that
streaming responses bypass output guardrails and why (e.g.,
complexity/performance/real-time constraints) and reference where non-streaming
guardrails run (e.g., the code path handling ChatResponse::Complete) so future
readers know this is deliberate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/proxy/handlers/format_handler.rs`:
- Around line 274-284: The current match arm handling ChatResponse::Stream
forwards streaming responses directly to handle_stream_response::<A> (symbols:
ChatResponse::Stream, handle_stream_response) which skips output guardrail
checks; either implement buffering & post-processing so streams are validated
before emission, or if skipping is intentional, add an explicit comment above
this arm documenting that streaming responses bypass output guardrails and why
(e.g., complexity/performance/real-time constraints) and reference where
non-streaming guardrails run (e.g., the code path handling
ChatResponse::Complete) so future readers know this is deliberate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2cc01c22-1645-4432-a246-32d827f11755

📥 Commits

Reviewing files that changed from the base of the PR and between 8da3346 and eb99781.

📒 Files selected for processing (8)
  • src/proxy/guardrails.rs
  • src/proxy/handlers/chat_completions/mod.rs
  • src/proxy/handlers/format_handler.rs
  • src/proxy/handlers/messages/mod.rs
  • src/proxy/handlers/mod.rs
  • src/proxy/handlers/responses/mod.rs
  • src/proxy/handlers/responses/runtime.rs
  • src/proxy/mod.rs

@bzp2010 bzp2010 merged commit 1ad76e4 into main May 12, 2026
3 checks passed
@bzp2010 bzp2010 deleted the bzp/feat-guardrail-in-chat branch May 12, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant