Skip to content

feat(ai-protocols): extend extract_request_content to cover Anthropic top-level system: and tools[] across all protocols #13352

@janiussyafiq

Description

@janiussyafiq

Context

apisix/plugins/ai-aliyun-content-moderation.lua, apisix/plugins/ai-aws-content-moderation.lua, and the upcoming ai-lakera-guard plugin (tracking issue: #13291) all use protocols.get(ctx.ai_client_protocol).extract_request_content(body) to assemble the text passed to their vendor moderation API.

Each protocol's extract_request_content (openai-chat, openai-responses, anthropic-messages, bedrock-converse) currently walks only body.messages[].content. Two coverage gaps are inherited by every moderation plugin that uses it.

Gap 1 — Anthropic top-level system:

Anthropic places system prompts outside messages[] (top-level body.system string). anthropic-messages.lua's extract_request_content (lines 177-198) does not surface it. The same module's get_messages (lines 203-228) does include it, so there's a clear precedent for treating system as scannable content — extract_request_content just doesn't.

The textbook prompt-injection attack — "ignore your previous instructions and..." — most commonly targets the system role. Anthropic-format requests bypass scanning entirely for this attack class. OpenAI Chat / Responses and Bedrock Converse put system content inside messages[] and are unaffected.

Gap 2 — body.tools[] definitions

Function-call schemas — tools[].function.description, tools[].function.parameters, the equivalents in Anthropic and Bedrock — are not scanned by any current extract_request_content implementation. A maliciously-crafted tool description (instructions to exfiltrate, jailbreak via tool name, etc.) bypasses scanning across all four protocols.

Proposed fix

Extend extract_request_content in each protocol module:

  • anthropic-messages.lua: include body.system string content; walk body.tools[] (name, description, input_schema text content).
  • openai-chat.lua and openai-responses.lua: walk body.tools[] (function.name, function.description, function.parameters schema text content).
  • bedrock-converse.lua: include body.system block content; walk body.toolConfig.tools[] (toolSpec.name, toolSpec.description, toolSpec.inputSchema text content).

Side effects

This is a behavior change. Existing ai-aliyun-content-moderation and ai-aws-content-moderation users may see traffic that previously passed now flag, because more content is being scanned. Test plans for both plugins should add coverage for the new fields. Worth flagging in release notes.

Discovered

While drafting the design for ai-lakera-guard (tracking issue: #13291). The inherited helper limitations affect any moderation plugin built on top of apisix/plugins/ai-protocols/, not just the new one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions