Context
apisix/plugins/ai-aliyun-content-moderation.lua, apisix/plugins/ai-aws-content-moderation.lua, and the upcoming ai-lakera-guard plugin (tracking issue: #13291) all use protocols.get(ctx.ai_client_protocol).extract_request_content(body) to assemble the text passed to their vendor moderation API.
Each protocol's extract_request_content (openai-chat, openai-responses, anthropic-messages, bedrock-converse) currently walks only body.messages[].content. Two coverage gaps are inherited by every moderation plugin that uses it.
Gap 1 — Anthropic top-level system:
Anthropic places system prompts outside messages[] (top-level body.system string). anthropic-messages.lua's extract_request_content (lines 177-198) does not surface it. The same module's get_messages (lines 203-228) does include it, so there's a clear precedent for treating system as scannable content — extract_request_content just doesn't.
The textbook prompt-injection attack — "ignore your previous instructions and..." — most commonly targets the system role. Anthropic-format requests bypass scanning entirely for this attack class. OpenAI Chat / Responses and Bedrock Converse put system content inside messages[] and are unaffected.
Gap 2 — body.tools[] definitions
Function-call schemas — tools[].function.description, tools[].function.parameters, the equivalents in Anthropic and Bedrock — are not scanned by any current extract_request_content implementation. A maliciously-crafted tool description (instructions to exfiltrate, jailbreak via tool name, etc.) bypasses scanning across all four protocols.
Proposed fix
Extend extract_request_content in each protocol module:
anthropic-messages.lua: include body.system string content; walk body.tools[] (name, description, input_schema text content).
openai-chat.lua and openai-responses.lua: walk body.tools[] (function.name, function.description, function.parameters schema text content).
bedrock-converse.lua: include body.system block content; walk body.toolConfig.tools[] (toolSpec.name, toolSpec.description, toolSpec.inputSchema text content).
Side effects
This is a behavior change. Existing ai-aliyun-content-moderation and ai-aws-content-moderation users may see traffic that previously passed now flag, because more content is being scanned. Test plans for both plugins should add coverage for the new fields. Worth flagging in release notes.
Discovered
While drafting the design for ai-lakera-guard (tracking issue: #13291). The inherited helper limitations affect any moderation plugin built on top of apisix/plugins/ai-protocols/, not just the new one.
Context
apisix/plugins/ai-aliyun-content-moderation.lua,apisix/plugins/ai-aws-content-moderation.lua, and the upcomingai-lakera-guardplugin (tracking issue: #13291) all useprotocols.get(ctx.ai_client_protocol).extract_request_content(body)to assemble the text passed to their vendor moderation API.Each protocol's
extract_request_content(openai-chat,openai-responses,anthropic-messages,bedrock-converse) currently walks onlybody.messages[].content. Two coverage gaps are inherited by every moderation plugin that uses it.Gap 1 — Anthropic top-level
system:Anthropic places system prompts outside
messages[](top-levelbody.systemstring).anthropic-messages.lua'sextract_request_content(lines 177-198) does not surface it. The same module'sget_messages(lines 203-228) does include it, so there's a clear precedent for treatingsystemas scannable content —extract_request_contentjust doesn't.The textbook prompt-injection attack — "ignore your previous instructions and..." — most commonly targets the system role. Anthropic-format requests bypass scanning entirely for this attack class. OpenAI Chat / Responses and Bedrock Converse put system content inside
messages[]and are unaffected.Gap 2 —
body.tools[]definitionsFunction-call schemas —
tools[].function.description,tools[].function.parameters, the equivalents in Anthropic and Bedrock — are not scanned by any currentextract_request_contentimplementation. A maliciously-crafted tool description (instructions to exfiltrate, jailbreak via tool name, etc.) bypasses scanning across all four protocols.Proposed fix
Extend
extract_request_contentin each protocol module:anthropic-messages.lua: includebody.systemstring content; walkbody.tools[](name,description,input_schematext content).openai-chat.luaandopenai-responses.lua: walkbody.tools[](function.name,function.description,function.parametersschema text content).bedrock-converse.lua: includebody.systemblock content; walkbody.toolConfig.tools[](toolSpec.name,toolSpec.description,toolSpec.inputSchematext content).Side effects
This is a behavior change. Existing
ai-aliyun-content-moderationandai-aws-content-moderationusers may see traffic that previously passed now flag, because more content is being scanned. Test plans for both plugins should add coverage for the new fields. Worth flagging in release notes.Discovered
While drafting the design for
ai-lakera-guard(tracking issue: #13291). The inherited helper limitations affect any moderation plugin built on top ofapisix/plugins/ai-protocols/, not just the new one.