Skip to content

feat(ai-moderation): shared ctx flag so stacked moderation plugins can short-circuit duplicate vendor calls #13353

@janiussyafiq

Description

@janiussyafiq

Context

APISIX currently ships moderation plugins that target the AI request/response path:

They all run in the access and lua_body_filter phases.

Problem

When an operator configures multiple moderation plugins on the same route (defense-in-depth), each plugin independently calls its own vendor API for every request and every streaming body chunk. There is no shared signal that another moderation plugin has already decided "this content is OK" or "this content is flagged." Concrete consequences:

  1. Vendor calls are additive — 2× or N× cost per request and per buffer flush.
  2. Last-flagger-wins deny body shape. Each lua_body_filter plugin reads original upstream content from ctx.var.llm_response_text / ctx.llm_response_contents_in_chunk regardless of earlier plugins' body rewrites, makes its own independent flag decision, and rewrites the body if it flags. When multiple plugins flag, the client sees whichever plugin's deny shape ran last (Aliyun-shaped one moment, Lakera-shaped the next).
  3. Operators get neither pass nor block coordination. No "fast pass" (skip remaining vendors on clean) nor "first block wins" (skip remaining vendors on flag).

Proposed design

Introduce shared ctx.var signals consumed by all moderation plugins:

  • ctx.var.ai_moderation_decided (boolean) — set to true by the first plugin that produces a verdict, clean or flagged.
  • ctx.var.ai_moderation_flagged (boolean) — set to true by the first plugin that flags.

Each moderation plugin gains a config knob coordinate_with_siblings: bool, default false to preserve current independent-scan behavior (no breaking change for existing setups). When set true:

  • If ai_moderation_decided is already true, the plugin skips its scan and inherits the prior decision.
  • If ai_moderation_flagged is true, the plugin lets the earlier plugin's deny shape stand (or returns its own — design decision worth a sub-discussion).
  • If clean, the plugin lets the request through without calling its vendor.

Out of scope

Cross-vendor verdict reconciliation (e.g., Aliyun says clean but Lakera says flag — should the operator be alerted to disagreement, treated as flagged, treated as clean?). That's a follow-up if anyone deploys this pattern at scale and reports concrete needs.

Discovered

While designing the composition behavior for ai-lakera-guard (tracking issue: #13291); applies generally to any pair of moderation plugins stacked on the same route.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions