Skip to content

feat(guardrails): Add Alice WonderFence guardrail integration#26901

Open
lior-k wants to merge 6 commits into
BerriAI:litellm_internal_stagingfrom
lior-k:feat-guardrails-alice-wonderfence
Open

feat(guardrails): Add Alice WonderFence guardrail integration#26901
lior-k wants to merge 6 commits into
BerriAI:litellm_internal_stagingfrom
lior-k:feat-guardrails-alice-wonderfence

Conversation

@lior-k
Copy link
Copy Markdown

@lior-k lior-k commented Apr 30, 2026

Re-opened from #20956 (auto-closed when the upstream litellm_oss_staging base branch was deleted while editing the PR base). Same head branch, same content, single squashed commit.

Summary

Adds Alice WonderFence as a new guardrail integration for real-time content moderation, using the WonderFence V2 SDK. Resolves per-request api_key and app_id from request / API-key / team metadata (same multi-tenant pattern as Zscaler / Pangea).

Behavior

  • Pre-call (apply_guardrail input_type="request"): evaluates user prompt — supports BLOCK / MASK / DETECT / NO_ACTION.
  • Post-call (apply_guardrail input_type="response"): evaluates LLM response with the same action set.
  • During-call: routed through apply_guardrail via the framework's async_moderation_hook.
  • BLOCK is always enforced; fail_open only suppresses transport-level errors.
  • MASK rewrites inputs["texts"][-1] with the SDK-provided action_text.

Per-request resolution

  • api_key: metadata.alice_wonderfence_api_keyuser_api_key_metadatauser_api_key_team_metadata → configured default → ALICE_API_KEY env.
  • app_id (no default): same metadata chain — error if missing.
  • post_call resolves from synthesized request_data first, then falls back to a per-request stash on logging_obj.model_call_details (the framework drops the request body's metadata before post_call).

Implementation

  • WonderFenceV2Client cached per api_key (LRU; max_cached_clients / ALICE_MAX_CACHED_CLIENTS).
  • BLOCK detections are model_dump()'d (with str() fallback) before being attached to HTTPException.detail to keep responses JSON-serializable.

Files

  • litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/{__init__,alice_wonderfence,example_config}.{py,yaml}
  • litellm/types/proxy/guardrails/guardrail_hooks/alice_wonderfence.py
  • litellm/types/guardrails.py (register ALICE_WONDERFENCE in SupportedGuardrailIntegrations)
  • docs/my-website/docs/proxy/guardrails/alice_wonderfence.md
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py (17 tests)
  • tests/local_testing/test_configs/test_alice_config.yaml

Test plan

  • make lint — Ruff, MyPy, Black all clean
  • 17 unit tests pass (all action types, fail-open / fail-closed, multi-text masking, structured messages, multipart content, BLOCK detection serialization)
  • Follows existing guardrail patterns (Zscaler, Pangea, Qualifire)
  • CLA signed

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

Adds Alice WonderFence as a new guardrail integration using the WonderFence V2 SDK, following the same multi-tenant credential-resolution pattern as Zscaler and Pangea. The PR covers pre-call, during-call, and post-call evaluation with BLOCK/MASK/DETECT/NO_ACTION actions and a per-request api_key/app_id resolution chain.

  • Core logic (alice_wonderfence.py): LRU client cache keyed by resolved api_key, explicit WonderFenceMissingSecrets handler ensures missing credentials always fail closed regardless of fail_open, and a logging_obj stash bridges the post-call path where the framework drops the original request body metadata.
  • Tests: 17 mock-only unit tests cover all action types, fail-open/closed semantics, LRU eviction, multi-tenant resolution priority, and the stash/recovery bridge — all without real network calls.
  • Registration: ALICE_WONDERFENCE is added to SupportedGuardrailIntegrations, and both guardrail_initializer_registry and guardrail_class_registry are populated in the package __init__.

Confidence Score: 5/5

New guardrail integration adding files only; no changes to existing code paths, and the implementation correctly handles all error and edge cases.

The two issues flagged in prior review threads (LRU eviction closing in-flight clients, WonderFenceMissingSecrets being swallowed by the fail-open handler) are both fixed in the submitted code. The only remaining finding is a one-word docstring typo. The guardrail safety-critical path — missing credentials always fail closed, BLOCK actions always enforced — is confirmed by dedicated tests and correctly implemented in the exception handler ordering.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py Core guardrail implementation; previous issues (LRU eviction, WonderFenceMissingSecrets fail-open bypass) are correctly addressed with explicit exception handlers and no close() call on eviction.
litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/init.py Package initializer; correctly registers the guardrail in both guardrail_initializer_registry and guardrail_class_registry and adds it as a LiteLLM callback.
litellm/types/proxy/guardrails/guardrail_hooks/alice_wonderfence.py Config model is well-defined; docstring has a typo — 'api_id' should be 'app_id' in the second sentence.
litellm/types/guardrails.py Adds ALICE_WONDERFENCE = 'alice_wonderfence' to the SupportedGuardrailIntegrations enum; straightforward, no issues.
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py 17 mock-only unit tests covering all action types, fail-open/closed semantics, LRU cache behavior, multi-tenant resolution priority, and the logging_obj stash bridge; no real network calls.
docs/my-website/docs/proxy/guardrails/alice_wonderfence.md Comprehensive integration docs; the ALLOW vs NO_ACTION action name discrepancy was flagged in a prior review thread.

Reviews (2): Last reviewed commit: "fix(guardrails): propagate Alice WonderF..." | Re-trigger Greptile

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 30, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing lior-k:feat-guardrails-alice-wonderfence (5d8d70d) with main (6ff668c)

Open in CodSpeed

Comment thread litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py Outdated
Comment thread docs/my-website/docs/proxy/guardrails/alice_wonderfence.md Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@lior-k lior-k force-pushed the feat-guardrails-alice-wonderfence branch 2 times, most recently from b5d20a6 to 5d8d70d Compare May 6, 2026 19:02
@lior-k lior-k changed the base branch from main to litellm_oss_staging May 6, 2026 19:12
@lior-k lior-k changed the base branch from litellm_oss_staging to litellm_internal_staging May 6, 2026 19:15
@lior-k lior-k requested a review from a team May 6, 2026 19:15
@lior-k lior-k force-pushed the feat-guardrails-alice-wonderfence branch from c9e422c to ec9b215 Compare May 6, 2026 19:27
Comment thread litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py Outdated
@veria-ai
Copy link
Copy Markdown
Contributor

veria-ai Bot commented May 6, 2026

PR overview

Veria reviewed the latest changes in this pull request.

Security review

  • No new security issues were flagged in the latest review.
  • No review issues remain open on this pull request.

Risk: 0/10

OpenAI chat translation populates both `structured_messages` and `texts`
on guardrail input but reads back only `texts` after apply_guardrail
returns. MASK was writing only to `structured_messages` when that was
the analyzed source, so the unmasked `texts` slot won downstream and
the original prompt reached the LLM while the response header still
claimed the guardrail applied.

MASK now also overwrites `texts[-1]` whenever `texts` is populated,
keeping both slots consistent.
@lior-k
Copy link
Copy Markdown
Author

lior-k commented May 6, 2026

Here's a short video showcasing Alice Wonderfence guardrails:
https://drive.google.com/file/d/1XtC4z-7R7tn1c-FnEJgYAOFGW0VlduTm/view?usp=drive_link

@lior-k
Copy link
Copy Markdown
Author

lior-k commented May 12, 2026

@greptileai

@oss-pr-review-agent-shin
Copy link
Copy Markdown
Contributor

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 3/5

Why blocked:

  • 1 PR-related CI failure (Size gate: 2 file(s) over 500 added LOC — split first (litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py (+627), tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py (+1011)). Add the oversized-ok label or a Big-PR-Approved: <handle> trailer if a maintainer has signed off.) (pr_related_failures, -2 pts)

Details: Score docked for: 1 PR-related CI failure (Size gate: 2 file(s) over 500 added LOC — split first (litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py (+627), tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py (+1011)). Add the oversized-ok label or a Big-PR-Approved: <handle> trailer if a maintainer has signed off.).

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

Replace stray app_name="test-app" with comment noting app_id is per-request
via metadata.alice_wonderfence_app_id, matching example_config.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py Outdated
Comment thread litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py Outdated
@lior-k

This comment was marked as outdated.

… metadata

Caller-supplied metadata.alice_wonderfence_app_id / alice_wonderfence_api_key
no longer outrank admin-pinned key/team metadata. Adds
allow_request_metadata_override (default False) as an explicit opt-in for
trusted-gateway deployments — even when enabled, key/team metadata still
wins. Closes the high-severity precedence inversion flagged on PR BerriAI#26901
(review comment r3226452019).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread docs/my-website/docs/proxy/guardrails/alice_wonderfence.md Outdated
lior-k and others added 2 commits May 20, 2026 14:51
…rn, drop in-repo doc

Addresses two PR BerriAI#26901 blockers:

1. **Size-gate CI**: `alice_wonderfence.py` (+627 LOC) and the monolithic test
   file (+1011 LOC) tripped the 500-added-LOC threshold. Both are split along
   separation-of-concerns boundaries — no behavioral changes, only relocation
   and import rewiring. Largest resulting file is 496 LOC.

   Production split:
   - exceptions.py — WonderFenceMissingSecrets, WonderFenceBlockedError
   - client_cache.py — SDK lazy import + LRU client cache helper
   - credentials.py — api_key/app_id resolution + request-scoped stash bridge
   - processing.py — analysis context build, text extract, action dispatch
   - alice_wonderfence.py — WonderFenceGuardrail class (orchestrator)

   Test split (under tests/.../alice_wonderfence/):
   - conftest.py — shared SDK-stub + guardrail-factory fixtures
   - test_credentials.py — resolver precedence + override-flag tests
   - test_client_cache.py — LRU cache + initialization + missing-SDK tests
   - test_apply_guardrail.py — BLOCK/MASK/DETECT/NO_ACTION + fail modes
   - test_post_call_bridge.py — logging_obj stash + sibling fallback

2. **Maintainer request**: drop docs/my-website/docs/proxy/guardrails/
   alice_wonderfence.md from this repo per CLAUDE.md (docs live in
   BerriAI/litellm-docs). The page has been ported to litellm-docs in
   BerriAI/litellm-docs#176.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@lior-k
Copy link
Copy Markdown
Author

lior-k commented May 21, 2026

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 3/5

Why blocked:

  • 1 PR-related CI failure (Size gate: 2 file(s) over 500 added LOC — split first (litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py (+627), tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py (+1011)). Add the oversized-ok label or a Big-PR-Approved: <handle> trailer if a maintainer has signed off.) (pr_related_failures, -2 pts)

Details: Score docked for: 1 PR-related CI failure (Size gate: 2 file(s) over 500 added LOC — split first (litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py (+627), tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py (+1011)). Add the oversized-ok label or a Big-PR-Approved: <handle> trailer if a maintainer has signed off.).

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

We've split the files to make them smaller.
@litellm-agent @oss-pr-review-agent-shin, please test again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants