feat(guardrails): Add Alice WonderFence guardrail integration#26901
feat(guardrails): Add Alice WonderFence guardrail integration#26901lior-k wants to merge 6 commits into
Conversation
Greptile SummaryAdds Alice WonderFence as a new guardrail integration using the WonderFence V2 SDK, following the same multi-tenant credential-resolution pattern as Zscaler and Pangea. The PR covers pre-call, during-call, and post-call evaluation with
Confidence Score: 5/5New guardrail integration adding files only; no changes to existing code paths, and the implementation correctly handles all error and edge cases. The two issues flagged in prior review threads (LRU eviction closing in-flight clients, WonderFenceMissingSecrets being swallowed by the fail-open handler) are both fixed in the submitted code. The only remaining finding is a one-word docstring typo. The guardrail safety-critical path — missing credentials always fail closed, BLOCK actions always enforced — is confirmed by dedicated tests and correctly implemented in the exception handler ordering. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py | Core guardrail implementation; previous issues (LRU eviction, WonderFenceMissingSecrets fail-open bypass) are correctly addressed with explicit exception handlers and no close() call on eviction. |
| litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/init.py | Package initializer; correctly registers the guardrail in both guardrail_initializer_registry and guardrail_class_registry and adds it as a LiteLLM callback. |
| litellm/types/proxy/guardrails/guardrail_hooks/alice_wonderfence.py | Config model is well-defined; docstring has a typo — 'api_id' should be 'app_id' in the second sentence. |
| litellm/types/guardrails.py | Adds ALICE_WONDERFENCE = 'alice_wonderfence' to the SupportedGuardrailIntegrations enum; straightforward, no issues. |
| tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py | 17 mock-only unit tests covering all action types, fail-open/closed semantics, LRU cache behavior, multi-tenant resolution priority, and the logging_obj stash bridge; no real network calls. |
| docs/my-website/docs/proxy/guardrails/alice_wonderfence.md | Comprehensive integration docs; the ALLOW vs NO_ACTION action name discrepancy was flagged in a prior review thread. |
Reviews (2): Last reviewed commit: "fix(guardrails): propagate Alice WonderF..." | Re-trigger Greptile
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
b5d20a6 to
5d8d70d
Compare
c9e422c to
ec9b215
Compare
PR overviewVeria reviewed the latest changes in this pull request. Security review
Risk: 0/10 |
OpenAI chat translation populates both `structured_messages` and `texts` on guardrail input but reads back only `texts` after apply_guardrail returns. MASK was writing only to `structured_messages` when that was the analyzed source, so the unmasked `texts` slot won downstream and the original prompt reached the LLM while the response header still claimed the guardrail applied. MASK now also overwrites `texts[-1]` whenever `texts` is populated, keeping both slots consistent.
|
Here's a short video showcasing Alice Wonderfence guardrails: |
|
🤖 litellm-agent: This PR is currently BLOCKED from merge. Score: 3/5 ❌ Why blocked:
Details: Score docked for: 1 PR-related CI failure (Size gate: 2 file(s) over 500 added LOC — split first (litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/alice_wonderfence.py (+627), tests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py (+1011)). Add the Fix the issues above and push an update — the bot will re-review automatically.
|
Replace stray app_name="test-app" with comment noting app_id is per-request via metadata.alice_wonderfence_app_id, matching example_config.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This comment was marked as outdated.
This comment was marked as outdated.
… metadata Caller-supplied metadata.alice_wonderfence_app_id / alice_wonderfence_api_key no longer outrank admin-pinned key/team metadata. Adds allow_request_metadata_override (default False) as an explicit opt-in for trusted-gateway deployments — even when enabled, key/team metadata still wins. Closes the high-severity precedence inversion flagged on PR BerriAI#26901 (review comment r3226452019). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rn, drop in-repo doc Addresses two PR BerriAI#26901 blockers: 1. **Size-gate CI**: `alice_wonderfence.py` (+627 LOC) and the monolithic test file (+1011 LOC) tripped the 500-added-LOC threshold. Both are split along separation-of-concerns boundaries — no behavioral changes, only relocation and import rewiring. Largest resulting file is 496 LOC. Production split: - exceptions.py — WonderFenceMissingSecrets, WonderFenceBlockedError - client_cache.py — SDK lazy import + LRU client cache helper - credentials.py — api_key/app_id resolution + request-scoped stash bridge - processing.py — analysis context build, text extract, action dispatch - alice_wonderfence.py — WonderFenceGuardrail class (orchestrator) Test split (under tests/.../alice_wonderfence/): - conftest.py — shared SDK-stub + guardrail-factory fixtures - test_credentials.py — resolver precedence + override-flag tests - test_client_cache.py — LRU cache + initialization + missing-SDK tests - test_apply_guardrail.py — BLOCK/MASK/DETECT/NO_ACTION + fail modes - test_post_call_bridge.py — logging_obj stash + sibling fallback 2. **Maintainer request**: drop docs/my-website/docs/proxy/guardrails/ alice_wonderfence.md from this repo per CLAUDE.md (docs live in BerriAI/litellm-docs). The page has been ported to litellm-docs in BerriAI/litellm-docs#176. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
We've split the files to make them smaller. |
Summary
Adds Alice WonderFence as a new guardrail integration for real-time content moderation, using the WonderFence V2 SDK. Resolves per-request
api_keyandapp_idfrom request / API-key / team metadata (same multi-tenant pattern as Zscaler / Pangea).Behavior
apply_guardrailinput_type="request"): evaluates user prompt — supportsBLOCK/MASK/DETECT/NO_ACTION.apply_guardrailinput_type="response"): evaluates LLM response with the same action set.apply_guardrailvia the framework'sasync_moderation_hook.BLOCKis always enforced;fail_openonly suppresses transport-level errors.MASKrewritesinputs["texts"][-1]with the SDK-providedaction_text.Per-request resolution
api_key:metadata.alice_wonderfence_api_key→user_api_key_metadata→user_api_key_team_metadata→ configured default →ALICE_API_KEYenv.app_id(no default): same metadata chain — error if missing.post_callresolves from synthesizedrequest_datafirst, then falls back to a per-request stash onlogging_obj.model_call_details(the framework drops the request body'smetadatabefore post_call).Implementation
WonderFenceV2Clientcached perapi_key(LRU;max_cached_clients/ALICE_MAX_CACHED_CLIENTS).model_dump()'d (withstr()fallback) before being attached toHTTPException.detailto keep responses JSON-serializable.Files
litellm/proxy/guardrails/guardrail_hooks/alice_wonderfence/{__init__,alice_wonderfence,example_config}.{py,yaml}litellm/types/proxy/guardrails/guardrail_hooks/alice_wonderfence.pylitellm/types/guardrails.py(registerALICE_WONDERFENCEinSupportedGuardrailIntegrations)docs/my-website/docs/proxy/guardrails/alice_wonderfence.mdtests/test_litellm/proxy/guardrails/guardrail_hooks/test_alice_wonderfence.py(17 tests)tests/local_testing/test_configs/test_alice_config.yamlTest plan
make lint— Ruff, MyPy, Black all clean🤖 Generated with Claude Code