feat(guardrails): add PRE+POST tool scope support for PII and HarmfulContent middleware [AL-445]#868
Conversation
valentinabojan
left a comment
There was a problem hiding this comment.
Took a closer look at the new BuiltInGuardrailMiddlewareMixin._run_tool_guardrail. The refactor and test coverage are solid. Posting 5 inline comments on _base.py:
- Sync HTTP call from
asynccontext (now fires up to 2× per tool invocation under the new default) _extract_tool_output_dataonly inspectsmessages[0]of aCommand_tool_stage/_tool_namesare annotation-only on the mixin — AttributeError trap for future TOOL-scope subclassesinput_dataextracted unconditionally even on POST-only path (wasted work)- PRE/POST asymmetry on the
isinstance(..., dict)guard aroundmodified_*
No approval blocker — flagging as COMMENT.
| GuardrailExecutionStage.PRE_AND_POST, | ||
| ): | ||
| try: | ||
| result = self._evaluate_guardrail(input_data) |
There was a problem hiding this comment.
_evaluate_guardrail (line 57) is a synchronous HTTP call to the UiPath guardrails API, but _run_tool_guardrail is async. With the new default of PRE_AND_POST, every tool invocation now blocks the event loop twice — once here and again on line 157. Under any concurrent agent load this is a real throughput regression vs. the old PRE-only path.
Cheapest fix is to push the sync call into a worker thread (needs import asyncio at the top):
| result = self._evaluate_guardrail(input_data) | |
| result = await asyncio.to_thread(self._evaluate_guardrail, input_data) |
And the same swap on line 157. Long-term, an async evaluate_guardrail on the SDK side would be cleaner.
| update = result.update if hasattr(result, "update") else {} | ||
| messages = update.get("messages", []) if isinstance(update, dict) else [] | ||
| content = ( | ||
| messages[0].content |
There was a problem hiding this comment.
A Command(update={"messages": [...]}) can carry multiple messages, and the ToolMessage isn't guaranteed to be first. If a flow emits e.g. [AIMessage(...), ToolMessage(<real content>)], this branch reads messages[0], sees it isn't a ToolMessage, sets content = None, and POST silently gets no content to validate.
Iterate to find the first ToolMessage instead:
| messages[0].content | |
| tool_msg = next( | |
| (m for m in messages if isinstance(m, ToolMessage)), None | |
| ) | |
| content = tool_msg.content if tool_msg is not None else None |
| _name: str | ||
| action: GuardrailAction | ||
| _tool_names: list[str] | None | ||
| _tool_stage: GuardrailExecutionStage |
There was a problem hiding this comment.
_tool_stage and _tool_names are annotation-only here — no class-level binding. Today both subclasses (pii_detection.py, harmful_content.py) set them in __init__, so it works. But the moment a future TOOL-scope subclass forgets, _run_tool_guardrail will AttributeError at runtime — not at construction — and only when a specific tool is called. The mixin docstring also only mentions _guardrail / _name / action as required attrs, which makes this easy to miss.
| _tool_stage: GuardrailExecutionStage | |
| _tool_names: list[str] | None = None | |
| _tool_stage: GuardrailExecutionStage = GuardrailExecutionStage.PRE_AND_POST |
And extend the docstring (lines 38–41) to list these alongside the others.
| if self._tool_names is None or sanitized_tool_name not in self._tool_names: | ||
| return await handler(request) | ||
|
|
||
| input_data = self._extract_tool_input_data(request) |
There was a problem hiding this comment.
input_data is extracted on every invocation, even when stage=POST and the value is never read. Cheap call, but pointless work — move it inside the PRE block (right after try: on line 132). Then a POST-only middleware skips the dict copy / str() conversion entirely.
| try: | ||
| result = self._evaluate_guardrail(output_data) | ||
| modified_output = self._handle_validation_result(result, output_data) | ||
| if modified_output is not None: |
There was a problem hiding this comment.
Asymmetry with line 135: PRE only re-wraps the request when isinstance(modified_input, dict), but POST forwards any non-None to create_modified_tool_result. The helper actually handles both str and dict (_utils.py:71-82), so the difference is intentional — but it's load-bearing in a way that's not obvious from the call site, and the next refactor pass is likely to "tidy it up" and break things.
At minimum, add a one-line comment:
| if modified_output is not None: | |
| if modified_output is not None: | |
| # POST: ToolMessage.content accepts both str and dict | |
| tool_result = create_modified_tool_result(tool_result, modified_output) |
Or unify the contract by letting create_modified_tool_request accept str too, and gating both sites the same way.
| return final_seen | ||
|
|
||
|
|
||
| class UiPathChat(_UpstreamUiPathChat): |
There was a problem hiding this comment.
The original UiPathChat was a no-op stub (pass). The change was needed to fix a real bug in the joke agent output.
Root cause: When LangChain tracing callbacks are registered, ainvoke uses the streaming API path. Each SSE chunk from the UiPath
LLM gateway includes model_name, id, and created in its generation_info. LangChain accumulates chunks via AIMessageChunk.add →
merge_dicts, which string-concatenates values for these keys. With 14 SSE chunks, the final AI message had:
response_metadata.model_name = "gpt-4o-2024-08-06gpt-4o-2024-08-06gpt-4o-2024-08-06..." # × 14
response_metadata.finish_reason = "stopstop" # sent in 2 final chunks
The fix strips model_name, id, created from intermediate chunks (they only belong on the final chunk), and deduplicates when the
API sends finish_reason twice. The parent class _UpstreamUiPathChat includes these fields in every chunk by design (for standalone
streaming use), so the override is the correct layer to fix this without patching the upstream client.
The change is minimal: a module-level _strip_chunk_metadata helper + overrides of the two streaming methods that call super() and
filter each chunk through it. No behavior changes for non-streaming invocations.
…Content middleware UiPathPIIDetectionMiddleware and UiPathHarmfulContentMiddleware now evaluate the guardrail against both the tool input (PRE) and tool output (POST) when scopes=[GuardrailScope.TOOL] is set, matching the behavior of low-code agents. A new `stage` parameter (default PRE_AND_POST) controls which stages run. _extract_tool_output_data added to BuiltInGuardrailMiddlewareMixin for shared output parsing logic (ToolMessage | Command -> dict). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _extract_tool_input_data and the entire PRE+POST wrap_tool_call implementation into BuiltInGuardrailMiddlewareMixin._create_tool_wrap_hook, eliminating duplication between UiPathPIIDetectionMiddleware and UiPathHarmfulContentMiddleware. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract _run_tool_guardrail from closure in _create_tool_wrap_hook to make the PRE+POST evaluation logic directly testable. Add 27 unit tests covering all branches in _extract_tool_input_data, _extract_tool_output_data, and _run_tool_guardrail. Add 2 parity scenarios (× 2 flavors) verifying that PII and HarmfulContent middlewares correctly block on tool output (POST). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on 3.11 event loop compat - Override _uipath_stream/_uipath_astream in UiPathChat to strip model_name/id/created from intermediate SSE chunks and deduplicate finish_reason on duplicate final chunks, preventing string-concatenation via AIMessageChunk.__add__ - Add _utils/_compat.py: backport CPython 3.12 fix for BaseEventLoop.is_closed() using getattr(self, '_closed', True) to suppress Homebrew Python 3.11 GC AttributeError warnings - Push both sync _evaluate_guardrail calls in _run_tool_guardrail to asyncio.to_thread to avoid blocking the event loop on guardrail HTTP calls - Move input_data extraction inside the PRE block (skip for POST-only middlewares) - Unify PRE/POST action result gating: both now forward any non-None result; widen create_modified_tool_request to accept str | dict to match create_modified_tool_result - Fix joke-agent graph.py: use input_schema/output_schema (deprecated input=/output= kwargs) - Update test_tool_wrap.py: add Command-with-non-first-ToolMessage test; mypy fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eption - Extract _get_tool_message_content and _parse_str_content module-level helpers to reduce _extract_tool_output_data cognitive complexity from 27 to 3 - Replace logger.error(..., exc_info=True) with logger.exception() in all three exception handlers (SonarCloud fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New tests/chat/test_models.py: covers _strip_chunk_metadata and both _uipath_stream/_uipath_astream overrides (chat/models.py → 100%) - Extend tests/guardrails/middlewares/test_tool_wrap.py: add TestCheckMessages (9 tests), TestCreateToolWrapHook (1 test), and non-str/non-dict content edge case (_base.py → 95.5%) - New tests/guardrails/middlewares/test_utils.py: covers create_modified_tool_request fallback, create_modified_tool_result all branches, and extract_text_from_messages (_utils.py → 100%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
58096df to
435fed6
Compare
- _utils.py: suppress typeddict-item error on `args` assignment, which intentionally accepts str | dict[str, Any] after the earlier contract unification - test_utils.py: annotate message lists as list[BaseMessage] to satisfy invariant list variance; add same typeddict-item suppression in helper - test_models.py: pass model_name explicitly to model_construct to satisfy mypy's required-argument check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without __init__.py, pytest cannot form unique module paths when two directories contain a file with the same basename (e.g. test_utils.py exists in both tests/agent/tools/ and tests/guardrails/middlewares/). Adding __init__.py throughout makes every test directory a proper Python package, so each module gets a fully-qualified name and the collision disappears. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
… fix - Add UiPathPromptInjectionMiddleware to middleware table and imports (threshold float 0–1) - Clarify that PII and HarmfulContent support TOOL scope (fixed in PR #868 via _run_tool_guardrail) - Reformat middleware table with Supported scopes / Stage support / Extra parameters columns - Add ## Reference section with complete value tables for GuardrailScope, GuardrailExecutionStage, LoggingSeverityLevel, PIIDetectionEntityType (19 values), HarmfulContentEntityType, and IntellectualPropertyEntityType including threshold constraints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… fix - Add UiPathPromptInjectionMiddleware to middleware table and imports (threshold float 0–1) - Clarify that PII and HarmfulContent support TOOL scope (fixed in PR #868 via _run_tool_guardrail) - Reformat middleware table with Supported scopes / Stage support / Extra parameters columns - Add ## Reference section with complete value tables for GuardrailScope, GuardrailExecutionStage, LoggingSeverityLevel, PIIDetectionEntityType (19 values), HarmfulContentEntityType, and IntellectualPropertyEntityType including threshold constraints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>



What changed?
Added PRE+POST tool scope support for
UiPathPIIDetectionMiddlewareandUiPathHarmfulContentMiddleware, matching the behavior of UiPath low-code agents where guardrails are evaluated against both tool input (PRE) and tool output (POST).Key changes:
pii_detection.py/harmful_content.py: Addedstage: GuardrailExecutionStage = PRE_AND_POSTparameter. TOOL scope now evaluates the guardrail against tool args (PRE) and against the tool's return value (POST)._base.py: Extracted_extract_tool_output_data,_extract_tool_input_data, and_create_tool_wrap_hookintoBuiltInGuardrailMiddlewareMixinto eliminate copy-paste duplication between the two middlewares. The inner closure was further extracted into_run_tool_guardrail(request, handler)to make the PRE+POST logic directly testable.samples/joke-agent/graph.py: AddedUiPathHarmfulContentMiddlewareon TOOL scope and explicitstage=PRE_AND_POSTto both TOOL-scope middlewares.uipath-langchain0.11.3 → 0.11.4.How has this been tested?
tests/guardrails/middlewares/test_tool_wrap.py): Cover all branches of_extract_tool_input_data,_extract_tool_output_data, and_run_tool_guardrail— tool-name skipping, PRE-only, POST-only, PRE+POST stage routing,GuardrailBlockException→AgentRuntimeErrorconversion,AgentRuntimeErrorre-raise, generic exception logging, request/result modification, and empty-output skipping.test_guardrails_in_langgraph.py):test_tool_pii_post_blockandtest_tool_harmful_content_post_block, each running against both middleware and decorator flavors, verifying that violations in tool output triggerBlockActioncorrectly.samples/joke-agent— HTTP logs show two guardrail API calls per tool invocation (PRE + POST).ruff checkandmypypass on all changed files.Are there any breaking changes?