Skip to content

feat(guardrails): add PRE+POST tool scope support for PII and HarmfulContent middleware [AL-445]#868

Merged
apetraru-uipath merged 11 commits into
mainfrom
fix/guardrails_on_tools_scopes
May 21, 2026
Merged

feat(guardrails): add PRE+POST tool scope support for PII and HarmfulContent middleware [AL-445]#868
apetraru-uipath merged 11 commits into
mainfrom
fix/guardrails_on_tools_scopes

Conversation

@apetraru-uipath
Copy link
Copy Markdown
Contributor

@apetraru-uipath apetraru-uipath commented May 20, 2026

What changed?

Added PRE+POST tool scope support for UiPathPIIDetectionMiddleware and UiPathHarmfulContentMiddleware, matching the behavior of UiPath low-code agents where guardrails are evaluated against both tool input (PRE) and tool output (POST).

Key changes:

  • pii_detection.py / harmful_content.py: Added stage: GuardrailExecutionStage = PRE_AND_POST parameter. TOOL scope now evaluates the guardrail against tool args (PRE) and against the tool's return value (POST).
  • _base.py: Extracted _extract_tool_output_data, _extract_tool_input_data, and _create_tool_wrap_hook into BuiltInGuardrailMiddlewareMixin to eliminate copy-paste duplication between the two middlewares. The inner closure was further extracted into _run_tool_guardrail(request, handler) to make the PRE+POST logic directly testable.
  • samples/joke-agent/graph.py: Added UiPathHarmfulContentMiddleware on TOOL scope and explicit stage=PRE_AND_POST to both TOOL-scope middlewares.
  • Version bump: uipath-langchain 0.11.3 → 0.11.4.

How has this been tested?

  • 27 new unit tests (tests/guardrails/middlewares/test_tool_wrap.py): Cover all branches of _extract_tool_input_data, _extract_tool_output_data, and _run_tool_guardrail — tool-name skipping, PRE-only, POST-only, PRE+POST stage routing, GuardrailBlockExceptionAgentRuntimeError conversion, AgentRuntimeError re-raise, generic exception logging, request/result modification, and empty-output skipping.
  • 4 new e2e parity test runs (test_guardrails_in_langgraph.py): test_tool_pii_post_block and test_tool_harmful_content_post_block, each running against both middleware and decorator flavors, verifying that violations in tool output trigger BlockAction correctly.
  • Hook wiring tests extended with TOOL-scope assertions for both affected middlewares.
  • Validated end-to-end by running samples/joke-agent — HTTP logs show two guardrail API calls per tool invocation (PRE + POST).
  • ruff check and mypy pass on all changed files.

Are there any breaking changes?

  • Under Feature Flag
  • None
  • DB migrations required
  • API removals or changes

Copy link
Copy Markdown
Contributor

@valentinabojan valentinabojan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a closer look at the new BuiltInGuardrailMiddlewareMixin._run_tool_guardrail. The refactor and test coverage are solid. Posting 5 inline comments on _base.py:

  • Sync HTTP call from async context (now fires up to 2× per tool invocation under the new default)
  • _extract_tool_output_data only inspects messages[0] of a Command
  • _tool_stage / _tool_names are annotation-only on the mixin — AttributeError trap for future TOOL-scope subclasses
  • input_data extracted unconditionally even on POST-only path (wasted work)
  • PRE/POST asymmetry on the isinstance(..., dict) guard around modified_*

No approval blocker — flagging as COMMENT.

GuardrailExecutionStage.PRE_AND_POST,
):
try:
result = self._evaluate_guardrail(input_data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_evaluate_guardrail (line 57) is a synchronous HTTP call to the UiPath guardrails API, but _run_tool_guardrail is async. With the new default of PRE_AND_POST, every tool invocation now blocks the event loop twice — once here and again on line 157. Under any concurrent agent load this is a real throughput regression vs. the old PRE-only path.

Cheapest fix is to push the sync call into a worker thread (needs import asyncio at the top):

Suggested change
result = self._evaluate_guardrail(input_data)
result = await asyncio.to_thread(self._evaluate_guardrail, input_data)

And the same swap on line 157. Long-term, an async evaluate_guardrail on the SDK side would be cleaner.

update = result.update if hasattr(result, "update") else {}
messages = update.get("messages", []) if isinstance(update, dict) else []
content = (
messages[0].content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Command(update={"messages": [...]}) can carry multiple messages, and the ToolMessage isn't guaranteed to be first. If a flow emits e.g. [AIMessage(...), ToolMessage(<real content>)], this branch reads messages[0], sees it isn't a ToolMessage, sets content = None, and POST silently gets no content to validate.

Iterate to find the first ToolMessage instead:

Suggested change
messages[0].content
tool_msg = next(
(m for m in messages if isinstance(m, ToolMessage)), None
)
content = tool_msg.content if tool_msg is not None else None

_name: str
action: GuardrailAction
_tool_names: list[str] | None
_tool_stage: GuardrailExecutionStage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tool_stage and _tool_names are annotation-only here — no class-level binding. Today both subclasses (pii_detection.py, harmful_content.py) set them in __init__, so it works. But the moment a future TOOL-scope subclass forgets, _run_tool_guardrail will AttributeError at runtime — not at construction — and only when a specific tool is called. The mixin docstring also only mentions _guardrail / _name / action as required attrs, which makes this easy to miss.

Suggested change
_tool_stage: GuardrailExecutionStage
_tool_names: list[str] | None = None
_tool_stage: GuardrailExecutionStage = GuardrailExecutionStage.PRE_AND_POST

And extend the docstring (lines 38–41) to list these alongside the others.

if self._tool_names is None or sanitized_tool_name not in self._tool_names:
return await handler(request)

input_data = self._extract_tool_input_data(request)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_data is extracted on every invocation, even when stage=POST and the value is never read. Cheap call, but pointless work — move it inside the PRE block (right after try: on line 132). Then a POST-only middleware skips the dict copy / str() conversion entirely.

try:
result = self._evaluate_guardrail(output_data)
modified_output = self._handle_validation_result(result, output_data)
if modified_output is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asymmetry with line 135: PRE only re-wraps the request when isinstance(modified_input, dict), but POST forwards any non-None to create_modified_tool_result. The helper actually handles both str and dict (_utils.py:71-82), so the difference is intentional — but it's load-bearing in a way that's not obvious from the call site, and the next refactor pass is likely to "tidy it up" and break things.

At minimum, add a one-line comment:

Suggested change
if modified_output is not None:
if modified_output is not None:
# POST: ToolMessage.content accepts both str and dict
tool_result = create_modified_tool_result(tool_result, modified_output)

Or unify the contract by letting create_modified_tool_request accept str too, and gating both sites the same way.

return final_seen


class UiPathChat(_UpstreamUiPathChat):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original UiPathChat was a no-op stub (pass). The change was needed to fix a real bug in the joke agent output.

Root cause: When LangChain tracing callbacks are registered, ainvoke uses the streaming API path. Each SSE chunk from the UiPath
LLM gateway includes model_name, id, and created in its generation_info. LangChain accumulates chunks via AIMessageChunk.add
merge_dicts, which string-concatenates values for these keys. With 14 SSE chunks, the final AI message had:

response_metadata.model_name = "gpt-4o-2024-08-06gpt-4o-2024-08-06gpt-4o-2024-08-06..." # × 14
response_metadata.finish_reason = "stopstop" # sent in 2 final chunks

The fix strips model_name, id, created from intermediate chunks (they only belong on the final chunk), and deduplicates when the
API sends finish_reason twice. The parent class _UpstreamUiPathChat includes these fields in every chunk by design (for standalone
streaming use), so the override is the correct layer to fix this without patching the upstream client.

The change is minimal: a module-level _strip_chunk_metadata helper + overrides of the two streaming methods that call super() and
filter each chunk through it. No behavior changes for non-streaming invocations.

apetraru-uipath and others added 8 commits May 21, 2026 16:02
…Content middleware

UiPathPIIDetectionMiddleware and UiPathHarmfulContentMiddleware now evaluate
the guardrail against both the tool input (PRE) and tool output (POST) when
scopes=[GuardrailScope.TOOL] is set, matching the behavior of low-code agents.

A new `stage` parameter (default PRE_AND_POST) controls which stages run.
_extract_tool_output_data added to BuiltInGuardrailMiddlewareMixin for shared
output parsing logic (ToolMessage | Command -> dict).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _extract_tool_input_data and the entire PRE+POST wrap_tool_call
implementation into BuiltInGuardrailMiddlewareMixin._create_tool_wrap_hook,
eliminating duplication between UiPathPIIDetectionMiddleware and
UiPathHarmfulContentMiddleware.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract _run_tool_guardrail from closure in _create_tool_wrap_hook to
make the PRE+POST evaluation logic directly testable. Add 27 unit tests
covering all branches in _extract_tool_input_data, _extract_tool_output_data,
and _run_tool_guardrail. Add 2 parity scenarios (× 2 flavors) verifying
that PII and HarmfulContent middlewares correctly block on tool output (POST).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on 3.11 event loop compat

- Override _uipath_stream/_uipath_astream in UiPathChat to strip model_name/id/created
  from intermediate SSE chunks and deduplicate finish_reason on duplicate final chunks,
  preventing string-concatenation via AIMessageChunk.__add__
- Add _utils/_compat.py: backport CPython 3.12 fix for BaseEventLoop.is_closed() using
  getattr(self, '_closed', True) to suppress Homebrew Python 3.11 GC AttributeError warnings
- Push both sync _evaluate_guardrail calls in _run_tool_guardrail to asyncio.to_thread
  to avoid blocking the event loop on guardrail HTTP calls
- Move input_data extraction inside the PRE block (skip for POST-only middlewares)
- Unify PRE/POST action result gating: both now forward any non-None result; widen
  create_modified_tool_request to accept str | dict to match create_modified_tool_result
- Fix joke-agent graph.py: use input_schema/output_schema (deprecated input=/output= kwargs)
- Update test_tool_wrap.py: add Command-with-non-first-ToolMessage test; mypy fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eption

- Extract _get_tool_message_content and _parse_str_content module-level helpers
  to reduce _extract_tool_output_data cognitive complexity from 27 to 3
- Replace logger.error(..., exc_info=True) with logger.exception() in all three
  exception handlers (SonarCloud fix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New tests/chat/test_models.py: covers _strip_chunk_metadata and both
  _uipath_stream/_uipath_astream overrides (chat/models.py → 100%)
- Extend tests/guardrails/middlewares/test_tool_wrap.py: add
  TestCheckMessages (9 tests), TestCreateToolWrapHook (1 test), and
  non-str/non-dict content edge case (_base.py → 95.5%)
- New tests/guardrails/middlewares/test_utils.py: covers
  create_modified_tool_request fallback, create_modified_tool_result
  all branches, and extract_text_from_messages (_utils.py → 100%)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@apetraru-uipath apetraru-uipath force-pushed the fix/guardrails_on_tools_scopes branch from 58096df to 435fed6 Compare May 21, 2026 13:02
apetraru-uipath and others added 3 commits May 21, 2026 16:06
- _utils.py: suppress typeddict-item error on `args` assignment, which
  intentionally accepts str | dict[str, Any] after the earlier contract
  unification
- test_utils.py: annotate message lists as list[BaseMessage] to satisfy
  invariant list variance; add same typeddict-item suppression in helper
- test_models.py: pass model_name explicitly to model_construct to
  satisfy mypy's required-argument check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without __init__.py, pytest cannot form unique module paths when two
directories contain a file with the same basename (e.g. test_utils.py
exists in both tests/agent/tools/ and tests/guardrails/middlewares/).
Adding __init__.py throughout makes every test directory a proper Python
package, so each module gets a fully-qualified name and the collision
disappears.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

apetraru-uipath added a commit that referenced this pull request May 21, 2026
… fix

- Add UiPathPromptInjectionMiddleware to middleware table and imports (threshold float 0–1)
- Clarify that PII and HarmfulContent support TOOL scope (fixed in PR #868 via _run_tool_guardrail)
- Reformat middleware table with Supported scopes / Stage support / Extra parameters columns
- Add ## Reference section with complete value tables for GuardrailScope, GuardrailExecutionStage,
  LoggingSeverityLevel, PIIDetectionEntityType (19 values), HarmfulContentEntityType,
  and IntellectualPropertyEntityType including threshold constraints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@apetraru-uipath apetraru-uipath enabled auto-merge (squash) May 21, 2026 13:49
@apetraru-uipath apetraru-uipath merged commit 20c34c4 into main May 21, 2026
49 checks passed
@apetraru-uipath apetraru-uipath deleted the fix/guardrails_on_tools_scopes branch May 21, 2026 14:42
apetraru-uipath added a commit that referenced this pull request May 21, 2026
… fix

- Add UiPathPromptInjectionMiddleware to middleware table and imports (threshold float 0–1)
- Clarify that PII and HarmfulContent support TOOL scope (fixed in PR #868 via _run_tool_guardrail)
- Reformat middleware table with Supported scopes / Stage support / Extra parameters columns
- Add ## Reference section with complete value tables for GuardrailScope, GuardrailExecutionStage,
  LoggingSeverityLevel, PIIDetectionEntityType (19 values), HarmfulContentEntityType,
  and IntellectualPropertyEntityType including threshold constraints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants