Conversation
Wraps with [UD-<id>]...[/UD-<id>] markers are now off by default. Callers that want them set annotateBoundary: true on createPromptDefense options. - Skips generateDataBoundary() entirely when disabled (no nanoid() call per tool result; no boundary_annotation entries in methodsByField) - Hard gate across all risk levels (old alwaysAnnotate only gated low) - Explicit methods: ["boundary_annotation"] in SanitizeOptions still wraps regardless of the flag (per-call escape hatch) Non-breaking in practice: generateBoundaryInstructions — the system-prompt template that teaches an LLM what [UD-*] means — has zero downstream consumers across defender/connect/connect-handler/unified-cloud-api, so the tags were inert scaffolding costing per-field metadata noise and output bloat. SFE and Tier 2 already strip boundary markers on input (v0.6.2) so self-flagging is unaffected. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
8edd268 to
9c2a23c
Compare
There was a problem hiding this comment.
Pull request overview
Makes boundary annotation ([UD-<id>]...[/UD-<id>]) opt-in across the sanitizer pipeline via a new annotateBoundary flag, defaulting to false, and updates tests/docs to reflect the new default behavior.
Changes:
- Replace risk-level/legacy gating with a hard opt-in
annotateBoundaryswitch inSanitizer, plus an explicit per-callmethods: ["boundary_annotation"]escape hatch. - Plumb
annotateBoundarythroughPromptDefenseOptions→ToolResultSanitizerConfig, and skipgenerateDataBoundary()entirely when wrapping is disabled. - Update specs and README to validate/document default-off behavior and opt-in wrapping.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/sanitizers/sanitizer.ts | Introduces annotateBoundary config, changes boundary wrapping to opt-in, keeps explicit method override behavior. |
| src/core/tool-result-sanitizer.ts | Adds annotateBoundary config and avoids boundary generation when disabled; passes flag into createSanitizer. |
| src/core/prompt-defense.ts | Adds annotateBoundary option to the public PromptDefenseOptions and forwards it to ToolResultSanitizer. |
| specs/sanitizers.spec.ts | Updates unit/integration expectations for default-off wrapping and adds opt-in/override coverage. |
| specs/integration.spec.ts | Adds PromptDefense-level assertions for default-off vs opt-in boundary wrapping; opts into wrapping in a scenario test. |
| README.md | Documents boundary annotation as opt-in and references system-prompt pairing guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Callers that opt into annotateBoundary need the system-prompt template that tells an LLM how to handle [UD-*] markers. Previously only available via a deep import. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…caller SFE was replacing the output value with the filtered payload, permanently dropping metadata/identifier fields before Tier 1 sanitization and the final DefenseResult.sanitized. This meant the LLM received a truncated tool result whenever useSfe was enabled. Fix: scope sfeFilteredValue to Tier 2 string extraction only. Tier 1 sanitization and the returned sanitized payload always operate on the original value. fieldsDropped now documents paths excluded from classification, not paths absent from the returned data. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
7d590ea
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Modifies core sanitization logic and changes default behavior for boundary annotation. The SFE filtering fix also alters data flow in a core path, requiring human verification.
- Update annotateBoundary JSDoc to reference the exported package symbol instead of the internal utils/boundary path - Add sfe.spec.ts test asserting DefenseResult.sanitized retains fields that SFE drops from Tier 2 classification Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Requires human review: Changes core sanitization logic and default output behavior in a security package. The data flow changes in the defense pipeline (SFE fix) warrant human verification.
Summary
Opt-in boundary annotation (
annotateBoundary)[UD-<id>]...[/UD-<id>]) is now opt-in viaannotateBoundaryonPromptDefenseOptions/ToolResultSanitizerConfig/SanitizerConfig. Default:false.alwaysAnnotateflag only gatedlowrisk; medium/high still wrapped. New flag is unconditional.methods: ['boundary_annotation']inSanitizeOptionsstill wraps regardless of the flag (per-call escape hatch).generateDataBoundary()is skipped entirely — nonanoid()call per tool result, no"boundary_annotation"entries inmetadata.methodsByField.generateBoundaryInstructions()andcontainsBoundaryPatterns()from the package entrypoint for callers that opt in.generateBoundaryInstructions()— the system-prompt template that teaches an LLM to treat[UD-*]content as untrusted data — had zero downstream consumers acrossdefender,connect,connect-handler, andunified-cloud-api. The tags were inert scaffolding, so making them opt-in removes per-field metadata noise and output bloat for all existing callers while preserving the feature for anyone who adds the system-prompt half of the contract.Fix: SFE filtering is classifier-only
SFE was replacing the output with the filtered payload, permanently dropping metadata/identifier fields from
DefenseResult.sanitized. This meant the LLM received a truncated tool result wheneveruseSfewas enabled — losing IDs, cursors, and other fields the agent might need for follow-up calls.Fix:
sfeFilteredValuenow feeds Tier 2 string extraction only. Tier 1 sanitization and the returnedsanitizedpayload always operate on the original value.fieldsDroppedis now accurately documented as "fields excluded from classification" not "fields absent from output".Non-breaking
[UD-*]tags appearing in output (confirmed by grepping all downstream repos).useSfeare unaffected.Downstream
connect-sdkDefenderSettingswill gainannotateBoundaryin a follow-up PR for exposure in unified-cloud-api project settings.Test plan
[UD-at any risk level), flag on (end-to-end viadefendToolResult), explicit-methods escape hatch.npm test— 206/207 passing (pre-existing ONNX batch classification flake on cleanmain).npx tsc --noEmit— clean.🤖 Generated with Claude Code