fix(ENG-12707, ENG-12708): make boundary annotation opt-in (annotateBoundary flag) by hiskudin · Pull Request #57 · StackOneHQ/defender

hiskudin · 2026-04-22T15:13:02Z

Summary

Opt-in boundary annotation (`annotateBoundary`)

Boundary wrapping ([UD-<id>]...[/UD-<id>]) is now opt-in via annotateBoundary on PromptDefenseOptions / ToolResultSanitizerConfig / SanitizerConfig. Default: false.
Hard gate across all risk levels — the old alwaysAnnotate flag only gated low risk; medium/high still wrapped. New flag is unconditional.
Explicit methods: ['boundary_annotation'] in SanitizeOptions still wraps regardless of the flag (per-call escape hatch).
When disabled, generateDataBoundary() is skipped entirely — no nanoid() call per tool result, no "boundary_annotation" entries in metadata.methodsByField.
Exported generateBoundaryInstructions() and containsBoundaryPatterns() from the package entrypoint for callers that opt in.

generateBoundaryInstructions() — the system-prompt template that teaches an LLM to treat [UD-*] content as untrusted data — had zero downstream consumers across defender, connect, connect-handler, and unified-cloud-api. The tags were inert scaffolding, so making them opt-in removes per-field metadata noise and output bloat for all existing callers while preserving the feature for anyone who adds the system-prompt half of the contract.

Fix: SFE filtering is classifier-only

SFE was replacing the output with the filtered payload, permanently dropping metadata/identifier fields from DefenseResult.sanitized. This meant the LLM received a truncated tool result whenever useSfe was enabled — losing IDs, cursors, and other fields the agent might need for follow-up calls.

Fix: sfeFilteredValue now feeds Tier 2 string extraction only. Tier 1 sanitization and the returned sanitized payload always operate on the original value. fieldsDropped is now accurately documented as "fields excluded from classification" not "fields absent from output".

Non-breaking

No known caller relies on [UD-*] tags appearing in output (confirmed by grepping all downstream repos).
Tier 2 still strips boundary markers on input (v0.6.2 fix(ENG-12702): strip boundary markers from input before classification #55), so the self-flagging feedback loop is unaffected.
SFE fix is purely additive correctness — callers not using useSfe are unaffected.

Downstream

connect-sdk DefenderSettings will gain annotateBoundary in a follow-up PR for exposure in unified-cloud-api project settings.

Test plan

Existing sanitizer + integration specs updated to opt-in or assert default-off behavior.
New tests: default off (no [UD- at any risk level), flag on (end-to-end via defendToolResult), explicit-methods escape hatch.
npm test — 206/207 passing (pre-existing ONNX batch classification flake on clean main).
npx tsc --noEmit — clean.

🤖 Generated with Claude Code

Wraps with [UD-<id>]...[/UD-<id>] markers are now off by default. Callers that want them set annotateBoundary: true on createPromptDefense options. - Skips generateDataBoundary() entirely when disabled (no nanoid() call per tool result; no boundary_annotation entries in methodsByField) - Hard gate across all risk levels (old alwaysAnnotate only gated low) - Explicit methods: ["boundary_annotation"] in SanitizeOptions still wraps regardless of the flag (per-call escape hatch) Non-breaking in practice: generateBoundaryInstructions — the system-prompt template that teaches an LLM what [UD-*] means — has zero downstream consumers across defender/connect/connect-handler/unified-cloud-api, so the tags were inert scaffolding costing per-field metadata noise and output bloat. SFE and Tier 2 already strip boundary markers on input (v0.6.2) so self-flagging is unaffected. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Makes boundary annotation ([UD-<id>]...[/UD-<id>]) opt-in across the sanitizer pipeline via a new annotateBoundary flag, defaulting to false, and updates tests/docs to reflect the new default behavior.

Changes:

Replace risk-level/legacy gating with a hard opt-in annotateBoundary switch in Sanitizer, plus an explicit per-call methods: ["boundary_annotation"] escape hatch.
Plumb annotateBoundary through PromptDefenseOptions → ToolResultSanitizerConfig, and skip generateDataBoundary() entirely when wrapping is disabled.
Update specs and README to validate/document default-off behavior and opt-in wrapping.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/sanitizers/sanitizer.ts	Introduces `annotateBoundary` config, changes boundary wrapping to opt-in, keeps explicit method override behavior.
src/core/tool-result-sanitizer.ts	Adds `annotateBoundary` config and avoids boundary generation when disabled; passes flag into `createSanitizer`.
src/core/prompt-defense.ts	Adds `annotateBoundary` option to the public `PromptDefenseOptions` and forwards it to `ToolResultSanitizer`.
specs/sanitizers.spec.ts	Updates unit/integration expectations for default-off wrapping and adds opt-in/override coverage.
specs/integration.spec.ts	Adds PromptDefense-level assertions for default-off vs opt-in boundary wrapping; opts into wrapping in a scenario test.
README.md	Documents boundary annotation as opt-in and references system-prompt pairing guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cubic-dev-ai

No issues found across 6 files

_{Requires human review: This change modifies the default output behavior of a core sanitization path in a security tool. Such changes to default behavior require human validation of downstream impact.}

Callers that opt into annotateBoundary need the system-prompt template that tells an LLM how to handle [UD-*] markers. Previously only available via a deep import. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

cubic-dev-ai

0 issues found across 1 file (changes from recent commits).

_{Auto-approved: Makes boundary annotation opt-in to reduce output noise. Includes comprehensive test updates and documentation. Low risk as the feature had no downstream consumers.}

glebedel

LGTM

…caller SFE was replacing the output value with the filtered payload, permanently dropping metadata/identifier fields before Tier 1 sanitization and the final DefenseResult.sanitized. This meant the LLM received a truncated tool result whenever useSfe was enabled. Fix: scope sfeFilteredValue to Tier 2 string extraction only. Tier 1 sanitization and the returned sanitized payload always operate on the original value. fieldsDropped now documents paths excluded from classification, not paths absent from the returned data. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

cubic-dev-ai

0 issues found across 1 file (changes from recent commits).

_{Requires human review: Modifies core sanitization logic and changes default behavior for boundary annotation. The SFE filtering fix also alters data flow in a core path, requiring human verification.}

- Update annotateBoundary JSDoc to reference the exported package symbol instead of the internal utils/boundary path - Add sfe.spec.ts test asserting DefenseResult.sanitized retains fields that SFE drops from Tier 2 classification Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cubic-dev-ai

0 issues found across 2 files (changes from recent commits).

_{Requires human review: Changes core sanitization logic and default output behavior in a security package. The data flow changes in the defense pipeline (SFE fix) warrant human verification.}

OMauriStkOne

LGTM

Copilot AI review requested due to automatic review settings April 22, 2026 15:13

hiskudin requested a review from a team as a code owner April 22, 2026 15:13

Copilot started reviewing on behalf of hiskudin April 22, 2026 15:13 View session

hiskudin changed the title ~~feat: make boundary annotation opt-in (annotateBoundary flag)~~ feat(ENG-12707): make boundary annotation opt-in (annotateBoundary flag) Apr 22, 2026

hiskudin changed the title ~~feat(ENG-12707): make boundary annotation opt-in (annotateBoundary flag)~~ fix(ENG-12707): make boundary annotation opt-in (annotateBoundary flag) Apr 22, 2026

hiskudin force-pushed the feat/opt-in-boundary-annotation branch from 8edd268 to 9c2a23c Compare April 22, 2026 15:15

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread src/core/prompt-defense.ts Outdated

Comment thread README.md

cubic-dev-ai Bot reviewed Apr 22, 2026

View reviewed changes

cubic-dev-ai Bot previously approved these changes Apr 22, 2026

View reviewed changes

glebedel previously approved these changes Apr 22, 2026

View reviewed changes

hiskudin dismissed stale reviews from glebedel and cubic-dev-ai[bot] via 7d590ea April 23, 2026 08:04

cubic-dev-ai Bot reviewed Apr 23, 2026

View reviewed changes

hiskudin changed the title ~~fix(ENG-12707): make boundary annotation opt-in (annotateBoundary flag)~~ fix(ENG-12707, ENG-12708): make boundary annotation opt-in (annotateBoundary flag) Apr 23, 2026

hiskudin requested a review from Copilot April 23, 2026 08:13

Copilot started reviewing on behalf of hiskudin April 23, 2026 08:13 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Apr 23, 2026

View reviewed changes

OMauriStkOne approved these changes Apr 23, 2026

View reviewed changes

hiskudin merged commit bf10849 into main Apr 23, 2026
4 checks passed

hiskudin deleted the feat/opt-in-boundary-annotation branch April 23, 2026 08:51

stackone-devops-service-account mentioned this pull request Apr 23, 2026

chore(main): release defender 0.6.3 #58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ENG-12707, ENG-12708): make boundary annotation opt-in (annotateBoundary flag)#57

fix(ENG-12707, ENG-12708): make boundary annotation opt-in (annotateBoundary flag)#57
hiskudin merged 4 commits intomainfrom
feat/opt-in-boundary-annotation

hiskudin commented Apr 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

glebedel left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

OMauriStkOne left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hiskudin commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Opt-in boundary annotation (annotateBoundary)

Fix: SFE filtering is classifier-only

Non-breaking

Downstream

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

glebedel left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

OMauriStkOne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hiskudin commented Apr 22, 2026 •

edited

Loading

Opt-in boundary annotation (`annotateBoundary`)