Skip to content

feat(security): response verification layer for post-LLM injection detection (#1862)#2149

Merged
bug-ops merged 1 commit intomainfrom
feat-issue-1862-research-security-multi-layer
Mar 23, 2026
Merged

feat(security): response verification layer for post-LLM injection detection (#1862)#2149
bug-ops merged 1 commit intomainfrom
feat-issue-1862-research-security-multi-layer

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 23, 2026

Summary

Implements the missing third layer of Zeph's defense-in-depth pipeline per arXiv 2511.15759 (SecureAgent benchmark). Reduces attack success by adding post-LLM response verification before tool dispatch.

  • Layer 1 (existing): ContentSanitizer — scans untrusted input before context insertion
  • Layer 2 (existing): ExfiltrationGuard + PreExecutionVerifier — output filtering + pre-dispatch tool audit
  • Layer 3 (this PR): ResponseVerifier — scans LLM responses for echoed injection patterns

Changes

  • zeph-sanitizer: new ResponseVerifier struct with verify(ctx) -> ResponseVerificationResult
  • zeph-tools/patterns.rs: 7 curated RAW_RESPONSE_PATTERNS (separate from input-side patterns to avoid false positives)
  • zeph-config: ResponseVerificationConfig with enabled=true, block_on_detection=false defaults
  • zeph-core agent loop: called in native and legacy tool-execution paths after LLM response, before tool dispatch
  • TUI SEC panel: [rver] badge via new SecurityEventCategory::ResponseVerification variant
  • 10 unit tests covering all pattern categories and Clean/Flagged/Blocked result paths

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy --workspace --features full -- -D warnings — clean
  • cargo nextest run --workspace --features full --lib --bins — 6407 passed (+10 vs baseline 6397)
  • All 10 response verifier tests pass
  • raw_response_patterns_all_compile test guards against regex regressions

Closes #1862

…n detection (#1862)

Implements the third layer of Zeph's defense-in-depth pipeline per arXiv 2511.15759.
ContentSanitizer and ExfiltrationGuard cover input; ResponseVerifier covers LLM output.

- ResponseVerifier in zeph-sanitizer scans LLM responses before tool dispatch
- 7 curated RAW_RESPONSE_PATTERNS (separate from input-side RAW_INJECTION_PATTERNS)
  to avoid false positives on normal LLM output
- Config: [security.response_verification] enabled=true, block_on_detection=false
- Integration: native and legacy tool-execution paths, post-LLM pre-dispatch
- TUI SEC panel: [rver] badge via SecurityEventCategory::ResponseVerification
- 10 unit tests covering all patterns and Clean/Flagged/Blocked result variants
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate labels Mar 23, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 23, 2026 00:35
@github-actions github-actions bot added enhancement New feature or request size/L Large PR (201-500 lines) labels Mar 23, 2026
@bug-ops bug-ops merged commit 6f2ef15 into main Mar 23, 2026
25 checks passed
@bug-ops bug-ops deleted the feat-issue-1862-research-security-multi-layer branch March 23, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(security): multi-layer prompt injection defense with response verification (SecureAgent)

1 participant