feat(security): response verification layer for post-LLM injection detection (#1862) by bug-ops · Pull Request #2149 · bug-ops/zeph

bug-ops · 2026-03-23T00:35:00Z

Summary

Implements the missing third layer of Zeph's defense-in-depth pipeline per arXiv 2511.15759 (SecureAgent benchmark). Reduces attack success by adding post-LLM response verification before tool dispatch.

Layer 1 (existing): ContentSanitizer — scans untrusted input before context insertion
Layer 2 (existing): ExfiltrationGuard + PreExecutionVerifier — output filtering + pre-dispatch tool audit
Layer 3 (this PR): ResponseVerifier — scans LLM responses for echoed injection patterns

Changes

zeph-sanitizer: new ResponseVerifier struct with verify(ctx) -> ResponseVerificationResult
zeph-tools/patterns.rs: 7 curated RAW_RESPONSE_PATTERNS (separate from input-side patterns to avoid false positives)
zeph-config: ResponseVerificationConfig with enabled=true, block_on_detection=false defaults
zeph-core agent loop: called in native and legacy tool-execution paths after LLM response, before tool dispatch
TUI SEC panel: [rver] badge via new SecurityEventCategory::ResponseVerification variant
10 unit tests covering all pattern categories and Clean/Flagged/Blocked result paths

Test plan

cargo +nightly fmt --check — clean
cargo clippy --workspace --features full -- -D warnings — clean
cargo nextest run --workspace --features full --lib --bins — 6407 passed (+10 vs baseline 6397)
All 10 response verifier tests pass
raw_response_patterns_all_compile test guards against regex regressions

Closes #1862

…n detection (#1862) Implements the third layer of Zeph's defense-in-depth pipeline per arXiv 2511.15759. ContentSanitizer and ExfiltrationGuard cover input; ResponseVerifier covers LLM output. - ResponseVerifier in zeph-sanitizer scans LLM responses before tool dispatch - 7 curated RAW_RESPONSE_PATTERNS (separate from input-side RAW_INJECTION_PATTERNS) to avoid false positives on normal LLM output - Config: [security.response_verification] enabled=true, block_on_detection=false - Integration: native and legacy tool-execution paths, post-LLM pre-dispatch - TUI SEC panel: [rver] badge via SecurityEventCategory::ResponseVerification - 10 unit tests covering all patterns and Clean/Flagged/Blocked result variants

github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate labels Mar 23, 2026

bug-ops enabled auto-merge (squash) March 23, 2026 00:35

github-actions bot added enhancement New feature or request size/L Large PR (201-500 lines) labels Mar 23, 2026

bug-ops merged commit 6f2ef15 into main Mar 23, 2026
25 checks passed

bug-ops deleted the feat-issue-1862-research-security-multi-layer branch March 23, 2026 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(security): response verification layer for post-LLM injection detection (#1862)#2149

feat(security): response verification layer for post-LLM injection detection (#1862)#2149
bug-ops merged 1 commit intomainfrom
feat-issue-1862-research-security-multi-layer

bug-ops commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 23, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant