Fix #5988: Add memory poisoning protection via MemorySanitizer#5989
Closed
devin-ai-integration[bot] wants to merge 1 commit into
Closed
Fix #5988: Add memory poisoning protection via MemorySanitizer#5989devin-ai-integration[bot] wants to merge 1 commit into
devin-ai-integration[bot] wants to merge 1 commit into
Conversation
- Add MemorySanitizer class that detects and neutralizes prompt injection patterns (system overrides, instruction overrides, role hijacking, command injection, hidden instructions, jailbreak attempts) - Integrate sanitization into Memory.save() base class for write-time protection - Integrate sanitization into ContextualMemory.build_context_for_task() for defense-in-depth on retrieval - Sanitize LongTermMemory metadata (suggestions, expected_output) - Add sanitize_memory config option in memory_config (default: True) - Add 41 tests covering all injection pattern categories, integration with Memory.save(), ContextualMemory retrieval, config toggle, and edge cases Co-Authored-By: João <joao@crewai.com>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| @@ -0,0 +1,297 @@ | |||
| """Tests for memory sanitization / memory poisoning protection.""" | |||
|
|
|||
| from unittest.mock import MagicMock, patch | |||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses the memory poisoning vulnerability described in #5988 — adversarial inputs stored in agent memory can later be retrieved and injected into agent prompts, causing prompt injection attacks (instruction overrides, role hijacking, secret leakage).
Adds
MemorySanitizer— a regex-based detection layer that neutralizes six categories of prompt injection patterns:system_override"System prompt: ..."instruction_override"Ignore all previous instructions"role_hijack"You are now ...","Pretend to be ..."command_injection"Do not follow: ...","New instructions: ..."hidden_instruction"[INST] ...","[SYSTEM] ..."jailbreak_attempt"jailbreak","bypass safety","developer mode"Matched patterns are replaced with
[SANITIZED:<label>]markers and logged as warnings.Integration points (defense-in-depth):
Memory.save()— sanitizes content at write time (catches all memory types that callsuper().save())LongTermMemory.save()— sanitizestask_descriptionand metadata fields (suggestions,expected_output) which bypass the base classContextualMemory.build_context_for_task()— sanitizes the merged context string before it's injected into the agent prompt, protecting against pre-existing poisoned dataConfiguration:
Users can also pass a custom
MemorySanitizerinstance toMemory(storage=..., sanitizer=...)for fine-grained control (e.g. custommax_content_length).41 new tests cover all pattern categories,
Memory.save()integration,ContextualMemoryretrieval sanitization, config toggle, edge cases (empty/non-string/mixed content).Link to Devin session: https://app.devin.ai/sessions/e01c9dede6014eab83d5f3a730cba346