Skip to content

Fix #5988: Add memory poisoning protection via MemorySanitizer#5989

Closed
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1780244418-memory-poisoning-protection
Closed

Fix #5988: Add memory poisoning protection via MemorySanitizer#5989
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1780244418-memory-poisoning-protection

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Addresses the memory poisoning vulnerability described in #5988 — adversarial inputs stored in agent memory can later be retrieved and injected into agent prompts, causing prompt injection attacks (instruction overrides, role hijacking, secret leakage).

Adds MemorySanitizer — a regex-based detection layer that neutralizes six categories of prompt injection patterns:

Category Example trigger
system_override "System prompt: ..."
instruction_override "Ignore all previous instructions"
role_hijack "You are now ...", "Pretend to be ..."
command_injection "Do not follow: ...", "New instructions: ..."
hidden_instruction "[INST] ...", "[SYSTEM] ..."
jailbreak_attempt "jailbreak", "bypass safety", "developer mode"

Matched patterns are replaced with [SANITIZED:<label>] markers and logged as warnings.

Integration points (defense-in-depth):

  • Memory.save() — sanitizes content at write time (catches all memory types that call super().save())
  • LongTermMemory.save() — sanitizes task_description and metadata fields (suggestions, expected_output) which bypass the base class
  • ContextualMemory.build_context_for_task() — sanitizes the merged context string before it's injected into the agent prompt, protecting against pre-existing poisoned data

Configuration:

crew = Crew(
    memory=True,
    memory_config={"sanitize_memory": False},  # opt-out; default is True
)

Users can also pass a custom MemorySanitizer instance to Memory(storage=..., sanitizer=...) for fine-grained control (e.g. custom max_content_length).

41 new tests cover all pattern categories, Memory.save() integration, ContextualMemory retrieval sanitization, config toggle, edge cases (empty/non-string/mixed content).

Link to Devin session: https://app.devin.ai/sessions/e01c9dede6014eab83d5f3a730cba346

- Add MemorySanitizer class that detects and neutralizes prompt injection
  patterns (system overrides, instruction overrides, role hijacking,
  command injection, hidden instructions, jailbreak attempts)
- Integrate sanitization into Memory.save() base class for write-time protection
- Integrate sanitization into ContextualMemory.build_context_for_task() for
  defense-in-depth on retrieval
- Sanitize LongTermMemory metadata (suggestions, expected_output)
- Add sanitize_memory config option in memory_config (default: True)
- Add 41 tests covering all injection pattern categories, integration with
  Memory.save(), ContextualMemory retrieval, config toggle, and edge cases

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@@ -0,0 +1,297 @@
"""Tests for memory sanitization / memory poisoning protection."""

from unittest.mock import MagicMock, patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant