Skip to content

refactor(memory): streamline memory extraction#1335

Merged
MODSetter merged 3 commits intoMODSetter:devfrom
AnishSarkar22:fix/memory-extraction
May 5, 2026
Merged

refactor(memory): streamline memory extraction#1335
MODSetter merged 3 commits intoMODSetter:devfrom
AnishSarkar22:fix/memory-extraction

Conversation

@AnishSarkar22
Copy link
Copy Markdown
Contributor

@AnishSarkar22 AnishSarkar22 commented May 2, 2026

Description

  • Utilized extract_text_content utility to fix the memory extraction

Motivation and Context

FIX #

Screenshots

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR refactors memory extraction code by consolidating duplicate text extraction logic into a centralized extract_text_content utility function. The changes replace multiple instances of inline conditional logic for handling LLM response content (checking if content is a string or needs conversion) across memory extraction, memory update, and memory editing endpoints with a single reusable utility. Additionally, a validation guard was added to _save_memory to reject non-string memory payloads early, and comprehensive unit tests were included to verify the text extraction behavior handles various response formats including thinking blocks, markdown text, and plain strings.

⏱️ Estimated Review Time: 5-15 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py
2 surfsense_backend/app/agents/new_chat/tools/update_memory.py
3 surfsense_backend/app/agents/new_chat/memory_extraction.py
4 surfsense_backend/app/routes/memory_routes.py
5 surfsense_backend/app/routes/search_spaces_routes.py

Need help? Join our Discord

Summary by CodeRabbit

  • Bug Fixes

    • Improved text extraction from AI-generated responses across memory management operations
    • Added validation to reject invalid (non-string) memory payloads with error messaging
  • Tests

    • Added comprehensive unit tests for memory response content handling and validation logic

@vercel
Copy link
Copy Markdown

vercel Bot commented May 2, 2026

@AnishSarkar22 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a286425-9b34-424a-a0b0-a5e781ab2c3c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnishSarkar22 AnishSarkar22 marked this pull request as ready for review May 2, 2026 17:27
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py (1)

1-10: ⚡ Quick win

Consider adding a test for the _forced_rewrite empty-text scenario.

The test suite covers _save_memory's type guard but has no coverage for the case where _forced_rewrite returns an empty string (thinking-only model response) and _save_memory consequently sets content = "". A targeted test would both document the expected behavior and confirm the fix suggested for update_memory.py.

💡 Suggested test skeleton
`@pytest.mark.asyncio`
async def test_save_memory_skips_empty_forced_rewrite() -> None:
    """Memory must not be wiped when forced-rewrite LLM returns no text."""
    large_content = "- (2026-01-01) [fact] x\n" * 1200  # > MEMORY_HARD_LIMIT

    class _EmptyRewriteLLM:
        async def ainvoke(self, *args, **kwargs):
            # Simulates a thinking-only response → extract_text_content returns ""
            class _Resp:
                content = [{"type": "thinking", "thinking": "..."}]
            return _Resp()

    recorder = _Recorder()
    result = await _save_memory(
        updated_memory=large_content,
        old_memory=None,
        llm=_EmptyRewriteLLM(),
        apply_fn=recorder.apply,
        commit_fn=recorder.commit,
        rollback_fn=recorder.rollback,
        label="memory",
        scope="user",
    )
    # With the fix, _forced_rewrite returns None → content stays large → hard-limit error
    assert result["status"] == "error"
    assert recorder.applied_content is None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py`
around lines 1 - 10, Add an async unit test to verify _save_memory does not wipe
memory when _forced_rewrite yields empty text: create a large_content string >
MEMORY_HARD_LIMIT, mock an LLM class (_EmptyRewriteLLM) whose ainvoke returns a
response whose content is [{"type":"thinking", ...}] so extract_text_content
would produce "" (simulating thinking-only output), implement a recorder stub
with apply/commit/rollback that records applied_content, call
_save_memory(updated_memory=large_content, old_memory=None,
llm=_EmptyRewriteLLM(), apply_fn=recorder.apply, commit_fn=recorder.commit,
rollback_fn=recorder.rollback, label="memory", scope="user") and assert the
returned result["status"] == "error" and recorder.applied_content is None to
confirm that when _forced_rewrite yields empty text, content remains unchanged
(not set to ""), exercising the update_memory._forced_rewrite path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@surfsense_backend/app/agents/new_chat/tools/update_memory.py`:
- Around line 192-193: The issue is that the `_forced_rewrite` function can
return an empty string, which leads to silent data loss because it replaces
existing memory with an empty string. To fix this, modify `_forced_rewrite` so
that if `extract_text_content(response.content)` returns an empty string,
`_forced_rewrite` should return None instead, indicating a failed rewrite. This
ensures the caller in `_save_memory` treats empty results as failures and avoids
overwriting memory with empty content.

---

Nitpick comments:
In
`@surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py`:
- Around line 1-10: Add an async unit test to verify _save_memory does not wipe
memory when _forced_rewrite yields empty text: create a large_content string >
MEMORY_HARD_LIMIT, mock an LLM class (_EmptyRewriteLLM) whose ainvoke returns a
response whose content is [{"type":"thinking", ...}] so extract_text_content
would produce "" (simulating thinking-only output), implement a recorder stub
with apply/commit/rollback that records applied_content, call
_save_memory(updated_memory=large_content, old_memory=None,
llm=_EmptyRewriteLLM(), apply_fn=recorder.apply, commit_fn=recorder.commit,
rollback_fn=recorder.rollback, label="memory", scope="user") and assert the
returned result["status"] == "error" and recorder.applied_content is None to
confirm that when _forced_rewrite yields empty text, content remains unchanged
(not set to ""), exercising the update_memory._forced_rewrite path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a2a01eb3-9b14-4327-bb05-13592992e184

📥 Commits

Reviewing files that changed from the base of the PR and between 451a989 and 9975e08.

📒 Files selected for processing (5)
  • surfsense_backend/app/agents/new_chat/memory_extraction.py
  • surfsense_backend/app/agents/new_chat/tools/update_memory.py
  • surfsense_backend/app/routes/memory_routes.py
  • surfsense_backend/app/routes/search_spaces_routes.py
  • surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py

Comment thread surfsense_backend/app/agents/new_chat/tools/update_memory.py Outdated
- Updated the `_forced_rewrite` function to strip whitespace from the extracted text and added a warning log if the response is empty, preventing potential issues with empty rewrites.
@MODSetter MODSetter merged commit ce6d923 into MODSetter:dev May 5, 2026
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants