refactor(memory): streamline memory extraction by AnishSarkar22 · Pull Request #1335 · MODSetter/SurfSense

AnishSarkar22 · 2026-05-02T17:26:55Z

Description

Utilized extract_text_content utility to fix the memory extraction

Motivation and Context

FIX #

Screenshots

API Changes

This PR includes API changes

Change Type

Testing Performed

Tested locally
Manual/QA verification

Checklist

Follows project coding standards and conventions
Documentation updated as needed
Dependencies updated as needed
No lint/build errors or new warnings
All relevant tests are passing

High-level PR Summary

This PR refactors memory extraction code by consolidating duplicate text extraction logic into a centralized extract_text_content utility function. The changes replace multiple instances of inline conditional logic for handling LLM response content (checking if content is a string or needs conversion) across memory extraction, memory update, and memory editing endpoints with a single reusable utility. Additionally, a validation guard was added to _save_memory to reject non-string memory payloads early, and comprehensive unit tests were included to verify the text extraction behavior handles various response formats including thinking blocks, markdown text, and plain strings.

⏱️ Estimated Review Time: 5-15 minutes

💡 Review Order Suggestion

Order	File Path
1	`surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py`
2	`surfsense_backend/app/agents/new_chat/tools/update_memory.py`
3	`surfsense_backend/app/agents/new_chat/memory_extraction.py`
4	`surfsense_backend/app/routes/memory_routes.py`
5	`surfsense_backend/app/routes/search_spaces_routes.py`

Summary by CodeRabbit

Bug Fixes
- Improved text extraction from AI-generated responses across memory management operations
- Added validation to reject invalid (non-string) memory payloads with error messaging
Tests
- Added comprehensive unit tests for memory response content handling and validation logic

…ext_content utility

vercel · 2026-05-02T17:26:59Z

@AnishSarkar22 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-05-02T17:27:01Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a286425-9b34-424a-a0b0-a5e781ab2c3c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py (1)

1-10: ⚡ Quick win

Consider adding a test for the _forced_rewrite empty-text scenario.

The test suite covers _save_memory's type guard but has no coverage for the case where _forced_rewrite returns an empty string (thinking-only model response) and _save_memory consequently sets content = "". A targeted test would both document the expected behavior and confirm the fix suggested for update_memory.py.

💡 Suggested test skeleton

`@pytest.mark.asyncio`
async def test_save_memory_skips_empty_forced_rewrite() -> None:
    """Memory must not be wiped when forced-rewrite LLM returns no text."""
    large_content = "- (2026-01-01) [fact] x\n" * 1200  # > MEMORY_HARD_LIMIT

    class _EmptyRewriteLLM:
        async def ainvoke(self, *args, **kwargs):
            # Simulates a thinking-only response → extract_text_content returns ""
            class _Resp:
                content = [{"type": "thinking", "thinking": "..."}]
            return _Resp()

    recorder = _Recorder()
    result = await _save_memory(
        updated_memory=large_content,
        old_memory=None,
        llm=_EmptyRewriteLLM(),
        apply_fn=recorder.apply,
        commit_fn=recorder.commit,
        rollback_fn=recorder.rollback,
        label="memory",
        scope="user",
    )
    # With the fix, _forced_rewrite returns None → content stays large → hard-limit error
    assert result["status"] == "error"
    assert recorder.applied_content is None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py`
around lines 1 - 10, Add an async unit test to verify _save_memory does not wipe
memory when _forced_rewrite yields empty text: create a large_content string >
MEMORY_HARD_LIMIT, mock an LLM class (_EmptyRewriteLLM) whose ainvoke returns a
response whose content is [{"type":"thinking", ...}] so extract_text_content
would produce "" (simulating thinking-only output), implement a recorder stub
with apply/commit/rollback that records applied_content, call
_save_memory(updated_memory=large_content, old_memory=None,
llm=_EmptyRewriteLLM(), apply_fn=recorder.apply, commit_fn=recorder.commit,
rollback_fn=recorder.rollback, label="memory", scope="user") and assert the
returned result["status"] == "error" and recorder.applied_content is None to
confirm that when _forced_rewrite yields empty text, content remains unchanged
(not set to ""), exercising the update_memory._forced_rewrite path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@surfsense_backend/app/agents/new_chat/tools/update_memory.py`:
- Around line 192-193: The issue is that the `_forced_rewrite` function can
return an empty string, which leads to silent data loss because it replaces
existing memory with an empty string. To fix this, modify `_forced_rewrite` so
that if `extract_text_content(response.content)` returns an empty string,
`_forced_rewrite` should return None instead, indicating a failed rewrite. This
ensures the caller in `_save_memory` treats empty results as failures and avoids
overwriting memory with empty content.

---

Nitpick comments:
In
`@surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py`:
- Around line 1-10: Add an async unit test to verify _save_memory does not wipe
memory when _forced_rewrite yields empty text: create a large_content string >
MEMORY_HARD_LIMIT, mock an LLM class (_EmptyRewriteLLM) whose ainvoke returns a
response whose content is [{"type":"thinking", ...}] so extract_text_content
would produce "" (simulating thinking-only output), implement a recorder stub
with apply/commit/rollback that records applied_content, call
_save_memory(updated_memory=large_content, old_memory=None,
llm=_EmptyRewriteLLM(), apply_fn=recorder.apply, commit_fn=recorder.commit,
rollback_fn=recorder.rollback, label="memory", scope="user") and assert the
returned result["status"] == "error" and recorder.applied_content is None to
confirm that when _forced_rewrite yields empty text, content remains unchanged
(not set to ""), exercising the update_memory._forced_rewrite path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a2a01eb3-9b14-4327-bb05-13592992e184

📥 Commits

Reviewing files that changed from the base of the PR and between 451a989 and 9975e08.

📒 Files selected for processing (5)

surfsense_backend/app/agents/new_chat/memory_extraction.py
surfsense_backend/app/agents/new_chat/tools/update_memory.py
surfsense_backend/app/routes/memory_routes.py
surfsense_backend/app/routes/search_spaces_routes.py
surfsense_backend/tests/unit/agents/new_chat/test_memory_response_content.py

- Updated the `_forced_rewrite` function to strip whitespace from the extracted text and added a warning log if the response is empty, preventing potential issues with empty rewrites.

refactor(memory): streamline memory extraction by utilizing extract_t…

9975e08

…ext_content utility

AnishSarkar22 marked this pull request as ready for review May 2, 2026 17:27

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

Comment thread surfsense_backend/app/agents/new_chat/tools/update_memory.py Outdated

AnishSarkar22 added 2 commits May 4, 2026 12:03

Merge remote-tracking branch 'upstream/dev' into fix/memory-extraction

b981b51

fix: handle empty response in forced rewrite function

e38e20b

- Updated the `_forced_rewrite` function to strip whitespace from the extracted text and added a warning log if the response is empty, preventing potential issues with empty rewrites.

MODSetter merged commit ce6d923 into MODSetter:dev May 5, 2026
4 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(memory): streamline memory extraction#1335

refactor(memory): streamline memory extraction#1335
MODSetter merged 3 commits intoMODSetter:devfrom
AnishSarkar22:fix/memory-extraction

AnishSarkar22 commented May 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 2, 2026

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

Review skipped

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AnishSarkar22 commented May 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Screenshots

API Changes

Change Type

Testing Performed

Checklist

High-level PR Summary

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 2, 2026

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnishSarkar22 commented May 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 2, 2026 •

edited

Loading