fix: recent_activity dedup + pagination across MCP tools by phernandez · Pull Request #595 · basicmachines-co/basic-memory

phernandez · 2026-02-21T01:34:03Z

Summary

Entity-only default in recent_activity MCP tool — prevents a single well-connected entity from filling the page with its observations/relations when no type is specified
Expose page/page_size params on recent_activity (were hardcoded to 1/10)
Add has_more field to GraphContext and SearchResponse using N+1 trick — no expensive COUNT queries
Pagination guidance in text output: "Use page=2 to see more" when more results exist

The API layer stays unchanged — it still accepts all types. This is purely an MCP tool UX default so LLMs see distinct notes by default.

Fixes the issue where a user reported recent_activity returning only 1 distinct note even though 9 were modified.

Test plan

just fast-check — lint, format, typecheck, impacted tests all pass
just test-sqlite — full SQLite test suite (275 passed)
just test-unit-sqlite — all 1915 unit tests pass
New tests for entity-only default behavior (JSON output asserts types)
New tests for explicit type filtering (observations-only, relations-only)
New tests for has_more in search and memory router responses
New tests for pagination guidance text in _format_project_output
Updated tool contract test for new page/page_size params

🤖 Generated with Claude Code

A user reported that recent_activity returns only 1 distinct note even though 9 were modified. Root cause: the API defaults to returning ALL types (entity, observation, relation), so a single well-connected entity fills the entire page with its observations and relations. Changes: - 🎯 Default to entity-only in recent_activity MCP tool when no type specified - 📄 Expose page/page_size params on recent_activity (were hardcoded) - ✅ Add has_more field to GraphContext and SearchResponse (N+1 trick) - 📝 Show pagination guidance ("Use page=2 to see more") in text output The API layer stays unchanged — it still accepts all types. This is purely an MCP tool UX default so LLMs see distinct notes by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-02-21T01:34:14Z

Claude finished @phernandez's task in 2m 14s —— View job

PR Review: fix/recent-activity-dedup-pagination

Tasks

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review Basic Memory architecture
Review security & performance
Review compatibility
Post final review summary

Summary

Good, focused PR. The entity-only default, exposed pagination params, and N+1 has_more trick are all solid UX improvements. Tests are well-structured. A few things need attention before merge.

Code Quality & Standards

Item	Status
Follows Basic Memory conventions	✅ Mostly — see speculative `getattr` below
Python 3.12+ type annotations & async	⚠️ Inconsistent — `Union[str, List[str]]` at line 39 vs `str \| list[dict]` at line 48
SQLAlchemy 2.0 best practices	✅ No direct DB changes
FastAPI/Typer conventions	✅
100-char line limit	✅
No commented-out code	✅

Speculative getattr/hasattr — violates CLAUDE.md convention

_format_project_output (recent_activity.py:464–511) and _extract_recent_rows (line 354) use unnecessary hasattr/getattr guards on typed objects. After the type-based grouping loop already separates results into entities, observations, and relations lists, the code still does defensive field access like:

# recent_activity.py:464 — entity.title is never None on EntitySummary if hasattr(entity, "title") and entity.title else "Untitled"

# recent_activity.py:479 — obs.category always exists on ObservationSummary
category = getattr(obs, "category", "general") if hasattr(obs, "category") else "general"

# recent_activity.py:354 — all three summary types have created_at
"created_at": (primary.created_at.isoformat() if getattr(primary, "created_at", None) else None),

All three summary types (EntitySummary, RelationSummary, ObservationSummary) define these fields. CLAUDE.md says: "Never use getattr(obj, "attr", default) when unsure about attribute names. Check the class definition or source code first." After the type-filtering loop, the field is always present and can be accessed directly.

Fix this →

Compatibility — Windows path splitting ⚠️

_get_project_activity (recent_activity.py:327) and _format_project_output (line 468) split file_path using /:

# recent_activity.py:327 — breaks on Windows
folder = "/".join(result.primary_result.file_path.split("/")[:-1])

# recent_activity.py:468-470 — same issue
folder_path = "/".join(entity.file_path.split("/")[:-1])

CLAUDE.md checklist says "File path comparisons must be windows compatible." Use pathlib.Path(file_path).parent instead:

from pathlib import Path
folder = str(Path(result.primary_result.file_path).parent)

Fix this →

Testing & Documentation

Item	Status
Unit tests for new functions	✅ Good coverage of formatting helpers
Integration tests for MCP tools	✅ Entity-only default, explicit type filtering, pagination
Edge cases	✅ `has_more` True/False, no results, observation truncation
100% test coverage	✅ Appears maintained
Tool contract test updated	✅ `page`/`page_size` added to `recent_activity` signature
Documentation updated	✅ Docstring explains new params and entity-only default

Architecture

Item	Status
MCP tools atomic/composable	✅
No DB migrations needed	✅ Only schema + service changes
Local-first principles preserved	✅
Knowledge graph consistency	✅
API layer unchanged (types still accepted)	✅ Confirmed in PR description

N+1 pagination trick is cleanly applied in both context_service.py:107 (fetch limit + 1) and search_router.py:52 (fetch page_size + 1). Both correctly trim before returning. This is the right approach to avoid COUNT queries.

has_more in discovery mode — when iterating multiple projects, per-project has_more values are silently discarded. This is acceptable for a summary view, but worth noting in a follow-up if users page through discovery results.

Security & Performance

Item	Status
No hardcoded secrets	✅
Input validation	✅ `SearchItemType` enum validation raises `ValueError`
Error handling	✅
Performance	✅ N+1 trick avoids COUNT queries
No sensitive data in logs	✅

Minor Observations

Type annotation style inconsistency — Union[str, List[str]] on line 39 vs str | list[dict] on line 48. Prefer the Python 3.10+ | style throughout since the project targets 3.12+.
SearchItemType string comparison — _extract_recent_rows compares primary.type == "entity" using string literals. Minor — using SearchItemType.ENTITY.value would be more defensible but is low priority.
Discovery mode has_more not surfaced — if a project returns has_more=True in discovery mode, it's not visible to the user. Low priority since discovery mode is a summary view.

Checklist Results

Verdict

Two issues should be fixed before merge:

Windows path compatibility in _get_project_activity and _format_project_output
Speculative getattr/hasattr patterns on typed objects — replace with direct attribute access

Everything else looks good. The core changes (entity-only default, has_more, exposed pagination) are correct and well-tested.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14eb35e82e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-21T01:38:36Z

src/basic_memory/mcp/tools/recent_activity.py

+        "page": page,
+        "page_size": page_size,


Validate pagination arguments before calling recent API

The new page/page_size inputs are forwarded as-is, but this tool does not enforce page >= 1 and page_size >= 1. In project mode these values are passed to /v2/projects/{id}/memory/recent, where offset is derived from page and can become negative; Postgres rejects negative OFFSET, so values like page=0 now surface as server/tool errors instead of a clear validation error. This regression appears only after exposing pagination params here, since the previous implementation always used safe constants.

Useful? React with 👍 / 👎.

Prevents negative OFFSET errors from reaching the database when invalid pagination values (page=0, page_size=-1, etc.) are passed. Raises clear ValueError messages at the tool boundary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-02-21T02:12:11Z

Claude finished @phernandez's task in 2m 20s —— View job

PR Review Update: fix/recent-activity-dedup-pagination

Tasks

Gather context
Read changed files
Check if previous review issues were addressed
Post final review summary

New Commit: `e3f8c83` — validate page/page_size before API call

The new commit directly addresses the Codex review comment: page and page_size are now validated at the tool boundary before reaching the API layer, preventing negative OFFSET from hitting the database.

What was added (recent_activity.py:114–121):

if page < 1:
    raise ValueError(f"page must be >= 1, got {page}")
if page_size < 1:
    raise ValueError(f"page_size must be >= 1, got {page_size}")
if page_size > 100:
    raise ValueError(f"page_size must be <= 100, got {page_size}")

The upper bound (page_size > 100) is a sensible addition beyond what Codex flagged. Tests cover all boundary cases (page=0, page=-1, page_size=0, page_size=-5, page_size=999). This is a clean fix.

Remaining Issues from Previous Review

The two items flagged in the first review are still unresolved:

1. Speculative `getattr`/`hasattr` — CLAUDE.md violation

All three summary types (EntitySummary, RelationSummary, ObservationSummary) define title, file_path, and created_at as non-optional fields. After the type-grouping loop in _format_project_output, the concrete type is known. The guards are unnecessary:

Location	Guarded field	Actual schema
`recent_activity.py:335`	`file_path` (hasattr)	All three types have it
`recent_activity.py:363`	`created_at` (getattr)	All three types have it
`recent_activity.py:388`	`title` (hasattr)	All three types have it
`recent_activity.py:417`	`title` (hasattr)	All three types have it
`recent_activity.py:473`	`title` (hasattr)	`EntitySummary.title: str`
`recent_activity.py:476`	`file_path` (hasattr)	`EntitySummary.file_path: str`
`recent_activity.py:489`	`category` (getattr)	`ObservationSummary.category: str`
`recent_activity.py:499`	`content` (getattr)	`ObservationSummary.content: str`
`recent_activity.py:513`	`relation_type` (getattr)	`RelationSummary.relation_type: str`
`recent_activity.py:518`	`from_entity` (getattr)	`RelationSummary.from_entity`
`recent_activity.py:520`	`to_entity` (getattr)	`RelationSummary.to_entity`

Fix this →

2. Windows path splitting — still uses `/` string split

Two locations still split file_path with / instead of pathlib.Path:

# recent_activity.py:336
folder = "/".join(result.primary_result.file_path.split("/")[:-1])

# recent_activity.py:477-478
folder_path = "/".join(entity.file_path.split("/")[:-1])

Should be:

from pathlib import Path
folder = str(Path(result.primary_result.file_path).parent)

Fix this →

Checklist

Verdict

The new commit cleanly resolves the Codex-flagged pagination validation issue. Two items from the first review remain open before merge: the speculative getattr/hasattr patterns (CLAUDE.md violation) and the Windows-incompatible / path splitting. Both are mechanical fixes.

…tting All three summary types (EntitySummary, ObservationSummary, RelationSummary) define title, file_path, created_at as required fields. Replace defensive getattr/hasattr with direct attribute access per CLAUDE.md guidelines. Replace string-based path splitting ("/".join(path.split("/")[:-1])) with PurePosixPath().parent for cross-platform correctness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-02-21T02:37:19Z

Claude finished @phernandez's task in 2m 9s —— View job

PR Review Update: fix/recent-activity-dedup-pagination

Tasks

Gather context
Read changed files from new commit (024bb04)
Verify speculative getattr/hasattr fixes
Verify pathlib path splitting fixes
Post final review summary

New Commit: `024bb04` — remove speculative getattr/hasattr and use pathlib for path splitting

This commit directly resolves both remaining issues from the previous reviews.

Issue 1: Speculative getattr/hasattr — RESOLVED ✅

All eleven guarded field accesses have been replaced with direct attribute access. Verified against the schemas in schemas/memory.py:

Field	Schema definition	Fix
`entity.title`	`EntitySummary.title: str` (required)	`entity.title or "Untitled"`
`entity.file_path`	`EntitySummary.file_path: str` (required)	`entity.file_path` directly
`obs.category`	`ObservationSummary.category: str` (required)	`obs.category` directly
`obs.content`	`ObservationSummary.content: str` (required)	`obs.content` directly
`rel.relation_type`	`RelationSummary.relation_type: str` (required)	`rel.relation_type` directly
`rel.from_entity`	`RelationSummary.from_entity: Optional[str]`	`rel.from_entity or "Unknown"`
`rel.to_entity`	`RelationSummary.to_entity: Optional[str]`	`rel.to_entity` directly (None handled downstream)
`primary.created_at`	All three types: `created_at: datetime` (required)	`primary.created_at.isoformat() if primary.created_at else None`
`latest.title` (discovery)	`EntitySummary.title: str` (required)	`latest.title or "Recent activity"`

All guards removed correctly. The or "Unknown" / or "Untitled" / or "Recent activity" fallbacks on Optional or truly nullable fields are appropriate — these are value-level defaults, not attribute guards.

Issue 2: Windows path splitting — RESOLVED ✅

Both locations replaced with PurePosixPath:

# Before
folder = "/".join(result.primary_result.file_path.split("/")[:-1])

# After
folder = str(PurePosixPath(result.primary_result.file_path).parent)
if folder and folder != ".":  # correctly filters root-level files
    active_folders.add(folder)

PurePosixPath is the right choice here. Basic Memory stores paths in the database using POSIX separators (confirmed by validate_memory_url_path which uses / as the canonical separator). PurePosixPath is OS-agnostic and always produces forward-slash output — using Path instead would emit backslashes on Windows for stored POSIX paths, which would be incorrect. The added folder != "." guard correctly handles root-level files where PurePosixPath("file.txt").parent returns ".".

Minor Observations (non-blocking, unchanged from previous reviews)

Type annotation style — Union[str, List[str]] (line 40) alongside str | list[dict] (line 48) in the same function signature. Minor cosmetic inconsistency in a 3.12+ codebase.
String literal type comparison — primary.type == "entity" (line 416) instead of SearchItemType.ENTITY.value. Low priority.

Neither blocks merge.

Checklist Results

Verdict

All blocking issues from both previous reviews are resolved. The three commits together deliver the entity-only default, exposed pagination, N+1 has_more, pagination guidance text, page/page_size validation, direct attribute access, and cross-platform path handling. Ready to merge.

5 new tests in test-int/mcp/test_pagination_integration.py: - search_notes: page 1 and page 2 return disjoint result sets - search_notes: has_more transitions from True to False on last page - recent_activity: page 1 and page 2 return disjoint result sets - recent_activity: has_more text guidance appears/disappears correctly - recent_activity: entity-only default filters out observations/relations All tests go through the full MCP → API → service → DB stack using write_note to seed real data, then verify pagination offsets produce correct non-overlapping windows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-02-21T03:02:07Z

Claude finished @phernandez's task in 2m 15s —— View job

PR Review Update: fix/recent-activity-dedup-pagination

Tasks

Gather context
Read new commit changes (5eb6db3)
Verify test quality and coverage
Post final review summary

New Commit: `5eb6db3` — integration tests for pagination through to the database

This commit adds 5 integration tests in test-int/mcp/test_pagination_integration.py that exercise the full MCP → API → service → DB stack using real data seeded by write_note.

Tests added:

Test	What it verifies
`test_search_notes_pages_are_disjoint`	Pages 1 and 2 return non-overlapping result sets, union covers all 6 seeded notes
`test_search_notes_has_more_becomes_false`	`has_more` is `True` on page 1 (4 results, page_size=3), `False` on page 2 (1 result)
`test_recent_activity_pages_are_disjoint`	Same disjoint coverage check for `recent_activity` entity-only default
`test_recent_activity_has_more_becomes_false`	Text output includes "Use page=2 to see more" on page 1, absent on page 2
`test_recent_activity_entity_only_default_with_relations`	Default call returns only `"entity"` type items; explicit `type=["entity","observation","relation"]` returns more items

Test isolation is correct — engine_factory and test_project are function-scoped (default pytest_asyncio.fixture) and each creates a fresh database + temp directory per test, so the len(pageN) == N assertions are not at risk of pollution from other tests.

Text assertions are correct — "Use page=2 to see more" matches _format_project_output line 518 (f"Use page={page + 1} to see more."), and "items found" matches line 521 (f"{total} items found.").

JSON output assertions are correct — _extract_recent_rows returns list[dict] with a "title" key for every row, so {item["title"] for item in page1} is valid.

Minor Observation (non-blocking)

test_recent_activity_entity_only_default_with_relations asserts that "entity" in types_returned and that len(all_payload) > len(payload), but does not explicitly assert "observation" in types_returned or "relation" in types_returned after requesting all three types. The count check is a reasonable proxy since a note with two observations and one relation was written, but an explicit type assertion would be slightly stronger. Low priority — does not block merge.

Summary

All four commits together deliver:

Entity-only default — prevents a well-connected entity from flooding results with its observations/relations
Exposed page/page_size params — were previously hardcoded
has_more field via N+1 trick — no COUNT queries
Pagination guidance text — "Use page=2 to see more"
Validation of page/page_size at the tool boundary (prevents negative DB offset)
Direct attribute access — speculative getattr/hasattr removed (CLAUDE.md compliance)
Cross-platform path handling — PurePosixPath replaces string splitting
End-to-end integration tests — full stack pagination verified

All blocking issues from the three prior reviews are resolved. Ready to merge.

chatgpt-codex-connector bot reviewed Feb 21, 2026

View reviewed changes

phernandez merged commit ee03975 into main Feb 21, 2026
26 checks passed

phernandez deleted the fix/recent-activity-dedup-pagination branch February 21, 2026 03:49

phernandez added this to the v0.19.0 milestone Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: recent_activity dedup + pagination across MCP tools#595

fix: recent_activity dedup + pagination across MCP tools#595
phernandez merged 4 commits intomainfrom
fix/recent-activity-dedup-pagination

phernandez commented Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phernandez commented Feb 21, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix/recent-activity-dedup-pagination

Tasks

Summary

Code Quality & Standards

Compatibility — Windows path splitting ⚠️

Testing & Documentation

Architecture

Security & Performance

Minor Observations

Checklist Results

Verdict

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Update: fix/recent-activity-dedup-pagination

Tasks

New Commit: e3f8c83 — validate page/page_size before API call

Remaining Issues from Previous Review

1. Speculative getattr/hasattr — CLAUDE.md violation

2. Windows path splitting — still uses / string split

Checklist

Verdict

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Update: fix/recent-activity-dedup-pagination

Tasks

New Commit: 024bb04 — remove speculative getattr/hasattr and use pathlib for path splitting

Issue 1: Speculative getattr/hasattr — RESOLVED ✅

Issue 2: Windows path splitting — RESOLVED ✅

Minor Observations (non-blocking, unchanged from previous reviews)

Checklist Results

Verdict

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Update: fix/recent-activity-dedup-pagination

Tasks

New Commit: 5eb6db3 — integration tests for pagination through to the database

Minor Observation (non-blocking)

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 21, 2026 •

edited

Loading

github-actions bot commented Feb 21, 2026 •

edited

Loading

New Commit: `e3f8c83` — validate page/page_size before API call

1. Speculative `getattr`/`hasattr` — CLAUDE.md violation

2. Windows path splitting — still uses `/` string split

github-actions bot commented Feb 21, 2026 •

edited

Loading

New Commit: `024bb04` — remove speculative getattr/hasattr and use pathlib for path splitting

github-actions bot commented Feb 21, 2026 •

edited

Loading

New Commit: `5eb6db3` — integration tests for pagination through to the database