Skip to content

fix: recent_activity dedup + pagination across MCP tools#595

Merged
phernandez merged 4 commits intomainfrom
fix/recent-activity-dedup-pagination
Feb 21, 2026
Merged

fix: recent_activity dedup + pagination across MCP tools#595
phernandez merged 4 commits intomainfrom
fix/recent-activity-dedup-pagination

Conversation

@phernandez
Copy link
Member

Summary

  • Entity-only default in recent_activity MCP tool — prevents a single well-connected entity from filling the page with its observations/relations when no type is specified
  • Expose page/page_size params on recent_activity (were hardcoded to 1/10)
  • Add has_more field to GraphContext and SearchResponse using N+1 trick — no expensive COUNT queries
  • Pagination guidance in text output: "Use page=2 to see more" when more results exist

The API layer stays unchanged — it still accepts all types. This is purely an MCP tool UX default so LLMs see distinct notes by default.

Fixes the issue where a user reported recent_activity returning only 1 distinct note even though 9 were modified.

Test plan

  • just fast-check — lint, format, typecheck, impacted tests all pass
  • just test-sqlite — full SQLite test suite (275 passed)
  • just test-unit-sqlite — all 1915 unit tests pass
  • New tests for entity-only default behavior (JSON output asserts types)
  • New tests for explicit type filtering (observations-only, relations-only)
  • New tests for has_more in search and memory router responses
  • New tests for pagination guidance text in _format_project_output
  • Updated tool contract test for new page/page_size params

🤖 Generated with Claude Code

A user reported that recent_activity returns only 1 distinct note even
though 9 were modified. Root cause: the API defaults to returning ALL
types (entity, observation, relation), so a single well-connected entity
fills the entire page with its observations and relations.

Changes:
- 🎯 Default to entity-only in recent_activity MCP tool when no type specified
- 📄 Expose page/page_size params on recent_activity (were hardcoded)
- ✅ Add has_more field to GraphContext and SearchResponse (N+1 trick)
- 📝 Show pagination guidance ("Use page=2 to see more") in text output

The API layer stays unchanged — it still accepts all types. This is
purely an MCP tool UX default so LLMs see distinct notes by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude finished @phernandez's task in 2m 14s —— View job


PR Review: fix/recent-activity-dedup-pagination

Tasks

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review Basic Memory architecture
  • Review security & performance
  • Review compatibility
  • Post final review summary

Summary

Good, focused PR. The entity-only default, exposed pagination params, and N+1 has_more trick are all solid UX improvements. Tests are well-structured. A few things need attention before merge.


Code Quality & Standards

Item Status
Follows Basic Memory conventions ✅ Mostly — see speculative getattr below
Python 3.12+ type annotations & async ⚠️ Inconsistent — Union[str, List[str]] at line 39 vs str | list[dict] at line 48
SQLAlchemy 2.0 best practices ✅ No direct DB changes
FastAPI/Typer conventions
100-char line limit
No commented-out code

Speculative getattr/hasattr — violates CLAUDE.md convention

_format_project_output (recent_activity.py:464–511) and _extract_recent_rows (line 354) use unnecessary hasattr/getattr guards on typed objects. After the type-based grouping loop already separates results into entities, observations, and relations lists, the code still does defensive field access like:

# recent_activity.py:464 — entity.title is never None on EntitySummary if hasattr(entity, "title") and entity.title else "Untitled"

# recent_activity.py:479 — obs.category always exists on ObservationSummary
category = getattr(obs, "category", "general") if hasattr(obs, "category") else "general"

# recent_activity.py:354 — all three summary types have created_at
"created_at": (primary.created_at.isoformat() if getattr(primary, "created_at", None) else None),

All three summary types (EntitySummary, RelationSummary, ObservationSummary) define these fields. CLAUDE.md says: "Never use getattr(obj, "attr", default) when unsure about attribute names. Check the class definition or source code first." After the type-filtering loop, the field is always present and can be accessed directly.

Fix this →


Compatibility — Windows path splitting ⚠️

_get_project_activity (recent_activity.py:327) and _format_project_output (line 468) split file_path using /:

# recent_activity.py:327 — breaks on Windows
folder = "/".join(result.primary_result.file_path.split("/")[:-1])

# recent_activity.py:468-470 — same issue
folder_path = "/".join(entity.file_path.split("/")[:-1])

CLAUDE.md checklist says "File path comparisons must be windows compatible." Use pathlib.Path(file_path).parent instead:

from pathlib import Path
folder = str(Path(result.primary_result.file_path).parent)

Fix this →


Testing & Documentation

Item Status
Unit tests for new functions ✅ Good coverage of formatting helpers
Integration tests for MCP tools ✅ Entity-only default, explicit type filtering, pagination
Edge cases has_more True/False, no results, observation truncation
100% test coverage ✅ Appears maintained
Tool contract test updated page/page_size added to recent_activity signature
Documentation updated ✅ Docstring explains new params and entity-only default

Architecture

Item Status
MCP tools atomic/composable
No DB migrations needed ✅ Only schema + service changes
Local-first principles preserved
Knowledge graph consistency
API layer unchanged (types still accepted) ✅ Confirmed in PR description

N+1 pagination trick is cleanly applied in both context_service.py:107 (fetch limit + 1) and search_router.py:52 (fetch page_size + 1). Both correctly trim before returning. This is the right approach to avoid COUNT queries.

has_more in discovery mode — when iterating multiple projects, per-project has_more values are silently discarded. This is acceptable for a summary view, but worth noting in a follow-up if users page through discovery results.


Security & Performance

Item Status
No hardcoded secrets
Input validation SearchItemType enum validation raises ValueError
Error handling
Performance ✅ N+1 trick avoids COUNT queries
No sensitive data in logs

Minor Observations

  1. Type annotation style inconsistencyUnion[str, List[str]] on line 39 vs str | list[dict] on line 48. Prefer the Python 3.10+ | style throughout since the project targets 3.12+.

  2. SearchItemType string comparison_extract_recent_rows compares primary.type == "entity" using string literals. Minor — using SearchItemType.ENTITY.value would be more defensible but is low priority.

  3. Discovery mode has_more not surfaced — if a project returns has_more=True in discovery mode, it's not visible to the user. Low priority since discovery mode is a summary view.


Checklist Results

  • Follows Basic Memory coding conventions (mostly — speculative getattr is a violation)
  • Python 3.12+ type annotations and async patterns (inconsistent style, not blocking)
  • SQLAlchemy 2.0 best practices
  • FastAPI and Typer conventions followed
  • 100-character line length maintained
  • No commented-out code blocks
  • Unit tests for new functions/methods
  • Integration tests for new MCP tools
  • Test coverage for edge cases
  • 100% test coverage maintained
  • Documentation updated (docstrings)
  • CLAUDE.md conventions — speculative getattr/hasattr should be removed
  • MCP tools follow atomic, composable design
  • No database migrations needed
  • Preserves local-first architecture
  • Knowledge graph operations consistent
  • No hardcoded secrets
  • Input validation in MCP tools
  • Proper error handling
  • No sensitive data in logs
  • Windows compatibility — path splitting with / should use pathlib.Path
  • No emojis in console/log output (MCP text output is LLM-facing, not console)

Verdict

Two issues should be fixed before merge:

  1. Windows path compatibility in _get_project_activity and _format_project_output
  2. Speculative getattr/hasattr patterns on typed objects — replace with direct attribute access

Everything else looks good. The core changes (entity-only default, has_more, exposed pagination) are correct and well-tested.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14eb35e82e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +116 to +117
"page": page,
"page_size": page_size,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate pagination arguments before calling recent API

The new page/page_size inputs are forwarded as-is, but this tool does not enforce page >= 1 and page_size >= 1. In project mode these values are passed to /v2/projects/{id}/memory/recent, where offset is derived from page and can become negative; Postgres rejects negative OFFSET, so values like page=0 now surface as server/tool errors instead of a clear validation error. This regression appears only after exposing pagination params here, since the previous implementation always used safe constants.

Useful? React with 👍 / 👎.

Prevents negative OFFSET errors from reaching the database when
invalid pagination values (page=0, page_size=-1, etc.) are passed.
Raises clear ValueError messages at the tool boundary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude finished @phernandez's task in 2m 20s —— View job


PR Review Update: fix/recent-activity-dedup-pagination

Tasks

  • Gather context
  • Read changed files
  • Check if previous review issues were addressed
  • Post final review summary

New Commit: e3f8c83 — validate page/page_size before API call

The new commit directly addresses the Codex review comment: page and page_size are now validated at the tool boundary before reaching the API layer, preventing negative OFFSET from hitting the database.

What was added (recent_activity.py:114–121):

if page < 1:
    raise ValueError(f"page must be >= 1, got {page}")
if page_size < 1:
    raise ValueError(f"page_size must be >= 1, got {page_size}")
if page_size > 100:
    raise ValueError(f"page_size must be <= 100, got {page_size}")

The upper bound (page_size > 100) is a sensible addition beyond what Codex flagged. Tests cover all boundary cases (page=0, page=-1, page_size=0, page_size=-5, page_size=999). This is a clean fix.


Remaining Issues from Previous Review

The two items flagged in the first review are still unresolved:

1. Speculative getattr/hasattr — CLAUDE.md violation

All three summary types (EntitySummary, RelationSummary, ObservationSummary) define title, file_path, and created_at as non-optional fields. After the type-grouping loop in _format_project_output, the concrete type is known. The guards are unnecessary:

Location Guarded field Actual schema
recent_activity.py:335 file_path (hasattr) All three types have it
recent_activity.py:363 created_at (getattr) All three types have it
recent_activity.py:388 title (hasattr) All three types have it
recent_activity.py:417 title (hasattr) All three types have it
recent_activity.py:473 title (hasattr) EntitySummary.title: str
recent_activity.py:476 file_path (hasattr) EntitySummary.file_path: str
recent_activity.py:489 category (getattr) ObservationSummary.category: str
recent_activity.py:499 content (getattr) ObservationSummary.content: str
recent_activity.py:513 relation_type (getattr) RelationSummary.relation_type: str
recent_activity.py:518 from_entity (getattr) RelationSummary.from_entity
recent_activity.py:520 to_entity (getattr) RelationSummary.to_entity

Fix this →

2. Windows path splitting — still uses / string split

Two locations still split file_path with / instead of pathlib.Path:

# recent_activity.py:336
folder = "/".join(result.primary_result.file_path.split("/")[:-1])

# recent_activity.py:477-478
folder_path = "/".join(entity.file_path.split("/")[:-1])

Should be:

from pathlib import Path
folder = str(Path(result.primary_result.file_path).parent)

Fix this →


Checklist

  • Follows Basic Memory coding conventions — partial (speculative getattr still present)
  • Python 3.12+ type annotations and async patterns — minor style inconsistency (Union[str, List[str]] vs str | list[str] on same function, non-blocking)
  • SQLAlchemy 2.0 best practices
  • FastAPI and Typer conventions followed
  • 100-character line length maintained
  • No commented-out code blocks
  • Unit tests for new functions/methods
  • Integration tests for new MCP tools
  • Test coverage for edge cases — new pagination validation tests are thorough
  • 100% test coverage maintained
  • Documentation updated (docstrings)
  • CLAUDE.md conventions — speculative getattr/hasattr should be removed
  • MCP tools follow atomic, composable design
  • No database migrations needed
  • Preserves local-first architecture
  • Knowledge graph operations consistent
  • No hardcoded secrets
  • Input validation in MCP tools — improved (page/page_size validation added)
  • Proper error handling
  • No sensitive data in logs
  • Windows compatibility — path splitting with / should use pathlib.Path

Verdict

The new commit cleanly resolves the Codex-flagged pagination validation issue. Two items from the first review remain open before merge: the speculative getattr/hasattr patterns (CLAUDE.md violation) and the Windows-incompatible / path splitting. Both are mechanical fixes.

…tting

All three summary types (EntitySummary, ObservationSummary, RelationSummary)
define title, file_path, created_at as required fields. Replace defensive
getattr/hasattr with direct attribute access per CLAUDE.md guidelines.

Replace string-based path splitting ("/".join(path.split("/")[:-1])) with
PurePosixPath().parent for cross-platform correctness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude finished @phernandez's task in 2m 9s —— View job


PR Review Update: fix/recent-activity-dedup-pagination

Tasks

  • Gather context
  • Read changed files from new commit (024bb04)
  • Verify speculative getattr/hasattr fixes
  • Verify pathlib path splitting fixes
  • Post final review summary

New Commit: 024bb04 — remove speculative getattr/hasattr and use pathlib for path splitting

This commit directly resolves both remaining issues from the previous reviews.


Issue 1: Speculative getattr/hasattr — RESOLVED ✅

All eleven guarded field accesses have been replaced with direct attribute access. Verified against the schemas in schemas/memory.py:

Field Schema definition Fix
entity.title EntitySummary.title: str (required) entity.title or "Untitled"
entity.file_path EntitySummary.file_path: str (required) entity.file_path directly
obs.category ObservationSummary.category: str (required) obs.category directly
obs.content ObservationSummary.content: str (required) obs.content directly
rel.relation_type RelationSummary.relation_type: str (required) rel.relation_type directly
rel.from_entity RelationSummary.from_entity: Optional[str] rel.from_entity or "Unknown"
rel.to_entity RelationSummary.to_entity: Optional[str] rel.to_entity directly (None handled downstream)
primary.created_at All three types: created_at: datetime (required) primary.created_at.isoformat() if primary.created_at else None
latest.title (discovery) EntitySummary.title: str (required) latest.title or "Recent activity"

All guards removed correctly. The or "Unknown" / or "Untitled" / or "Recent activity" fallbacks on Optional or truly nullable fields are appropriate — these are value-level defaults, not attribute guards.


Issue 2: Windows path splitting — RESOLVED ✅

Both locations replaced with PurePosixPath:

# Before
folder = "/".join(result.primary_result.file_path.split("/")[:-1])

# After
folder = str(PurePosixPath(result.primary_result.file_path).parent)
if folder and folder != ".":  # correctly filters root-level files
    active_folders.add(folder)

PurePosixPath is the right choice here. Basic Memory stores paths in the database using POSIX separators (confirmed by validate_memory_url_path which uses / as the canonical separator). PurePosixPath is OS-agnostic and always produces forward-slash output — using Path instead would emit backslashes on Windows for stored POSIX paths, which would be incorrect. The added folder != "." guard correctly handles root-level files where PurePosixPath("file.txt").parent returns ".".


Minor Observations (non-blocking, unchanged from previous reviews)

  1. Type annotation styleUnion[str, List[str]] (line 40) alongside str | list[dict] (line 48) in the same function signature. Minor cosmetic inconsistency in a 3.12+ codebase.
  2. String literal type comparisonprimary.type == "entity" (line 416) instead of SearchItemType.ENTITY.value. Low priority.

Neither blocks merge.


Checklist Results

  • Follows Basic Memory coding conventions — fixed: speculative getattr/hasattr removed
  • Python 3.12+ type annotations and async patterns — minor style inconsistency remains (non-blocking)
  • SQLAlchemy 2.0 best practices
  • FastAPI and Typer conventions followed
  • 100-character line length maintained
  • No commented-out code blocks
  • Unit tests for new functions/methods
  • Integration tests for new MCP tools
  • Test coverage for edge cases
  • 100% test coverage maintained
  • Documentation updated (docstrings)
  • CLAUDE.md conventions — speculative getattr/hasattr removed
  • MCP tools follow atomic, composable design
  • No database migrations needed
  • Preserves local-first architecture
  • Knowledge graph operations consistent
  • No hardcoded secrets
  • Input validation in MCP tools — page/page_size validation added (e3f8c83), entity-only default in place
  • Proper error handling
  • No sensitive data in logs
  • Windows compatibility — fixed: PurePosixPath replaces string splitting

Verdict

All blocking issues from both previous reviews are resolved. The three commits together deliver the entity-only default, exposed pagination, N+1 has_more, pagination guidance text, page/page_size validation, direct attribute access, and cross-platform path handling. Ready to merge.

5 new tests in test-int/mcp/test_pagination_integration.py:

- search_notes: page 1 and page 2 return disjoint result sets
- search_notes: has_more transitions from True to False on last page
- recent_activity: page 1 and page 2 return disjoint result sets
- recent_activity: has_more text guidance appears/disappears correctly
- recent_activity: entity-only default filters out observations/relations

All tests go through the full MCP → API → service → DB stack using
write_note to seed real data, then verify pagination offsets produce
correct non-overlapping windows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude finished @phernandez's task in 2m 15s —— View job


PR Review Update: fix/recent-activity-dedup-pagination

Tasks

  • Gather context
  • Read new commit changes (5eb6db3)
  • Verify test quality and coverage
  • Post final review summary

New Commit: 5eb6db3 — integration tests for pagination through to the database

This commit adds 5 integration tests in test-int/mcp/test_pagination_integration.py that exercise the full MCP → API → service → DB stack using real data seeded by write_note.

Tests added:

Test What it verifies
test_search_notes_pages_are_disjoint Pages 1 and 2 return non-overlapping result sets, union covers all 6 seeded notes
test_search_notes_has_more_becomes_false has_more is True on page 1 (4 results, page_size=3), False on page 2 (1 result)
test_recent_activity_pages_are_disjoint Same disjoint coverage check for recent_activity entity-only default
test_recent_activity_has_more_becomes_false Text output includes "Use page=2 to see more" on page 1, absent on page 2
test_recent_activity_entity_only_default_with_relations Default call returns only "entity" type items; explicit type=["entity","observation","relation"] returns more items

Test isolation is correctengine_factory and test_project are function-scoped (default pytest_asyncio.fixture) and each creates a fresh database + temp directory per test, so the len(pageN) == N assertions are not at risk of pollution from other tests.

Text assertions are correct"Use page=2 to see more" matches _format_project_output line 518 (f"Use page={page + 1} to see more."), and "items found" matches line 521 (f"{total} items found.").

JSON output assertions are correct_extract_recent_rows returns list[dict] with a "title" key for every row, so {item["title"] for item in page1} is valid.


Minor Observation (non-blocking)

test_recent_activity_entity_only_default_with_relations asserts that "entity" in types_returned and that len(all_payload) > len(payload), but does not explicitly assert "observation" in types_returned or "relation" in types_returned after requesting all three types. The count check is a reasonable proxy since a note with two observations and one relation was written, but an explicit type assertion would be slightly stronger. Low priority — does not block merge.


Summary

All four commits together deliver:

  1. Entity-only default — prevents a well-connected entity from flooding results with its observations/relations
  2. Exposed page/page_size params — were previously hardcoded
  3. has_more field via N+1 trick — no COUNT queries
  4. Pagination guidance text — "Use page=2 to see more"
  5. Validation of page/page_size at the tool boundary (prevents negative DB offset)
  6. Direct attribute access — speculative getattr/hasattr removed (CLAUDE.md compliance)
  7. Cross-platform path handlingPurePosixPath replaces string splitting
  8. End-to-end integration tests — full stack pagination verified

All blocking issues from the three prior reviews are resolved. Ready to merge.

@phernandez phernandez merged commit ee03975 into main Feb 21, 2026
26 checks passed
@phernandez phernandez deleted the fix/recent-activity-dedup-pagination branch February 21, 2026 03:49
@phernandez phernandez added this to the v0.19.0 milestone Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant