Skip to content

feat: add Logfire phased instrumentation#692

Merged
phernandez merged 10 commits intomainfrom
feat/logfire-phased-instrumentation
Mar 25, 2026
Merged

feat: add Logfire phased instrumentation#692
phernandez merged 10 commits intomainfrom
feat/logfire-phased-instrumentation

Conversation

@phernandez
Copy link
Member

@phernandez phernandez commented Mar 24, 2026

Summary

  • Add Logfire telemetry integration with config-gated bootstrap (logfire_enabled = true in config enables; false by default — zero overhead when off)
  • Instrument key paths: MCP tool execution, project routing, sync service, search, and API endpoints with structured spans and failure details
  • Improve MCP log clarity by binding loguru context to telemetry scopes
  • logfire is an optional dependency (pip install basic-memory[telemetry])

Details

Phased instrumentation strategy following docs/logfire-instrumentation-strategy.md:

  1. Bootstrap & config gatingtelemetry.py module with configure_telemetry() and span helpers
  2. Root span boundaries — MCP server lifecycle, sync coordinator, API startup
  3. Routing & sync spans — project context routing, sync service phases (discover → parse → save → link)
  4. Failure-focused details — error attributes on spans, search scoring, tool input capture
  5. Loguru ↔ telemetry binding — structured log context flows into OpenTelemetry spans

Setup:

  1. pip install basic-memory[telemetry] (or logfire is already included in dev dependencies)
  2. Set logfire_enabled = true in ~/.basic-memory/config.json
  3. Optionally set LOGFIRE_TOKEN env var for backend export to Logfire

Includes comprehensive test coverage (telemetry tests for each instrumented layer) and a smoke workflow (just test-smoke-logfire).

Test plan

  • just fast-check passes
  • just test-smoke-logfire smoke test validates spans are emitted when enabled
  • Verify no performance impact when logfire_enabled is false (telemetry is fully no-op)
  • Verify graceful degradation when logfire package is not installed

🤖 Generated with Claude Code

Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 24, 2026

Claude finished @phernandez's task in 3m 31s —— View job


PR Review: feat/logfire-phased-instrumentation

Tasks:

  • Gather context and understand the PR
  • Read core telemetry module
  • Review instrumented files
  • Review test coverage
  • Check against team checklist
  • Post final review

Summary

This is a well-structured telemetry integration. The config-gated bootstrap, zero-overhead no-op path when disabled, and layered span / scope / operation abstractions are clean. The test coverage via fake context managers is thorough. A few issues need attention before merge.


Issues

1. # pragma: no cover on functions that ARE tested

init_cli_logging, init_mcp_logging, and init_api_logging are all marked # pragma: no cover in config.py (lines 992, 1003, 1014), but tests/test_telemetry.py::test_init_logging_functions_configure_telemetry_and_logging directly calls all three. This means the test provides coverage but it's silently excluded from the coverage report, undermining the 100% coverage requirement.

Since these functions are tested (via monkeypatched dependencies), the pragmas should be removed. If there's a reason they can't be tested fully (e.g., the log_to_file=True path causing real file I/O), that specific branch should carry the pragma — not the entire function.

Fix this →


2. PR description says "LOGFIRE_TOKEN enables" — code says logfire_enabled = true

The PR summary states "LOGFIRE_TOKEN enables, absent disables — zero overhead when off" but the actual gate is logfire_enabled: bool in BasicMemoryConfig (config.py line 146). A user who sets LOGFIRE_TOKEN in their environment but leaves logfire_enabled at its default (false) will see no telemetry, with no warning.

LOGFIRE_TOKEN is consumed by Logfire's own configure() for the backend connection — it doesn't auto-enable the integration. The docs/README and strategy doc should be updated to clarify the two-step setup:

  1. Set logfire_enabled = true in ~/.basic-memory/config.json
  2. Set LOGFIRE_TOKEN for backend export (optional if only local inspection)

3. Duplicate helper functions: _canonicalize_project_name vs _configured_project_name

project_context.py contains two nearly-identical helpers:

  • _canonicalize_project_name (lines 67–85): returns the configured name if found, else returns project_name unchanged
  • _configured_project_name (lines 88–101): returns the configured name if found, else returns None

The only difference is the fallback. Both iterate config.projects with the same permalink logic. The names are confusingly similar and their distinction ("canonicalize" vs "configured") isn't obvious. Consider consolidating into one with a clear name or at minimum adding a comment explaining why both exist.

Fix this →


4. telemetry.operation vs telemetry.scope are functionally identical

telemetry.py lines 178–180:

@contextmanager
def operation(name: str, **attrs: Any) -> Iterator[None]:
    with scope(name, **attrs):
        yield

operation is a pure pass-through to scope with no behavioral difference. The semantic distinction ("root operation" vs "nested scope") is documented but not enforced. This makes it easy to misuse (scope at root level, operation nested). Either add enforcement (e.g., track nesting depth via a ContextVar and warn) or collapse them into one function with a is_root kwarg — or at minimum add a docstring note stating the two are aliases and the distinction is convention only.


5. Sync context managers wrapping yield in @asynccontextmanager — span lifecycle covers full tool execution

In get_project_client (project_context.py):

with telemetry.scope("routing.resolve_client", ...):
    async with get_client(...) as client:
        active_project = await get_active_project(...)
        yield client, active_project   # span stays open until caller exits

The yield is inside the with telemetry.scope(...) block, so the span remains open for the entire duration of the caller's async with get_project_client(...) block — including all tool I/O. This means routing.resolve_client spans can last seconds rather than milliseconds. The span name implies it's measuring routing resolution time, not tool execution time. Consider closing the routing span before yielding, or renaming it to routing.tool_execution to accurately reflect what's being measured.


6. logfire added as a hard dependency in pyproject.toml

pyproject.toml adds "logfire>=4.19.0" to the core dependencies. Since telemetry is optional (config-gated), this adds a mandatory install dependency for all users. The _load_logfire() lazy import guard handles the missing-package case gracefully, but users who don't want telemetry still pay the install cost (~several MB).

Consider moving logfire to an optional extras group (e.g., [telemetry]) and documenting the install step in the telemetry setup guide. This aligns with the "zero overhead when off" design goal.


Team Checklist

Item Status Notes
Follows Basic Memory coding conventions Literate comments, section headers, decision-point comments
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices No DB changes
FastAPI and Typer conventions ctx.with_resource() for CLI span is idiomatic
100-character line length
No commented-out code
Unit tests for new functions Comprehensive fake-logfire coverage
Integration tests for MCP tools test_tool_telemetry.py covers all major tools
Test coverage edge cases Missing dependency, disabled, retry without send_to_logfire
100% test coverage maintained ⚠️ # pragma: no cover on tested init functions (see Issue 1)
Documentation updated docs/logfire-instrumentation-strategy.md added
CLAUDE.md updated if conventions change N/A No convention changes
MCP tools follow atomic, composable design
Database changes include Alembic migrations No DB changes
Preserves local-first architecture Zero overhead path when disabled
No hardcoded secrets
Input validation for MCP tools
Proper error handling and logging pop_telemetry_warnings() surfaces errors after logger init
Performance when disabled telemetry_enabled() short-circuits all paths
No sensitive data in logs Only structural metadata in span attributes
Windows compatibility No path operations in telemetry code
No emojis/unicode in console/log output

Minor Observations

  • telemetry.py line 120–123 uses triple # pragma: no cover on adjacent lines — standard practice is a single pragma on the except Exception line only; the pragma on the body lines is redundant but harmless.
  • _load_logfire() is called on every span() invocation (when enabled). Python's import cache makes this fast, but a module-level cache of the resolved logfire object after configure_telemetry would be marginally cleaner.
  • The with telemetry.scope(...) in resolve_workspace_parameter and get_active_project are synchronous context managers used in async functions — this is correct with asyncio/contextvars and works as expected.

Overall this is a solid, well-tested implementation. Issues 1 (coverage pragma) and 6 (hard dependency) are the most important to address before merge.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23e0073ce6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# project. Resolving it through /v2/projects/resolve creates noisy false-negative
# logs before the tool recovers with the active project anyway.
# Outcome: skip project resolution and keep the path within the active project.
if project_prefix and (configured_prefix is not None or project is None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve memory URL prefixes even when project is preselected

The new guard if project_prefix and (configured_prefix is not None or project is None) skips /v2/projects/resolve whenever a project is already fixed and the prefix is not present in local config. In cloud routing, local config may not contain all accessible projects, so valid inputs like memory://other-project/note can be misinterpreted as a path inside the active project (for example <active>/other-project/note) instead of routing to other-project. This regresses cross-project memory URL behavior in any flow that has already resolved project.

Useful? React with 👍 / 👎.

The telemetry PR introduced a guard that skipped /v2/projects/resolve
when a project was already fixed and the URL prefix wasn't in local
config. This broke cloud routing where remote projects aren't locally
configured — valid URLs like memory://other-project/note would be
misinterpreted as paths within the active project.

Revert to the original unconditional `if project_prefix:` guard and
remove the now-unused `_configured_project_name()` helper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 24, 2026

Claude finished @phernandez's task in 1m 10s —— View job


PR Review Update: feat/logfire-phased-instrumentation

Tasks:

  • Identify new commits since last review
  • Review new changes (fe32ac5: fix cross-project memory URL resolution)
  • Check if previous review issues were addressed
  • Post final review update

What changed in fe32ac5

The new commit directly addresses the Codex P1 regression: cross-project memory URL resolution was broken when a project was already pre-selected. The fix is clean and correct:

  • Removes _configured_project_name helper (which was used only in the now-reverted guard)
  • Reverts the conditional in resolve_project_and_path from if project_prefix and (configured_prefix is not None or project is None): back to the simpler if project_prefix: — restoring unconditional resolution via /v2/projects/resolve
  • Removes the test that validated the reverted optimization (appropriate — it tested behavior that no longer exists)

This is the right call. The optimization was trading correctness for noise reduction. The "noisy false-negative logs" concern can be addressed at the logging level (e.g., log at DEBUG instead of WARNING on project-not-found), not by skipping resolution.

Partially addresses Issue 3 from prior review: _configured_project_name is gone, so the duplicate-helper concern is resolved. _canonicalize_project_name (lines 67–85) remains and is still used — that's correct.


Remaining open issues from prior review

These were not addressed in this push:

# Issue Status
1 # pragma: no cover on init_cli_logging, init_mcp_logging, init_api_logging in config.py — these functions are fully tested Still open
2 PR description says "LOGFIRE_TOKEN enables" but actual gate is logfire_enabled: bool in config — docs/README need clarification Still open
4 telemetry.operation is a pure pass-through to telemetry.scope — semantic distinction is convention only, not enforced Still open (minor)
5 routing.resolve_client span in get_project_client wraps the yield, so it stays open for the entire tool execution rather than just routing resolution time Still open (minor)
6 logfire is a hard dependency in pyproject.toml — should be an optional extras group given the "zero overhead when off" design goal Still open

Issues 1 and 6 remain the most important before merge. The fix in this commit is solid — the cross-project routing regression is the kind of subtle correctness bug that would be painful to track down in production.

- Remove `# pragma: no cover` from init_cli/mcp/api_logging since
  they are fully tested (issue 1)
- Move logfire to optional dependency `pip install basic-memory[telemetry]`
  while keeping it in dev deps for testing (issue 6)
- Collapse `telemetry.operation` into an alias for `telemetry.scope`
  with a comment explaining the convention (issue 4)
- Rename `routing.resolve_client` span to `routing.client_session`
  to accurately reflect that it covers the full tool execution
  lifetime, not just routing resolution (issue 5)
- Update PR description to clarify two-step setup:
  logfire_enabled config flag + optional LOGFIRE_TOKEN (issue 2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

Claude finished @phernandez's task in 2m 3s —— View job


PR Review Update: feat/logfire-phased-instrumentation

Tasks:

  • Gather context from PR and prior reviews
  • Read new commits since last review (02b42c6 refactor: address PR review feedback)
  • Check if prior issues were addressed
  • Review key files for outstanding issues
  • Post final review

What changed in 02b42c6

This commit directly addresses all 6 issues from the prior review. Here's what was resolved:

# Prior Issue Status
1 # pragma: no cover on init_cli/mcp/api_logging in config.py — these are fully tested Fixed — pragmas removed
2 PR description said "LOGFIRE_TOKEN enables" but actual gate is logfire_enabled: bool Fixed — PR description updated; logfire_enabled = true is now the documented first step
3 Duplicate _canonicalize_project_name / _configured_project_name helpers Already fixed in fe32ac5 (prior review cycle)
4 telemetry.operation was a pure pass-through to telemetry.scope with no enforcement Fixed — collapsed to a module-level alias with a comment documenting the convention-only distinction
5 routing.resolve_client span stayed open for the full tool execution lifetime, not just routing Fixed — renamed to routing.client_session, accurately reflecting what it measures
6 logfire was a hard dependency for all users Fixed — moved to [project.optional-dependencies] telemetry = ["logfire>=4.19.0"]; kept in [dependency-groups] dev for tests

Remaining # pragma: no cover (all appropriate)

The remaining pragmas in telemetry.py are on genuinely hard-to-test paths:

  • Lines 120–123: except Exception as exc inside logfire.configure() — requires failure injection into a third-party call
  • Lines 190–192: if logfire is None inside span() — only reachable if logfire is importable at configure_telemetry time but disappears afterward (can't happen in practice)

These are appropriate uses per the project's # pragma: no cover policy.


Team Checklist

Item Status Notes
Follows Basic Memory coding conventions Literate comments, section headers, decision-point comments throughout
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices No DB changes
FastAPI and Typer conventions
100-character line length
No commented-out code
Unit tests for new functions Comprehensive fake-logfire coverage in tests/test_telemetry.py
Integration tests for MCP tools test_tool_telemetry.py covers all major tools
Test coverage edge cases Missing dependency, disabled path, retry without send_to_logfire
100% test coverage maintained Pragmas removed from init_cli/mcp/api_logging (Issue 1 resolved)
Documentation updated docs/logfire-instrumentation-strategy.md added; span name updated in docs to match rename
CLAUDE.md updated if conventions change N/A No convention changes
MCP tools follow atomic, composable design
Database changes include Alembic migrations N/A No DB changes
Preserves local-first architecture Zero overhead path when disabled
No hardcoded secrets
Input validation for MCP tools
Proper error handling and logging pop_telemetry_warnings() surfaces errors after logger init
Performance when disabled telemetry_enabled() short-circuits all paths
No sensitive data in logs Only structural metadata in span attributes
Windows compatibility No path operations in telemetry code
No emojis/unicode in console/log output
logfire optional dependency Moved to [telemetry] extras; dev deps retain it for tests

Summary

All prior issues are resolved. The PR is clean:

  • logfire is an optional install (pip install basic-memory[telemetry])
  • Telemetry is fully no-op when logfire_enabled = false (default)
  • Coverage pragmas are only on genuinely untestable paths
  • Span names accurately reflect their lifetime scope
  • telemetry.operation / telemetry.scope distinction is documented as convention-only with an alias

This is ready to merge.

@phernandez phernandez merged commit 4791e19 into main Mar 25, 2026
25 checks passed
@phernandez phernandez deleted the feat/logfire-phased-instrumentation branch March 25, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant