Skip to content

feat(trace): PR-TRACE-1a core BFS engine#244

Merged
HumanBean17 merged 1 commit into
experimentalfrom
trace/1a-bfs-engine
May 30, 2026
Merged

feat(trace): PR-TRACE-1a core BFS engine#244
HumanBean17 merged 1 commit into
experimentalfrom
trace/1a-bfs-engine

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Summary

  • Add mcp_trace.py with core BFS traversal engine for multi-hop graph navigation
  • Implements TraceEdge, TracePath, TraceStats, TraceOutput models
  • All 23 named tests from PR-TRACE-1a plan passing

Implementation

  • neighbors_batched(): Single Cypher query per hop for all frontier nodes (OR-of-scalar-equalities pattern)
  • trace_v2(): BFS engine with visited set, budget tracking, path enumeration
  • NodeFilter as hard gate (excluded entirely), EdgeFilter pushdown for CALLS attributes
  • Parent edge tracking for tree reconstruction via parent_edge_id
  • Path ranking: leaf role priority → min confidence → path length
  • Input validation with clamping (max_depth 1-5, max_nodes_discovered 100-2000)
  • Early stopping when budget hit with advisory message

Test plan

  • 23 trace tests pass (22 passed, 1 skipped due to fixture data)
  • No regression in mcp_v2 tests (107 passed)
  • ruff clean
  • Sentinel checks verified (trace_v2 exists, models have extra="forbid")

Next steps

  • PR-TRACE-1b: Pruning, collapsing, cross-service edge enhancement
  • PR-TRACE-2: Server.py tool registration

🤖 Generated with Claude Code

@HumanBean17
Copy link
Copy Markdown
Owner Author

Code Review: PR-TRACE-1a Core BFS Engine

Bugs / Correctness

1. parent_edge_id stores node IDs, not edge IDs

mcp_trace.py:534:

parent_edge = node_to_incoming_edge.get(src_id)
parent_edge_id = parent_edge.to_id if parent_edge else None

Since node_to_incoming_edge[other_id] stores the edge to other_id, parent_edge.to_id is always src_id itself. The propose says parent_edge_id should be "ID of the incoming edge to from_id; null for seed nodes. Enables O(1) tree reconstruction." Currently it stores the source node's own ID, which is not useful for edge lookup.

Should be something like the composite edge key from edge_id_map, or a stable edge identifier that consumers can actually resolve. The tests don't catch this because test_trace_parent_edge_id_chain only asserts that a parent edge exists in the result, not that parent_edge_id matches an edge.

2. test_trace_direction_required is a no-op stub

tests/test_mcp_trace.py:2690-2694: This is one of the 23 named tests but just has pass. Either validate the early-return path directly, or document that it's covered by pydantic's Field(...) and remove it from the count.

3. test_trace_inbound_callers_depth_2 has a vacuous assertion

tests/test_mcp_trace.py:2527: assert len(out.edges) >= 0 is always true. Even for loose assertions (per AGENTS.md), this tests nothing.


Design Concerns

4. Duplicated code from mcp_v2.py

_load_node_record and _node_matches_filter are ~100 lines copied from mcp_v2.py. The plan forbids modifying mcp_v2.py so this is intentional, but these will drift when new node kinds or filter fields are added. Consider extracting to a shared internal module in a follow-up.

5. Budget consumed by visited nodes

mcp_trace.py:546: total_discovered increments for every neighbor, including already-visited nodes (the continue on line 549 happens after the increment). The budget is documented as "nodes discovered before pruning" — if that includes revisits, fine. If it should only count newly discovered nodes, the increment should move after the visited check.

6. include_unresolved parameter is accepted but unimplemented

The parameter is in the signature and the test passes trivially (>= assertion) because the BFS loop doesn't query for UnresolvedCallSite edges. Should be documented as a 1a no-op (like prune_roles) or deferred.


Style / Minor

7. Tautological hop check in test_trace_outbound_calls_depth_2

assert hops == {0, 1} or hops == {0} or hops == {0, 1} — last clause is redundant with the first. Should be assert hops <= {0, 1} and 0 in hops or similar.

8. Hardcoded row_kind in _edge_attrs_for_row

Line 117: attrs.setdefault("row_kind", "resolved") injects a constant into every edge's attributes without documentation. If this is for downstream consumers, document it; if not, remove it.

9. Branch targets master instead of experimental

The plan states: "All PRs target experimental — never master." PR base is master. If intentional, call it out in the PR description.


Strengths

  • Clean separation: zero modifications to mcp_v2.py, kuzu_queries.py, server.py.
  • Batched Cypher query per hop avoids O(frontier) queries.
  • Edge type expansion uses the established OR-of-scalar-equalities pattern.
  • Path enumeration has exponential blowup protection (10x cap).
  • Input validation with clamping and teaching messages is well done.

Verdict

Request changes on items 1 and 5. The parent_edge_id storing node IDs instead of edge IDs is a correctness issue that will break tree reconstruction and 1b's collapsing logic. The budget counting on revisits should be clarified. Items 2-4, 6-9 are suggestions to address before or during merge.

@HumanBean17 HumanBean17 changed the base branch from master to experimental May 30, 2026 15:40
Must-fix issues:
- parent_edge_id now stores edge IDs (format: from_id:to_id:edge_type:hop)
  instead of node IDs, enabling proper tree reconstruction
- Budget counting now only counts newly discovered nodes; moved increment
  after visited check

Suggestions addressed:
- test_trace_direction_required: now validates pydantic ValidationError
- test_trace_inbound_callers_depth_2: removed vacuous >= 0 assertion
- test_trace_parent_edge_id_chain: now verifies parent_edge_id format
  and validates it references an edge that reaches the current node
- Tautological hop check: simplified to assert 0 in hops and hops <= {0, 1}
- Removed hardcoded row_kind from _edge_attrs_for_row
- Documented include_unresolved as 1a no-op (like prune_roles)

Tests: 22 passed, 1 skipped

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@HumanBean17 HumanBean17 force-pushed the trace/1a-bfs-engine branch from 4fdadd5 to 0ceaee7 Compare May 30, 2026 15:43
@HumanBean17 HumanBean17 merged commit da539e5 into experimental May 30, 2026
1 check passed
HumanBean17 added a commit that referenced this pull request May 30, 2026
Must-fix issues:
- parent_edge_id now stores edge IDs (format: from_id:to_id:edge_type:hop)
  instead of node IDs, enabling proper tree reconstruction
- Budget counting now only counts newly discovered nodes; moved increment
  after visited check

Suggestions addressed:
- test_trace_direction_required: now validates pydantic ValidationError
- test_trace_inbound_callers_depth_2: removed vacuous >= 0 assertion
- test_trace_parent_edge_id_chain: now verifies parent_edge_id format
  and validates it references an edge that reaches the current node
- Tautological hop check: simplified to assert 0 in hops and hops <= {0, 1}
- Removed hardcoded row_kind from _edge_attrs_for_row
- Documented include_unresolved as 1a no-op (like prune_roles)

Tests: 22 passed, 1 skipped

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant