Skip to content

CALLS-edge noise problem: neighbors(out, [CALLS]) returns ~80% noise on typical service methods #177

@HumanBean17

Description

@HumanBean17

Problem

neighbors(out, [CALLS]) on a typical service method returns a wall of edges where most are noise. An agent doing hop-based exploration (neighbors(out, [CALLS]) → pick target → repeat) floods its context window with ~30 noise items per ~5 signal items at each hop.

Evidence

Concrete example from the bank-chat-system fixture — ChatManagementService#assign(AssignmentRequest) returns 35 outgoing CALLS edges:

Category Count % Examples Signal
Business-logic delegation ~3 ~8% SplitResolverService#resolveSplitName(String), DistributionTriggerPublisher#publishTrigger() High
Repository / persistence ~4 ~12% AssignChatRepository#findByConversationId(String), AssignQueueRepository#save(?) Medium
Entity accessor noise ~15 ~43% AssignChatEntity#setEpkId(String), AssignmentRequest#getConversationId() ×3 Low
Phantom / chained / JDK ~13 ~37% ?ResponseStatusException#<init>(2), ?c#setId(1), UUID#randomUUID(?), Instant#now(?), ?...#orElseGet(1) Zero

After 3 hops: ~90 tokens of noise versus ~15 of signal.

Why NodeFilter on neighbors can't help today

  • All targets (services, entities, phantoms) have role=OTHER — the role system doesn't distinguish "service I delegate to" from "entity I call a setter on."
  • There's no filter on edge attributes (confidence, strategy, resolved) — these exist in attrs on the response but can't be used as input predicates.
  • There's no exclude_phantom or min_confidence parameter.
  • Duplicate edges (same callee called N times from different call sites) aren't deduplicated.

Noise taxonomy

The noise breaks into recognizable categories that could be filtered either server-side or client-side:

  1. Phantom / chained-receiver edges: strategy in (phantom, chained_receiver) or confidence < 0.3. These are unresolved call sites — the graph builder couldn't determine the actual target.
  2. JDK / library utility calls: fqn starts with java., javax., org.slf4j., org.apache.logging., lombok.. Standard library calls that never carry business logic.
  3. Entity accessor calls: getter (get*), setter (set*), constructor (<init>) on DTO/Entity types. High volume, low information during hop traversal.
  4. Duplicates: same callee FQN from multiple call sites within the same method.

Possible solution directions (not prescriptive)

  • Skill-layer heuristic filtering — a skill (e.g. /mini-map) calls neighbors, receives all edges, and applies classification rules client-side. No MCP surface change. See PR propose: agent skills and commands — Layer 3 over the 4-tool MCP #59 (/mini-map skill).
  • Server-side edge-attr filter — add min_confidence, exclude_phantom, or general edge-attr predicates to neighbors. Requires a propose for MCP surface change.
  • Smarter role assignment — classify entity accessor methods differently from business-logic methods so NodeFilter can distinguish them. Requires ontology work.
  • Deduplicate in response — collapse same-callee edges into one with a count. Small MCP change.

These are not mutually exclusive.

Reproduction

rm -rf /tmp/noise-check && .venv/bin/python build_ast_graph.py \
  --source-root tests/bank-chat-system --kuzu-path /tmp/noise-check/code_graph.kuzu --verbose

.venv/bin/python -c "
from mcp_v2 import neighbors_v2, find_v2
from kuzu_queries import KuzuGraph

g = KuzuGraph('/tmp/noise-check/code_graph.kuzu')
KuzuGraph._instance = g

result = find_v2(kind='symbol', filter={'fqn_prefix': 'com.bank.chat.assign.service.ChatManagementService#assign'}, limit=1, graph=g)
sym_id = result.results[0].id

nb = neighbors_v2(ids=sym_id, direction='out', edge_types=['CALLS'], limit=50, graph=g)
for e in nb.results:
    attrs = e.attrs
    print(f'{e.other.fqn}  strategy={attrs.get(\"strategy\",\"?\")} conf={attrs.get(\"confidence\",\"?\")} resolved={attrs.get(\"resolved\",\"?\")}')
print(f'\nTotal: {len(nb.results)} edges')
"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions