Skip to content

feat: tolerate JSON-encoded NodeFilter in MCP search/find/neighbors#55

Merged
HumanBean17 merged 2 commits into
masterfrom
cursor/mcp-filter-json-string
May 7, 2026
Merged

feat: tolerate JSON-encoded NodeFilter in MCP search/find/neighbors#55
HumanBean17 merged 2 commits into
masterfrom
cursor/mcp-filter-json-string

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Scope

Allows weak LLM / MCP clients to pass filter as a JSON-encoded string (serialized NodeFilter) without failing FastMCP JSON-RPC validation (params/filter must be object). Standalone robustness change (no linked plan file).

What Changed

  • Added _coerce_filter() in mcp_v2.py — strict json.loads, decoded value must be a JSON object; whitespace-only string treated as no filter.
  • Widened search_v2 / find_v2 / neighbors_v2 so filter may be str; after coercion, find_v2 maps empty filter to {}.
  • Widened filter parameter types on search / find / neighbors in server.py, updated Field descriptions, noted string filters in _INSTRUCTIONS.
  • Added five tests in tests/test_mcp_v2.py; updated MCP tool table and note in README.md.

Semantics / Non-Goals

  • No ontology bump, no Kuzu or LanceDB schema changes, no re-index requirement.
  • No Python-literal parsing (e.g. single-quoted dicts); JSON only.
  • No changes to other tool parameters (path_contains, edge_types, ids, etc.).

Validation

Lint

  • ruff check . — clean on branch before push; not re-run during PR open.

Tests

  • pytest tests — author confirms green; not re-run during PR open (last full run on branch: 330 passed, 7 skipped).

Sentinel checks

  • git diff master...HEAD --name-only → README.md, mcp_v2.py, server.py, tests/test_mcp_v2.py only.

Manual evidence

  • Weak MCP clients sending filter as a string (e.g. {"exclude_roles":["CONTROLLER"]}) should pass transport validation; decoding and NodeFilter validation are covered by unit tests.

Out of Scope Confirmed

Did not implement:

  • Tier 2 incremental rebuild / related proposes.
  • Graph builder, ranking, java_ontology.py, or fixture-only special cases in production code.
  • CLI (user-rag) feature work beyond README tool docs.

Definition of Done

  • All listed deliverables for this PR are shipped.
  • Required lint/tests pass locally with recorded command output.
  • Sentinel checks produce expected results.
  • Only in-scope files are modified.
  • PR description includes scope, validation, and manual evidence.
  • PR targets master with agreed title and branch naming.

Co-authored-by: Cursor <cursoragent@cursor.com>
@HumanBean17
Copy link
Copy Markdown
Owner Author

Review: PR #55 — tolerate JSON-encoded NodeFilter in MCP search/find/neighbors

Verdict: Approved ✅

Tightly scoped weak-model tolerance fix. _coerce_filter is a 20-line helper in mcp_v2.py, the filter parameter on all three tools accepts dict | str | null at the FastMCP boundary (and the NodeFilter Python type stays internal — the wire schema is now structurally minimal), and 5 regression tests lock both the happy paths and the failure paths. No ontology bump, no schema delta, surface stays at 4. The implementation matches what we discussed with one well-judged refinement (whitespace-only string → no filter).

Scope discipline

Sentinel Status
ONTOLOGY_VERSION / SCHEMA_VERSION ✅ 0 hits
CREATE NODE TABLE / CREATE REL TABLE / DROP TABLE ✅ 0 hits
@mcp.tool count in server.py ✅ 4 (unchanged)
Files touched ✅ exactly what the PR description claims: README.md, mcp_v2.py, server.py, tests/test_mcp_v2.py

What landed (verified)

# Deliverable Verified
1 _coerce_filter helper handles None, NodeFilter, dict (passthrough), str (json.loads → must-be-dict), invalid types ✅ mcp_v2.py:54-71 — strict json.loads, decoded value asserted to be dict, whitespace-only string short-circuits to None, raises ValueError with "JSON" in message on failure
2 search_v2 / find_v2 / neighbors_v2 accept filter: ... | str ✅ all 3 widened, all 3 route through _coerce_filter before NodeFilter.model_validate
3 find_v2 empty-filter mapping: _coerce_filter("") → None → {} (because find requires a filter) ✅ mcp_v2.py:399-401 — if raw_filter is None: raw_filter = {} after coercion
4 FastMCP-level Field types widened to dict[str, Any] | str | None (or ... | str for required find) — keeps NodeFilter out of the wire schema ✅ verified the actual emitted JSON Schema: search.filter is {"anyOf": [{"type":"object"}, {"type":"string"}, {"type":"null"}]}; find.filter is {"anyOf": [{"type":"object"}, {"type":"string"}]} (no null because required); neighbors.filter matches search. No $ref to NodeFilter — that's the structural fix that should reduce Qwen's double-encoding tendency on the optional cases
5 _INSTRUCTIONS mentions stringified filters ✅ server.py:23 ("NodeFilter \filter` arguments may be passed as JSON-encoded strings."`)
6 Tests ✅ 5 tests, not the 2 promised — bonus coverage
7 README tool table updated filter: NodeFilter | str | None in all 3 rows + new bullet below the table

Tests added (tests/test_mcp_v2.py:308-353)

Test Asserts
test_search_filter_accepts_json_string search_v2(filter=dict) and search_v2(filter='{"...":"..."}') produce identical .results
test_search_filter_empty_string_treated_as_none filter="" and filter=" " both equivalent to filter=None (whitespace-only)
test_find_filter_accepts_json_string dict-vs-string parity for find_v2
test_neighbors_filter_accepts_json_string dict-vs-string parity for neighbors_v2
test_filter_invalid_json_returns_failure malformed {not jsonsuccess=False, "JSON" in message

This is the right test matrix. Both arms of the truth table for each tool, plus the malformed-input path, plus the empty-string edge case.

Notes that earned my trust

  • Schema flattening done right. The mcp_v2.search_v2 Python signature still has NodeFilter | dict | str | None (so internal callers retain type help), but the FastMCP-exposed Field type at server.py is just dict | str | None. That's the clean separation: rich types internally, structurally minimal types on the wire. The model-facing schema dropped from 3-branch with $ref to 3-branch with primitives only — and crucially, no nested $ref, which was the actual structural cue that triggered Qwen's stringification heuristic.
  • _coerce_filter raises ValueError, not ValidationError. That's the right choice: the existing tool bodies have except Exception as exc: return ...Output(success=False, message=str(exc)) for non-validation errors. Malformed JSON should be a "user error" returning success=False with a helpful message, not a transport-layer rejection. The test_filter_invalid_json_returns_failure test confirms this contract.
  • Whitespace-only string → None (mcp_v2.py:60-62). Models occasionally emit " " or "" when they want "no filter" but feel obligated to send something for an anyOf parameter. Treating that as None is the gentle correct behaviour.
  • json.loads strictness preserved. No eval, no Python-literal parsing, no fallback to ast.literal_eval — the PR explicitly calls this out as a non-goal. That's correct: lax parsers create their own footguns (e.g. silently accepting '{...}' with single quotes when the model is one tier lower than Qwen).
  • find_v2 empty-filter promotion (mcp_v2.py:399-401). When _coerce_filter returns None (whitespace string), find_v2 promotes to {} because its contract is "filter is required." NodeFilter() with all-None fields is the "match anything" sentinel. Subtle but correct — would be wrong to raise here since the user did supply a filter argument, just an empty one.

Observations (non-blocking, none are merge-blockers)

  1. _INSTRUCTIONS doesn't tell the model to prefer objects over strings (server.py:23). The current copy reads "NodeFilter filter arguments may be passed as JSON-encoded strings." — that's a permissive statement, but a weak model reading it may interpret it as a recommendation. Consider sharpening to: "NodeFilter filter is a JSON object (preferred); a JSON-encoded string is also accepted as a fallback." Tiny prompt-engineering nudge to keep Claude/GPT on the fast path while keeping the Qwen escape hatch open.

  2. Same nudge applies to the per-Field descriptions (server.py:296-300, 318-322, 362-366). They currently say "...; a JSON-encoded string is also accepted" — append "(prefer the object form)" or similar. Defer to a future copy pass.

  3. No test for filter as a JSON-encoded null (i.e. filter='"null"' or filter="null"). After json.loads("null")None, the _coerce_filter would hit the not isinstance(decoded, dict) branch and raise ValueError(filter must decode to a JSON object, got NoneType). That's defensible behaviour, but a 3-line test asserting it would lock the contract. (Alternative: special-case decoded is None to return None — also defensible. Either way, document it.)

  4. find_v2 empty-filter promotion is implicit (mcp_v2.py:399-401). The PR description doesn't mention this contract change ("find accepts whitespace string as {}"). Worth a one-line bullet in the README's new sentence — something like "For find, an empty/whitespace string is equivalent to filter={}." Otherwise users reading the README may not know the special case exists.

  5. _coerce_filter returns a union type that callers re-narrow (NodeFilter | dict | None). Each caller then runs if not isinstance(raw_filter, NodeFilter): NodeFilter.model_validate(raw_filter). Fine, but a slightly cleaner shape would be to have _coerce_filter return NodeFilter | None directly (i.e. always do the model_validate inside the helper). That eliminates the duplicate guard at each callsite. Refactor in a future cleanup pass — non-blocking.

  6. README filter: NodeFilter | str | None is accurate but the NodeFilter reference is documentation-shorthand — the actual wire type at the FastMCP boundary is dict | str | None. Most readers will understand, but a careful pedant could read NodeFilter as "you must construct a Pydantic model client-side". A footnote (NodeFilter = JSON object matching the NodeFilter schema) would be airtight. Trivial.

  7. Tests use monkeypatch.setattr("mcp_v2.run_search", ...) for search cases but rely on real kuzu_graph for find/neighbors. That's the right asymmetry (search needs LanceDB which isn't available in test env; find/neighbors only need Kuzu which is fixture-loaded). Worth noting because the malformed-JSON test (test_filter_invalid_json_returns_failure) could in principle target find or neighbors and exercise a non-mocked path — the monkeypatch isn't strictly needed there since the failure happens before any backend call. Not worth changing now.

Plan deltas needed

None — this PR has no linked plan. The PR-description "Definition of Done" items map cleanly to what shipped.

Bonus catches

  • The structural fix (dropping NodeFilter from the wire-level Field type) was implemented correctly even though it wasn't explicit in the PR description. This is the load-bearing change for actually fixing the Qwen behaviour — without it, the anyOf + $ref + null shape that triggers the stringification heuristic would still be present even with the runtime _coerce_filter workaround. Both layers needed; both shipped.
  • Test coverage went 2 → 5 (3 dict-vs-string parity + 1 empty-string + 1 malformed). The two extras (empty_string_treated_as_none and invalid_json_returns_failure) are exactly the right edge cases.
  • 5 new tests + 1 from PR Sync MCP v2 docs, neighbors edge_types validation, venv cursor rule #54 (test_neighbors_empty_edge_types_rejected) + 1 PR-V2-4 PTY failure still red = expected suite count: master 323 passed, 7 skipped, 1 failed → branch 328 passed, 7 skipped, 1 failed. PR description claims 330 passed, 7 skipped from author env — close to my expected math; the +2 difference is likely a LANCEDB_MCP_RUN_HEAVY=1 toggle on author's box that unskips 2 of the 7. Either way, math sanity-checks.

Ready to merge. After this lands, validate the original Qwen Code workflow that motivated the fix — re-run the search invocation that originally failed and confirm it now works whether Qwen sends filter as a string or as an object. If Qwen reverts to sending objects (because the schema simplification removed the structural cue that pushed it to strings), that's the strongest possible confirmation that #1 of my earlier recommendation (flatten the schema) is doing the heavy lifting and _coerce_filter is just the safety net.

Suggested follow-up backlog (still PR-E1 candidates, none blocking):

@HumanBean17 HumanBean17 merged commit bffefb9 into master May 7, 2026
@HumanBean17 HumanBean17 deleted the cursor/mcp-filter-json-string branch May 10, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant