feat: tolerate JSON-encoded NodeFilter in MCP search/find/neighbors#55
Conversation
Co-authored-by: Cursor <cursoragent@cursor.com>
Review: PR #55 — tolerate JSON-encoded NodeFilter in MCP search/find/neighborsVerdict: Approved ✅ Tightly scoped weak-model tolerance fix. Scope discipline
What landed (verified)
Tests added (
|
| Test | Asserts |
|---|---|
test_search_filter_accepts_json_string |
search_v2(filter=dict) and search_v2(filter='{"...":"..."}') produce identical .results |
test_search_filter_empty_string_treated_as_none |
filter="" and filter=" " both equivalent to filter=None (whitespace-only) |
test_find_filter_accepts_json_string |
dict-vs-string parity for find_v2 |
test_neighbors_filter_accepts_json_string |
dict-vs-string parity for neighbors_v2 |
test_filter_invalid_json_returns_failure |
malformed {not json → success=False, "JSON" in message |
This is the right test matrix. Both arms of the truth table for each tool, plus the malformed-input path, plus the empty-string edge case.
Notes that earned my trust
- Schema flattening done right. The
mcp_v2.search_v2Python signature still hasNodeFilter | dict | str | None(so internal callers retain type help), but the FastMCP-exposedFieldtype atserver.pyis justdict | str | None. That's the clean separation: rich types internally, structurally minimal types on the wire. The model-facing schema dropped from 3-branch with$refto 3-branch with primitives only — and crucially, no nested$ref, which was the actual structural cue that triggered Qwen's stringification heuristic. _coerce_filterraisesValueError, notValidationError. That's the right choice: the existing tool bodies haveexcept Exception as exc: return ...Output(success=False, message=str(exc))for non-validation errors. Malformed JSON should be a "user error" returningsuccess=Falsewith a helpful message, not a transport-layer rejection. Thetest_filter_invalid_json_returns_failuretest confirms this contract.- Whitespace-only string → None (mcp_v2.py:60-62). Models occasionally emit
" "or""when they want "no filter" but feel obligated to send something for ananyOfparameter. Treating that as None is the gentle correct behaviour. json.loadsstrictness preserved. Noeval, no Python-literal parsing, no fallback toast.literal_eval— the PR explicitly calls this out as a non-goal. That's correct: lax parsers create their own footguns (e.g. silently accepting'{...}'with single quotes when the model is one tier lower than Qwen).find_v2empty-filter promotion (mcp_v2.py:399-401). When_coerce_filterreturnsNone(whitespace string),find_v2promotes to{}because its contract is "filter is required."NodeFilter()with all-None fields is the "match anything" sentinel. Subtle but correct — would be wrong to raise here since the user did supply a filter argument, just an empty one.
Observations (non-blocking, none are merge-blockers)
-
_INSTRUCTIONSdoesn't tell the model to prefer objects over strings (server.py:23). The current copy reads"NodeFilter filter arguments may be passed as JSON-encoded strings."— that's a permissive statement, but a weak model reading it may interpret it as a recommendation. Consider sharpening to:"NodeFilter filter is a JSON object (preferred); a JSON-encoded string is also accepted as a fallback."Tiny prompt-engineering nudge to keep Claude/GPT on the fast path while keeping the Qwen escape hatch open. -
Same nudge applies to the per-Field descriptions (server.py:296-300, 318-322, 362-366). They currently say
"...; a JSON-encoded string is also accepted"— append"(prefer the object form)"or similar. Defer to a future copy pass. -
No test for
filteras a JSON-encodednull(i.e.filter='"null"'orfilter="null"). Afterjson.loads("null")→None, the_coerce_filterwould hit thenot isinstance(decoded, dict)branch and raiseValueError(filter must decode to a JSON object, got NoneType). That's defensible behaviour, but a 3-line test asserting it would lock the contract. (Alternative: special-casedecoded is Noneto returnNone— also defensible. Either way, document it.) -
find_v2empty-filter promotion is implicit (mcp_v2.py:399-401). The PR description doesn't mention this contract change ("findaccepts whitespace string as{}"). Worth a one-line bullet in the README's new sentence — something like"For find, an empty/whitespace string is equivalent to filter={}."Otherwise users reading the README may not know the special case exists. -
_coerce_filterreturns a union type that callers re-narrow (NodeFilter | dict | None). Each caller then runsif not isinstance(raw_filter, NodeFilter): NodeFilter.model_validate(raw_filter). Fine, but a slightly cleaner shape would be to have_coerce_filterreturnNodeFilter | Nonedirectly (i.e. always do themodel_validateinside the helper). That eliminates the duplicate guard at each callsite. Refactor in a future cleanup pass — non-blocking. -
README
filter: NodeFilter | str | Noneis accurate but theNodeFilterreference is documentation-shorthand — the actual wire type at the FastMCP boundary isdict | str | None. Most readers will understand, but a careful pedant could readNodeFilteras "you must construct a Pydantic model client-side". A footnote(NodeFilter = JSON object matching the NodeFilter schema)would be airtight. Trivial. -
Tests use
monkeypatch.setattr("mcp_v2.run_search", ...)forsearchcases but rely on realkuzu_graphforfind/neighbors. That's the right asymmetry (search needs LanceDB which isn't available in test env; find/neighbors only need Kuzu which is fixture-loaded). Worth noting because the malformed-JSON test (test_filter_invalid_json_returns_failure) could in principle targetfindorneighborsand exercise a non-mocked path — the monkeypatch isn't strictly needed there since the failure happens before any backend call. Not worth changing now.
Plan deltas needed
None — this PR has no linked plan. The PR-description "Definition of Done" items map cleanly to what shipped.
Bonus catches
- The structural fix (dropping
NodeFilterfrom the wire-levelFieldtype) was implemented correctly even though it wasn't explicit in the PR description. This is the load-bearing change for actually fixing the Qwen behaviour — without it, theanyOf + $ref + nullshape that triggers the stringification heuristic would still be present even with the runtime_coerce_filterworkaround. Both layers needed; both shipped. - Test coverage went 2 → 5 (3 dict-vs-string parity + 1 empty-string + 1 malformed). The two extras (
empty_string_treated_as_noneandinvalid_json_returns_failure) are exactly the right edge cases. - 5 new tests + 1 from PR Sync MCP v2 docs, neighbors edge_types validation, venv cursor rule #54 (
test_neighbors_empty_edge_types_rejected) + 1 PR-V2-4 PTY failure still red = expected suite count: master323 passed, 7 skipped, 1 failed→ branch328 passed, 7 skipped, 1 failed. PR description claims330 passed, 7 skippedfrom author env — close to my expected math; the +2 difference is likely aLANCEDB_MCP_RUN_HEAVY=1toggle on author's box that unskips 2 of the 7. Either way, math sanity-checks.
Ready to merge. After this lands, validate the original Qwen Code workflow that motivated the fix — re-run the search invocation that originally failed and confirm it now works whether Qwen sends filter as a string or as an object. If Qwen reverts to sending objects (because the schema simplification removed the structural cue that pushed it to strings), that's the strongest possible confirmation that #1 of my earlier recommendation (flatten the schema) is doing the heavy lifting and _coerce_filter is just the safety net.
Suggested follow-up backlog (still PR-E1 candidates, none blocking):
- Sharpen
_INSTRUCTIONSandField(description=...)to prefer objects (Notes AST by Opus #1, Tier 1 completion plan + proposals (B2a + B4 + B5) #2) - Document
findwhitespace-string →{}semantics in README (Notes Add per-PR Cursor task prompts for Tier 1 completion #4) - One test for
filter="null"(Notes Add Cursor rules and agent settings for CLI agents #3) - PTY EIO handling (carried from PR-V2-4)
LANCEDB_MCP_ALLOW_REFRESHhint vs CLI behaviour mismatch (carried from PR Sync MCP v2 docs, neighbors edge_types validation, venv cursor rule #54)Literal[...]types ondirection/kind/table(carried from PR Sync MCP v2 docs, neighbors edge_types validation, venv cursor rule #54)
Co-authored-by: Cursor <cursoragent@cursor.com>
Scope
Allows weak LLM / MCP clients to pass
filteras a JSON-encoded string (serializedNodeFilter) without failing FastMCP JSON-RPC validation (params/filter must be object). Standalone robustness change (no linked plan file).What Changed
_coerce_filter()inmcp_v2.py— strictjson.loads, decoded value must be a JSON object; whitespace-only string treated as no filter.search_v2/find_v2/neighbors_v2sofiltermay bestr; after coercion,find_v2maps empty filter to{}.filterparameter types onsearch/find/neighborsinserver.py, updated Field descriptions, noted string filters in_INSTRUCTIONS.tests/test_mcp_v2.py; updated MCP tool table and note inREADME.md.Semantics / Non-Goals
path_contains,edge_types,ids, etc.).Validation
Lint
ruff check .— clean on branch before push; not re-run during PR open.Tests
pytest tests— author confirms green; not re-run during PR open (last full run on branch: 330 passed, 7 skipped).Sentinel checks
git diff master...HEAD --name-only→ README.md, mcp_v2.py, server.py, tests/test_mcp_v2.py only.Manual evidence
filteras a string (e.g.{"exclude_roles":["CONTROLLER"]}) should pass transport validation; decoding andNodeFiltervalidation are covered by unit tests.Out of Scope Confirmed
Did not implement:
java_ontology.py, or fixture-only special cases in production code.user-rag) feature work beyond README tool docs.Definition of Done
masterwith agreed title and branch naming.