propose: hints field as machine-readable road signs on MCP V2 outputs#120
Conversation
Pausing pending #117 and #118Dmitriy spotted that the hint catalog references Grilling that single observation surfaced that the hint catalog has rows whose text or condition depends on outcomes still open in #117 and #118, not just one bad row:
Two additional risks the current draft doesn't evaluate:
DecisionPause this PR pending #117 and #118 lock. Sequencing:
Estimated reshape cost post-lock: ~4 catalog rows + priority order + one section. Doing it now means re-doing it twice. Not closing the PR — branch Cross-refs
|
|
Re-grilled in the post-#117 / post- What aged well
Items to address(1) The "resolves #118" claim is no longer accurateOpen-links bullet says "locking hints here mostly resolves #118". After re-grilling #118 in the post- Fix: Replace the bullet with a precise cross-link: strings cover the documentation-grade consumer model from #118 option B; the mechanical/typed model is a separate decision that may follow once we see evidence of agents doing programmatic next-call construction. Pair with item (8) below. (2) The §3.3 catalog has a stale row that bypasses
|
a30984b to
3ea64e3
Compare
|
All 8 items applied + 3 drifts caught during the consistency pass. Force-pushed as 8 items — what changed
3 drifts caught during consistency pass
State after revision
Ready for another pass. |
HumanBean17
left a comment
There was a problem hiding this comment.
Review (propose/HINTS-ROAD-SIGNS-PROPOSE.md)
Overall the direction is strong: the §1 “road sign” frame, trigger-vs-emission (§2.9), output-level hints, caps, static templates, and find → resolve alignment fit the strict frame and existing MCP V2 docs. A few catalog/contract gaps should be fixed in the propose (or explicitly scoped as ontology/API work) so implementation does not ship invalid neighbors calls.
Blocking / high priority
1. OVERRIDES in hint emissions vs current EdgeType
Templates that say neighbors(..., edge_types=['OVERRIDES']) conflict with mcp_v2.py today: OVERRIDES is documented alongside rollup keys as not a valid neighbors(edge_types=…) literal, and EdgeType does not list OVERRIDES. The “overriders” / OVERRIDDEN_BY.* rows need either (a) a multi-hop emission using only real relationship types that match how the rollup is defined, or (b) an explicit proposal to extend EdgeType + neighbors for OVERRIDES (ontology bump / implementation scope), not a hints-only add.
2. find “page full” and observable state
FindOutput currently has no echoed limit, so len(results) >= limit compares against request parameters that are not part of pure “payload-only” hint generation unless you extend the contract. Consider: thread (output, request_kwargs) into hint generation, or echo limit/offset on FindOutput, and align §3.2 wording with that.
3. Search score < 0.5
Hybrid/RRF scores may not be calibrated so 0.5 means “low confidence.” Either tie the threshold to documented ranking behavior, make it configurable, or use a structural signal; otherwise the hint risks being arbitrary across corpora/settings.
Medium / clarity4. Kind gates for DECLARES.* rollups
The table already separates type vs method describe; worth requiring tests (or explicit pseudo-code) so method rows never fire type-only rollup triggers incorrectly.
5. §2.8 vs cap
“No coalescing” may duplicate the same next-call shape and burn the 5-hint budget; one sentence on whether duplicate normalized call-shapes dedupe under the cap would help implementers.
6. “Backwards-compatible”
Minor: repo policy allows breaking changes; “additive for clients that ignore new fields” may be enough for a design doc.
Answers to your review goals
-
§1 frame — Good anchor; does not block a future structured next-step surface.
-
§3.3 completeness — Fine for v1 once override-related emissions are fixed or scoped as API/ontology work. Optional later:
HTTP_CALLS/ASYNC_CALLS,IMPLEMENTS/EXTENDSfor types. -
§7.12 priority — Plausible; priority only matters once every emitted hint uses valid tool/edge literals.
-
§5 out-of-scope — Prefetch and structured records can stay out; request context for pagination hints is the main tweak worth allowing in-scope without expanding to Shape 2/3.
|
Real catches on 1 and 2 — both expose contract gaps the consistency pass missed because it only checked propose-internal cross-references, not propose ↔ Blocking1. Confirmed: This violates §2.9 (the principle this revision just added: triggers vs emissions, atomic EdgeType only in emissions). Two clean options:
Recommendation: (b). The Will add a §6 dependency note + a new locked decision: "v1 hints depend on 2. §2 principle says hints are pure-payload-server-side; §3.3 row says
Recommendation: (a). The contract should make “you got a full page” a property of the output, not a synthesis between request and output. It also makes the same hint computable by a client agent that wants to verify, which is in spirit with the documentation-grade-not-programmatic-dispatch lock. Will add: 3. Search Fair point that 0.5 is not a calibrated threshold across RRF / hybrid / pure-cosine modes. Three options:
Recommendation: drop the threshold from v1 entirely and replace with a structural signal: "all returned hits have score within 10% of each other AND Will lock: "Low-confidence search hint is structural (score spread + page state), not threshold-on-score. v1 emits no hint when uncertain rather than emitting a miscalibrated one." Medium4. Kind gates for The §3.3 table already says "describe (type Symbol)" vs "describe (method Symbol)", but the propose doesn't pin this to a test. Will add to §6: "Kind gate: method-Symbol describe with type-only rollup keys present → no hint emitted (impossible state, but test as a regression bumper)." 5. §2.8 vs cap — agree, dedupe normalized call-shape. This is a real semantic gap. Two rows can trigger the same emission shape (e.g., a class with Will lock as a new decision. 6. "Backwards-compatible" wording — disagree (minor). Repo policy does allow breaking changes, but the hints field on output types isn't a breaking change in any meaningful sense — adding an optional field to a Pydantic output that defaults to If you want the wording trimmed for tone, happy to soften to "additive: existing callers see no behavioral change." But I don't want to delete the compatibility note entirely. Review goals
What I'll change in the next push
That's 5 changes + 1 keep. Confirm |
3ea64e3 to
1e55cd5
Compare
|
Applied all 5 changes + the keep. Force-pushed as What changed1.
2.
3. Structural low-confidence search signal.
4. Kind-gate test scenario.
5. Dedupe-by-rendered-string, §2.8 rewritten.
6 (keep). "Additive for clients that ignore new fields" wording preserved.
Consistency pass (post-revision)
Ready for another pass when you have time. |
HumanBean17
left a comment
There was a problem hiding this comment.
Second review (updated propose + replies)
The revision fixes most of the first-pass contract gaps: the PR-A/PR-B split, FindOutput pagination echo + pure-payload hint generation, dedupe-then-cap (§2.8 / §7.20), relative score spread for search low-confidence (§7.19), and the §6 test matrix (kind-gate, cap, dedupe, char-cap) all read implementable.
A few doc issues remain—worth fixing before coding so implementers are not forced to guess.
1. Search hint vs “pagination echo only on find” (contradiction)
§5 explicitly defers pagination echo for search (and neighbors) to future proposes; §7.19, §3.3 / Appendix A, §6 search scenario, and UC10 all use len(results) == limit on SearchOutput. Today SearchOutput does not echo limit, and the propose does not add it.
Pick one arm and align §5 / §7.19 / catalog / tests: e.g. echo limit/offset on SearchOutput in PR-B (and narrow the §5 carve-out), or drop limit from the trigger (e.g. score-band + minimum len(results)), or (less ideal given §5) allow request-context for this single trigger.
2. OVERRIDDEN_BY triggers use the wrong edge_summary half
In current rollups (override_axis_rollup_for), composed override keys are emitted as {"in": 0, "out": n}. §3.3 rows that fire on edge_summary["OVERRIDDEN_BY"].in > 0 and ...OVERRIDDEN_BY.DECLARES_CLIENT...in > 0 would never fire. These should use .out > 0 (and UC2 / Appendix A should match).
3. PR-A scope: materialize edges, not only EdgeType
OVERRIDES / override-axis behavior is virtual Cypher today; there is no stored [:OVERRIDES] rel to traverse. PR-A should spell out graph builder + schema work (materialize edges, or an explicitly scoped neighbors special-case—different design). The §6 equivalence test (neighbors vs rollup prediction) is the right acceptance criterion once that storage story is stated.
4. §5 “no cross-tool hints” vs §7.16
The §5 table claims no v1 cross-tool hints, but find → resolve is cross-tool and is locked in §7.16. Narrow that §5 row to “no additional cross-tool templates beyond the locked find empty → resolve row.”
5. Minor: FindOutput errors and the route/client template
- If
limit/offsetare required fields with no default, definesuccess=Falsebehavior (defaults or optionals) so builders stay simple. - The route/client row uses
'EXPOSES' or 'DECLARES_CLIENT'as human-readable text; add a line that the real catalog entries are concrete emissions after substituting{inverse_edge}(two templates or one parameterized string).
Overall: ship the direction; resolve search limit vs §5, fix .in → .out, and expand PR-A to the full graph story so PR-B hints are not blocked by missing rels.
1e55cd5 to
b69dbd8
Compare
|
All 5 caught — agree on every one. Verified against source:
Force-pushed as What changed1. Search hint vs §5 carve-out — echo
2.
3. PR-A scope expanded to full graph work — schema + builder + traversal.
4. §5 row "no cross-tool hints" reconciled with §7.16.
5. Minor — error-path
Drift caught during consistency pass
Consistency pass post-revision
Ready for another pass. |
HumanBean17
left a comment
There was a problem hiding this comment.
Third review (post third revision)
Read the full updated propose/HINTS-ROAD-SIGNS-PROPOSE.md on propose/hints-road-signs plus the thread reply. The third pass closes the earlier contradictions; the doc is largely implementation-ready.
Strengths in this revision
- Search vs §5:
SearchOutput.limit/offsetecho, §7.18 / §7.19 alignment, and thelimit is None→ no hint scenario remove the internal inconsistency. - Override-axis triggers:
.out > 0matchesoverride_axis_rollup_for({"in": 0, "out": n}). - PR-A: Stored
OVERRIDES, schema, traversal, ontology bump, and explicit rejection of virtualneighborswithout a new propose. - Route vs client: Two concrete rows + UC6a/UC6b fix the valid single-shot call-shape issue.
- Cross-tool §5 row: Narrowed to “no additional cross-tool templates” — consistent with §7.16.
Remaining gaps / nits
1. PR-A builder: override_axis_rollup_for has two query shapes — §6 mostly narrates one
override_axis_rollup_for runs down (supertype method → implementing methods → OVERRIDDEN_BY) and up (concrete method → parent declaration methods → OVERRIDES). §6 describes (mover)-[:OVERRIDES]->(m) for m on t, which matches the down pattern when m is the supertype method.
The up query adds parent decl_m ids for a concrete m. For stored edges to satisfy the §6 equivalence story (“rollup vs neighbors in both directions”), materialization must cover both virtual patterns — or one explicit unified rule (e.g. whenever method A overrides method B in the rollup sense, emit (A)-[:OVERRIDES]->(B) so neighbors(B,'in',['OVERRIDES']) and neighbors(A,'out',['OVERRIDES']) line up with the right id sets).
As written, a reader might implement only the interface/t + impl loop and miss the concrete-m + parent-decl_m half.
2. Builder module placeholder
§6 cites graph_enrich.py (or wherever). In this repo shape, rel creation usually lives under build_ast_graph.py (with graph_enrich.py for enrichment). Pointing PR-A implementers at the actual graph write pass avoids starting in the wrong module.
3. §5 “Request-context plumbing” row
It still says “Once FindOutput echoes …” while §7.18 now covers both FindOutput and SearchOutput. One sentence update avoids confusion.
4. Appendix B consistency
An older changelog bullet still says the UC count is 14 while the doc now uses 15 UCs (UC6 split). Worth de-staling that historical line.
5. Optional: post-filtered search and “full page”
search_v2 can return fewer results than the requested limit after post-processing filters. len(results) == limit is fine if defined as “full page of returned hits after filters.” A single clarifying sentence in §7.19 or §6 removes ambiguity for implementers.
Verdict
Good to land as the design authority for PR-A/PR-B. The substantive follow-up is making PR-A’s materialization story explicitly cover both halves of override_axis_rollup_for (or the unified directed-edge rule), so equivalence tests and future hints do not half-implement the virtual rollup.
b69dbd8 to
84fc2a0
Compare
|
Items 1, 2, 3, 5 applied. Pushing back on item 4 (it's a historical changelog entry — see below). Force-pushed as Applied1. PR-A builder — unified directed-edge rule covering both rollup arms. Real catch. The old §6 wording ( Rewrote §6 PR-A builder section as a single rule:
Added builder pseudo-code: walk every method Equivalence test scenario rewritten to explicitly exercise both halves: (down) 2. Builder module pinned to Verified by grep — §6 schema and builder bullets both now pin to 3. §5 "Request-context plumbing" row updated to mention both Stale text from before §7.18 expanded to cover both outputs. One-sentence fix. 5. §7.19 page-full clarification for Implementer-relevant clarification. Pushed back on4. Appendix B §B5 "UC count is now 14" — keeping the historical bullet. The Appendix B changelog records what was true at each revision. §B5 was added in the first re-grilling pass when UC15 was moved to a test scenario, dropping the count from 15 to 14. UC6 splitting into UC6a + UC6b (which brought the count back up to 15) happened in the third pass and is recorded in §B23. Editing §B5 to retroactively read "UC count is 15" would muddy traceability — a reader following the timeline would see "UC15 dropped, count is now 14" → §B23 "UC6 split, count is 15", which is the actual history. Overwriting §B5 to claim "count is 15" makes §B23's contribution invisible. The doc's current state (the parts a reviewer evaluates the design from — TL;DR, §3.3, §4, §6, §7) all consistently say 15. Only the layered changelog records the in-between counts, which is what it's for. If you'd prefer a different traceability convention here (e.g., changelog entries always reflect final state and we lose the pass-by-pass diff), happy to switch — but I'd want that as a separate decision rather than retconning this one bullet. Consistency pass
Ready for landing if you're satisfied with the push-back on 4, or happy to revisit if you'd rather have a flat (current-state-only) changelog convention going forward. |
Draft propose for review. Tight scope: add
hints: list[str]to all four MCP V2 output models, with a strict catalog of road-sign-shaped templates generated server-side from observable output state.Frame (§1): a hint is a road sign attached to a tool output — it tells the agent the next reachable call, not what that call means or why. ≤120 chars per hint, ≤5 hints per output, no prose.
Sibling to issues #117 (filter contract) and #118 (rollup decomposition). Hints can ship under any frame decision in those issues; they reduce the cost of getting the frames wrong by making the next call machine-readable instead of inference-required.
Key locked decisions:
list[str], not structured records (Shape 2 deferred to a future propose)EdgeTypeliterals only; never dot-keys (aligned with PR propose: synthetic(via members)rollup keys indescribe.edge_summary(clients + routes) #89 decision plan: Tier 1B (B2b + B6) plan + per-PR Cursor prompts #11)Migration: 1 PR. Additive, backwards-compatible.
Review goals:
DECLARES.*>OVERRIDDEN_BY.*> leaf > meta)