propose: hints field as machine-readable road signs on MCP V2 outputs by HumanBean17 · Pull Request #120 · HumanBean17/java-codebase-rag

HumanBean17 · 2026-05-14T11:06:05Z

Draft propose for review. Tight scope: add hints: list[str] to all four MCP V2 output models, with a strict catalog of road-sign-shaped templates generated server-side from observable output state.

Frame (§1): a hint is a road sign attached to a tool output — it tells the agent the next reachable call, not what that call means or why. ≤120 chars per hint, ≤5 hints per output, no prose.

Sibling to issues #117 (filter contract) and #118 (rollup decomposition). Hints can ship under any frame decision in those issues; they reduce the cost of getting the frames wrong by making the next call machine-readable instead of inference-required.

Key locked decisions:

Field is list[str], not structured records (Shape 2 deferred to a future propose)
Generation is pure, server-side, no graph access (Shape 3 prefetch explicitly out of scope)
Templates are static; never LLM-generated
Hints reference real EdgeType literals only; never dot-keys (aligned with PR propose: synthetic (via members) rollup keys in describe.edge_summary (clients + routes) #89 decision plan: Tier 1B (B2b + B6) plan + per-PR Cursor prompts #11)
Hard caps enforced by unit test, not convention

Migration: 1 PR. Additive, backwards-compatible.

Review goals:

Is the §1 frame the right anchor, or does "road sign" rule out something we actually want?
Is the §3.3 catalog complete enough for v1, or are there obvious templates missing?
Is the §7 decision feat: B2b HTTP_CALLS + ASYNC_CALLS extractor (PR-D1) #12 priority order right? (DECLARES.* > OVERRIDDEN_BY.* > leaf > meta)
Anything in §5 ("deliberately does NOT do") that should be in-scope after all?

HumanBean17 · 2026-05-14T16:22:58Z

Pausing pending #117 and #118

Dmitriy spotted that the hint catalog references OVERRIDDEN_BY and OVERRIDES as if they were navigable edges. They are not — both are virtual rollup keys synthesized by override_axis_rollup_for() in kuzu_queries.py:643–696, and the EdgeType Literal in mcp_v2.py:21–31 does not accept them. The propose's own pre-existing docstring at mcp_v2.py:135–137 already warns about this exact pitfall.

Grilling that single observation surfaced that the hint catalog has rows whose text or condition depends on outcomes still open in #117 and #118, not just one bad row:

Hint family	Depends on #117?	Depends on #118?	Issue
`OVERRIDDEN_BY*` family (catalog rows at L90–91, L222–226; UC2 at L117; priority order at L170; UC14/UC15)	no	yes (hard)	No reachable next call exists; hint can't honor the road-sign frame until #118 decides whether rollups become navigable.
`DECLARES.DECLARES_CLIENT` / `DECLARES.EXPOSES` rollup hints	no	yes (soft)	Text prescribes "two neighbors calls". If #118 lands as a dot-notation primitive or a decomposition tool, the canonical hint text changes.
`search` → low-confidence describe-fallback	yes (soft)	no	If #117 commits to a `resolve` tool, the hint points at `resolve(query)`, not `describe`.
`find` page-full → "page=2"	no	no	Stable.
`neighbors` empty result	no	no	Stable.
Basic `describe` hints (type Symbol, route, client)	no	no	Stable.

Two additional risks the current draft doesn't evaluate:

The priority cap rule (decision feat: B2b HTTP_CALLS + ASYNC_CALLS extractor (PR-D1) #12, L170) ranks OVERRIDDEN_BY.* rollups second. If those hints get deferred or restructured, the priority order needs revision.
The generation-discipline rules (≤120 chars, ≤5 per output, static templates, never LLM, no dot-keys, pure generation) are the frame of the propose — independent of find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117/rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118. Those survive any catalog reshape and are the durable contribution of this doc.

Decision

Pause this PR pending #117 and #118 lock. Sequencing:

Lock find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 (filter-contract / strict frame, plus resolve tool design if frame commits to it).
Lock rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118 (rollup decomposition — anchored to find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117's frame).
Reshape this propose: keep the frame and discipline rules, revise the catalog rows whose text or condition is now determined, add an explicit "deferred families" section if any remain.

Estimated reshape cost post-lock: ~4 catalog rows + priority order + one section. Doing it now means re-doing it twice.

Not closing the PR — branch propose/hints-road-signs at a30984b stays as-is so the frame work is preserved.

Cross-refs

find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 — filter-contract frame (next: draft propose-doc).
rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118 — rollup decomposition (depends on find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117).
The OVERRIDDEN_BY bug catch and the two-layer (frame vs catalog) finding are captured here rather than fixed-up so the reasoning survives the reshape.

HumanBean17 · 2026-05-15T20:14:39Z

Re-grilled in the post-#117 / post-resolve world. Frame, cap discipline, list[str] shape, static-template decision, priority order, and no-LLM-in-hints rule all hold up cleanly. Eight items to address before locking.

What aged well

§1 frame ("road sign, not tutorial") survives the strict-frame world cleanly \u2014 it's the same loud-machine-readable-contract principle applied to output instead of input.
§7.2 (list[str] not list[dict]) \u2014 right call for an LLM consumer; structured records can come later if evidence warrants.
§7.5 / §7.6 hard caps enforced by unit tests \u2014 same spine that made find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 / lossless-permissive frame: tactical no-regret fix for #117 silent-drop bug class #122 land cleanly.
The dot-key-in-trigger-but-not-in-hint pattern in §3.3 \u2014 elegant, post-propose: synthetic (via members) rollup keys in describe.edge_summary (clients + routes) #89 strict-frame discipline expressed perfectly. (See item 5 below: it should be promoted to a §2 principle.)

Items to address

(1) The "resolves #118" claim is no longer accurate

Open-links bullet says "locking hints here mostly resolves #118". After re-grilling #118 in the post-resolve world, the conclusion was the opposite: strings cover the documentation-grade consumer model from #118 option B, but the mechanical/typed model (rollup_paths) is a separate decision. The propose currently dodges the consumer-model question with "lossy by design" and "advisory" framing.

Fix: Replace the bullet with a precise cross-link: strings cover the documentation-grade consumer model from #118 option B; the mechanical/typed model is a separate decision that may follow once we see evidence of agents doing programmatic next-call construction. Pair with item (8) below.

(2) The §3.3 catalog has a stale row that bypasses `resolve`

Row: find(kind=client, filter={fqn_prefix:...}) empty \u2192 "try search(...) for fuzzy fallback". This is the pre-resolve fallback wording that PR-RESOLVE-2 removed from all four tool descriptions per decision \u00a77.9 of the resolve propose. Letting it live on as a hint re-introduces what we just removed.

Fix: Replace with a hint pointing at resolve(identifier, hint_kind="client"). Update §3.3 row, UC7, and hints_for_find in Appendix A consistently. This also resolves item (8) below \u2014 it makes the propose ship one cross-tool hint at v1 (the resolve redirect) instead of claiming "no cross-tool hints" while sneaking one in.

(3) Row-4 hint contradicts §7.8 (paraphrased emission, not concrete call)

§7.8 locks: hints reference real EdgeType literals only, never dot-keys, never paraphrases. But row 4 emits "clients in overriders: walk OVERRIDDEN_BY then DECLARES_CLIENT (two neighbors calls)". The neighbors() argument is paraphrased away \u2014 the hint doesn't tell the agent what to call. Compare row 1, which emits a concrete two-call template.

Fix: Reshape row 4 to match row 1's shape, e.g. "clients in overriders: neighbors(['{rid}'],'in',['OVERRIDES']) then neighbors(overrider_ids,'out',['DECLARES_CLIENT'])". If that exceeds the 120-char cap, drop the row from v1 and let it return in a future amendment with a cleaner template.

(4) §7.14 carve-out is solving a problem #117 already solved

§7.14: "No hints for find when no filter was passed. find() with no filter is a sentinel call." In the strict-frame world #117 locked, find() without a filter is a contract error that fails loud. The carve-out is treating it as a soft edge case but it's a hard error now.

Fix: Drop §7.14. Renumber subsequent decisions.

(5) The "triggers vs emissions" principle is implicit but not stated

§3.3 quietly does the right thing: hints are triggered by dot-keys (read-only signal) but emit atomic EdgeType calls (consumable contract). That's the post-#89 strict-frame principle in action. But it's not a stated principle anywhere, so future template authors won't know to follow it.

Fix: Add §2.9 (or equivalent) principle: triggers are signals \u2014 may reference dot-keys, rollup state, score thresholds, etc.; emissions are calls \u2014 atomic EdgeType literals and concrete arguments only. The two never share vocabulary.

(6) UC15 is hypothetical and weakens the re-walk

UC15: "Agent describes a class with 8 rollup signals (hypothetical max)". The propose-doc-author skill explicitly calls out the UC re-walk as the validation move \u2014 every row should be realistic. UC15 is testing the cap, not the design.

Fix: Move the cap test into §6 named test scenarios. Either replace UC15 with a realistic case, or drop to 14 UCs and say so. (UC count needs to match between the row count and any "N realistic cases" prose.)

(7) Appendix A has plan-level detail

The skill says: keep Appendix A to the one thing the implementer copies verbatim; if there's no such thing, omit. The mcp_hints.py skeleton with hints_for_describe / hints_for_find / etc. function signatures crosses the propose\u2194plan boundary. The hint templates are the artifact; the function decomposition is plan work.

Fix: Trim Appendix A to the template strings (the §3.3 table is already that). Drop the function-level skeleton; the file-existence claim (mcp_hints.py) can live in §7.15 unchanged, but the function bodies belong in plans/PLAN-HINTS.md.

(8) Lock the consumer model as a decision

The propose says "lossy by design, advisory" in §2.4 / §7.10 but doesn't lock what the agent does with hints. After (1), the consumer model needs to be explicit so future readers don't relitigate it.

Fix: Add a §7 decision: Hints are documentation-grade. The agent reads them as part of its prompt context; the surface does not commit to programmatic dispatch in v1. If a future workflow needs mechanical consumption, that's a separate propose for a typed surface (e.g., the rollup_paths shape sketched in #118).

What doesn't need to change

§1 frame, §2 principles 1\u20138, §3.1 field shape, §3.2 generation contract.
§7 decisions 1\u20137 and 9\u201313, 15 (only 8 and 14 named above).
§5 "deliberately does NOT do" \u2014 holds up; the cross-tool hint row becomes consistent after item (2) lands.
§8 risks table.

Suggested order

Apply items 1, 2, 4, 8 first (decision-level shifts), then 3, 5, 6, 7 (text edits). Force-push the revision per the iterate-and-amend pattern; rerun the consistency pass (decision count, UC count, cardinal-number alignment) before requesting re-review.

HumanBean17 · 2026-05-15T20:20:43Z

All 8 items applied + 3 drifts caught during the consistency pass. Force-pushed as 3ea64e3.

8 items — what changed

Dropped "resolves rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118" claim. Now cross-link only: "rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118 is a related, paused issue; this propose does not subsume it. Consumer-model question for rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118 stays open after this lands."
Replaced stale find→search fallback in §3.2 and in every UC row that referenced it. Hints now redirect to resolve(identifier, hint_kind=…) per the find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 strict frame. No search() fallback survives anywhere in the doc.
Fixed row-4 paraphrased chain. "Walk OVERRIDDEN_BY then DECLARES_CLIENT" → concrete two-call: neighbors(node, edge=OVERRIDDEN_BY, direction=out) followed by neighbors(<impl>, edge=DECLARES_CLIENT, direction=out). Caught one more during consistency pass (UC2 row had the same paraphrase shape) — fixed.
Dropped §7.14 carve-out. "No hints when find has no filter" is dead weight: find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 strict frame already loud-fails this case. Removed the section; renumbered downstream.
Added §2.9 principle: "Triggers are signals; emissions are calls; vocabularies never share." Dot-keys may trigger a hint but never appear in emissions. Atomic EdgeType only on the emission side.
Moved UC15 → §6 test scenario. Hypothetical cap test doesn't belong in the use-case re-walk; it's a unit-test scenario. UC count is now 14, not 15.
Trimmed Appendix A. Was a mcp_hints.py skeleton with function bodies — that's plan-level detail. Now a template catalog only: trigger → emission rows, no Python.
Locked consumer model: hints are documentation-grade, not programmatic-dispatch. Added explicit decision; UC table column renamed from "agent action" to "agent reads".

3 drifts caught during consistency pass

UC count claim line still said "15 representative use cases" after UC15 moved to §6. Updated to 14.
UC2 row still had paraphrased "walk DECLARES_CLIENT then CALLS" left over from an earlier draft — replaced with the concrete two-call form, matching the fix in item 3.
§7.6 rendered-string cap had an honest tension between cap-on-template and cap-on-rendered-string. Locked: cap is on the rendered string with placeholders substituted, enforced by a unit test using realistic placeholder values. Templates that can't render within 120 chars are dropped from v1.

State after revision

9 design principles (was 8; added §2.9)
16 locked decisions (was 14; added consumer-model lock + cap-on-rendered)
14 UCs (was 15; UC15 → test scenario)
No search() fallback anywhere
No paraphrased chain language anywhere
No §7.14 carve-out
No "resolves rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118" claim
Appendix A is a template catalog, not a code skeleton

Ready for another pass.

HumanBean17

Review (`propose/HINTS-ROAD-SIGNS-PROPOSE.md`)

Overall the direction is strong: the §1 “road sign” frame, trigger-vs-emission (§2.9), output-level hints, caps, static templates, and find → resolve alignment fit the strict frame and existing MCP V2 docs. A few catalog/contract gaps should be fixed in the propose (or explicitly scoped as ontology/API work) so implementation does not ship invalid neighbors calls.

Blocking / high priority

1. OVERRIDES in hint emissions vs current EdgeType

Templates that say neighbors(..., edge_types=['OVERRIDES']) conflict with mcp_v2.py today: OVERRIDES is documented alongside rollup keys as not a valid neighbors(edge_types=…) literal, and EdgeType does not list OVERRIDES. The “overriders” / OVERRIDDEN_BY.* rows need either (a) a multi-hop emission using only real relationship types that match how the rollup is defined, or (b) an explicit proposal to extend EdgeType + neighbors for OVERRIDES (ontology bump / implementation scope), not a hints-only add.

2. find “page full” and observable state

FindOutput currently has no echoed limit, so len(results) >= limit compares against request parameters that are not part of pure “payload-only” hint generation unless you extend the contract. Consider: thread (output, request_kwargs) into hint generation, or echo limit/offset on FindOutput, and align §3.2 wording with that.

3. Search score < 0.5

Hybrid/RRF scores may not be calibrated so 0.5 means “low confidence.” Either tie the threshold to documented ranking behavior, make it configurable, or use a structural signal; otherwise the hint risks being arbitrary across corpora/settings.

Medium / clarity4. Kind gates for `DECLARES.*` rollups

The table already separates type vs method describe; worth requiring tests (or explicit pseudo-code) so method rows never fire type-only rollup triggers incorrectly.

5. §2.8 vs cap

“No coalescing” may duplicate the same next-call shape and burn the 5-hint budget; one sentence on whether duplicate normalized call-shapes dedupe under the cap would help implementers.

6. “Backwards-compatible”

Minor: repo policy allows breaking changes; “additive for clients that ignore new fields” may be enough for a design doc.

Answers to your review goals

§1 frame — Good anchor; does not block a future structured next-step surface.
§3.3 completeness — Fine for v1 once override-related emissions are fixed or scoped as API/ontology work. Optional later: HTTP_CALLS / ASYNC_CALLS, IMPLEMENTS/EXTENDS for types.
§7.12 priority — Plausible; priority only matters once every emitted hint uses valid tool/edge literals.
§5 out-of-scope — Prefetch and structured records can stay out; request context for pagination hints is the main tweak worth allowing in-scope without expanding to Shape 2/3.

HumanBean17 · 2026-05-15T20:26:33Z

Real catches on 1 and 2 — both expose contract gaps the consistency pass missed because it only checked propose-internal cross-references, not propose ↔ mcp_v2.py reality. Going through one by one.

Blocking

1. OVERRIDES in hint emissions vs current EdgeType — agree.

Confirmed: EdgeType literal in mcp_v2.py:21 is EXTENDS, IMPLEMENTS, INJECTS, DECLARES, DECLARES_CLIENT, CALLS, EXPOSES, HTTP_CALLS, ASYNC_CALLS. OVERRIDES and OVERRIDDEN_BY are documented as virtual rollup keys on edge_summary, explicitly not valid for neighbors(edge_types=…). The propose currently emits neighbors(...,'in',['OVERRIDES']) in §3.3 rows 1–2, UC2, UC14, and Appendix A — every one of those would be a strict-frame failure at call time.

This violates §2.9 (the principle this revision just added: triggers vs emissions, atomic EdgeType only in emissions). OVERRIDES is no more atomic-EdgeType than DECLARES.DECLARES_CLIENT is.

Two clean options:

(a) Scope the override emissions out of v1. Drop the four override-axis template rows; keep the triggers as “signal only, no emission” notes, deferred to whenever the ontology bump lands. Loses ~3 UC value but keeps v1 honest.
(b) Pre-require an OVERRIDES ontology bump as a dependency. Land a tiny prior PR adding "OVERRIDES" to EdgeType and the corresponding kuzu query path, bump ontology_version, then this propose stays as-is.

Recommendation: (b). The OVERRIDDEN_BY.in > 0 → “list overriders” redirect is one of the highest-value hints in the catalog (the whole point of the rollup in describe), and OVERRIDES is the natural inverse of OVERRIDDEN_BY rather than an ad-hoc addition. Adding it elevates a rollup-only relation to a first-class edge, which the override-axis rollups already imply exists in the graph.

Will add a §6 dependency note + a new locked decision: "v1 hints depend on OVERRIDES being added to EdgeType; ordered before the hints implementation PR." If you prefer (a), say the word and I'll cut those rows instead.

2. find page-full and observable state — agree.

§2 principle says hints are pure-payload-server-side; §3.3 row says len(results) >= limit; FindOutput doesn't echo limit. That's a contract contradiction the consistency pass missed. Two repairs available:

(a) Echo limit (and probably offset) on FindOutput. Smallest delta to the contract; trivially observable; aligns with how other tools could grow the same affordance. Slight bloat on every find response.
(b) Pass (output, request_kwargs) into the hint generator. Keeps FindOutput unchanged but breaks the "hints are computed from the output object alone" mental model and forces every call site to thread two arguments.

Recommendation: (a). The contract should make “you got a full page” a property of the output, not a synthesis between request and output. It also makes the same hint computable by a client agent that wants to verify, which is in spirit with the documentation-grade-not-programmatic-dispatch lock.

Will add: FindOutput.limit: int and FindOutput.offset: int echoes, with the same fields proposed (separately) for any future tool that grows pagination. New locked decision: "Pagination state is part of the response payload, not the call context."

3. Search score < 0.5 — partially agree, weaker fix than 1/2.

Fair point that 0.5 is not a calibrated threshold across RRF / hybrid / pure-cosine modes. Three options:

(a) Tie to ranking config. Use search's known scoring path (current default is RRF per search.py) and pick a per-mode threshold documented in the search docs, not in this propose.
(b) Make it configurable. Add a search_hint_low_confidence_threshold knob; default to whatever calibration says.
(c) Use a structural signal. E.g., len(results) < 3 AND no result has score > p90_baseline — drop the absolute threshold entirely.

Recommendation: drop the threshold from v1 entirely and replace with a structural signal: "all returned hits have score within 10% of each other AND len(results) == limit" → emits "results look weak — try find(role=…) or narrow query". That's calibration-free, observable from payload alone, and matches the documentation-grade contract.

Will lock: "Low-confidence search hint is structural (score spread + page state), not threshold-on-score. v1 emits no hint when uncertain rather than emitting a miscalibrated one."

Medium

4. Kind gates for DECLARES.* rollups — agree, add test scenarios.

The §3.3 table already says "describe (type Symbol)" vs "describe (method Symbol)", but the propose doesn't pin this to a test. Will add to §6: "Kind gate: method-Symbol describe with type-only rollup keys present → no hint emitted (impossible state, but test as a regression bumper)."

5. §2.8 vs cap — agree, dedupe normalized call-shape.

This is a real semantic gap. Two rows can trigger the same emission shape (e.g., a class with DECLARES.DECLARES_CLIENT and also DECLARES.DECLARES_CLIENT from a different rollup path — contrived but possible after future ontology growth). Will add a sentence to §2.8: "Coalescing by trigger is forbidden; deduplication by normalized emission string after template rendering is required and runs before the 5-hint cap. The cap counts unique rendered strings."

Will lock as a new decision.

6. "Backwards-compatible" wording — disagree (minor).

Repo policy does allow breaking changes, but the hints field on output types isn't a breaking change in any meaningful sense — adding an optional field to a Pydantic output that defaults to [] doesn't break any current caller, agent or test. "Additive for clients that ignore new fields" is accurate as written and useful for the future case where someone resurrects a pre-hints client. I'd rather keep the phrasing and let the policy override apply only to actually-breaking changes.

If you want the wording trimmed for tone, happy to soften to "additive: existing callers see no behavioral change." But I don't want to delete the compatibility note entirely.

Review goals

§1 frame — agreed, no change.
§3.3 completeness — once 1 lands (override emissions resolved), the optional later additions (HTTP_CALLS, ASYNC_CALLS, IMPLEMENTS, EXTENDS rollups for types) make sense for a v2 hints expansion, not v1. Will add to §5 "out of scope, v2 candidate" rather than expand v1.
§7.12 priority — yes, priority is downstream of valid emissions; that's exactly why I want to lock 1 and 2 first.
§5 out-of-scope, request context for pagination hints — this is item 2(a) above; once limit/offset are echoed on FindOutput, pagination hint generation stays pure-payload and request context is still out of scope. Will reflect this in §5 as a clarifying line.

What I'll change in the next push

OVERRIDES ontology dependency (or row drop — pending your pick on 1a vs 1b)
FindOutput.limit/FindOutput.offset echo + locked decision
Search hint: structural signal, not score threshold
§6 test scenario for kind gates
§2.8 dedupe-by-rendered-string + locked decision
§5 lines for pagination request-context clarification + v2 hint catalog candidates
Keep "additive for clients that ignore new fields" wording

That's 5 changes + 1 keep. Confirm OVERRIDES direction (a vs b) and I'll push.

HumanBean17 · 2026-05-15T20:31:45Z

Applied all 5 changes + the keep. Force-pushed as 1e55cd5.

What changed

1. OVERRIDES ontology bump (option b, your call).

Migration reshaped from 1 PR to 2 PRs:
- PR-A (prereq): promote "OVERRIDES" to a first-class EdgeType in mcp_v2.py, extend the Kuzu query path, bump ontology_version, adjust the edge_summary description to drop the "not valid for neighbors()" carve-out for OVERRIDES only (keep it for OVERRIDDEN_BY and dot-key rollups).
- PR-B: hints field + pagination echo + v1 template catalog.
§3.3 rows 3 and 4 keep their neighbors([id],'in',['OVERRIDES']) emissions but now annotate "requires PR-A; see §6". Appendix A rows annotated the same way.
New locked decision §7.17 captures the rationale: OVERRIDES is the natural inverse of OVERRIDDEN_BY, already implied by the rollup; elevating it preserves the highest-value hints in the catalog rather than scoping them out.
OVERRIDDEN_BY stays rollup-only (no plan to promote it; it's a count-of-incoming-overrides view, not a forward edge).

2. FindOutput.limit / FindOutput.offset echoes.

§3.1 extended with the FindOutput shape showing the new echoed fields.
The page-full hint trigger now reads output.limit and len(output.results) — pure-payload, request kwargs not threaded into the hint generator. New named test scenario in §6 (PR-B): "Pagination-echo scenario: FindOutput round-trips limit/offset verbatim; the page-full hint fires iff len(results) >= limit".
New locked decision §7.18: pagination state is part of the response payload, not the call context. Same pattern applies if search/neighbors grow pagination in their own future proposes; v1 echoes only on find.
§5 extended with two clarifying out-of-scope rows: request-context plumbing for pagination, and pagination echo on tools other than find.

3. Structural low-confidence search signal.

Replaced top score < 0.5 (uncalibrated, mode-dependent) with a structural trigger: len(results) == limit AND (max_score - min_score) < 0.1 * max_score. Calibration-free, observable from payload alone, robust across RRF / hybrid / pure-cosine ranking modes.
§3.3 row, UC10, and the Appendix A row all updated.
Hint text changed from "low-confidence match — try a more specific query" to "results look weak — narrow the query or try find(role=…)". The new wording is concretely actionable (suggests the find(role=…) redirect, which is the natural fallback when text search is weak).
New locked decision §7.19: when the signal is uncertain v1 emits no hint rather than a miscalibrated one.

4. Kind-gate test scenario.

Added to §6 PR-B test scenarios: "Kind-gate scenario: a method-Symbol describe payload synthesized with type-only rollup keys present (impossible but defensible state) emits no type-rollup hints — a regression bumper for the §3.3 kind separator". The §3.3 table already split type vs method; this pins it to a test.

5. Dedupe-by-rendered-string, §2.8 rewritten.

§2.8 used to say "we do not coalesce" — that was misleading. The right rule: coalesce only on character-identical rendered output, after template substitution. Distinct triggers that render to different strings stay separate even if they recommend the same conceptual next step.
The cap now counts unique rendered strings.
§7.12 priority lock updated to reference the dedupe step explicitly: "Priority applies after dedupe-by-rendered-string".
Old decision §7.5 ("Drop, do not coalesce") tightened to reference §7.20 and §7.12, since "do not coalesce" was the prior contradicting wording.
New named test scenario in §6 PR-B: dedupe runs before the cap.
New locked decision §7.20: dedupe-by-rendered-string is required.

6 (keep). "Additive for clients that ignore new fields" wording preserved.

The compatibility note stays. Adding an optional Pydantic field that defaults to [] is behaviorally invisible.
New locked decision §7.21 makes the position explicit: the hints field is additive on the agent surface; the breaking-changes-allowed policy applies to PR-A's ontology bump (which IS breaking by intent), not to PR-B.
TL;DR line updated to make the split explicit: "PR-A bumps the ontology … PR-B adds the hints field … PR-B is additive on the agent-visible surface".

Consistency pass (post-revision)

21 locked decisions (was 16; added §7.17 OVERRIDES bump, §7.18 pagination echo, §7.19 structural search trigger, §7.20 dedupe-by-rendered, §7.21 additive-not-breaking)
9 design principles (unchanged; §2.8 rewritten in place)
14 UCs (unchanged; UC10 wording updated to match the new structural trigger)
2 PRs in TL;DR ↔ 2 PRs in §6 ↔ 2 PRs in the migration narrative — counts agree
No surviving "top score < 0.5" or "low-confidence match" outside the §B12 changelog reference
§7.5 reconciled with §2.8 / §7.20 (the prior "do not coalesce" wording was contradictory once dedupe-by-rendered was added)

Ready for another pass when you have time.

HumanBean17

Second review (updated propose + replies)

The revision fixes most of the first-pass contract gaps: the PR-A/PR-B split, FindOutput pagination echo + pure-payload hint generation, dedupe-then-cap (§2.8 / §7.20), relative score spread for search low-confidence (§7.19), and the §6 test matrix (kind-gate, cap, dedupe, char-cap) all read implementable.

A few doc issues remain—worth fixing before coding so implementers are not forced to guess.

1. Search hint vs “pagination echo only on `find`” (contradiction)

§5 explicitly defers pagination echo for search (and neighbors) to future proposes; §7.19, §3.3 / Appendix A, §6 search scenario, and UC10 all use len(results) == limit on SearchOutput. Today SearchOutput does not echo limit, and the propose does not add it.

Pick one arm and align §5 / §7.19 / catalog / tests: e.g. echo limit/offset on SearchOutput in PR-B (and narrow the §5 carve-out), or drop limit from the trigger (e.g. score-band + minimum len(results)), or (less ideal given §5) allow request-context for this single trigger.

2. `OVERRIDDEN_BY` triggers use the wrong `edge_summary` half

In current rollups (override_axis_rollup_for), composed override keys are emitted as {"in": 0, "out": n}. §3.3 rows that fire on edge_summary["OVERRIDDEN_BY"].in > 0 and ...OVERRIDDEN_BY.DECLARES_CLIENT...in > 0 would never fire. These should use .out > 0 (and UC2 / Appendix A should match).

3. PR-A scope: materialize edges, not only `EdgeType`

OVERRIDES / override-axis behavior is virtual Cypher today; there is no stored [:OVERRIDES] rel to traverse. PR-A should spell out graph builder + schema work (materialize edges, or an explicitly scoped neighbors special-case—different design). The §6 equivalence test (neighbors vs rollup prediction) is the right acceptance criterion once that storage story is stated.

4. §5 “no cross-tool hints” vs §7.16

The §5 table claims no v1 cross-tool hints, but find → resolve is cross-tool and is locked in §7.16. Narrow that §5 row to “no additional cross-tool templates beyond the locked find empty → resolve row.”

5. Minor: `FindOutput` errors and the route/client template

If limit / offset are required fields with no default, define success=False behavior (defaults or optionals) so builders stay simple.
The route/client row uses 'EXPOSES' or 'DECLARES_CLIENT' as human-readable text; add a line that the real catalog entries are concrete emissions after substituting {inverse_edge} (two templates or one parameterized string).

Overall: ship the direction; resolve search limit vs §5, fix .in → .out, and expand PR-A to the full graph story so PR-B hints are not blocked by missing rels.

HumanBean17 · 2026-05-15T20:42:09Z

All 5 caught — agree on every one. Verified against source:

kuzu_queries.override_axis_rollup_for writes {"in": 0, "out": n} for all four override-axis keys (lines 675–694). So .in > 0 would indeed have been dead code.
OVERRIDES is virtual Cypher in that same function — no stored [:OVERRIDES] rel exists; the rollup is computed from IMPLEMENTS|EXTENDS + signature match. neighbors(..., ['OVERRIDES']) over the existing graph would have returned empty.

Force-pushed as b69dbd8.

What changed

1. Search hint vs §5 carve-out — echo limit/offset on SearchOutput in PR-B.

§3.1 now shows both FindOutput and SearchOutput shapes with limit: int | None + offset: int | None echoes.
§5 carve-out row renamed to "Pagination echo on neighbors" — search is no longer carved out at v1, only neighbors is (it doesn't have a hint trigger that needs pagination yet).
§3.3 search row annotated "Requires SearchOutput.limit echo per §3.1 / §7.18". Appendix A row annotated the same way.
§7.18 rewritten to cover both FindOutput and SearchOutput.
New §6 PR-B test scenario: pagination echo round-trips on both find and search; structural search hint with limit == None emits nothing.

2. .in → .out direction fix on override-axis triggers.

§3.3 rows 3 and 4: edge_summary["OVERRIDDEN_BY"].in > 0 → .out > 0, same for OVERRIDDEN_BY.DECLARES_CLIENT.
UC2 row updated to reference OVERRIDDEN_BY.DECLARES_CLIENT.out > 0.
Appendix A entries annotated: # requires PR-A; rollup stores counts on .out per override_axis_rollup_for.
New §B19 changelog entry captures the catch.

3. PR-A scope expanded to full graph work — schema + builder + traversal.

§6 PR-A rewritten as three-part work:
- Schema: extend Kuzu schema with OVERRIDES between method Symbols.
- Builder: write (mover)-[:OVERRIDES]->(m) during build, mirroring the rollup logic verbatim — for each method m on type t, find implementing types impl, find each impl's declared method mover with matching signature, write the edge.
- Query path: extend neighbors to traverse the stored edge.
- Then the ontology bump and EdgeType admission as before.
Equivalence test scenario sharpened: the rollup is computed from the declaring side, the stored edge is read from the implementer side; the test exercises both directions and the two sets must match.
Added schema-roundtrip and builder-symmetry test scenarios.
§6 includes an explicit "if materialization cost is too high we'd consider a neighbors special-case dispatching to virtual Cypher — that's a different design and a different propose. This one commits to the stored-edge path."
§7.17 rewritten to reflect the three-part scope, not just the literal admission.

4. §5 row "no cross-tool hints" reconciled with §7.16.

Old row contradicted the locked find empty → resolve template. New row: "Additional cross-tool hint templates beyond the locked find empty → resolve row (§7.16)" — narrowing to "no additional cross-tool templates", which is what we actually want.

5. Minor — error-path limit/offset defaults + route/client template split.

limit / offset typed int | None with default None so error-path builders stay simple. Hint triggers treat None as absent (do not fire). New error-path pagination test scenario in §6 PR-B.
Route/client describe template split into two concrete rows (one per kind, each with the concrete inverse edge). Old row's 'EXPOSES' or 'DECLARES_CLIENT' was a human-readable placeholder that wouldn't have rendered to a valid call shape — that violated §2.7 (no alias magic). Now mechanical: kind == route → neighbors(..., ['EXPOSES']); kind == client → neighbors(..., ['DECLARES_CLIENT']). UC6 also split into UC6a/UC6b to match.

Drift caught during consistency pass

§3.1 had two SearchOutput blocks (one in the new pagination-echo block, one in the original hints-schema-description block) after the edit. Replaced the second block with a plain hints-description quote so the model definitions appear exactly once.
UC count adjusted to 15 (UC6 split into UC6a + UC6b). Count claim lines above and below the UC table both updated.

Consistency pass post-revision

21 locked decisions (unchanged count; §7.17 + §7.18 substantially rewritten)
9 design principles (unchanged)
15 UCs (was 14; UC6 → UC6a + UC6b to match the route/client split in §3.3)
2 PRs (PR-A now three-part graph work + ontology admission; PR-B unchanged in shape)
No surviving .in > 0 on override-axis triggers outside the §B19 changelog reference
No surviving 'EXPOSES' or 'DECLARES_CLIENT' alias-style template outside the §B23 changelog reference
§5 cross-tool row reconciled with §7.16
TL;DR ↔ §3.1 ↔ §6 ↔ §7.17 / §7.18 all reference the same PR-A three-part scope and the same FindOutput + SearchOutput pagination story

Ready for another pass.

HumanBean17

Third review (post third revision)

Read the full updated propose/HINTS-ROAD-SIGNS-PROPOSE.md on propose/hints-road-signs plus the thread reply. The third pass closes the earlier contradictions; the doc is largely implementation-ready.

Strengths in this revision

Search vs §5: SearchOutput.limit / offset echo, §7.18 / §7.19 alignment, and the limit is None → no hint scenario remove the internal inconsistency.
Override-axis triggers: .out > 0 matches override_axis_rollup_for ({"in": 0, "out": n}).
PR-A: Stored OVERRIDES, schema, traversal, ontology bump, and explicit rejection of virtual neighbors without a new propose.
Route vs client: Two concrete rows + UC6a/UC6b fix the valid single-shot call-shape issue.
Cross-tool §5 row: Narrowed to “no additional cross-tool templates” — consistent with §7.16.

Remaining gaps / nits

1. PR-A builder: override_axis_rollup_for has two query shapes — §6 mostly narrates one

override_axis_rollup_for runs down (supertype method → implementing methods → OVERRIDDEN_BY) and up (concrete method → parent declaration methods → OVERRIDES). §6 describes (mover)-[:OVERRIDES]->(m) for m on t, which matches the down pattern when m is the supertype method.

The up query adds parent decl_m ids for a concrete m. For stored edges to satisfy the §6 equivalence story (“rollup vs neighbors in both directions”), materialization must cover both virtual patterns — or one explicit unified rule (e.g. whenever method A overrides method B in the rollup sense, emit (A)-[:OVERRIDES]->(B) so neighbors(B,'in',['OVERRIDES']) and neighbors(A,'out',['OVERRIDES']) line up with the right id sets).

As written, a reader might implement only the interface/t + impl loop and miss the concrete-m + parent-decl_m half.

2. Builder module placeholder

§6 cites graph_enrich.py (or wherever). In this repo shape, rel creation usually lives under build_ast_graph.py (with graph_enrich.py for enrichment). Pointing PR-A implementers at the actual graph write pass avoids starting in the wrong module.

3. §5 “Request-context plumbing” row

It still says “Once FindOutput echoes …” while §7.18 now covers both FindOutput and SearchOutput. One sentence update avoids confusion.

4. Appendix B consistency

An older changelog bullet still says the UC count is 14 while the doc now uses 15 UCs (UC6 split). Worth de-staling that historical line.

5. Optional: post-filtered search and “full page”

search_v2 can return fewer results than the requested limit after post-processing filters. len(results) == limit is fine if defined as “full page of returned hits after filters.” A single clarifying sentence in §7.19 or §6 removes ambiguity for implementers.

Verdict

Good to land as the design authority for PR-A/PR-B. The substantive follow-up is making PR-A’s materialization story explicitly cover both halves of override_axis_rollup_for (or the unified directed-edge rule), so equivalence tests and future hints do not half-implement the virtual rollup.

HumanBean17 · 2026-05-15T20:51:37Z

Items 1, 2, 3, 5 applied. Pushing back on item 4 (it's a historical changelog entry — see below). Force-pushed as 84fc2a0.

Applied

1. PR-A builder — unified directed-edge rule covering both rollup arms.

Real catch. The old §6 wording ((mover)-[:OVERRIDES]->(m) for m on t) only narrated the down arm of override_axis_rollup_for. The up arm (concrete m → parent.decl_m) was implicit; a reader could have implemented half of the rollup and the equivalence test would still half-pass.

Rewrote §6 PR-A builder section as a single rule: (A)-[:OVERRIDES]->(B) whenever subtype-method A overrides supertype-method B (signature match). One walk, one edge per pair, both arms covered:

Down: m on supertype t → neighbors(m, 'in', ['OVERRIDES']) returns the impl_ids set (the movers).
Up: m on subtype → neighbors(m, 'out', ['OVERRIDES']) returns the decl_ids set (parent declarations).

Added builder pseudo-code: walk every method A, walk its declaring type's transitive IMPLEMENTS/EXTENDS ancestors, for each ancestor's method B with matching signature write (A)-[:OVERRIDES]->(B).

Equivalence test scenario rewritten to explicitly exercise both halves: (down) neighbors(supertype_method, 'in', ['OVERRIDES']) == impl_ids, (up) neighbors(subtype_method, 'out', ['OVERRIDES']) == decl_ids. Both must match; if either comes up short, the materialization is missing pairs.

2. Builder module pinned to build_ast_graph.py.

Verified by grep — build_ast_graph.py has the CREATE REL TABLE definitions (lines 2176–2208) and the per-edge CREATE (a)-[:EDGE]->(b) writes. graph_enrich.py has no CREATE REL or relationship-write statements; it's enrichment-only.

§6 schema and builder bullets both now pin to build_ast_graph.py explicitly, with an inline note that graph_enrich.py does not create relationships. The old "graph_enrich.py (or wherever)" phrasing is gone.

3. §5 "Request-context plumbing" row updated to mention both FindOutput and SearchOutput.

Stale text from before §7.18 expanded to cover both outputs. One-sentence fix.

5. §7.19 page-full clarification for search.

Implementer-relevant clarification. search_v2 may post-filter; the trigger needs to be defined against what the agent actually sees. §7.19 now says: len(results) == limit means "the returned result list, after any post-processing filters, fills the requested page." The trigger reads output.results, not a pre-filter count. This is the right level for a propose — it's a definitional decision about what the hint observes, not implementation detail.

Pushed back on

4. Appendix B §B5 "UC count is now 14" — keeping the historical bullet.

The Appendix B changelog records what was true at each revision. §B5 was added in the first re-grilling pass when UC15 was moved to a test scenario, dropping the count from 15 to 14. UC6 splitting into UC6a + UC6b (which brought the count back up to 15) happened in the third pass and is recorded in §B23.

Editing §B5 to retroactively read "UC count is 15" would muddy traceability — a reader following the timeline would see "UC15 dropped, count is now 14" → §B23 "UC6 split, count is 15", which is the actual history. Overwriting §B5 to claim "count is 15" makes §B23's contribution invisible.

The doc's current state (the parts a reviewer evaluates the design from — TL;DR, §3.3, §4, §6, §7) all consistently say 15. Only the layered changelog records the in-between counts, which is what it's for.

If you'd prefer a different traceability convention here (e.g., changelog entries always reflect final state and we lose the pass-by-pass diff), happy to switch — but I'd want that as a separate decision rather than retconning this one bullet.

Consistency pass

21 locked decisions (unchanged; §7.19 wording sharpened)
9 design principles (unchanged)
15 UCs (unchanged; current-state)
2 PRs; PR-A's builder section now states the unified rule and pins the module
10 changelog entries in Appendix B (4 new from this pass: §B25 unified-edge rule, §B26 builder module, §B27 §5 row update, §B28 §7.19 post-filter clarification)
Both graph_enrich.py mentions are now explicitly negative references ("does not create relationships") rather than implementation pointers
No surviving stale text on the items applied

Ready for landing if you're satisfied with the push-back on 4, or happy to revisit if you'd rather have a flat (current-state-only) changelog convention going forward.

This was referenced May 14, 2026

find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117

Closed

lossless-permissive frame: tactical no-regret fix for #117 silent-drop bug class #122

Closed

HumanBean17 marked this pull request as ready for review May 14, 2026 15:57

HumanBean17 mentioned this pull request May 14, 2026

propose: MCP filter frame — typed query language with one named carve-out #128

Merged

HumanBean17 force-pushed the propose/hints-road-signs branch from a30984b to 3ea64e3 Compare May 15, 2026 20:18

HumanBean17 commented May 15, 2026

View reviewed changes

HumanBean17 force-pushed the propose/hints-road-signs branch from 3ea64e3 to 1e55cd5 Compare May 15, 2026 20:29

HumanBean17 commented May 15, 2026

View reviewed changes

HumanBean17 force-pushed the propose/hints-road-signs branch from 1e55cd5 to b69dbd8 Compare May 15, 2026 20:40

HumanBean17 commented May 15, 2026

View reviewed changes

propose: hints field as machine-readable road signs on MCP V2 outputs

84fc2a0

HumanBean17 force-pushed the propose/hints-road-signs branch from b69dbd8 to 84fc2a0 Compare May 15, 2026 20:47

HumanBean17 merged commit 2569af4 into master May 15, 2026
1 check passed

HumanBean17 mentioned this pull request May 16, 2026

propose: hints-v2 — extend hints to resolve and to fuzzy-strategy neighbors signals #146

Merged

HumanBean17 deleted the propose/hints-road-signs branch May 23, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

propose: hints field as machine-readable road signs on MCP V2 outputs#120

propose: hints field as machine-readable road signs on MCP V2 outputs#120
HumanBean17 merged 1 commit into
masterfrom
propose/hints-road-signs

HumanBean17 commented May 14, 2026

Uh oh!

HumanBean17 commented May 14, 2026

Uh oh!

HumanBean17 commented May 15, 2026

Uh oh!

HumanBean17 commented May 15, 2026

Uh oh!

HumanBean17 left a comment

Uh oh!

HumanBean17 commented May 15, 2026

Uh oh!

HumanBean17 commented May 15, 2026

Uh oh!

HumanBean17 left a comment

Uh oh!

HumanBean17 commented May 15, 2026

Uh oh!

HumanBean17 left a comment

Uh oh!

HumanBean17 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HumanBean17 commented May 14, 2026

Uh oh!

HumanBean17 commented May 14, 2026

Pausing pending #117 and #118

Decision

Cross-refs

Uh oh!

HumanBean17 commented May 15, 2026

What aged well

Items to address

(1) The "resolves #118" claim is no longer accurate

(2) The §3.3 catalog has a stale row that bypasses resolve

(3) Row-4 hint contradicts §7.8 (paraphrased emission, not concrete call)

(4) §7.14 carve-out is solving a problem #117 already solved

(5) The "triggers vs emissions" principle is implicit but not stated

(6) UC15 is hypothetical and weakens the re-walk

(7) Appendix A has plan-level detail

(8) Lock the consumer model as a decision

What doesn't need to change

Suggested order

Uh oh!

HumanBean17 commented May 15, 2026

8 items — what changed

3 drifts caught during consistency pass

State after revision

Uh oh!

HumanBean17 left a comment

Choose a reason for hiding this comment

Review (propose/HINTS-ROAD-SIGNS-PROPOSE.md)

Blocking / high priority

Medium / clarity4. Kind gates for DECLARES.* rollups

Answers to your review goals

Uh oh!

HumanBean17 commented May 15, 2026

Blocking

Medium

Review goals

What I'll change in the next push

Uh oh!

HumanBean17 commented May 15, 2026

What changed

Consistency pass (post-revision)

Uh oh!

HumanBean17 left a comment

Choose a reason for hiding this comment

Second review (updated propose + replies)

1. Search hint vs “pagination echo only on find” (contradiction)

2. OVERRIDDEN_BY triggers use the wrong edge_summary half

3. PR-A scope: materialize edges, not only EdgeType

4. §5 “no cross-tool hints” vs §7.16

5. Minor: FindOutput errors and the route/client template

Uh oh!

HumanBean17 commented May 15, 2026

What changed

Drift caught during consistency pass

Consistency pass post-revision

Uh oh!

HumanBean17 left a comment

Choose a reason for hiding this comment

Third review (post third revision)

Strengths in this revision

Remaining gaps / nits

Verdict

Uh oh!

HumanBean17 commented May 15, 2026

Applied

Pushed back on

Consistency pass

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

(2) The §3.3 catalog has a stale row that bypasses `resolve`

Review (`propose/HINTS-ROAD-SIGNS-PROPOSE.md`)

Medium / clarity4. Kind gates for `DECLARES.*` rollups

1. Search hint vs “pagination echo only on `find`” (contradiction)

2. `OVERRIDDEN_BY` triggers use the wrong `edge_summary` half

3. PR-A scope: materialize edges, not only `EdgeType`

5. Minor: `FindOutput` errors and the route/client template