Skip to content

propose: hints field as machine-readable road signs on MCP V2 outputs#120

Merged
HumanBean17 merged 1 commit into
masterfrom
propose/hints-road-signs
May 15, 2026
Merged

propose: hints field as machine-readable road signs on MCP V2 outputs#120
HumanBean17 merged 1 commit into
masterfrom
propose/hints-road-signs

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Draft propose for review. Tight scope: add hints: list[str] to all four MCP V2 output models, with a strict catalog of road-sign-shaped templates generated server-side from observable output state.

Frame (§1): a hint is a road sign attached to a tool output — it tells the agent the next reachable call, not what that call means or why. ≤120 chars per hint, ≤5 hints per output, no prose.

Sibling to issues #117 (filter contract) and #118 (rollup decomposition). Hints can ship under any frame decision in those issues; they reduce the cost of getting the frames wrong by making the next call machine-readable instead of inference-required.

Key locked decisions:

Migration: 1 PR. Additive, backwards-compatible.

Review goals:

  1. Is the §1 frame the right anchor, or does "road sign" rule out something we actually want?
  2. Is the §3.3 catalog complete enough for v1, or are there obvious templates missing?
  3. Is the §7 decision feat: B2b HTTP_CALLS + ASYNC_CALLS extractor (PR-D1) #12 priority order right? (DECLARES.* > OVERRIDDEN_BY.* > leaf > meta)
  4. Anything in §5 ("deliberately does NOT do") that should be in-scope after all?

@HumanBean17
Copy link
Copy Markdown
Owner Author

Pausing pending #117 and #118

Dmitriy spotted that the hint catalog references OVERRIDDEN_BY and OVERRIDES as if they were navigable edges. They are not — both are virtual rollup keys synthesized by override_axis_rollup_for() in kuzu_queries.py:643–696, and the EdgeType Literal in mcp_v2.py:21–31 does not accept them. The propose's own pre-existing docstring at mcp_v2.py:135–137 already warns about this exact pitfall.

Grilling that single observation surfaced that the hint catalog has rows whose text or condition depends on outcomes still open in #117 and #118, not just one bad row:

Hint family Depends on #117? Depends on #118? Issue
OVERRIDDEN_BY* family (catalog rows at L90–91, L222–226; UC2 at L117; priority order at L170; UC14/UC15) no yes (hard) No reachable next call exists; hint can't honor the road-sign frame until #118 decides whether rollups become navigable.
DECLARES.DECLARES_CLIENT / DECLARES.EXPOSES rollup hints no yes (soft) Text prescribes "two neighbors calls". If #118 lands as a dot-notation primitive or a decomposition tool, the canonical hint text changes.
search → low-confidence describe-fallback yes (soft) no If #117 commits to a resolve tool, the hint points at resolve(query), not describe.
find page-full → "page=2" no no Stable.
neighbors empty result no no Stable.
Basic describe hints (type Symbol, route, client) no no Stable.

Two additional risks the current draft doesn't evaluate:

  1. The priority cap rule (decision feat: B2b HTTP_CALLS + ASYNC_CALLS extractor (PR-D1) #12, L170) ranks OVERRIDDEN_BY.* rollups second. If those hints get deferred or restructured, the priority order needs revision.
  2. The generation-discipline rules (≤120 chars, ≤5 per output, static templates, never LLM, no dot-keys, pure generation) are the frame of the propose — independent of find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117/rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118. Those survive any catalog reshape and are the durable contribution of this doc.

Decision

Pause this PR pending #117 and #118 lock. Sequencing:

  1. Lock find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 (filter-contract / strict frame, plus resolve tool design if frame commits to it).
  2. Lock rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118 (rollup decomposition — anchored to find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117's frame).
  3. Reshape this propose: keep the frame and discipline rules, revise the catalog rows whose text or condition is now determined, add an explicit "deferred families" section if any remain.

Estimated reshape cost post-lock: ~4 catalog rows + priority order + one section. Doing it now means re-doing it twice.

Not closing the PR — branch propose/hints-road-signs at a30984b stays as-is so the frame work is preserved.

Cross-refs

@HumanBean17
Copy link
Copy Markdown
Owner Author

Re-grilled in the post-#117 / post-resolve world. Frame, cap discipline, list[str] shape, static-template decision, priority order, and no-LLM-in-hints rule all hold up cleanly. Eight items to address before locking.

What aged well

Items to address

(1) The "resolves #118" claim is no longer accurate

Open-links bullet says "locking hints here mostly resolves #118". After re-grilling #118 in the post-resolve world, the conclusion was the opposite: strings cover the documentation-grade consumer model from #118 option B, but the mechanical/typed model (rollup_paths) is a separate decision. The propose currently dodges the consumer-model question with "lossy by design" and "advisory" framing.

Fix: Replace the bullet with a precise cross-link: strings cover the documentation-grade consumer model from #118 option B; the mechanical/typed model is a separate decision that may follow once we see evidence of agents doing programmatic next-call construction. Pair with item (8) below.

(2) The §3.3 catalog has a stale row that bypasses resolve

Row: find(kind=client, filter={fqn_prefix:...}) empty \u2192 "try search(...) for fuzzy fallback". This is the pre-resolve fallback wording that PR-RESOLVE-2 removed from all four tool descriptions per decision \u00a77.9 of the resolve propose. Letting it live on as a hint re-introduces what we just removed.

Fix: Replace with a hint pointing at resolve(identifier, hint_kind="client"). Update §3.3 row, UC7, and hints_for_find in Appendix A consistently. This also resolves item (8) below \u2014 it makes the propose ship one cross-tool hint at v1 (the resolve redirect) instead of claiming "no cross-tool hints" while sneaking one in.

(3) Row-4 hint contradicts §7.8 (paraphrased emission, not concrete call)

§7.8 locks: hints reference real EdgeType literals only, never dot-keys, never paraphrases. But row 4 emits "clients in overriders: walk OVERRIDDEN_BY then DECLARES_CLIENT (two neighbors calls)". The neighbors() argument is paraphrased away \u2014 the hint doesn't tell the agent what to call. Compare row 1, which emits a concrete two-call template.

Fix: Reshape row 4 to match row 1's shape, e.g. "clients in overriders: neighbors(['{rid}'],'in',['OVERRIDES']) then neighbors(overrider_ids,'out',['DECLARES_CLIENT'])". If that exceeds the 120-char cap, drop the row from v1 and let it return in a future amendment with a cleaner template.

(4) §7.14 carve-out is solving a problem #117 already solved

§7.14: "No hints for find when no filter was passed. find() with no filter is a sentinel call." In the strict-frame world #117 locked, find() without a filter is a contract error that fails loud. The carve-out is treating it as a soft edge case but it's a hard error now.

Fix: Drop §7.14. Renumber subsequent decisions.

(5) The "triggers vs emissions" principle is implicit but not stated

§3.3 quietly does the right thing: hints are triggered by dot-keys (read-only signal) but emit atomic EdgeType calls (consumable contract). That's the post-#89 strict-frame principle in action. But it's not a stated principle anywhere, so future template authors won't know to follow it.

Fix: Add §2.9 (or equivalent) principle: triggers are signals \u2014 may reference dot-keys, rollup state, score thresholds, etc.; emissions are calls \u2014 atomic EdgeType literals and concrete arguments only. The two never share vocabulary.

(6) UC15 is hypothetical and weakens the re-walk

UC15: "Agent describes a class with 8 rollup signals (hypothetical max)". The propose-doc-author skill explicitly calls out the UC re-walk as the validation move \u2014 every row should be realistic. UC15 is testing the cap, not the design.

Fix: Move the cap test into §6 named test scenarios. Either replace UC15 with a realistic case, or drop to 14 UCs and say so. (UC count needs to match between the row count and any "N realistic cases" prose.)

(7) Appendix A has plan-level detail

The skill says: keep Appendix A to the one thing the implementer copies verbatim; if there's no such thing, omit. The mcp_hints.py skeleton with hints_for_describe / hints_for_find / etc. function signatures crosses the propose\u2194plan boundary. The hint templates are the artifact; the function decomposition is plan work.

Fix: Trim Appendix A to the template strings (the §3.3 table is already that). Drop the function-level skeleton; the file-existence claim (mcp_hints.py) can live in §7.15 unchanged, but the function bodies belong in plans/PLAN-HINTS.md.

(8) Lock the consumer model as a decision

The propose says "lossy by design, advisory" in §2.4 / §7.10 but doesn't lock what the agent does with hints. After (1), the consumer model needs to be explicit so future readers don't relitigate it.

Fix: Add a §7 decision: Hints are documentation-grade. The agent reads them as part of its prompt context; the surface does not commit to programmatic dispatch in v1. If a future workflow needs mechanical consumption, that's a separate propose for a typed surface (e.g., the rollup_paths shape sketched in #118).

What doesn't need to change

  • §1 frame, §2 principles 1\u20138, §3.1 field shape, §3.2 generation contract.
  • §7 decisions 1\u20137 and 9\u201313, 15 (only 8 and 14 named above).
  • §5 "deliberately does NOT do" \u2014 holds up; the cross-tool hint row becomes consistent after item (2) lands.
  • §8 risks table.

Suggested order

Apply items 1, 2, 4, 8 first (decision-level shifts), then 3, 5, 6, 7 (text edits). Force-push the revision per the iterate-and-amend pattern; rerun the consistency pass (decision count, UC count, cardinal-number alignment) before requesting re-review.

@HumanBean17 HumanBean17 force-pushed the propose/hints-road-signs branch from a30984b to 3ea64e3 Compare May 15, 2026 20:18
@HumanBean17
Copy link
Copy Markdown
Owner Author

All 8 items applied + 3 drifts caught during the consistency pass. Force-pushed as 3ea64e3.

8 items — what changed

  1. Dropped "resolves rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118" claim. Now cross-link only: "rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118 is a related, paused issue; this propose does not subsume it. Consumer-model question for rollup decomposition affordance: describe surfaces composed edges that neighbors can't directly traverse #118 stays open after this lands."

  2. Replaced stale find→search fallback in §3.2 and in every UC row that referenced it. Hints now redirect to resolve(identifier, hint_kind=…) per the find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 strict frame. No search() fallback survives anywhere in the doc.

  3. Fixed row-4 paraphrased chain. "Walk OVERRIDDEN_BY then DECLARES_CLIENT" → concrete two-call: neighbors(node, edge=OVERRIDDEN_BY, direction=out) followed by neighbors(<impl>, edge=DECLARES_CLIENT, direction=out). Caught one more during consistency pass (UC2 row had the same paraphrase shape) — fixed.

  4. Dropped §7.14 carve-out. "No hints when find has no filter" is dead weight: find filter contract: silent-drop bug surfaces the larger 'what is the filter contract per kind?' question #117 strict frame already loud-fails this case. Removed the section; renumbered downstream.

  5. Added §2.9 principle: "Triggers are signals; emissions are calls; vocabularies never share." Dot-keys may trigger a hint but never appear in emissions. Atomic EdgeType only on the emission side.

  6. Moved UC15 → §6 test scenario. Hypothetical cap test doesn't belong in the use-case re-walk; it's a unit-test scenario. UC count is now 14, not 15.

  7. Trimmed Appendix A. Was a mcp_hints.py skeleton with function bodies — that's plan-level detail. Now a template catalog only: trigger → emission rows, no Python.

  8. Locked consumer model: hints are documentation-grade, not programmatic-dispatch. Added explicit decision; UC table column renamed from "agent action" to "agent reads".

3 drifts caught during consistency pass

  • UC count claim line still said "15 representative use cases" after UC15 moved to §6. Updated to 14.
  • UC2 row still had paraphrased "walk DECLARES_CLIENT then CALLS" left over from an earlier draft — replaced with the concrete two-call form, matching the fix in item 3.
  • §7.6 rendered-string cap had an honest tension between cap-on-template and cap-on-rendered-string. Locked: cap is on the rendered string with placeholders substituted, enforced by a unit test using realistic placeholder values. Templates that can't render within 120 chars are dropped from v1.

State after revision

Ready for another pass.

Copy link
Copy Markdown
Owner Author

@HumanBean17 HumanBean17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (propose/HINTS-ROAD-SIGNS-PROPOSE.md)

Overall the direction is strong: the §1 “road sign” frame, trigger-vs-emission (§2.9), output-level hints, caps, static templates, and findresolve alignment fit the strict frame and existing MCP V2 docs. A few catalog/contract gaps should be fixed in the propose (or explicitly scoped as ontology/API work) so implementation does not ship invalid neighbors calls.

Blocking / high priority

1. OVERRIDES in hint emissions vs current EdgeType

Templates that say neighbors(..., edge_types=['OVERRIDES']) conflict with mcp_v2.py today: OVERRIDES is documented alongside rollup keys as not a valid neighbors(edge_types=…) literal, and EdgeType does not list OVERRIDES. The “overriders” / OVERRIDDEN_BY.* rows need either (a) a multi-hop emission using only real relationship types that match how the rollup is defined, or (b) an explicit proposal to extend EdgeType + neighbors for OVERRIDES (ontology bump / implementation scope), not a hints-only add.

2. find “page full” and observable state

FindOutput currently has no echoed limit, so len(results) >= limit compares against request parameters that are not part of pure “payload-only” hint generation unless you extend the contract. Consider: thread (output, request_kwargs) into hint generation, or echo limit/offset on FindOutput, and align §3.2 wording with that.

3. Search score < 0.5

Hybrid/RRF scores may not be calibrated so 0.5 means “low confidence.” Either tie the threshold to documented ranking behavior, make it configurable, or use a structural signal; otherwise the hint risks being arbitrary across corpora/settings.

Medium / clarity4. Kind gates for DECLARES.* rollups

The table already separates type vs method describe; worth requiring tests (or explicit pseudo-code) so method rows never fire type-only rollup triggers incorrectly.

5. §2.8 vs cap

“No coalescing” may duplicate the same next-call shape and burn the 5-hint budget; one sentence on whether duplicate normalized call-shapes dedupe under the cap would help implementers.

6. “Backwards-compatible”

Minor: repo policy allows breaking changes; “additive for clients that ignore new fields” may be enough for a design doc.

Answers to your review goals

  1. §1 frame — Good anchor; does not block a future structured next-step surface.

  2. §3.3 completeness — Fine for v1 once override-related emissions are fixed or scoped as API/ontology work. Optional later: HTTP_CALLS / ASYNC_CALLS, IMPLEMENTS/EXTENDS for types.

  3. §7.12 priority — Plausible; priority only matters once every emitted hint uses valid tool/edge literals.

  4. §5 out-of-scope — Prefetch and structured records can stay out; request context for pagination hints is the main tweak worth allowing in-scope without expanding to Shape 2/3.

@HumanBean17
Copy link
Copy Markdown
Owner Author

Real catches on 1 and 2 — both expose contract gaps the consistency pass missed because it only checked propose-internal cross-references, not propose ↔ mcp_v2.py reality. Going through one by one.

Blocking

1. OVERRIDES in hint emissions vs current EdgeType — agree.

Confirmed: EdgeType literal in mcp_v2.py:21 is EXTENDS, IMPLEMENTS, INJECTS, DECLARES, DECLARES_CLIENT, CALLS, EXPOSES, HTTP_CALLS, ASYNC_CALLS. OVERRIDES and OVERRIDDEN_BY are documented as virtual rollup keys on edge_summary, explicitly not valid for neighbors(edge_types=…). The propose currently emits neighbors(...,'in',['OVERRIDES']) in §3.3 rows 1–2, UC2, UC14, and Appendix A — every one of those would be a strict-frame failure at call time.

This violates §2.9 (the principle this revision just added: triggers vs emissions, atomic EdgeType only in emissions). OVERRIDES is no more atomic-EdgeType than DECLARES.DECLARES_CLIENT is.

Two clean options:

  • (a) Scope the override emissions out of v1. Drop the four override-axis template rows; keep the triggers as “signal only, no emission” notes, deferred to whenever the ontology bump lands. Loses ~3 UC value but keeps v1 honest.
  • (b) Pre-require an OVERRIDES ontology bump as a dependency. Land a tiny prior PR adding "OVERRIDES" to EdgeType and the corresponding kuzu query path, bump ontology_version, then this propose stays as-is.

Recommendation: (b). The OVERRIDDEN_BY.in > 0 → “list overriders” redirect is one of the highest-value hints in the catalog (the whole point of the rollup in describe), and OVERRIDES is the natural inverse of OVERRIDDEN_BY rather than an ad-hoc addition. Adding it elevates a rollup-only relation to a first-class edge, which the override-axis rollups already imply exists in the graph.

Will add a §6 dependency note + a new locked decision: "v1 hints depend on OVERRIDES being added to EdgeType; ordered before the hints implementation PR." If you prefer (a), say the word and I'll cut those rows instead.

2. find page-full and observable state — agree.

§2 principle says hints are pure-payload-server-side; §3.3 row says len(results) >= limit; FindOutput doesn't echo limit. That's a contract contradiction the consistency pass missed. Two repairs available:

  • (a) Echo limit (and probably offset) on FindOutput. Smallest delta to the contract; trivially observable; aligns with how other tools could grow the same affordance. Slight bloat on every find response.
  • (b) Pass (output, request_kwargs) into the hint generator. Keeps FindOutput unchanged but breaks the "hints are computed from the output object alone" mental model and forces every call site to thread two arguments.

Recommendation: (a). The contract should make “you got a full page” a property of the output, not a synthesis between request and output. It also makes the same hint computable by a client agent that wants to verify, which is in spirit with the documentation-grade-not-programmatic-dispatch lock.

Will add: FindOutput.limit: int and FindOutput.offset: int echoes, with the same fields proposed (separately) for any future tool that grows pagination. New locked decision: "Pagination state is part of the response payload, not the call context."

3. Search score < 0.5 — partially agree, weaker fix than 1/2.

Fair point that 0.5 is not a calibrated threshold across RRF / hybrid / pure-cosine modes. Three options:

  • (a) Tie to ranking config. Use search's known scoring path (current default is RRF per search.py) and pick a per-mode threshold documented in the search docs, not in this propose.
  • (b) Make it configurable. Add a search_hint_low_confidence_threshold knob; default to whatever calibration says.
  • (c) Use a structural signal. E.g., len(results) < 3 AND no result has score > p90_baseline — drop the absolute threshold entirely.

Recommendation: drop the threshold from v1 entirely and replace with a structural signal: "all returned hits have score within 10% of each other AND len(results) == limit" → emits "results look weak — try find(role=…) or narrow query". That's calibration-free, observable from payload alone, and matches the documentation-grade contract.

Will lock: "Low-confidence search hint is structural (score spread + page state), not threshold-on-score. v1 emits no hint when uncertain rather than emitting a miscalibrated one."

Medium

4. Kind gates for DECLARES.* rollups — agree, add test scenarios.

The §3.3 table already says "describe (type Symbol)" vs "describe (method Symbol)", but the propose doesn't pin this to a test. Will add to §6: "Kind gate: method-Symbol describe with type-only rollup keys present → no hint emitted (impossible state, but test as a regression bumper)."

5. §2.8 vs cap — agree, dedupe normalized call-shape.

This is a real semantic gap. Two rows can trigger the same emission shape (e.g., a class with DECLARES.DECLARES_CLIENT and also DECLARES.DECLARES_CLIENT from a different rollup path — contrived but possible after future ontology growth). Will add a sentence to §2.8: "Coalescing by trigger is forbidden; deduplication by normalized emission string after template rendering is required and runs before the 5-hint cap. The cap counts unique rendered strings."

Will lock as a new decision.

6. "Backwards-compatible" wording — disagree (minor).

Repo policy does allow breaking changes, but the hints field on output types isn't a breaking change in any meaningful sense — adding an optional field to a Pydantic output that defaults to [] doesn't break any current caller, agent or test. "Additive for clients that ignore new fields" is accurate as written and useful for the future case where someone resurrects a pre-hints client. I'd rather keep the phrasing and let the policy override apply only to actually-breaking changes.

If you want the wording trimmed for tone, happy to soften to "additive: existing callers see no behavioral change." But I don't want to delete the compatibility note entirely.

Review goals

  1. §1 frame — agreed, no change.
  2. §3.3 completeness — once 1 lands (override emissions resolved), the optional later additions (HTTP_CALLS, ASYNC_CALLS, IMPLEMENTS, EXTENDS rollups for types) make sense for a v2 hints expansion, not v1. Will add to §5 "out of scope, v2 candidate" rather than expand v1.
  3. §7.12 priority — yes, priority is downstream of valid emissions; that's exactly why I want to lock 1 and 2 first.
  4. §5 out-of-scope, request context for pagination hints — this is item 2(a) above; once limit/offset are echoed on FindOutput, pagination hint generation stays pure-payload and request context is still out of scope. Will reflect this in §5 as a clarifying line.

What I'll change in the next push

  • OVERRIDES ontology dependency (or row drop — pending your pick on 1a vs 1b)
  • FindOutput.limit/FindOutput.offset echo + locked decision
  • Search hint: structural signal, not score threshold
  • §6 test scenario for kind gates
  • §2.8 dedupe-by-rendered-string + locked decision
  • §5 lines for pagination request-context clarification + v2 hint catalog candidates
  • Keep "additive for clients that ignore new fields" wording

That's 5 changes + 1 keep. Confirm OVERRIDES direction (a vs b) and I'll push.

@HumanBean17 HumanBean17 force-pushed the propose/hints-road-signs branch from 3ea64e3 to 1e55cd5 Compare May 15, 2026 20:29
@HumanBean17
Copy link
Copy Markdown
Owner Author

Applied all 5 changes + the keep. Force-pushed as 1e55cd5.

What changed

1. OVERRIDES ontology bump (option b, your call).

  • Migration reshaped from 1 PR to 2 PRs:
    • PR-A (prereq): promote "OVERRIDES" to a first-class EdgeType in mcp_v2.py, extend the Kuzu query path, bump ontology_version, adjust the edge_summary description to drop the "not valid for neighbors()" carve-out for OVERRIDES only (keep it for OVERRIDDEN_BY and dot-key rollups).
    • PR-B: hints field + pagination echo + v1 template catalog.
  • §3.3 rows 3 and 4 keep their neighbors([id],'in',['OVERRIDES']) emissions but now annotate "requires PR-A; see §6". Appendix A rows annotated the same way.
  • New locked decision §7.17 captures the rationale: OVERRIDES is the natural inverse of OVERRIDDEN_BY, already implied by the rollup; elevating it preserves the highest-value hints in the catalog rather than scoping them out.
  • OVERRIDDEN_BY stays rollup-only (no plan to promote it; it's a count-of-incoming-overrides view, not a forward edge).

2. FindOutput.limit / FindOutput.offset echoes.

  • §3.1 extended with the FindOutput shape showing the new echoed fields.
  • The page-full hint trigger now reads output.limit and len(output.results) — pure-payload, request kwargs not threaded into the hint generator. New named test scenario in §6 (PR-B): "Pagination-echo scenario: FindOutput round-trips limit/offset verbatim; the page-full hint fires iff len(results) >= limit".
  • New locked decision §7.18: pagination state is part of the response payload, not the call context. Same pattern applies if search/neighbors grow pagination in their own future proposes; v1 echoes only on find.
  • §5 extended with two clarifying out-of-scope rows: request-context plumbing for pagination, and pagination echo on tools other than find.

3. Structural low-confidence search signal.

  • Replaced top score < 0.5 (uncalibrated, mode-dependent) with a structural trigger: len(results) == limit AND (max_score - min_score) < 0.1 * max_score. Calibration-free, observable from payload alone, robust across RRF / hybrid / pure-cosine ranking modes.
  • §3.3 row, UC10, and the Appendix A row all updated.
  • Hint text changed from "low-confidence match — try a more specific query" to "results look weak — narrow the query or try find(role=…)". The new wording is concretely actionable (suggests the find(role=…) redirect, which is the natural fallback when text search is weak).
  • New locked decision §7.19: when the signal is uncertain v1 emits no hint rather than a miscalibrated one.

4. Kind-gate test scenario.

  • Added to §6 PR-B test scenarios: "Kind-gate scenario: a method-Symbol describe payload synthesized with type-only rollup keys present (impossible but defensible state) emits no type-rollup hints — a regression bumper for the §3.3 kind separator". The §3.3 table already split type vs method; this pins it to a test.

5. Dedupe-by-rendered-string, §2.8 rewritten.

  • §2.8 used to say "we do not coalesce" — that was misleading. The right rule: coalesce only on character-identical rendered output, after template substitution. Distinct triggers that render to different strings stay separate even if they recommend the same conceptual next step.
  • The cap now counts unique rendered strings.
  • §7.12 priority lock updated to reference the dedupe step explicitly: "Priority applies after dedupe-by-rendered-string".
  • Old decision §7.5 ("Drop, do not coalesce") tightened to reference §7.20 and §7.12, since "do not coalesce" was the prior contradicting wording.
  • New named test scenario in §6 PR-B: dedupe runs before the cap.
  • New locked decision §7.20: dedupe-by-rendered-string is required.

6 (keep). "Additive for clients that ignore new fields" wording preserved.

  • The compatibility note stays. Adding an optional Pydantic field that defaults to [] is behaviorally invisible.
  • New locked decision §7.21 makes the position explicit: the hints field is additive on the agent surface; the breaking-changes-allowed policy applies to PR-A's ontology bump (which IS breaking by intent), not to PR-B.
  • TL;DR line updated to make the split explicit: "PR-A bumps the ontology … PR-B adds the hints field … PR-B is additive on the agent-visible surface".

Consistency pass (post-revision)

  • 21 locked decisions (was 16; added §7.17 OVERRIDES bump, §7.18 pagination echo, §7.19 structural search trigger, §7.20 dedupe-by-rendered, §7.21 additive-not-breaking)
  • 9 design principles (unchanged; §2.8 rewritten in place)
  • 14 UCs (unchanged; UC10 wording updated to match the new structural trigger)
  • 2 PRs in TL;DR ↔ 2 PRs in §6 ↔ 2 PRs in the migration narrative — counts agree
  • No surviving "top score < 0.5" or "low-confidence match" outside the §B12 changelog reference
  • §7.5 reconciled with §2.8 / §7.20 (the prior "do not coalesce" wording was contradictory once dedupe-by-rendered was added)

Ready for another pass when you have time.

Copy link
Copy Markdown
Owner Author

@HumanBean17 HumanBean17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second review (updated propose + replies)

The revision fixes most of the first-pass contract gaps: the PR-A/PR-B split, FindOutput pagination echo + pure-payload hint generation, dedupe-then-cap (§2.8 / §7.20), relative score spread for search low-confidence (§7.19), and the §6 test matrix (kind-gate, cap, dedupe, char-cap) all read implementable.

A few doc issues remain—worth fixing before coding so implementers are not forced to guess.

1. Search hint vs “pagination echo only on find” (contradiction)

§5 explicitly defers pagination echo for search (and neighbors) to future proposes; §7.19, §3.3 / Appendix A, §6 search scenario, and UC10 all use len(results) == limit on SearchOutput. Today SearchOutput does not echo limit, and the propose does not add it.

Pick one arm and align §5 / §7.19 / catalog / tests: e.g. echo limit/offset on SearchOutput in PR-B (and narrow the §5 carve-out), or drop limit from the trigger (e.g. score-band + minimum len(results)), or (less ideal given §5) allow request-context for this single trigger.

2. OVERRIDDEN_BY triggers use the wrong edge_summary half

In current rollups (override_axis_rollup_for), composed override keys are emitted as {"in": 0, "out": n}. §3.3 rows that fire on edge_summary["OVERRIDDEN_BY"].in > 0 and ...OVERRIDDEN_BY.DECLARES_CLIENT...in > 0 would never fire. These should use .out > 0 (and UC2 / Appendix A should match).

3. PR-A scope: materialize edges, not only EdgeType

OVERRIDES / override-axis behavior is virtual Cypher today; there is no stored [:OVERRIDES] rel to traverse. PR-A should spell out graph builder + schema work (materialize edges, or an explicitly scoped neighbors special-case—different design). The §6 equivalence test (neighbors vs rollup prediction) is the right acceptance criterion once that storage story is stated.

4. §5 “no cross-tool hints” vs §7.16

The §5 table claims no v1 cross-tool hints, but findresolve is cross-tool and is locked in §7.16. Narrow that §5 row to “no additional cross-tool templates beyond the locked find empty → resolve row.”

5. Minor: FindOutput errors and the route/client template

  • If limit / offset are required fields with no default, define success=False behavior (defaults or optionals) so builders stay simple.
  • The route/client row uses 'EXPOSES' or 'DECLARES_CLIENT' as human-readable text; add a line that the real catalog entries are concrete emissions after substituting {inverse_edge} (two templates or one parameterized string).

Overall: ship the direction; resolve search limit vs §5, fix .in.out, and expand PR-A to the full graph story so PR-B hints are not blocked by missing rels.

@HumanBean17 HumanBean17 force-pushed the propose/hints-road-signs branch from 1e55cd5 to b69dbd8 Compare May 15, 2026 20:40
@HumanBean17
Copy link
Copy Markdown
Owner Author

All 5 caught — agree on every one. Verified against source:

  • kuzu_queries.override_axis_rollup_for writes {"in": 0, "out": n} for all four override-axis keys (lines 675–694). So .in > 0 would indeed have been dead code.
  • OVERRIDES is virtual Cypher in that same function — no stored [:OVERRIDES] rel exists; the rollup is computed from IMPLEMENTS|EXTENDS + signature match. neighbors(..., ['OVERRIDES']) over the existing graph would have returned empty.

Force-pushed as b69dbd8.

What changed

1. Search hint vs §5 carve-out — echo limit/offset on SearchOutput in PR-B.

  • §3.1 now shows both FindOutput and SearchOutput shapes with limit: int | None + offset: int | None echoes.
  • §5 carve-out row renamed to "Pagination echo on neighbors" — search is no longer carved out at v1, only neighbors is (it doesn't have a hint trigger that needs pagination yet).
  • §3.3 search row annotated "Requires SearchOutput.limit echo per §3.1 / §7.18". Appendix A row annotated the same way.
  • §7.18 rewritten to cover both FindOutput and SearchOutput.
  • New §6 PR-B test scenario: pagination echo round-trips on both find and search; structural search hint with limit == None emits nothing.

2. .in → .out direction fix on override-axis triggers.

  • §3.3 rows 3 and 4: edge_summary["OVERRIDDEN_BY"].in > 0.out > 0, same for OVERRIDDEN_BY.DECLARES_CLIENT.
  • UC2 row updated to reference OVERRIDDEN_BY.DECLARES_CLIENT.out > 0.
  • Appendix A entries annotated: # requires PR-A; rollup stores counts on .out per override_axis_rollup_for.
  • New §B19 changelog entry captures the catch.

3. PR-A scope expanded to full graph work — schema + builder + traversal.

  • §6 PR-A rewritten as three-part work:
    • Schema: extend Kuzu schema with OVERRIDES between method Symbols.
    • Builder: write (mover)-[:OVERRIDES]->(m) during build, mirroring the rollup logic verbatim — for each method m on type t, find implementing types impl, find each impl's declared method mover with matching signature, write the edge.
    • Query path: extend neighbors to traverse the stored edge.
    • Then the ontology bump and EdgeType admission as before.
  • Equivalence test scenario sharpened: the rollup is computed from the declaring side, the stored edge is read from the implementer side; the test exercises both directions and the two sets must match.
  • Added schema-roundtrip and builder-symmetry test scenarios.
  • §6 includes an explicit "if materialization cost is too high we'd consider a neighbors special-case dispatching to virtual Cypher — that's a different design and a different propose. This one commits to the stored-edge path."
  • §7.17 rewritten to reflect the three-part scope, not just the literal admission.

4. §5 row "no cross-tool hints" reconciled with §7.16.

  • Old row contradicted the locked find empty → resolve template. New row: "Additional cross-tool hint templates beyond the locked find empty → resolve row (§7.16)" — narrowing to "no additional cross-tool templates", which is what we actually want.

5. Minor — error-path limit/offset defaults + route/client template split.

  • limit / offset typed int | None with default None so error-path builders stay simple. Hint triggers treat None as absent (do not fire). New error-path pagination test scenario in §6 PR-B.
  • Route/client describe template split into two concrete rows (one per kind, each with the concrete inverse edge). Old row's 'EXPOSES' or 'DECLARES_CLIENT' was a human-readable placeholder that wouldn't have rendered to a valid call shape — that violated §2.7 (no alias magic). Now mechanical: kind == route → neighbors(..., ['EXPOSES']); kind == client → neighbors(..., ['DECLARES_CLIENT']). UC6 also split into UC6a/UC6b to match.

Drift caught during consistency pass

  • §3.1 had two SearchOutput blocks (one in the new pagination-echo block, one in the original hints-schema-description block) after the edit. Replaced the second block with a plain hints-description quote so the model definitions appear exactly once.
  • UC count adjusted to 15 (UC6 split into UC6a + UC6b). Count claim lines above and below the UC table both updated.

Consistency pass post-revision

  • 21 locked decisions (unchanged count; §7.17 + §7.18 substantially rewritten)
  • 9 design principles (unchanged)
  • 15 UCs (was 14; UC6 → UC6a + UC6b to match the route/client split in §3.3)
  • 2 PRs (PR-A now three-part graph work + ontology admission; PR-B unchanged in shape)
  • No surviving .in > 0 on override-axis triggers outside the §B19 changelog reference
  • No surviving 'EXPOSES' or 'DECLARES_CLIENT' alias-style template outside the §B23 changelog reference
  • §5 cross-tool row reconciled with §7.16
  • TL;DR ↔ §3.1 ↔ §6 ↔ §7.17 / §7.18 all reference the same PR-A three-part scope and the same FindOutput + SearchOutput pagination story

Ready for another pass.

Copy link
Copy Markdown
Owner Author

@HumanBean17 HumanBean17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third review (post third revision)

Read the full updated propose/HINTS-ROAD-SIGNS-PROPOSE.md on propose/hints-road-signs plus the thread reply. The third pass closes the earlier contradictions; the doc is largely implementation-ready.

Strengths in this revision

  • Search vs §5: SearchOutput.limit / offset echo, §7.18 / §7.19 alignment, and the limit is None → no hint scenario remove the internal inconsistency.
  • Override-axis triggers: .out > 0 matches override_axis_rollup_for ({"in": 0, "out": n}).
  • PR-A: Stored OVERRIDES, schema, traversal, ontology bump, and explicit rejection of virtual neighbors without a new propose.
  • Route vs client: Two concrete rows + UC6a/UC6b fix the valid single-shot call-shape issue.
  • Cross-tool §5 row: Narrowed to “no additional cross-tool templates” — consistent with §7.16.

Remaining gaps / nits

1. PR-A builder: override_axis_rollup_for has two query shapes — §6 mostly narrates one

override_axis_rollup_for runs down (supertype method → implementing methods → OVERRIDDEN_BY) and up (concrete method → parent declaration methods → OVERRIDES). §6 describes (mover)-[:OVERRIDES]->(m) for m on t, which matches the down pattern when m is the supertype method.

The up query adds parent decl_m ids for a concrete m. For stored edges to satisfy the §6 equivalence story (“rollup vs neighbors in both directions”), materialization must cover both virtual patterns — or one explicit unified rule (e.g. whenever method A overrides method B in the rollup sense, emit (A)-[:OVERRIDES]->(B) so neighbors(B,'in',['OVERRIDES']) and neighbors(A,'out',['OVERRIDES']) line up with the right id sets).

As written, a reader might implement only the interface/t + impl loop and miss the concrete-m + parent-decl_m half.

2. Builder module placeholder

§6 cites graph_enrich.py (or wherever). In this repo shape, rel creation usually lives under build_ast_graph.py (with graph_enrich.py for enrichment). Pointing PR-A implementers at the actual graph write pass avoids starting in the wrong module.

3. §5 “Request-context plumbing” row

It still says “Once FindOutput echoes …” while §7.18 now covers both FindOutput and SearchOutput. One sentence update avoids confusion.

4. Appendix B consistency

An older changelog bullet still says the UC count is 14 while the doc now uses 15 UCs (UC6 split). Worth de-staling that historical line.

5. Optional: post-filtered search and “full page”

search_v2 can return fewer results than the requested limit after post-processing filters. len(results) == limit is fine if defined as “full page of returned hits after filters.” A single clarifying sentence in §7.19 or §6 removes ambiguity for implementers.

Verdict

Good to land as the design authority for PR-A/PR-B. The substantive follow-up is making PR-A’s materialization story explicitly cover both halves of override_axis_rollup_for (or the unified directed-edge rule), so equivalence tests and future hints do not half-implement the virtual rollup.

@HumanBean17 HumanBean17 force-pushed the propose/hints-road-signs branch from b69dbd8 to 84fc2a0 Compare May 15, 2026 20:47
@HumanBean17
Copy link
Copy Markdown
Owner Author

Items 1, 2, 3, 5 applied. Pushing back on item 4 (it's a historical changelog entry — see below). Force-pushed as 84fc2a0.

Applied

1. PR-A builder — unified directed-edge rule covering both rollup arms.

Real catch. The old §6 wording ((mover)-[:OVERRIDES]->(m) for m on t) only narrated the down arm of override_axis_rollup_for. The up arm (concrete mparent.decl_m) was implicit; a reader could have implemented half of the rollup and the equivalence test would still half-pass.

Rewrote §6 PR-A builder section as a single rule: (A)-[:OVERRIDES]->(B) whenever subtype-method A overrides supertype-method B (signature match). One walk, one edge per pair, both arms covered:

  • Down: m on supertype tneighbors(m, 'in', ['OVERRIDES']) returns the impl_ids set (the movers).
  • Up: m on subtype → neighbors(m, 'out', ['OVERRIDES']) returns the decl_ids set (parent declarations).

Added builder pseudo-code: walk every method A, walk its declaring type's transitive IMPLEMENTS/EXTENDS ancestors, for each ancestor's method B with matching signature write (A)-[:OVERRIDES]->(B).

Equivalence test scenario rewritten to explicitly exercise both halves: (down) neighbors(supertype_method, 'in', ['OVERRIDES']) == impl_ids, (up) neighbors(subtype_method, 'out', ['OVERRIDES']) == decl_ids. Both must match; if either comes up short, the materialization is missing pairs.

2. Builder module pinned to build_ast_graph.py.

Verified by grep — build_ast_graph.py has the CREATE REL TABLE definitions (lines 2176–2208) and the per-edge CREATE (a)-[:EDGE]->(b) writes. graph_enrich.py has no CREATE REL or relationship-write statements; it's enrichment-only.

§6 schema and builder bullets both now pin to build_ast_graph.py explicitly, with an inline note that graph_enrich.py does not create relationships. The old "graph_enrich.py (or wherever)" phrasing is gone.

3. §5 "Request-context plumbing" row updated to mention both FindOutput and SearchOutput.

Stale text from before §7.18 expanded to cover both outputs. One-sentence fix.

5. §7.19 page-full clarification for search.

Implementer-relevant clarification. search_v2 may post-filter; the trigger needs to be defined against what the agent actually sees. §7.19 now says: len(results) == limit means "the returned result list, after any post-processing filters, fills the requested page." The trigger reads output.results, not a pre-filter count. This is the right level for a propose — it's a definitional decision about what the hint observes, not implementation detail.

Pushed back on

4. Appendix B §B5 "UC count is now 14" — keeping the historical bullet.

The Appendix B changelog records what was true at each revision. §B5 was added in the first re-grilling pass when UC15 was moved to a test scenario, dropping the count from 15 to 14. UC6 splitting into UC6a + UC6b (which brought the count back up to 15) happened in the third pass and is recorded in §B23.

Editing §B5 to retroactively read "UC count is 15" would muddy traceability — a reader following the timeline would see "UC15 dropped, count is now 14" → §B23 "UC6 split, count is 15", which is the actual history. Overwriting §B5 to claim "count is 15" makes §B23's contribution invisible.

The doc's current state (the parts a reviewer evaluates the design from — TL;DR, §3.3, §4, §6, §7) all consistently say 15. Only the layered changelog records the in-between counts, which is what it's for.

If you'd prefer a different traceability convention here (e.g., changelog entries always reflect final state and we lose the pass-by-pass diff), happy to switch — but I'd want that as a separate decision rather than retconning this one bullet.

Consistency pass

  • 21 locked decisions (unchanged; §7.19 wording sharpened)
  • 9 design principles (unchanged)
  • 15 UCs (unchanged; current-state)
  • 2 PRs; PR-A's builder section now states the unified rule and pins the module
  • 10 changelog entries in Appendix B (4 new from this pass: §B25 unified-edge rule, §B26 builder module, §B27 §5 row update, §B28 §7.19 post-filter clarification)
  • Both graph_enrich.py mentions are now explicitly negative references ("does not create relationships") rather than implementation pointers
  • No surviving stale text on the items applied

Ready for landing if you're satisfied with the push-back on 4, or happy to revisit if you'd rather have a flat (current-state-only) changelog convention going forward.

@HumanBean17 HumanBean17 merged commit 2569af4 into master May 15, 2026
1 check passed
@HumanBean17 HumanBean17 deleted the propose/hints-road-signs branch May 23, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant