Skip to content

fix(retrieval): always return top hit + skip-and-continue on token budget overflow (B-MCP-8)#33

Merged
h2devx merged 1 commit intodevelopfrom
fix/b-mcp-8-recall-empty-hits
May 2, 2026
Merged

fix(retrieval): always return top hit + skip-and-continue on token budget overflow (B-MCP-8)#33
h2devx merged 1 commit intodevelopfrom
fix/b-mcp-8-recall-empty-hits

Conversation

@h2devx
Copy link
Copy Markdown
Contributor

@h2devx h2devx commented May 2, 2026

Summary

Closes #31. Fixes B-MCP-8 (medium, discovered in Phase-14 smoke against the dogfood DB): mem.recall returned total_candidates>0 but hits=0 whenever the top-ranked candidate alone exceeded max_tokens.

The old loop in RecallMemoryUseCase.rankAndSlice did if (runningTokens + tokens > max) break from the first iteration, leaving out=[]. The dogfood DB tripped this because learning rows store full content un-truncated (decisions and entities truncate at 600 chars; learnings and turns do not), so a single long learning could cost more tokens than the 4000-token default budget.

Why it surfaced now (beta.4) and not before (beta.3)

Pre-B-MCP-7 (beta.3), the embedding worker never drained the queue, so vector hits were always empty and recall fell back to BM25-only. BM25 prioritized entries with exact lexical matches — typically short titles. With B-MCP-7 fixed in beta.4, the worker now populates embeddings end-to-end, and the hybrid scorer began ranking long-form learnings higher (because their semantic vectors matched). Those long learnings then triggered the budget-overflow break.

Fix

Three changes:

  1. Always include the top-ranked hit, even if it solo exceeds the budget. Returning zero hits when there ARE candidates surprises callers and degrades the semantic-recall promise; one slightly-oversized result is strictly more useful than no result.
  2. Use continue (not break) on overflow for subsequent hits, so a mid-loop oversized candidate doesn't suppress smaller hits behind it in the ranking.
  3. Bump RecallMemoryFacadeAdapter.DEFAULT_MAX_TOKENS from 4000 → 8000 for consistency with GetContextFacadeAdapter (recall returns full ranked entries with previews, so a tighter budget than the bundle made no sense).

Tests (VALUES not SHAPE — Phase-9 rule)

The pre-existing unit test ("trims the tail when cumulative token cost would exceed maxTokens") used expect(getEntries().length).toBeLessThanOrEqual(1) — that loose assertion silently passed even at length=0, exactly the bug. It's now toBe(1) plus an exact id assertion.

  • Tighten "trims the tail" + "respects the filters.limit slice" to assert exact lengths.
  • New unit test: top hit is always returned when it solo exceeds budget.
  • New unit test: a mid-ranking oversized hit is skipped and smaller hits behind it still surface (continue-vs-break semantics).
  • New integration test: reproduces the dogfood scenario by recording a 1.8 KiB learning and asserting mem.recall("GitFlow", max_tokens: 50) returns hits.length >= 1.
  • New integration test: the new 8000-token default lets multiple hits surface for a literal "hexagonal" query against the seeded corpus.

5+1 EXIT=0

Check Result
npm run typecheck EXIT=0
npm run lint EXIT=0
npm run lint:tests EXIT=0
npm run validate:modules PASS — no module violations (retrieval×46 + curator×10 cross-imports as ADR-001)
npm run build EXIT=0
npm test 2557 passing in 211 files (+4 vs 2553 baseline)

Test plan

  • Pre-fix unit test reproducing the bug (always includes the top-ranked hit even when it solo exceeds maxTokens (B-MCP-8)) — fails on develop, passes on this branch.
  • Pre-fix integration test reproducing the dogfood scenario (returns the top hit even when its preview alone exceeds maxTokens (B-MCP-8)) — fails on develop, passes on this branch.
  • All 2557 existing tests still pass (no regressions).
  • CI green (typecheck + lint + lint:tests + validate:modules + build + test:coverage + SonarQube quality gate strict).
  • Post-merge: cut release/0.1.2-beta.5 and validate fix end-to-end against the real dogfood DB.

🤖 Generated with Claude Code

…dget overflow (B-MCP-8)

Closes #31. `mem.recall` returned `total_candidates>0` but `hits=0`
whenever the top-ranked candidate alone exceeded `max_tokens`. The
old loop in `RecallMemoryUseCase.rankAndSlice` did `if (runningTokens
+ tokens > max) break` from the first iteration, leaving `out=[]`.
The dogfood DB tripped this because learning rows store the full
content un-truncated (decisions and entities truncate at 600 chars;
learnings and turns do not), so a single long learning could cost
more tokens than the 4000-token default budget.

Fix:
1. Always include the top-ranked hit, even if it solo exceeds the
   budget. Returning zero hits when there ARE candidates surprises
   callers and degrades the semantic-recall promise; one slightly
   oversized result is strictly more useful than no result.
2. Use `continue` (not `break`) on overflow for subsequent hits, so a
   mid-loop oversized candidate doesn't suppress smaller hits behind
   it in the ranking.
3. Bump `RecallMemoryFacadeAdapter.DEFAULT_MAX_TOKENS` from 4000 to
   8000 for consistency with `GetContextFacadeAdapter` (recall returns
   full ranked entries with previews, so a tighter budget than the
   bundle made no sense).

Tests (VALUES not SHAPE — Phase-9 rule):
- Tighten the existing "trims the tail" unit test from
  `toBeLessThanOrEqual(1)` to `toBe(1)` plus exact id assertion. The
  loose assertion silently passed at length=0, exactly the bug.
- Tighten "respects the filters.limit slice" the same way.
- New unit test: top hit is always returned when it solo exceeds
  budget.
- New unit test: a mid-ranking oversized hit is skipped and smaller
  hits behind it still surface (continue-vs-break semantics).
- New integration test: reproduces the dogfood scenario by recording
  a 1.8 KiB learning and asserting `mem.recall("GitFlow",
  max_tokens: 50)` returns hits.length >= 1.
- New integration test: the new 8000-token default lets multiple
  hits surface for a literal "hexagonal" query against the seeded
  corpus.

5+1 EXIT=0: typecheck, lint, lint:tests, validate:modules, build,
test (2557 passing in 211 files, +4 vs baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@h2devx h2devx merged commit ee74d36 into develop May 2, 2026
1 check passed
@h2devx h2devx deleted the fix/b-mcp-8-recall-empty-hits branch May 2, 2026 19:41
h2devx added a commit that referenced this pull request May 2, 2026
…>0) (#34)

## Summary

Cuts `v0.1.2-beta.5` from `develop` after [PR
#33](#33) closed
[B-MCP-8](#31): `mem.recall`
returned `total_candidates>0` but `hits=0` whenever the top-ranked
candidate alone exceeded `max_tokens`. The dogfood DB tripped this
because learning rows store the full content un-truncated and the hybrid
scorer (now active end-to-end thanks to B-MCP-7) ranks long learnings
highly.

This release ships **only the B-MCP-8 fix + version/docs sync**. No new
features.

| Item | Value |
|---|---|
| Issues closed | [#31
B-MCP-8](#31) (medium) |
| PRs merged into develop since beta.4 |
[#33](#33) |
| Tests | **2557 passing** in 211 files (+4 vs 2553 at beta.4) |
| Coverage on new code | 100% (PR #33) |
| Coverage overall | 96.4% |

## Highlights of the fix (PR #33)

1. **`RecallMemoryUseCase.rankAndSlice` always includes the top-ranked
hit** — even if it solo exceeds `max_tokens`. Returning zero hits when
there ARE candidates surprises callers and degrades the semantic-recall
promise; one slightly-oversized result is strictly more useful than no
result.
2. **`continue` (not `break`) on overflow** for subsequent hits — a
mid-loop oversized candidate doesn't suppress smaller hits behind it in
the ranking.
3. **`RecallMemoryFacadeAdapter.DEFAULT_MAX_TOKENS` bumped 4000 → 8000**
for consistency with `GetContextFacadeAdapter`.

## Tests (VALUES not SHAPE — Phase-9 rule)

The pre-existing unit test ("trims the tail when cumulative token cost
would exceed maxTokens") used
`expect(getEntries().length).toBeLessThanOrEqual(1)` — that loose
assertion silently passed even at length=0, exactly the bug. Now
`toBe(1)` plus an exact id assertion.

- 2 pre-existing assertions tightened from `toBeLessThanOrEqual(N)` →
`toBe(N)`.
- 2 new unit tests in `recall-memory.use-case.test.ts` (top-hit always
returned; mid-ranking overflow uses continue).
- 2 new integration tests in `D-mem-recall.test.ts` (B-MCP-8 dogfood
reproducer + new 8000-token default).

## Files in this release branch

| Capa | Archivos |
|---|---|
| Code (already in develop via PR #33) |
`code/src/modules/retrieval/application/use-cases/recall-memory.use-case.ts`;
`code/src/composition/facades/mcp-server-facades.ts`; 2 test files |
| Release tooling (this PR) | `code/package.json` +
`code/sonar-project.properties` (bump 0.1.2-beta.4 → 0.1.2-beta.5) |
| Release notes (this PR) | `docs/RELEASE-NOTES-v0.1.2-beta.5.md` (NEW)
|
| Public docs (this PR) | `README.md`, `code/README.md`, `SECURITY.md`
(banner / install / version table) |
| HANDOFF (this PR) | `HANDOFF.md` (§0 + new §6.20 Phase-15 + footer) |

## Validation

| Check | Result |
|---|---|
| `npm run typecheck` | EXIT=0 |
| `npm run lint` (max-warnings 0) | EXIT=0 |
| `npm run lint:tests` (max-warnings 0) | EXIT=0 |
| `npm run validate:modules` | PASS — no module violations |
| `npm run build` (tsup) | EXIT=0 |
| `npm test` | **2557 passing** in 211 files |
| SonarQube quality gate | already PASSED on PR #33 (coverage new 100%,
ratings A) |

## Test plan

- [x] 5+1 EXIT=0 over the release branch.
- [x] PR #33 merged into develop with CI green + SonarQube quality gate
PASSED.
- [ ] CI green on this PR.
- [ ] Squash-merge to main.
- [ ] Tag `v0.1.2-beta.5` annotated to merge commit + push (with `git
switch --detach v0.1.2-beta.5` because of the protected-branch hook).
- [ ] `gh release create v0.1.2-beta.5 --prerelease --notes-file
docs/RELEASE-NOTES-v0.1.2-beta.5.md --target main`.
- [ ] `cd code && npm publish --tag beta --auth-type=web` (WebAuthn
passkey).
- [ ] Smoke against the dogfood DB: `mem.recall` with paraphrased
queries should return `hits.length >= 1` even when the top-ranked hit is
a long learning.
- [ ] Merge-back develop ← main via `chore/sync-develop-after-beta-5`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
h2devx added a commit that referenced this pull request May 2, 2026
…ated B-MCP-8 fix end-to-end (#36)

## Summary

Standard phase-close docs PR (pattern:
[#25](#25),
[#28](#28),
[#32](#32)). Updates HANDOFF.md
§0 + §6.20 + footer to reflect Phase-15 fully closed end-to-end.

## What Phase-15 delivered (all already shipped)

- [PR #33](#33) — B-MCP-8 fix
(always-include-top-hit + continue-not-break + default max_tokens
4000→8000).
- [PR #34](#34) — release
v0.1.2-beta.5 to main with conflict resolution (--ours).
- Tag `v0.1.2-beta.5` → `4a281f0` + GitHub pre-release.
- `npm publish --tag beta` (user, WebAuthn passkey).
- Smoke against dogfood DB: **2/2 PASS, 0 FAIL** —
`mem.recall("GitFlow")` returns `hits=2` (was 0 in beta.4);
`mem.recall("embedding worker async")` returns `hits=1` (was 0).
- [PR #35](#35) — merge-back
develop ← main with conflict resolution (--theirs).

## What this PR adds

Only HANDOFF.md changes (33 insertions, 30 deletions):

- §0: 10 rows updated to reflect post-publish reality (Fecha, Fase
actual, Paquete npm, Estado del release, Memoria propia, Proximo paso,
etc.).
- §6.20: sub-fases 5-9 marked complete with concrete outcomes (commit
SHAs, conflict resolution patterns, smoke results); new lecciones
durables (#5 smoke script reusability, #6 wire output shape parsing, #7
conflict resolution symmetry); estado del repo post-Phase-15 with actual
SHAs and dist-tags; Siguiente accion concreta refocused.
- Footer "Ultima actualizacion" updated to reflect end-to-end closure.

## Caveat tracked for next phase

`serverInfo.version` reported by the JSON-RPC handshake reads
`0.1.2-beta.3` even though the installed binary is beta.5. Confirmed in
smoke. Cosmetic only — needs `grep -rn "0.1.2-beta" code/src` to locate
hardcoded value before promoting to `release/0.1.2` stable.

## Test plan

- [x] No code changes — pure docs PR (HANDOFF.md only).
- [x] Hooks pre-commit no-op (no `code/src/` changes → typecheck not
triggered).
- [ ] CI green on this PR (typecheck + lint + lint:tests +
validate:modules + build + test:coverage + Sonar all pass over unchanged
source).
- [ ] Squash-merge to develop.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
h2devx added a commit that referenced this pull request May 3, 2026
## Summary

**First stable release of `@netzi/recall`.** Channel promotion from
`0.1.2-beta.6` (commit `f3aca46`) to the `latest` dist-tag. **No code
changes** — same binary bit-for-bit modulo the version string.

Decision: skip the soak 24-48h post-beta.6 (justified by fresh smoke
10/10 PASS + 0 issues + cycle of 7 betas that already closed 8 bugs
vinculated to real use). The cycle's bug discovery rate has been "1 bug
per beta surfaced by dogfood" — staying on beta.6 longer would not
surface new information.

| Item | Value |
|---|---|
| Tag | `v0.1.2` (no suffix) |
| Channel | `latest` (was `beta` for the entire `0.1.2-beta.*` cycle) |
| Tests | **2560 passing** in 212 files (no change vs beta.6) |
| Coverage SonarQube | overall 96.4%, ratings A/A/A |
| Issues at release | **0** open |

## What 0.1.2 stable consolidates

The `0.1.2-beta.*` cycle ran from 2026-04-28 (beta.0) to 2026-05-03
(beta.6). Each release surfaced exactly one bug from real-world dogfood
that the previous release exposed:

| Beta | Date | Bugs closed | PR |
|---|---|---|---|
| `0.1.2-beta.0` | 2026-04-28 | preventive cut after Phase-9 dogfood
discovery | n/a |
| `0.1.2-beta.3` | 2026-05-01 | **B-MCP-2** + **B-MCP-3** + **B-MCP-4**
+ **B-MCP-5** | [#17](#17),
[#18](#18),
[#19](#19),
[#20](#20) |
| `0.1.2-beta.4` | 2026-05-02 | **B-MCP-7** |
[#27](#27) |
| `0.1.2-beta.5` | 2026-05-02 | **B-MCP-8** |
[#33](#33) |
| `0.1.2-beta.6` | 2026-05-03 | **carryover** `serverInfo.version` |
[#37](#37) |
| **`0.1.2` (this)** | 2026-05-03 | channel promotion + npm deprecation
of 0.1.0/0.1.1 | this PR |

Plus B-MCP-1 closed in v0.1.1 (Phase-8 same-day patch). Total: 8 bugs
closed end-to-end via dogfood + smoke loop.

## Files in this release branch

| Capa | Archivos |
|---|---|
| Code | (cero cambios — channel promotion sin cambios de logica) |
| Release tooling | `code/package.json` +
`code/sonar-project.properties` (bump 0.1.2-beta.6 → 0.1.2) |
| Release notes | `docs/RELEASE-NOTES-v0.1.2.md` (NEW, ~250 LOC
consolidating the full cycle + migration guide) |
| Public docs | `README.md`, `code/README.md`, `SECURITY.md` (banner
"stable", install command without @beta, version table updated) |
| HANDOFF | `HANDOFF.md` (§0 + new §6.21 Phase-16 + footer) |

## Validation

| Check | Result |
|---|---|
| `npm run typecheck` | EXIT=0 |
| `npm run lint` (max-warnings 0) | EXIT=0 |
| `npm run lint:tests` (max-warnings 0) | EXIT=0 |
| `npm run validate:modules` | PASS — no module violations |
| `npm run build` (tsup) | EXIT=0 |
| `npm test` | **2560 passing** in 212 files |

## Test plan

- [x] 5+1 EXIT=0 over the release branch.
- [x] PR #38 (release v0.1.2-beta.6) merged to main.
- [x] Fresh smoke 10/10 PASS against npx-installed beta.6 in fresh
workspace (this is the same binary).
- [ ] CI green on this PR.
- [ ] Squash-merge to main.
- [ ] Tag `v0.1.2` annotated to merge commit + push (`git switch
--detach v0.1.2` because of protected-branch hook).
- [ ] `gh release create v0.1.2 --target main --notes-file
docs/RELEASE-NOTES-v0.1.2.md` — **NO `--prerelease`**.
- [ ] `cd code && npm publish --auth-type=web` — **NO `--tag beta`** →
publishes directly to `latest`.
- [ ] `npm deprecate @netzi/recall@0.1.0 "deprecated due to bugs
B-MCP-1..8 (closed in 0.1.2). Use @netzi/recall@latest"`.
- [ ] `npm deprecate @netzi/recall@0.1.1 "deprecated due to bugs
B-MCP-2..8 (closed in 0.1.2). Use @netzi/recall@latest"`.
- [ ] `npm view @netzi/recall dist-tags` should return `{ latest:
'0.1.2', beta: '0.1.2-beta.6' }`.
- [ ] `npx --yes @netzi/recall@latest --help` should execute 0.1.2 (not
0.1.1).
- [ ] Merge-back develop ← main via `chore/sync-develop-after-0.1.2`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[B-MCP-8] mem.recall returns total_candidates>0 but hits=0 after beta.4 smoke

1 participant