Skip to content

fix(retrieval): wire AsyncEmbeddingWorker into mcp-server bootstrap (B-MCP-3)#17

Merged
h2devx merged 1 commit intodevelopfrom
feature/b-mcp-3-wire-embedding-worker
May 1, 2026
Merged

fix(retrieval): wire AsyncEmbeddingWorker into mcp-server bootstrap (B-MCP-3)#17
h2devx merged 1 commit intodevelopfrom
feature/b-mcp-3-wire-embedding-worker

Conversation

@h2devx
Copy link
Copy Markdown
Contributor

@h2devx h2devx commented May 1, 2026

Que cambia

Wire the AsyncEmbeddingWorker into the mcp-server bootstrap so the
embedding_queue actually drains in production. Cambios son
puramente de wiring + un test de regresion; no se toco el codigo del
worker ni el de las use cases.

Por que

Fixes #2 — B-MCP-3 (critical). El worker existia y estaba testeado al
100% pero ningun bootstrap lo instanciaba. La cola crecia
indefinidamente y mem.recall caia silenciosamente a BM25-only
("fallback_reason: embedder_unavailable"), rompiendo la promesa
central del producto. El dogfood de Phase-9 dejo 64 rows en
embedding_queue con attempts=0 — evidencia empirica del bug.

Tipo de cambio

  • fix — bug fix
  • test — agrega tests

Checklist

  • npm run typecheck EXIT=0
  • npm run lint y npm run lint:tests EXIT=0
  • npm run validate:modules EXIT=0 (cero violaciones ADR-001)
  • npm run build EXIT=0
  • npm run test EXIT=0 — 2504 tests passing en 206 archivos (was 2501 in 205)
  • Cero any, cero as any, cero // @ts-ignore
  • Tests nuevos cubren el cambio
  • N/A — wire/protocolo MCP no cambia
  • N/A — no introduce ADR
  • HANDOFF.md se actualiza al cierre de la fase v0.1.2-beta.1, no en este PR

E2E que validan VALORES

  • Los tests nuevos asertan valores reales (no solo shape)

El test tests/integration/L-embedding-worker-drains.test.ts sigue
la metodologia codificada en Phase-9:

  1. Estado inicial conocido: embedding_queue=0, embedding_metadata=0.
  2. Insertar 1 decision + 1 learning + 1 entity → la cola crece a 3
    filas (target_kind en [decision, entity, learning]).
  3. Arrancar el worker, hacer poll hasta que la cola caiga a 0.
  4. Asertar valores: queue=0, metadata=3 con dimension=384 y
    embedded_text no vacio, stub embedder invocado >=3 veces.
  5. stop() idempotente.

Un segundo caso valida que el worker sobrevive un fallo transitorio
del embedder (failNext=true) sin perder la fila de la cola — guarda
contra el anti-patron "fail-and-forget" que reintroduciria una
variante de B-MCP-3.

Notas para el reviewer

Decisiones de wiring:

  • AsyncEmbeddingWorker se construye en buildRetrievalWiring, no
    en el bootstrap entrypoint. La razon: la wiring helper es donde se
    ensamblan los use cases retrieval; el entrypoint solo controla
    ciclos de vida del proceso. Esto deja RetrievalWiring.embeddingWorker
    disponible para tests y cualquier futuro driver (CLI long-running,
    daemon mode).
  • Container ahora expone workspaceId publicamente. Antes vivia
    como local en buildContainer. Util para el bootstrap del MCP
    server (no se ejercita en este PR pero queda disponible para
    proximos drivers).
  • En el SIGINT/SIGTERM path: worker.stop() se awaita antes de
    shutdown() para que el drain en vuelo no quede a la deriva
    cuando se cierra el handle de SQLite.
  • No toco cli-entrypoint.ts. El HANDOFF lo menciona "para
    comandos long-running" pero el CLI hoy son one-shots. Es scope
    creep; lo dejo para una iteracion separada si surge el caso.

Que NO valida este PR:

  • Que el path real con FastembedEmbedder funcione end-to-end. Para
    eso hace falta un E2E con descarga real del modelo desde GCS —
    posible pero costoso (30+s en cold cache). El test de integracion
    con stub valida el contrato; el real-fastembed quedaria para una
    validacion de release adicional.

…B-MCP-3)

The worker class was implemented and unit-tested at 100% but no
production code path instantiated it. The embedding_queue grew
unbounded and mem.recall silently fell back to BM25-only — breaking
the semantic-recall guarantee of the product (issue #2).

Changes:
- buildRetrievalWiring now constructs the worker bound to the workspace
  id and exposes it on RetrievalWiring. Lifecycle (start/stop) is owned
  by the bootstrap entrypoint, not the wiring helper.
- Container exposes workspaceId on its public surface so bootstrap
  drivers can reuse the canonical id without re-resolving config.json.
- mcp-server-entrypoint.ts starts the worker after bootstrapComposition
  and awaits worker.stop() before shutdown() in both the SIGINT/SIGTERM
  path and the normal-exit finally block.

Regression test (tests/integration/L-embedding-worker-drains.test.ts)
validates VALUES not SHAPE — the methodology Phase-9 codified after
B-MCP-1/2/3 escaped the MVP suite. The test asserts: starting state
queue=0, three records enqueue 3 rows, worker drains to 0 and
embedding_metadata grows by 3 with the expected dimension. A second
case asserts the worker survives a transient embed failure without
dropping the queue row.

5/5 EXIT=0:
- typecheck, lint, lint:tests, validate:modules, build, test
- 2504 tests passing in 206 files (was 2501 in 205)

Closes #2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@h2devx h2devx merged commit 229e7cd into develop May 1, 2026
1 check passed
@h2devx h2devx deleted the feature/b-mcp-3-wire-embedding-worker branch May 1, 2026 20:34
@h2devx h2devx mentioned this pull request May 1, 2026
10 tasks
h2devx added a commit that referenced this pull request May 2, 2026
## Que cambia

Cuts `v0.1.2-beta.3` consolidando los 4 fixes de Phase-11 que cierran
los 4 bugs de dogfood reportados en Phase-9 (B-MCP-2/3/4/5).

## Por que

`@netzi/recall@0.1.2-beta.0` (en npm beta channel) tenia 4 issues
abiertos que rompian la promesa central del producto (semantic
recall, mem.health diagnostics, decision content storage). Phase-11
los cerro todos via PRs squash-mergeados a `develop`. Esta release
branch consolida los version bumps + release notes + HANDOFF
update para promover beta.3 a `main` y publish a npm.

## Tipo de cambio

- [x] chore — release (no code change beyond version bumps)

## Cambios incluidos (commits ya en develop, no parte de este PR)

| Issue | Tag | PR | Commit en develop |
|---|---|---|---|
| [#2](#2) | B-MCP-3 critical
| [#17](#17) | `229e7cd` |
| [#1](#1) | B-MCP-2 high |
[#18](#18) | `05b6731` |
| [#4](#4) | B-MCP-5 low |
[#19](#19) | `c4a2d1d` |
| [#3](#3) | B-MCP-4 critical
(data loss) | [#20](#20) |
`52fbfd9` |

## Cambios de este PR

- `code/package.json`: `0.1.2-beta.0` → `0.1.2-beta.3`
- `code/src/bootstrap/composition-root.ts`: default `serverInfo.version`
actualizado
- `code/sonar-project.properties`: `projectVersion` actualizado
- `docs/RELEASE-NOTES-v0.1.2-beta.3.md`: NUEVO, documentando los 4 fixes
+ migration safety + engineering metrics + outstanding caveats + path a
v0.1.2 stable
- `HANDOFF.md`: §0 actualizado al estado post-Phase-11; §6.16 nueva con
la cronologia completa, 10 decisiones del orquestador (D-1101..D-1110),
y 6 hallazgos durables

## Checklist

- [x] `npm run typecheck` EXIT=0
- [x] `npm run lint` y `npm run lint:tests` EXIT=0
- [x] `npm run validate:modules` EXIT=0
- [x] `npm run build` EXIT=0
- [x] `npm run test` EXIT=0 — 2519 tests passing en 208 archivos
- [x] Cero `any`, cero `as any`, cero `// @ts-ignore`
- [x] HANDOFF.md actualizado (§0 + §6.16 nueva)
- [x] Release notes consolidadas en
`docs/RELEASE-NOTES-v0.1.2-beta.3.md`

## E2E que validan VALORES

- [x] N/A — release branch sin cambios de codigo nuevos. Los 3 archivos
test value-validation (L/M/N) ya estan en develop via PRs #17/#18/#20.

## Plan post-merge

```bash
git checkout main && git pull
git tag -a v0.1.2-beta.3 -m "v0.1.2-beta.3"
git push origin v0.1.2-beta.3
gh release create v0.1.2-beta.3 --prerelease --notes-file docs/RELEASE-NOTES-v0.1.2-beta.3.md

# Usuario ejecuta el publish:
cd code && npm publish --tag beta --auth-type=web

# Merge-back develop:
git checkout develop && git merge --no-ff main && git push origin develop

# Dogfood real:
npm install -g @netzi/recall@beta && recall mcp serve  # o el comando equivalente
```

Si el dogfood post-publish surface nuevos defectos → cortar `beta.4`.
Si pasa limpio → cortar `release/0.1.2` (stable) + promover `latest`
desde 0.1.1 → 0.1.2 + hard-deprecate 0.1.1.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
h2devx added a commit that referenced this pull request May 2, 2026
#23)

## Que cambia

Hotfix que actualiza los docs publicos (README.md raiz, code/README.md
shipped en npm tarball, SECURITY.md, CONTRIBUTING.md) para reflejar
que los 4 bugs B-MCP-2..5 quedaron cerrados en \`v0.1.2-beta.3\`.

## Por que

PR #21 cerro el release v0.1.2-beta.3 pero solo actualizo:
- bumps de version (package.json, composition-root, sonar)
- HANDOFF.md (§0 + §6.16)
- docs/RELEASE-NOTES-v0.1.2-beta.3.md (nuevo)

**Olvidamos los public-facing docs**. Cuando el usuario tomo el
codigo a publish, el README.md root y el code/README.md (shipped en
npm tarball como package README) seguian diciendo:
- "trabajo activo en 0.1.2-beta.x"
- "4 issues abiertos hoy (B-MCP-2..5)"
- "npm install -g @netzi/recall" (defaults a latest=0.1.1 deprecada)

Sin este hotfix, la pagina del paquete en npmjs.com mostraria
informacion incorrecta sobre el estado del producto justo despues
del publish.

## Tipo de cambio

- [x] docs — solo documentacion publica
- [x] hotfix — fix sobre main (perdimos esto en el release; sin esto
      el npm publish sale con package.json correcto pero README
      stale)

## Updates

- **README.md** (root): banner refleja "v0.1.2-beta.3 cierra los 4
  bugs"; quick start nota canal beta; "Issues" reads "0 issues
  abiertos al cierre de Phase-11 (cerrados via PRs #17/#18/#19/#20)".
- **code/README.md** (shipped en npm tarball como package README):
  install command recomienda `@beta` con nota sobre por que; CTA
  apunta a beta.3 en vez de latest=0.1.1 deprecada.
- **SECURITY.md**: tabla de versiones soportadas incluye
  `0.1.2-beta.3` (active), marca `0.1.2-beta.0` como superseded,
  nota que latest=0.1.1 sigue deprecada hasta que 0.1.2 stable
  promote.
- **CONTRIBUTING.md**: "Issues y bugs" reads "0 issues abiertos al
  cierre de Phase-11" con pointers a §6.16 + release notes.

## Artefactos historicos intencionalmente sin tocar

- `docs/RELEASE-NOTES-v0.1.2-beta.0.md` — snapshot de un release
  anterior; release notes son inmutables por convencion.
- `CONTRIBUTING.md` line 139 commit-format example mencionando
  beta.0 — illustrative, no truth claim.

## Checklist

- [x] `npm run typecheck` EXIT=0
- [x] `npm run lint` y `npm run lint:tests` EXIT=0
- [x] `npm run validate:modules` EXIT=0
- [x] `npm run build` EXIT=0
- [x] `npm run test` EXIT=0 (no source files touched, suite previa sigue
valida)
- [ ] N/A — wire/protocolo MCP no cambia
- [ ] N/A — no introduce ADR

## Plan post-merge

Per CONTRIBUTING.md hotfix flow: tras merge a main, hacer
merge-back a develop (PR separado). Adicionalmente, **actualizar el
tag v0.1.2-beta.3** para que apunte al commit con los doc fixes
incluidos antes del npm publish:

\`\`\`bash
git checkout main && git pull
git tag -d v0.1.2-beta.3
git push origin :refs/tags/v0.1.2-beta.3
git tag -a v0.1.2-beta.3 -m "v0.1.2-beta.3"
git push origin v0.1.2-beta.3
gh release edit v0.1.2-beta.3 --target main # apunta el release al nuevo
SHA
cd code && npm publish --tag beta --auth-type=web
\`\`\`

(El tag v0.1.2-beta.3 actualmente apunta a \`a826ef0\`, sin los
doc fixes. Re-tagging es seguro porque el GitHub release de
beta.3 aun no tiene downloads y npm publish todavia no se completo.)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
h2devx added a commit that referenced this pull request May 3, 2026
## Summary

**First stable release of `@netzi/recall`.** Channel promotion from
`0.1.2-beta.6` (commit `f3aca46`) to the `latest` dist-tag. **No code
changes** — same binary bit-for-bit modulo the version string.

Decision: skip the soak 24-48h post-beta.6 (justified by fresh smoke
10/10 PASS + 0 issues + cycle of 7 betas that already closed 8 bugs
vinculated to real use). The cycle's bug discovery rate has been "1 bug
per beta surfaced by dogfood" — staying on beta.6 longer would not
surface new information.

| Item | Value |
|---|---|
| Tag | `v0.1.2` (no suffix) |
| Channel | `latest` (was `beta` for the entire `0.1.2-beta.*` cycle) |
| Tests | **2560 passing** in 212 files (no change vs beta.6) |
| Coverage SonarQube | overall 96.4%, ratings A/A/A |
| Issues at release | **0** open |

## What 0.1.2 stable consolidates

The `0.1.2-beta.*` cycle ran from 2026-04-28 (beta.0) to 2026-05-03
(beta.6). Each release surfaced exactly one bug from real-world dogfood
that the previous release exposed:

| Beta | Date | Bugs closed | PR |
|---|---|---|---|
| `0.1.2-beta.0` | 2026-04-28 | preventive cut after Phase-9 dogfood
discovery | n/a |
| `0.1.2-beta.3` | 2026-05-01 | **B-MCP-2** + **B-MCP-3** + **B-MCP-4**
+ **B-MCP-5** | [#17](#17),
[#18](#18),
[#19](#19),
[#20](#20) |
| `0.1.2-beta.4` | 2026-05-02 | **B-MCP-7** |
[#27](#27) |
| `0.1.2-beta.5` | 2026-05-02 | **B-MCP-8** |
[#33](#33) |
| `0.1.2-beta.6` | 2026-05-03 | **carryover** `serverInfo.version` |
[#37](#37) |
| **`0.1.2` (this)** | 2026-05-03 | channel promotion + npm deprecation
of 0.1.0/0.1.1 | this PR |

Plus B-MCP-1 closed in v0.1.1 (Phase-8 same-day patch). Total: 8 bugs
closed end-to-end via dogfood + smoke loop.

## Files in this release branch

| Capa | Archivos |
|---|---|
| Code | (cero cambios — channel promotion sin cambios de logica) |
| Release tooling | `code/package.json` +
`code/sonar-project.properties` (bump 0.1.2-beta.6 → 0.1.2) |
| Release notes | `docs/RELEASE-NOTES-v0.1.2.md` (NEW, ~250 LOC
consolidating the full cycle + migration guide) |
| Public docs | `README.md`, `code/README.md`, `SECURITY.md` (banner
"stable", install command without @beta, version table updated) |
| HANDOFF | `HANDOFF.md` (§0 + new §6.21 Phase-16 + footer) |

## Validation

| Check | Result |
|---|---|
| `npm run typecheck` | EXIT=0 |
| `npm run lint` (max-warnings 0) | EXIT=0 |
| `npm run lint:tests` (max-warnings 0) | EXIT=0 |
| `npm run validate:modules` | PASS — no module violations |
| `npm run build` (tsup) | EXIT=0 |
| `npm test` | **2560 passing** in 212 files |

## Test plan

- [x] 5+1 EXIT=0 over the release branch.
- [x] PR #38 (release v0.1.2-beta.6) merged to main.
- [x] Fresh smoke 10/10 PASS against npx-installed beta.6 in fresh
workspace (this is the same binary).
- [ ] CI green on this PR.
- [ ] Squash-merge to main.
- [ ] Tag `v0.1.2` annotated to merge commit + push (`git switch
--detach v0.1.2` because of protected-branch hook).
- [ ] `gh release create v0.1.2 --target main --notes-file
docs/RELEASE-NOTES-v0.1.2.md` — **NO `--prerelease`**.
- [ ] `cd code && npm publish --auth-type=web` — **NO `--tag beta`** →
publishes directly to `latest`.
- [ ] `npm deprecate @netzi/recall@0.1.0 "deprecated due to bugs
B-MCP-1..8 (closed in 0.1.2). Use @netzi/recall@latest"`.
- [ ] `npm deprecate @netzi/recall@0.1.1 "deprecated due to bugs
B-MCP-2..8 (closed in 0.1.2). Use @netzi/recall@latest"`.
- [ ] `npm view @netzi/recall dist-tags` should return `{ latest:
'0.1.2', beta: '0.1.2-beta.6' }`.
- [ ] `npx --yes @netzi/recall@latest --help` should execute 0.1.2 (not
0.1.1).
- [ ] Merge-back develop ← main via `chore/sync-develop-after-0.1.2`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[B-MCP-3] AsyncEmbeddingWorker never instantiated in production — embedding queue grows unbounded

1 participant