Skip to content

PR - M10 — Knowledge Base (RAG)#14

Merged
ByteStreams-AI merged 2 commits intomainfrom
feat/m10-rag
May 4, 2026
Merged

PR - M10 — Knowledge Base (RAG)#14
ByteStreams-AI merged 2 commits intomainfrom
feat/m10-rag

Conversation

@ByteStreams-AI
Copy link
Copy Markdown
Owner

@ByteStreams-AI ByteStreams-AI commented May 4, 2026

feat(rag): implement rag knowledge base

Closes #13

Greptile Summary

This PR ships the complete M10 RAG Knowledge Base pipeline: two SECURITY DEFINER SQL RPCs (knowledge_replace_chunks, knowledge_lookup), three Edge Functions (knowledge_reindex, vapi_lookup_knowledge, knowledge_preview), shared chunking/embedding/threshold modules, the admin /settings/knowledge CRUD page, seed data, and 60 new tests. The implementation is well-structured, with per-candidate threshold filtering, a deterministic content-aware mock embedding for CI, and cross-tenant scoping enforced in SQL.

Confidence Score: 5/5

Safe to merge — all findings are P2 style/UX suggestions; no logic errors or security issues found

No P0 or P1 issues found. The single new inline comment is a UX polish suggestion (silent poll timeout). Previously-flagged P2s (silent empty list on query error, category_hint not forwarded to SQL, topSimilarity fallback category mismatch) don't affect correctness on the happy path. Tests are comprehensive and the security model is consistent with the established codebase pattern.

apps/admin/src/pages/settings/knowledge-page.tsx (poll timeout UX), supabase/functions/vapi_lookup_knowledge/index.ts (p_category forwarding, previously flagged)

Important Files Changed

Filename Overview
apps/admin/src/pages/settings/knowledge-page.tsx New admin Knowledge Base page — CRUD, fire-and-poll reindex, and Q&A preview; error handling is mostly solid but the 60-second polling cap exits silently when indexing is still in-progress
supabase/functions/knowledge_reindex/index.ts JWT-gated reindex pipeline — chunks body, embeds via OpenAI (with mock fallback), atomically replaces chunks via RPC; error handling and auth checks are thorough
supabase/functions/vapi_lookup_knowledge/index.ts RAG retrieval Edge Function — p_category is not forwarded to knowledge_lookup SQL even when category_hint is set (already flagged), so hinted-category chunks ranking outside the top-5 globally are invisible to the filter
packages/shared/src/voice/knowledge.ts Per-category threshold policy and applyThresholds — solid per-candidate filtering; when categoryHint is set but no candidates match, topSimilarity falls back to the best score from a different category (already flagged)
packages/shared/src/voice/embedding.ts OpenAI embeddings wrapper with deterministic sparse-activation mock — content-aware cosine signal for tests, OPENAI_MOCK_MODE=never production guard, correct L2-normalization
supabase/migrations/0012_m10_knowledge_rpcs.sql Adds knowledge_replace_chunks (atomic delete+insert, flips status to ready) and knowledge_lookup (kNN cosine via pgvector, scoped to restaurant+active+ready); both correctly restricted to service_role
packages/shared/src/voice/chunk.ts Paragraph-first / sentence-fallback chunker with overlap — approx token counting, clean boundary logic, well-tested
packages/shared/src/voice/tools.ts Adds lookup_knowledge tool schema with clear verbatim-question instruction; gated on knowledge_base_enabled in selectTools, ordered correctly before end_call

Comments Outside Diff (3)

  1. packages/shared/src/voice/knowledge.ts, line 1503-1517 (link)

    P2 topSimilarity can report a non-hint-category score when hint filters to empty

    When categoryHint is set but no candidates in the kNN result match that category, filtered is empty and topSimilarity falls back to candidates[0]?.similarity — the best score from a different category. The debug panel then shows, for example, top_similarity=0.43, threshold=0.55 (dietary), making it look as if a dietary chunk nearly cleared the threshold when no dietary chunk was even in the top-K. A more accurate fallback would be 0 (or omitted) when the hinted-category result set is empty.

    const topSimilarity = filtered.length > 0
      ? filtered[0]!.similarity
      : (categoryHint ? 0 : (candidates[0]?.similarity ?? 0));
  2. supabase/functions/vapi_lookup_knowledge/index.ts, line 2870-2875 (link)

    P2 category_hint not pushed to SQL — hint-category chunks can be missed

    When args.category_hint is set, applyThresholds narrows the kNN result to only candidates in the hinted category. But the SQL knowledge_lookup is called without p_category, so it fetches the top RAW_FETCH_K (5) rows across all categories. If those 5 rows are dominated by non-hint categories, relevant chunks in the hinted category that rank below position 5 globally will never be seen by the filter. The SQL function already accepts p_category precisely for this case — passing it here guarantees the top-K is drawn from the right category pool.

    const { data: rawData, error: rawErr } = await client.rpc('knowledge_lookup', {
      p_restaurant_id: state.restaurant_id,
      p_query_embedding: embeddingStr,
      p_category: args.category_hint ?? undefined,
      p_limit: RAW_FETCH_K,
    });

    The same pattern applies to knowledge_preview/index.ts.

  3. supabase/seed.sql, line 3143-3150 (link)

    P2 Catering source seeded as 'pending' but comment says 'ready'

    The block comment directly above these inserts states "Sources are seeded with index_status='ready'", but the catering source is seeded as 'pending'. All other Sui's Sushi sources are 'ready', so knowledge_base_enabled is unaffected in practice — but the inconsistency between the comment and the actual value could confuse future maintainers who rely on the comment to understand the seed state.

Reviews (2): Last reviewed commit: "fix(rag): sparse mock embedding so CI in..." | Re-trigger Greptile

Comment on lines +199 to +228

<label className="flex items-center gap-2 text-xs text-stone-400">
<input
type="checkbox"
checked={showInactive}
onChange={(e) => setShowInactive(e.target.checked)}
className="accent-emerald-500"
/>
Show inactive
</label>

{error && (
<div className="rounded-lg border border-red-500/30 bg-red-500/10 px-4 py-2 text-sm text-red-300">
{error}
</div>
)}

{visible.length === 0 ? (
<div className="rounded-xl border border-dashed border-white/10 bg-white/5 px-6 py-12 text-center text-sm text-stone-400">
No knowledge sources yet. Add a dietary policy, parking notes, or FAQ entry to enable the
voice agent's knowledge lookup tool.
</div>
) : (
<div className="overflow-hidden rounded-xl border border-white/10 bg-white/5">
<table className="w-full text-sm">
<thead className="bg-stone-950/60 text-left text-xs uppercase tracking-wide text-stone-500">
<tr>
<th className="px-5 py-3">Title</th>
<th className="px-5 py-3">Category</th>
<th className="px-5 py-3">Status</th>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Supabase query failures in fetchRows produce a silent empty list

Neither the knowledge_sources select nor the knowledge_chunks select checks for an error return value. If either query fails (network blip, RLS violation), sources and chunks are null, setRows([]) is called, and the UI renders an empty "No knowledge sources yet" state with no indication of the problem. Setting an error state on failure here would give operators a clear signal rather than a misleading empty view.

…OPENAI_API_KEY

The dense hash-spread mock from PR #13 produced cosine values around
0.05 across all queries — hash noise drowning shared-keyword signal.
CI runs without OPENAI_API_KEY (mock mode), so two integration tests
failed: parking lookup and large-party catering routing.

Switched the mock to sparse per-word activation:
- Each unique stemmed token activates ONE specific dimension
  (hash(stem) mod 1536), set to 1 (additive on duplicates).
- Crude prefix-truncation stemmer ("park"/"parking" → "park",
  "parties"/"party" → "part") so caller queries bridge to owner
  content that uses different word forms.
- L2-normalize at the end.

Cosine now reflects actual shared-vocabulary count instead of being
swamped by per-dimension noise. Locally:
  - "is the salmon gluten free" → dietary 0.54
  - "where do I park" → parking 0.26
  - "kids menu" → kids menu 0.32
  - "large party 30 people" → catering 0.37
  - off-topic gibberish → 0.00

All 109 integration tests pass in both mock mode (CI) and live mode
(local dev with OPENAI_API_KEY set). Unit tests in embedding.test.ts
already covered the relative-similarity properties (shared keywords
score higher than disjoint vocabulary); they pass against the new
sparse implementation unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

@ByteStreams-AI ByteStreams-AI merged commit 0b1e491 into main May 4, 2026
3 of 4 checks passed
@ByteStreams-AI ByteStreams-AI deleted the feat/m10-rag branch May 4, 2026 02:04
ByteStreams-AI added a commit that referenced this pull request May 4, 2026
Closes the gap between the M10 PR (#14) updating supabase/seed.sql
with knowledge sources and the cloud DB never running `db reset`
(which would wipe live customer data). The cloud has migration 0012
applied (RPCs exist) but no knowledge_sources rows for Sui's Sushi —
so `lookup_knowledge` is absent from the tool list and scenarios 2,
11, 14 from developer/dialtone_full_call_scenarios.html fail.

- scripts/cloud-seed-kb.sql — idempotent INSERT for the six Sui's KB
  sources (dietary, parking, kids menu, catering, payments, hours).
  Uses fixed UUIDs from supabase/seed.sql + ON CONFLICT (id) DO
  NOTHING so re-running is a no-op. Inserts with index_status=
  'pending' so knowledge_base_enabled stays false until the reindex
  script populates chunks — avoids a window where the agent sees
  lookup_knowledge in tools but retrieval returns empty.
- scripts/cloud-reindex-kb.sh — signs in via STAFF_EMAIL/PASSWORD,
  loops the six source IDs through the knowledge_reindex Edge
  Function. Each call chunks + embeds + atomically replaces, flips
  the row to index_status='ready'. Reports per-source result with
  embedding mode (live/mock). Exits non-zero on any failure with
  three common-cause hints (401 auth / 403 staff / 500 OpenAI key
  missing). Made executable.

After running both, the next inbound call to +16296001047 exposes
lookup_knowledge in the tool list and the agent can answer dietary /
parking / FAQ / catering questions from the KB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ByteStreams-AI added a commit that referenced this pull request May 4, 2026
The deploy workflow ships admin + kitchen + Edge Functions on every
push to main, but does NOT run `pnpm supabase db push --linked`. M10
(PR #14) merged with migration 0012_m10_knowledge_rpcs.sql; the
Edge Function code deployed automatically but the migration never
landed in cloud. Discovered when `knowledge_reindex` returned a
generic 500 — OpenAI embed had been hit successfully (visible in
OpenAI dashboard) but the subsequent `knowledge_replace_chunks` RPC
call hit "function does not exist".

Two doc updates:

- developer/m8-runbook.md — adds a callout at the top of the
  Deployment + rollback section. Operators see this when they look
  up deploy procedure. Includes the verification SQL.
- AGENTS.md — adds Open follow-up #5 documenting the gap and
  proposing a fix path for M12/M13 (a separate `migrate.yml`
  workflow with a manual confirmation gate, not auto-push, since
  destructive migrations need human review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

M10 — Knowledge Base (RAG)

1 participant