PR - M10 — Knowledge Base (RAG) by ByteStreams-AI · Pull Request #14 · ByteStreams-AI/dialtone

ByteStreams-AI · 2026-05-04T01:07:28Z

feat(rag): implement rag knowledge base

Closes #13

Greptile Summary

This PR ships the complete M10 RAG Knowledge Base pipeline: two SECURITY DEFINER SQL RPCs (knowledge_replace_chunks, knowledge_lookup), three Edge Functions (knowledge_reindex, vapi_lookup_knowledge, knowledge_preview), shared chunking/embedding/threshold modules, the admin /settings/knowledge CRUD page, seed data, and 60 new tests. The implementation is well-structured, with per-candidate threshold filtering, a deterministic content-aware mock embedding for CI, and cross-tenant scoping enforced in SQL.

Confidence Score: 5/5

Safe to merge — all findings are P2 style/UX suggestions; no logic errors or security issues found

No P0 or P1 issues found. The single new inline comment is a UX polish suggestion (silent poll timeout). Previously-flagged P2s (silent empty list on query error, category_hint not forwarded to SQL, topSimilarity fallback category mismatch) don't affect correctness on the happy path. Tests are comprehensive and the security model is consistent with the established codebase pattern.

apps/admin/src/pages/settings/knowledge-page.tsx (poll timeout UX), supabase/functions/vapi_lookup_knowledge/index.ts (p_category forwarding, previously flagged)

Important Files Changed

Filename	Overview
apps/admin/src/pages/settings/knowledge-page.tsx	New admin Knowledge Base page — CRUD, fire-and-poll reindex, and Q&A preview; error handling is mostly solid but the 60-second polling cap exits silently when indexing is still in-progress
supabase/functions/knowledge_reindex/index.ts	JWT-gated reindex pipeline — chunks body, embeds via OpenAI (with mock fallback), atomically replaces chunks via RPC; error handling and auth checks are thorough
supabase/functions/vapi_lookup_knowledge/index.ts	RAG retrieval Edge Function — `p_category` is not forwarded to `knowledge_lookup` SQL even when `category_hint` is set (already flagged), so hinted-category chunks ranking outside the top-5 globally are invisible to the filter
packages/shared/src/voice/knowledge.ts	Per-category threshold policy and `applyThresholds` — solid per-candidate filtering; when `categoryHint` is set but no candidates match, `topSimilarity` falls back to the best score from a different category (already flagged)
packages/shared/src/voice/embedding.ts	OpenAI embeddings wrapper with deterministic sparse-activation mock — content-aware cosine signal for tests, `OPENAI_MOCK_MODE=never` production guard, correct L2-normalization
supabase/migrations/0012_m10_knowledge_rpcs.sql	Adds `knowledge_replace_chunks` (atomic delete+insert, flips status to `ready`) and `knowledge_lookup` (kNN cosine via pgvector, scoped to restaurant+active+ready); both correctly restricted to `service_role`
packages/shared/src/voice/chunk.ts	Paragraph-first / sentence-fallback chunker with overlap — approx token counting, clean boundary logic, well-tested
packages/shared/src/voice/tools.ts	Adds `lookup_knowledge` tool schema with clear verbatim-question instruction; gated on `knowledge_base_enabled` in `selectTools`, ordered correctly before `end_call`

Comments Outside Diff (3)

packages/shared/src/voice/knowledge.ts, line 1503-1517 (link)

topSimilarity can report a non-hint-category score when hint filters to empty

When categoryHint is set but no candidates in the kNN result match that category, filtered is empty and topSimilarity falls back to candidates[0]?.similarity — the best score from a different category. The debug panel then shows, for example, top_similarity=0.43, threshold=0.55 (dietary), making it look as if a dietary chunk nearly cleared the threshold when no dietary chunk was even in the top-K. A more accurate fallback would be 0 (or omitted) when the hinted-category result set is empty.
```
const topSimilarity = filtered.length > 0
  ? filtered[0]!.similarity
  : (categoryHint ? 0 : (candidates[0]?.similarity ?? 0));
```
supabase/functions/vapi_lookup_knowledge/index.ts, line 2870-2875 (link)

category_hint not pushed to SQL — hint-category chunks can be missed

When args.category_hint is set, applyThresholds narrows the kNN result to only candidates in the hinted category. But the SQL knowledge_lookup is called without p_category, so it fetches the top RAW_FETCH_K (5) rows across all categories. If those 5 rows are dominated by non-hint categories, relevant chunks in the hinted category that rank below position 5 globally will never be seen by the filter. The SQL function already accepts p_category precisely for this case — passing it here guarantees the top-K is drawn from the right category pool.
```
const { data: rawData, error: rawErr } = await client.rpc('knowledge_lookup', {
  p_restaurant_id: state.restaurant_id,
  p_query_embedding: embeddingStr,
  p_category: args.category_hint ?? undefined,
  p_limit: RAW_FETCH_K,
});
```
The same pattern applies to knowledge_preview/index.ts.
supabase/seed.sql, line 3143-3150 (link)

Catering source seeded as 'pending' but comment says 'ready'

The block comment directly above these inserts states "Sources are seeded with index_status='ready'", but the catering source is seeded as 'pending'. All other Sui's Sushi sources are 'ready', so knowledge_base_enabled is unaffected in practice — but the inconsistency between the comment and the actual value could confuse future maintainers who rely on the comment to understand the seed state.

_{Reviews (2): Last reviewed commit: "fix(rag): sparse mock embedding so CI in..." | Re-trigger Greptile}

greptile-apps · 2026-05-04T01:12:27Z

+
+      <label className="flex items-center gap-2 text-xs text-stone-400">
+        <input
+          type="checkbox"
+          checked={showInactive}
+          onChange={(e) => setShowInactive(e.target.checked)}
+          className="accent-emerald-500"
+        />
+        Show inactive
+      </label>
+
+      {error && (
+        <div className="rounded-lg border border-red-500/30 bg-red-500/10 px-4 py-2 text-sm text-red-300">
+          {error}
+        </div>
+      )}
+
+      {visible.length === 0 ? (
+        <div className="rounded-xl border border-dashed border-white/10 bg-white/5 px-6 py-12 text-center text-sm text-stone-400">
+          No knowledge sources yet. Add a dietary policy, parking notes, or FAQ entry to enable the
+          voice agent's knowledge lookup tool.
+        </div>
+      ) : (
+        <div className="overflow-hidden rounded-xl border border-white/10 bg-white/5">
+          <table className="w-full text-sm">
+            <thead className="bg-stone-950/60 text-left text-xs uppercase tracking-wide text-stone-500">
+              <tr>
+                <th className="px-5 py-3">Title</th>
+                <th className="px-5 py-3">Category</th>
+                <th className="px-5 py-3">Status</th>


Supabase query failures in fetchRows produce a silent empty list

Neither the knowledge_sources select nor the knowledge_chunks select checks for an error return value. If either query fails (network blip, RLS violation), sources and chunks are null, setRows([]) is called, and the UI renders an empty "No knowledge sources yet" state with no indication of the problem. Setting an error state on failure here would give operators a clear signal rather than a misleading empty view.

…OPENAI_API_KEY The dense hash-spread mock from PR #13 produced cosine values around 0.05 across all queries — hash noise drowning shared-keyword signal. CI runs without OPENAI_API_KEY (mock mode), so two integration tests failed: parking lookup and large-party catering routing. Switched the mock to sparse per-word activation: - Each unique stemmed token activates ONE specific dimension (hash(stem) mod 1536), set to 1 (additive on duplicates). - Crude prefix-truncation stemmer ("park"/"parking" → "park", "parties"/"party" → "part") so caller queries bridge to owner content that uses different word forms. - L2-normalize at the end. Cosine now reflects actual shared-vocabulary count instead of being swamped by per-dimension noise. Locally: - "is the salmon gluten free" → dietary 0.54 - "where do I park" → parking 0.26 - "kids menu" → kids menu 0.32 - "large party 30 people" → catering 0.37 - off-topic gibberish → 0.00 All 109 integration tests pass in both mock mode (CI) and live mode (local dev with OPENAI_API_KEY set). Unit tests in embedding.test.ts already covered the relative-similarity properties (shared keywords score higher than disjoint vocabulary); they pass against the new sparse implementation unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-05-04T01:39:51Z

Want your agent to iterate on Greptile's feedback? Try greploops.

Closes the gap between the M10 PR (#14) updating supabase/seed.sql with knowledge sources and the cloud DB never running `db reset` (which would wipe live customer data). The cloud has migration 0012 applied (RPCs exist) but no knowledge_sources rows for Sui's Sushi — so `lookup_knowledge` is absent from the tool list and scenarios 2, 11, 14 from developer/dialtone_full_call_scenarios.html fail. - scripts/cloud-seed-kb.sql — idempotent INSERT for the six Sui's KB sources (dietary, parking, kids menu, catering, payments, hours). Uses fixed UUIDs from supabase/seed.sql + ON CONFLICT (id) DO NOTHING so re-running is a no-op. Inserts with index_status= 'pending' so knowledge_base_enabled stays false until the reindex script populates chunks — avoids a window where the agent sees lookup_knowledge in tools but retrieval returns empty. - scripts/cloud-reindex-kb.sh — signs in via STAFF_EMAIL/PASSWORD, loops the six source IDs through the knowledge_reindex Edge Function. Each call chunks + embeds + atomically replaces, flips the row to index_status='ready'. Reports per-source result with embedding mode (live/mock). Exits non-zero on any failure with three common-cause hints (401 auth / 403 staff / 500 OpenAI key missing). Made executable. After running both, the next inbound call to +16296001047 exposes lookup_knowledge in the tool list and the agent can answer dietary / parking / FAQ / catering questions from the KB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The deploy workflow ships admin + kitchen + Edge Functions on every push to main, but does NOT run `pnpm supabase db push --linked`. M10 (PR #14) merged with migration 0012_m10_knowledge_rpcs.sql; the Edge Function code deployed automatically but the migration never landed in cloud. Discovered when `knowledge_reindex` returned a generic 500 — OpenAI embed had been hit successfully (visible in OpenAI dashboard) but the subsequent `knowledge_replace_chunks` RPC call hit "function does not exist". Two doc updates: - developer/m8-runbook.md — adds a callout at the top of the Deployment + rollback section. Operators see this when they look up deploy procedure. Includes the verification SQL. - AGENTS.md — adds Open follow-up #5 documenting the gap and proposing a fix path for M12/M13 (a separate `migrate.yml` workflow with a manual confirmation gate, not auto-push, since destructive migrations need human review). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(rag): implement rag knowledge base

2e87369

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

ByteStreams-AI merged commit 0b1e491 into main May 4, 2026
3 of 4 checks passed

ByteStreams-AI deleted the feat/m10-rag branch May 4, 2026 02:04

ByteStreams-AI mentioned this pull request May 4, 2026

docs: M11 test plan + M8 ops runbook + check-keys script #15

Merged

4 tasks

ByteStreams-AI mentioned this pull request May 4, 2026

docs(agents): three deploy gaps observed during M10/M11 ramp #16

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR - M10 — Knowledge Base (RAG)#14

PR - M10 — Knowledge Base (RAG)#14
ByteStreams-AI merged 2 commits intomainfrom
feat/m10-rag

ByteStreams-AI commented May 4, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot May 4, 2026

Uh oh!

greptile-apps Bot commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ByteStreams-AI commented May 4, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ByteStreams-AI commented May 4, 2026 •

edited by greptile-apps Bot

Loading