Skip to content

refactor(schema): lazy heal for all Deeplake tables (memory, sessions, skills)#177

Merged
efenocchi merged 11 commits into
mainfrom
feat/lazy-schema-heal-all-tables
May 19, 2026
Merged

refactor(schema): lazy heal for all Deeplake tables (memory, sessions, skills)#177
efenocchi merged 11 commits into
mainfrom
feat/lazy-schema-heal-all-tables

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

@efenocchi efenocchi commented May 18, 2026

Why

Schema healing was inconsistent across the three Deeplake-managed tables:

  • memory / sessions (ensureTable / ensureSessionsTable): one SELECT info_schema per column plus a local marker file, with ad-hoc ensureColumn(table, "agent", …) / ensureColumn(table, "plugin_version", …) calls. Every new column required a matching call site or pre-existing tables would silently drop INSERTs after upgrade.
  • skills (worker insertSkillRow): retry only matched the literal column name contributors. Any future schema addition would fall through the retry and fail INSERTs forever on stale tables.

Underlying reason for the per-column / marker design: a historical Deeplake "post-ALTER bug" (a ~30s window of failing INSERTs after every ALTER). Re-probed against api.deeplake.ai in the test_plugin org on 2026-05-18: bug no longer reproduces (71/71 INSERTs OK, first success 2ms after ALTER — repro script in PR description footer).

What changes

New module src/deeplake-schema.ts

Single source of truth as {name, sql} arrays:

  • MEMORY_COLUMNS (14), SESSIONS_COLUMNS (14), SKILLS_COLUMNS (17)
  • buildCreateTableSql(table, cols) — derives CREATE TABLE from the list.
  • healMissingColumns({ query, tableName, workspaceId, columns })one SELECT against information_schema.columns, diff in memory, then ALTER TABLE ADD COLUMN only the genuinely missing columns. Race ("already exists" from a concurrent writer) caught and re-verified via a second SELECT — that's the only tolerated ALTER failure.
  • isMissingTableError, isMissingColumnError — generic error classifiers. Both exclude permission denied so auth problems aren't masked as heal-able.

Module-load lint rejects any NOT NULL column lacking DEFAULT (ALTER on a populated table would otherwise fail).

src/deeplake-api.ts

ensureTable / ensureSessionsTable / ensureSkillsTable now:

  1. listTables (cached) → CREATE with buildCreateTableSql(..., XXX_COLUMNS) if missing.
  2. On pre-existing table: call healMissingColumns once → one SELECT info_schema, ALTER only the truly missing.

Gone: ensureColumn, ensureEmbeddingColumn, the per-column on-disk marker, the ad-hoc ensureColumn(tbl, "agent", …) calls. ensureLookupIndex and its marker stay (CREATE INDEX is different — rare failure, permanent success, cheap marker).

src/skillify/skills-table.ts (worker INSERT retry)

Drops the contributors-only branch. On isMissingColumnError, runs healMissingColumns over SKILLS_COLUMNS and retries the INSERT once. If the diff reports missing: [] (the failing column isn't in our schema), the original error propagates rather than looping.

InsertSkillRowArgs.workspaceId added (mandatory) so the catalog SELECT scopes to this workspace — prevents a false PRESENT from a same-named table elsewhere.

Behaviour summary

Trigger Before After
Session start, table up-to-date listTables + 3 SELECTs info_schema (one per migrated column) listTables + 1 SELECT info_schema, 0 ALTER
Session start, table missing 1 col listTables + 3 SELECTs + 1 ALTER listTables + 1 SELECT + 1 ALTER
Worker INSERT, table missing column X If X != contributors → INSERT fails forever 1 SELECT + 1 ALTER + retry, regardless of X
Future schema addition New ad-hoc ensureColumn call required in 2 places + worker retry update Add one line to XXX_COLUMNS array

Tests

  • New tests/claude-code/deeplake-schema.test.ts (28 tests): schema lint, builder injection guard, heal flow (no-op / one missing / multiple in schema order / race tolerance / non-race propagation / case-insensitive catalog match), classifiers.
  • tests/claude-code/deeplake-api.test.ts — rewrote the ensureXxxTable describes for the new flow (CREATE-only / SELECT+ALTER / all-present / info_schema error / non-race ALTER error / race-tolerated ALTER / custom name / cross-method listTables cache).
  • tests/claude-code/schema-scenarios.test.ts — all 7 schema/upgrade scenarios + cross-cutting invariants rewritten against the new SQL shape.
  • tests/claude-code/skillify-skills-table.test.ts — worker retry covers any missing column (regression guard for future schema additions), multi-column heal, race tolerance, rethrow on unknown column, rethrow on permission-denied.
  • tests/claude-code/embeddings-schema.test.ts — points at src/deeplake-schema.ts; regex scoped to single object literals so a later entry's SQL can't satisfy a regression match.

Full suite: 2599 / 2599 passing. (Two pre-existing unhandled rejections in session-notifications-hook.test.ts are unrelated to this change — missing /tmp/hivemind-notif-hook-*/ paths from another test fixture.)

Test plan

  • npm run typecheck
  • npm run build
  • npx vitest run
  • End-to-end smoke against the live test_plugin org: created a skills table with the legacy schema (no contributors), called insertSkillRow against it, observed the exact sequence:
    [0] INSERT INTO "skills_smoke_…" (…)                          → fail
    [1] SELECT column_name FROM information_schema.columns …      → 16 rows, no contributors
    [2] ALTER TABLE "skills_smoke_…" ADD COLUMN contributors …    → ok (no IF NOT EXISTS)
    [3] INSERT INTO "skills_smoke_…" (…)                          → ok
    
    Second call against the now-current schema: 1 SQL call (the INSERT itself).
  • CI green.
  • Manual sanity on at least one agent's bundle (claude-code) before merging.

Re-probe script for the post-ALTER bug

For posterity / future readers wondering why the marker is gone:

TOKEN=$(jq -r .token ~/.deeplake/credentials.json)
ORG=$(jq -r .orgId ~/.deeplake/credentials.json)
WS=$(jq -r .workspaceId ~/.deeplake/credentials.json)
API=$(jq -r .apiUrl ~/.deeplake/credentials.json)

# CREATE → INSERT baseline → ALTER → loop INSERTs for 60s, log every offset.
# 2026-05-18 result on test_plugin org: 71/71 OK, first success at +2ms.

If a future Deeplake regression re-introduces the window, add the marker back at the healMissingColumns boundary — the rest of the surface doesn't need to change.

Summary by CodeRabbit

  • Bug Fixes

    • Improved database table schema initialization and column migration reliability through centralized schema definitions and intelligent column detection.
  • Refactor

    • Consolidated table schema definitions and column management logic across memory, sessions, and skills tables for consistent, predictable behavior.

efenocchi added 4 commits May 18, 2026 08:51
…helper

Introduce a single source of truth for the three Deeplake-managed tables
(memory, sessions, skills) as `{name, sql}` arrays in a new
`src/deeplake-schema.ts`. Both `CREATE TABLE` and the heal path are
derived from the same list, so adding a column means one edit.

Public surface:
- `MEMORY_COLUMNS`, `SESSIONS_COLUMNS`, `SKILLS_COLUMNS` — frozen arrays.
- `buildCreateTableSql(table, cols)` — renders the canonical CREATE.
- `healMissingColumns({ query, tableName, workspaceId, columns })` — one
  SELECT against `information_schema.columns`, diff in memory, then
  `ALTER TABLE ADD COLUMN` only the truly missing columns. ALTER without
  `IF NOT EXISTS` so real failures surface; the single tolerated race
  ("already exists" from a concurrent writer) is caught and re-verified
  via a second SELECT.
- `isMissingTableError` / `isMissingColumnError` — generic classifiers
  used by the worker's INSERT retry path (next commit). Both exclude
  `permission denied` to avoid masking auth problems.

Module-load lint rejects any `NOT NULL` column lacking `DEFAULT` —
ALTER on a populated table needs something to backfill with.

Background on the design: an earlier path used a local marker file
(`col_<name>` under the index-marker dir) to skip even the SELECT after
the first confirmation, motivated by a Deeplake post-ALTER bug (~30s
window of failing INSERTs after every ALTER). Re-probed against
`api.deeplake.ai` on 2026-05-18 in the `test_plugin` org: bug no longer
reproduces (71/71 INSERTs OK, first success 2ms after ALTER). Marker
removed; the SELECT-first pattern survives because each ALTER still
costs ~800ms and a targeted diff produces clearer logs.

Tests cover the lint, builder identifier injection, the heal flow
(no-op, single missing, multiple missing in schema order, race
tolerance with re-SELECT, non-race failure propagation,
case-insensitive catalog match), and both error classifiers.
…t contributors

Drop the hard-coded `contributors`-only retry. When the worker's INSERT
fails with a missing-column error (any column name), run a single
heal pass over the full SKILLS_COLUMNS schema via the helper introduced
in the previous commit: SELECT info_schema, diff, ALTER only the
genuinely missing columns, retry the INSERT once.

Why this matters: every time a new column lands in the skills schema
(contributors was an early example, more will follow), pre-existing
tables on workspaces that haven't seen a fresh INSERT yet would
otherwise drop INSERTs forever. With this change, the first worker
that lands on a stale table heals it surgically.

Semantics:
- `isMissingTableError` → CREATE TABLE with canonical schema, retry INSERT.
- `isMissingColumnError` → heal pass, retry INSERT. If the heal pass
  reports `missing: []` (the failing column isn't in our schema), the
  original error propagates rather than looping.
- A race where another writer healed the column between our SELECT and
  our ALTER is tolerated transparently (`healMissingColumns` handles
  it via re-SELECT).
- No local marker — the worker is short-lived and the INSERT itself is
  the cheapest possible "is the schema current?" probe.

`InsertSkillRowArgs.workspaceId` added (mandatory) so the
`information_schema` SELECT scopes to this workspace's catalog entry —
prevents a false PRESENT from a same-named table in another workspace.
Plumbed in `skillify-worker.ts` via `cfg.workspaceId`.

Removed unused exports (`createSkillsTableSql`,
`addContributorsColumnSql`, the local
`isMissingContributorsColumnError`). Tests rewritten against the new
flow: heal of `contributors`, heal of any other column (regression
guard for future schema additions), multiple missing columns in one
pass, race tolerance, rethrow on unknown column, rethrow on
permission-denied.
… ALTER

Replace the per-column `ensureColumn` / `ensureEmbeddingColumn` calls
(one SELECT info_schema per column, with on-disk marker) with a single
`healSchema(table, XXX_COLUMNS)` that delegates to the helper added
two commits back: one SELECT against `information_schema.columns` per
table, then ALTER only the genuinely missing columns.

What goes away:
- `private ensureColumn(table, column, sqlType)` and its on-disk marker.
- `private ensureEmbeddingColumn(table, column)` shim.
- Ad-hoc `ensureColumn(tbl, "agent", …)` /
  `ensureColumn(tbl, "plugin_version", …)` calls duplicated across
  `ensureTable` and `ensureSessionsTable`. Future columns no longer
  need a matching call site.

What stays:
- `createTableWithRetry` — outer network-resilience loop on CREATE.
- `ensureLookupIndex` and its marker (CREATE INDEX is a different
  beast — failure is rare, success is permanent, marker is cheap).
- Strict identifier validation via `sqlIdent` on every table name.

What stays correct by construction:
- CREATE TABLE on a missing table uses `buildCreateTableSql(safe,
  XXX_COLUMNS)`, so the canonical schema is whatever
  `deeplake-schema.ts` says. No second copy to keep in sync.
- Fresh CREATE skips the heal pass — the column set is exactly the
  one we just declared, so a SELECT would just confirm it for no work.
- A drift assert at the bottom of `ensureTable` rejects a future
  refactor that removes `summary_embedding` from MEMORY_COLUMNS
  without updating `embeddings/columns.ts`.

Test rewrite covers the full matrix:
- `deeplake-api.test.ts` — fetch-level mocks of every code path
  (CREATE-only, SELECT+ALTER, all-present, info_schema error,
  non-race ALTER error, race-tolerated ALTER, custom name,
  cross-method listTables cache).
- `schema-scenarios.test.ts` — the 7 schema/upgrade scenarios
  (greenfield, full legacy, half legacy memory/sessions, fully
  migrated, mixed mem-emb, mixed sess-emb) plus the cross-cutting
  invariants (race tolerance, non-race propagation, INSERT vector::at
  bubbling, legacy-`agent` heal).
- `embeddings-schema.test.ts` — source-of-truth checks now look in
  `src/deeplake-schema.ts` and scope each regex to a single object
  literal so a later entry's SQL can't satisfy a regression match.

Smoke-tested end-to-end against the live `test_plugin` org on
2026-05-18: a `skills` table missing `contributors` triggered exactly
[INSERT-fail, SELECT info_schema (16 cols, no contributors), targeted
ALTER, INSERT-retry-OK]; a second call against the now-current schema
emitted exactly one SQL call (the INSERT).
Regenerated bundles for every agent (claude-code, codex, cursor,
hermes, pi) plus the MCP server and the CLI: they inline
`deeplake-schema.ts` and the new `healMissingColumns` flow, drop the
old per-column ensureColumn / contributors-only retry helpers, and
embed the canonical column lists.

Produced by `npm run build` (tsc + esbuild). No source-level changes
in this commit; the previous three commits are the actual refactor.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 98.53% (🎯 90%) 268 / 272
🟢 Statements 98.13% (🎯 90%) 314 / 320
🟢 Functions 100.00% (🎯 90%) 56 / 56
🔴 Branches 88.83% (🎯 90%) 159 / 179
File Coverage — 4 files changed
File Stmts Branches Functions Lines
src/deeplake-api.ts 🟢 98.7% 🟢 90.3% 🟢 100.0% 🟢 99.5%
src/deeplake-schema.ts 🟢 95.4% 🔴 84.8% 🟢 100.0% 🟢 94.3%
src/skillify/skillify-worker.ts
src/skillify/skills-table.ts 🟢 100.0% 🔴 83.3% 🟢 100.0% 🟢 100.0%

Generated for commit 749d6f6.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 770f7a4f-89bd-455f-8d4f-12f7d92b5cb2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors Deeplake table schema management from ad-hoc, marker-gated per-column migrations to a centralized schema-driven approach. A new deeplake-schema.ts module defines canonical column specifications and introduces healMissingColumns for targeted, introspection-based column healing. DeeplakeApi table provisioning is updated to use schema-generated CREATE TABLE SQL and healSchema instead of per-column ensure helpers. The pattern is applied consistently across source TypeScript, all bundled deployment variants, and comprehensive test coverage.

Changes

Core Schema Refactoring

Layer / File(s) Summary
Schema definition module
src/deeplake-schema.ts, tests/claude-code/deeplake-schema.test.ts
New module exports frozen column definition constants for MEMORY_COLUMNS, SESSIONS_COLUMNS, SKILLS_COLUMNS with module-load validation ensuring valid SQL identifiers and NOT NULL/DEFAULT pairing. Provides buildCreateTableSql and healMissingColumns which introspects information_schema.columns and issues targeted ALTER TABLE ADD COLUMN only for missing columns, with race-condition tolerance and error classification helpers. Comprehensive unit tests validate column rules, SQL generation, healing scenarios, and error handling.
DeeplakeApi refactor
src/deeplake-api.ts, tests/claude-code/deeplake-api.test.ts
Imports schema utilities and removes marker-gated per-column migration methods. ensureTable, ensureSessionsTable, ensureSkillsTable now create tables using buildCreateTableSql with schema column definitions and call healSchema on existing tables. Memory table adds runtime drift check ensuring summary_embedding is present. API integration tests rewritten to validate CREATE with canonical schema on new tables, single information_schema SELECT with targeted ALTER on existing tables, and race handling.
Skills table migration
src/skillify/skills-table.ts, tests/claude-code/skillify-skills-table.test.ts
Imports schema utilities and removes inline SQL helpers (createSkillsTableSql, addContributorsColumnSql). InsertSkillRowArgs gains required workspaceId field. insertSkillRow error handling changes to create missing tables via buildCreateTableSql(SKILLS_COLUMNS) and heal missing columns via healMissingColumns, replacing contributor-column-specific recovery. Tests validate SQL escaping, missing-table recovery, missing-column healing with targeted ALTERs, race tolerance, and injection resistance.
Worker integration
src/skillify/skillify-worker.ts
Passes workspaceId: cfg.workspaceId into insertSkillRow calls, enabling schema introspection/healing to target correct workspace catalog.

Test Coverage

Layer / File(s) Summary
Schema scenarios
tests/claude-code/schema-scenarios.test.ts, tests/claude-code/embeddings-schema.test.ts
Schema upgrade scenarios updated to expect one information_schema.columns SELECT per table followed by targeted ALTERs only for missing columns. Greenfield scenarios expect only CREATE; legacy/mixed expect SELECT + missing-column ALTERs; fully-migrated expect SELECT without ALTERs. Embedding column tests validate definitions by reading canonical schema source instead of API strings.

Bundled Deployments

Layer / File(s) Summary
CLI, Claude-code, Codex, Cursor, Hermes, MCP, and PI bundles
bundle/cli.js, **/bundle/capture.js, **/bundle/commands/auth-login.js, **/bundle/pre-tool-use.js, **/bundle/session-start-setup.js, **/bundle/session-start.js, **/bundle/shell/deeplake-shell.js, **/bundle/skillify-worker.js, **/bundle/stop.js, mcp/bundle/server.js
All bundles contain embedded deeplake-schema sections defining column specs, validation, SQL builders, and healMissingColumns logic. Each bundle introduces DeeplakeApi.healSchema wrapper and updates ensureTable, ensureSessionsTable, ensureSkillsTable to create via buildCreateTableSql and heal existing tables via healSchema. Skillify workers pass workspaceId to insertSkillRow. Constants updated (e.g., MESSAGE_EMBEDDING_COLSUMMARY_EMBEDDING_COL in shell bundles).

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant healMissingColumns
  participant informationSchema as information_schema
  participant alterTable as ALTER TABLE
  Caller->>healMissingColumns: query, tableName, workspaceId, columns
  healMissingColumns->>informationSchema: SELECT existing columns
  informationSchema-->>healMissingColumns: column_name list
  healMissingColumns->>healMissingColumns: diff against expected
  alt all columns present
    healMissingColumns-->>Caller: {missing:[], altered:[]}
  else missing columns found
    loop for each missing column
      healMissingColumns->>alterTable: ALTER TABLE ADD COLUMN
      alt success
        alterTable-->>healMissingColumns: ✓
      else already exists error
        healMissingColumns->>informationSchema: re-check presence
        alt confirmed present
          informationSchema-->>healMissingColumns: found
        else still missing
          healMissingColumns-->>Caller: ✗ error (re-throw)
        end
      end
    end
    healMissingColumns-->>Caller: {missing:[...], altered:[...]}
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The refactor is substantial and repetitive across many bundled variants (homogeneous pattern), but involves intricate schema-healing logic with race conditions, multiple test suite rewrites, and subtle behavioral changes to table provisioning that demand careful verification of correctness across each component.

Possibly related PRs

  • activeloopai/hivemind#120: Adds plugin_version column via per-column ensureColumn; this PR refactors that same initialization/healing code path into centralized deeplake-schema + healSchema, directly overlapping on schema/migration logic.
  • activeloopai/hivemind#98: Introduces ensureSkillsTable for skills table provisioning; this PR refactors its schema-healing behavior to use the new centralized SKILLS_COLUMNS + healSchema approach.
  • activeloopai/hivemind#119: Adjusts skill worker Deeplake insert payload (createdAt/updatedAt threading); this PR refactors the same worker's schema/healing logic and workspace-scoped introspection.

Suggested reviewers

  • kaghni

🐰 Schemas now heal with targeted precision,
No more per-column migrations—just introspection,
One SELECT finds gaps, one ALTER fills the slots,
And races bow before a humble re-check.
The bundled variants march in unified rhythm.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/lazy-schema-heal-all-tables

@coderabbitai coderabbitai Bot requested a review from kaghni May 18, 2026 08:59
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cursor/bundle/session-start.js (1)

1548-1566: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Auto-pull still bypasses the new skills-table heal path.

Line 1552 creates a DeeplakeApi, but this path goes straight to runPull() without calling ensureSkillsTable(). On a drifted legacy skills table, read-only clients will still fail on any missing column other than contributors, and autoPullSkills() then just swallows that as reason: "error" until some writer happens to repair the schema.

Suggested fix
 async function autoPullSkills(deps = {}) {
   if (process.env.HIVEMIND_AUTOPULL_DISABLED === "1") {
     log4("disabled via HIVEMIND_AUTOPULL_DISABLED=1");
     return { pulled: 0, skipped: true, reason: "disabled" };
   }
   const loadFn = deps.loadConfigFn ?? loadConfig;
   const config = loadFn();
   if (!config) {
     log4("skipped: not logged in");
     return { pulled: 0, skipped: true, reason: "not-logged-in" };
   }
   let query;
+  let ensureSkillsTable;
   if (deps.queryFn) {
     query = deps.queryFn;
   } else {
     const api = new DeeplakeApi(config.token, config.apiUrl, config.orgId, config.workspaceId, config.skillsTableName);
     query = (sql) => api.query(sql);
+    ensureSkillsTable = () => api.ensureSkillsTable(config.skillsTableName);
   }
   const install = deps.install ?? "global";
   const timeoutMs = deps.timeoutMs ?? DEFAULT_TIMEOUT_MS;
   try {
-    const summary = await withTimeout(runPull({
-      query,
-      tableName: config.skillsTableName,
-      install,
-      cwd: install === "project" ? deps.cwd ?? process.cwd() : void 0,
-      users: [],
-      dryRun: false,
-      force: false
-    }), timeoutMs);
+    const summary = await withTimeout((async () => {
+      await ensureSkillsTable?.();
+      return runPull({
+        query,
+        tableName: config.skillsTableName,
+        install,
+        cwd: install === "project" ? deps.cwd ?? process.cwd() : void 0,
+        users: [],
+        dryRun: false,
+        force: false
+      });
+    })(), timeoutMs);
     log4(`pulled scanned=${summary.scanned} wrote=${summary.wrote} skipped=${summary.skipped}`);
     return { pulled: summary.wrote, skipped: false };
   } catch (e) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cursor/bundle/session-start.js` around lines 1548 - 1566, The auto-pull path
constructs a DeeplakeApi and then calls runPull without first invoking the
skills-table migration/validation, so legacy-read-only clients can fail on
missing columns; before calling runPull in the branch that creates DeeplakeApi,
call the existing ensureSkillsTable (or the migration helper used elsewhere) for
config.skillsTableName using that DeeplakeApi instance (or call
autoPullSkills()/ensureSkillsTable-like helper) and only proceed to runPull
after ensureSkillsTable completes successfully, keeping the same timeout
handling and passing the same install/cwd/users/dryRun/force parameters.
🧹 Nitpick comments (2)
src/deeplake-schema.ts (1)

134-137: ⚡ Quick win

Sanitize ColumnDef.name where the SQL is emitted.

validateSchema() only protects the three built-in arrays at module load. These helpers are exported and accept arbitrary columns, so a future caller can bypass identifier validation for column names even though tableName is already sanitized.

🔒 Proposed fix
 export function buildCreateTableSql(tableName: string, cols: readonly ColumnDef[]): string {
   const safe = sqlIdent(tableName);
-  const colSql = cols.map(c => `${c.name} ${c.sql}`).join(", ");
+  const colSql = cols.map(c => `${sqlIdent(c.name)} ${c.sql}`).join(", ");
   return `CREATE TABLE IF NOT EXISTS "${safe}" (${colSql}) USING deeplake`;
 }
@@
   const altered: string[] = [];
   for (const col of missingCols) {
     try {
-      await args.query(`ALTER TABLE "${safeTable}" ADD COLUMN ${col.name} ${col.sql}`);
+      const safeColumn = sqlIdent(col.name);
+      await args.query(`ALTER TABLE "${safeTable}" ADD COLUMN ${safeColumn} ${col.sql}`);
       altered.push(col.name);

Also applies to: 196-216

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/deeplake-schema.ts` around lines 134 - 137, The code emits raw
ColumnDef.name into SQL; update buildCreateTableSql (and the related helper
around lines 196-216, e.g., the alter/create helpers) to sanitize column
identifiers by passing c.name through sqlIdent (or the module's
identifier-quoting helper) instead of interpolating c.name directly; build each
column fragment as `${sqlIdent(c.name)} ${c.sql}` (or equivalent) so column
names are properly escaped/quoted while preserving c.sql for the
type/constraints.
mcp/bundle/server.js (1)

23806-23821: 💤 Low value

Drift check only runs for existing tables.

The SUMMARY_EMBEDDING_COL validation at lines 23819-23821 only executes when the table already exists. If the schema constant drifts, a freshly created table would silently have the wrong schema. Consider moving this guard before the table-existence branch or validating it once at module load alongside the validateSchema calls.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp/bundle/server.js` around lines 23806 - 23821, The SUMMARY_EMBEDDING_COL
presence check currently runs only for existing tables inside ensureTable; move
or add that guard so it runs before the create-or-heal branch (or validate once
at module load) to prevent creating a new table with a drifted MEMORY_COLUMNS
schema. Specifically, in ensureTable (or during module init), assert
MEMORY_COLUMNS.some(c => c.name === SUMMARY_EMBEDDING_COL) and throw the same
Error if missing; this check should occur prior to calling
listTables/createTableWithRetry/healSchema and should use the same
SUMMARY_EMBEDDING_COL, MEMORY_COLUMNS, and createTableWithRetry/healSchema
symbols so the code fails fast on schema drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bundle/cli.js`:
- Around line 4509-4518: The ensure* table functions currently return
immediately after creating a table when listTables() didn't include it, which
skips the subsequent healSchema call; change the flow so the post-create path
falls through to call healSchema(tbl, MEMORY_COLUMNS) (i.e., remove the early
return in the create branch, ensure _tablesCache is updated to include tbl if
needed, then call healSchema), and apply the same fallthrough fix to
ensureSessionsTable and ensureSkillsTable so the schema heal runs even when
CREATE TABLE IF NOT EXISTS was a no-op.

In `@claude-code/bundle/session-start.js`:
- Around line 603-615: The early-return after the CREATE attempt in ensureTable
can skip healSchema when another worker created the table concurrently; change
the flow so that after calling createTableWithRetry(buildCreateTableSql(...),
tbl) you do NOT return immediately but refresh/compute the current table name
list and always call healSchema(tbl, MEMORY_COLUMNS) (and only then update
this._tablesCache if needed); apply the same pattern to ensureSessionsTable and
ensureSkillsTable so healSchema always runs even when CREATE TABLE IF NOT EXISTS
was a no-op.

In `@claude-code/bundle/shell/deeplake-shell.js`:
- Around line 67283-67298: In ensureTable, the invariant that MEMORY_COLUMNS
includes SUMMARY_EMBEDDING_COL must run before the early-return create branch;
move the drift guard that checks SUMMARY_EMBEDDING_COL up so it executes before
the "if (!tables.includes(tbl)) { ... return; }" block in ensureTable, and throw
the same Error if the column is missing; this ensures createTableWithRetry
cannot run for a fresh table when MEMORY_COLUMNS has drifted.

In `@codex/bundle/shell/deeplake-shell.js`:
- Around line 67287-67295: The create-branch currently returns after creating
the table (in the block that calls createTableWithRetry(buildCreateTableSql(tbl,
MEMORY_COLUMNS), tbl)), which skips the subsequent healSchema(tbl,
MEMORY_COLUMNS) and leaves concurrently-created older schemas unhealed; remove
the early return so the flow falls through and always calls healSchema(tbl,
MEMORY_COLUMNS) after the CREATE path. Make the identical change in the other
two variants referenced (the sessions and skills variants around the same
pattern), ensuring that after calling createTableWithRetry(...) you update
_tablesCache as needed but do not return—allow execution to proceed to
healSchema(...) in createTableWithRetry, sessions, and skills code paths.

In `@cursor/bundle/shell/deeplake-shell.js`:
- Around line 67287-67295: The current branch skips healSchema(tbl,
MEMORY_COLUMNS) when createTableWithRetry is called but another process created
the table first; always run the schema healing pass after the CREATE attempt.
Change the flow in the block that checks tables/includes tbl so that after
awaiting createTableWithRetry(buildCreateTableSql(tbl, MEMORY_COLUMNS), tbl) you
always call await this.healSchema(tbl, MEMORY_COLUMNS) (and then update
this._tablesCache = [...tables, tbl] as needed), ensuring the same adjustment is
applied to the other identical blocks (the ones around
createTableWithRetry/buildCreateTableSql/MEMORY_COLUMNS at the referenced
locations) so healSchema runs regardless of whether the CREATE was a no-op.

In `@hermes/bundle/commands/auth-login.js`:
- Around line 780-791: The ensureTable implementation currently returns
immediately after creating a missing table and thus skips calling healSchema,
which can leave an older schema if CREATE TABLE was a no-op; update ensureTable
so that after awaiting createTableWithRetry(buildCreateTableSql(tbl,
MEMORY_COLUMNS), tbl) you still update this._tablesCache as needed and then call
await this.healSchema(tbl, MEMORY_COLUMNS) before returning; apply the same
change in the other create-branch locations (the blocks around the other
ensureTable-like branches mentioned) so createTableWithRetry, listTables,
_tablesCache and healSchema are used consistently.

In `@hermes/bundle/session-start.js`:
- Around line 594-608: ensureTable currently skips healSchema and the
SUMMARY_EMBEDDING_COL check when the cached listTables() shows the table missing
and createTableWithRetry becomes a no-op; change the flow so that after
attempting creation (via createTableWithRetry) you always call healSchema(tbl,
MEMORY_COLUMNS) and perform the SUMMARY_EMBEDDING_COL check regardless of
whether the table was believed to exist beforehand. Apply the same pattern to
ensureSessionsTable() and ensureSkillsTable(): always run healSchema(...) and
validate MEMORY_COLUMNS (including SUMMARY_EMBEDDING_COL) after the
create-or-exists attempt, and update the _tablesCache only for cache maintenance
but not to gate schema healing.

In `@hermes/bundle/shell/deeplake-shell.js`:
- Around line 67287-67298: The drift check for SUMMARY_EMBEDDING_COL is skipped
for the create-path return; move or duplicate the assertion so that after
creating a table via createTableWithRetry(buildCreateTableSql(tbl,
MEMORY_COLUMNS), tbl) (and before returning) you run the same schema drift check
that follows healSchema: verify MEMORY_COLUMNS.some(c => c.name ===
SUMMARY_EMBEDDING_COL) and throw the Error if missing; ensure the check still
runs for the existing-table path that calls healSchema(tbl, MEMORY_COLUMNS) so
both new and existing tables validate the embedding column (references:
SUMMARY_EMBEDDING_COL, MEMORY_COLUMNS, createTableWithRetry,
buildCreateTableSql, healSchema, _tablesCache).

In `@pi/bundle/skillify-worker.js`:
- Around line 474-477: The current branch that handles isMissingTableError(msg)
immediately runs CREATE TABLE IF NOT EXISTS via
buildCreateTableSql(args.tableName, SKILLS_COLUMNS) then retries the INSERT
(args.query(sql)), which can still fail if another worker created an older table
missing columns; change the flow so that after running the CREATE, you do not
blindly retry the INSERT — instead re-run the INSERT and if it fails with a
missing-column error delegate to the existing missing-column recovery logic (the
same handler used elsewhere), or explicitly check the table schema for missing
SKILLS_COLUMNS and apply the missing-column migration before retrying; locate
isMissingTableError, buildCreateTableSql, SKILLS_COLUMNS and the args.query(sql)
retry to implement this handoff.

In `@src/deeplake-api.ts`:
- Around line 402-416: The code currently bails out after creating a table when
listTables() (which is cached) didn't include tbl, so a stale cache plus CREATE
TABLE IF NOT EXISTS can leave legacy columns unhealed; modify
ensureMemoryTable/ensureSessionsTable/ensureSkillsTable so that after calling
createTableWithRetry(buildCreateTableSql(tbl, ...), tbl) you do NOT return
immediately but instead refresh/update the tables cache (update
this._tablesCache to include tbl) and call await this.healSchema(tbl,
<appropriate_COLUMNS>) unconditionally (or re-check existence and then call
healSchema) so the schema healing always runs after the create attempt;
reference methods: listTables, createTableWithRetry, buildCreateTableSql,
healSchema, and the _tablesCache field.

In `@src/skillify/skills-table.ts`:
- Around line 96-100: The lazy-create flow uses buildCreateTableSql and then
immediately retries the INSERT, which can still fail if another worker created
an older schema; after awaiting args.query(buildCreateTableSql(args.tableName,
SKILLS_COLUMNS)) invoke the schema-heal step to ensure the table has all columns
defined in SKILLS_COLUMNS (e.g., run the existing schema migration/alter logic
or an ensureColumns function for args.tableName) before retrying args.query(sql)
once; keep using isMissingTableError to gate this path and still only retry the
insert a single time.

---

Outside diff comments:
In `@cursor/bundle/session-start.js`:
- Around line 1548-1566: The auto-pull path constructs a DeeplakeApi and then
calls runPull without first invoking the skills-table migration/validation, so
legacy-read-only clients can fail on missing columns; before calling runPull in
the branch that creates DeeplakeApi, call the existing ensureSkillsTable (or the
migration helper used elsewhere) for config.skillsTableName using that
DeeplakeApi instance (or call autoPullSkills()/ensureSkillsTable-like helper)
and only proceed to runPull after ensureSkillsTable completes successfully,
keeping the same timeout handling and passing the same
install/cwd/users/dryRun/force parameters.

---

Nitpick comments:
In `@mcp/bundle/server.js`:
- Around line 23806-23821: The SUMMARY_EMBEDDING_COL presence check currently
runs only for existing tables inside ensureTable; move or add that guard so it
runs before the create-or-heal branch (or validate once at module load) to
prevent creating a new table with a drifted MEMORY_COLUMNS schema. Specifically,
in ensureTable (or during module init), assert MEMORY_COLUMNS.some(c => c.name
=== SUMMARY_EMBEDDING_COL) and throw the same Error if missing; this check
should occur prior to calling listTables/createTableWithRetry/healSchema and
should use the same SUMMARY_EMBEDDING_COL, MEMORY_COLUMNS, and
createTableWithRetry/healSchema symbols so the code fails fast on schema drift.

In `@src/deeplake-schema.ts`:
- Around line 134-137: The code emits raw ColumnDef.name into SQL; update
buildCreateTableSql (and the related helper around lines 196-216, e.g., the
alter/create helpers) to sanitize column identifiers by passing c.name through
sqlIdent (or the module's identifier-quoting helper) instead of interpolating
c.name directly; build each column fragment as `${sqlIdent(c.name)} ${c.sql}`
(or equivalent) so column names are properly escaped/quoted while preserving
c.sql for the type/constraints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0860dccc-2902-4979-a980-51c5a8e6cb52

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae7da6 and c5bc413.

📒 Files selected for processing (40)
  • bundle/cli.js
  • claude-code/bundle/capture.js
  • claude-code/bundle/commands/auth-login.js
  • claude-code/bundle/pre-tool-use.js
  • claude-code/bundle/session-start-setup.js
  • claude-code/bundle/session-start.js
  • claude-code/bundle/shell/deeplake-shell.js
  • claude-code/bundle/skillify-worker.js
  • codex/bundle/capture.js
  • codex/bundle/commands/auth-login.js
  • codex/bundle/pre-tool-use.js
  • codex/bundle/session-start-setup.js
  • codex/bundle/session-start.js
  • codex/bundle/shell/deeplake-shell.js
  • codex/bundle/skillify-worker.js
  • codex/bundle/stop.js
  • cursor/bundle/capture.js
  • cursor/bundle/commands/auth-login.js
  • cursor/bundle/pre-tool-use.js
  • cursor/bundle/session-start.js
  • cursor/bundle/shell/deeplake-shell.js
  • cursor/bundle/skillify-worker.js
  • hermes/bundle/capture.js
  • hermes/bundle/commands/auth-login.js
  • hermes/bundle/pre-tool-use.js
  • hermes/bundle/session-start.js
  • hermes/bundle/shell/deeplake-shell.js
  • hermes/bundle/skillify-worker.js
  • mcp/bundle/server.js
  • pi/bundle/autopull-worker.js
  • pi/bundle/skillify-worker.js
  • src/deeplake-api.ts
  • src/deeplake-schema.ts
  • src/skillify/skillify-worker.ts
  • src/skillify/skills-table.ts
  • tests/claude-code/deeplake-api.test.ts
  • tests/claude-code/deeplake-schema.test.ts
  • tests/claude-code/embeddings-schema.test.ts
  • tests/claude-code/schema-scenarios.test.ts
  • tests/claude-code/skillify-skills-table.test.ts

Comment thread bundle/cli.js
Comment thread claude-code/bundle/session-start.js Outdated
Comment thread claude-code/bundle/shell/deeplake-shell.js Outdated
Comment thread codex/bundle/shell/deeplake-shell.js
Comment thread cursor/bundle/shell/deeplake-shell.js
Comment thread hermes/bundle/session-start.js Outdated
Comment thread hermes/bundle/shell/deeplake-shell.js Outdated
Comment thread pi/bundle/skillify-worker.js
Comment thread src/deeplake-api.ts
Comment thread src/skillify/skills-table.ts
efenocchi added 3 commits May 18, 2026 17:33
Address CodeRabbit review on PR #177. Three coupled fixes, one root
cause: `listTables()` is cached, so `CREATE TABLE IF NOT EXISTS` can
silently no-op against a concurrently-created legacy table, and the
previous code skipped `healSchema()` on that path. The next read/write
then failed on missing columns.

Changes:

- `src/deeplake-api.ts` — `ensureTable`, `ensureSessionsTable`,
  `ensureSkillsTable` now run `healSchema(...)` unconditionally after
  the create/exists decision. Removed the early `return` on the CREATE
  branch in `ensureTable` and the `if-else` split in the other two.
  On a genuinely fresh CREATE the heal pass adds one SELECT
  info_schema (~250ms) and zero ALTERs; on a race-detected legacy
  table it ALTERs only the missing columns.

- `src/deeplake-api.ts` — moved the `SUMMARY_EMBEDDING_COL` drift
  guard to the very top of `ensureTable`, before any SQL. Previously
  it ran after `healSchema` and was skipped on the create-and-return
  path, so a fresh CREATE could materialise an incomplete schema if
  `MEMORY_COLUMNS` ever drifted from `embeddings/columns.ts`.

- `src/skillify/skills-table.ts` — `insertSkillRow`'s
  `isMissingTableError` branch now calls `healMissingColumns(...)`
  between the CREATE and the retry INSERT. Same race: a concurrent
  worker could land a legacy table between our INSERT failure and our
  CREATE no-op; without the heal pass, the retry INSERT would fail
  with the same missing-column error.

Tests updated to match the new sequence:
- `tests/claude-code/deeplake-api.test.ts` — fresh-CREATE paths now
  expect `listTables + CREATE + post-CREATE heal SELECT`. New
  regression: `heals after CREATE: race-detected legacy table gets
  ALTERed before returning`.
- `tests/claude-code/schema-scenarios.test.ts` — scenarios 1, 3, 4
  updated for the unconditional post-CREATE heal SELECT.
- `tests/claude-code/skillify-skills-table.test.ts` — the
  missing-table path now asserts `INSERT-fail → CREATE → SELECT
  info_schema → INSERT-retry`. New regression:
  `CREATE-then-INSERT race: lazy-create no-ops vs legacy table → heal
  pass adds the missing column`.
- `tests/claude-code/deeplake-api-retry.test.ts` — fix Response
  body-reuse error after the extra heal SELECT was added.

Smoke re-run against the live `test_plugin` org with the same legacy
schema repro: the worker still emits exactly INSERT-fail → SELECT
info_schema → ALTER → INSERT-retry, and a follow-up call on the
healed table emits a single INSERT.

Full test suite: 2601 / 2601.
Regenerated bundles for every agent (claude-code, codex, cursor,
hermes, pi) plus MCP server and CLI. Bundles now inline the
unconditional post-CREATE heal pass in `ensureTable` /
`ensureSessionsTable` / `ensureSkillsTable` and the worker's
CREATE-then-heal-then-retry path in `insertSkillRow`.

Produced by `npm run build`. No further source changes in this commit.
…bles

Mirror the existing race regression test from `ensureTable` to
`ensureSessionsTable` and `ensureSkillsTable`. The source already calls
`healSchema` unconditionally on all three paths (commit a2700c8), but
only the memory table had an explicit test proving the heal pass repairs
a race-detected legacy table; sessions and skills were validated only by
code-shape parity.

Each new test feeds the mock fetch sequence:
  listTables(empty) → CREATE(no-op vs legacy) → heal SELECT(legacy
  schema missing one column) → ALTER → CREATE INDEX

and asserts the targeted ALTER fires before `ensureLookupIndex` returns.
Closes the [P2] coverage gap codex flagged on PR #177.
// otherwise break CREATE TABLE / CREATE INDEX startup).
const safe = sqlIdent(tableName);
return (
`CREATE TABLE IF NOT EXISTS "${safe}" (` +
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efenocchi was table creation moved to somewhere else? did you cover the case of creating the first skill? so the case when there is no skills table

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table creation is still in skills-table.ts, inline in insertSkillRow (the lazy-create branch). First-skill flow:

  1. INSERT fires (line 92)
  2. catches isMissingTableError (line 96) — table truly does not exist yet
  3. CREATE TABLE IF NOT EXISTS (line 104)
  4. healMissingColumns heals the schema in case another worker raced us with a legacy CREATE between our INSERT and our own CREATE (lines 105–110)
  5. retry the INSERT (line 111)

Covered in tests/claude-code/skillify-skills-table.test.ts: on first INSERT failing because the table is missing: CREATE → heal pass → retry INSERT (line 99) plus the race regression CREATE-then-INSERT race: lazy-create no-ops vs legacy table → heal pass adds the missing column (line 121).

// retry would otherwise fail with the same missing-column error.
// healMissingColumns is cheap on a freshly-created table (1
// SELECT info_schema, 0 ALTERs).
await args.query(buildCreateTableSql(args.tableName, SKILLS_COLUMNS));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efenocchi ok you moved it here seems like, but pls double check and confirm

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. Table creation is still inline in insertSkillRow, at this exact line. Flow: INSERT → catch isMissingTableErrorCREATE TABLE IF NOT EXISTS here (104) → healMissingColumns (105–110) → retry INSERT (111). Nothing was moved out of this file; the diff just inserts the explicit heal pass between CREATE and the retry, to close the listTables()-stale race CodeRabbit flagged on the parallel call sites in deeplake-api.ts.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests are in claude code folder

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in the right folder per the current convention. PR #133 (chore(tests): relocate agent-specific tests to tests/<agent>/) carved out tests/<agent>/ only for tests that exercise src/hooks/<agent>/*. Shared infrastructure tests (deeplake-api*, deeplake-schema*, skillify-*, embeddings-*, etc.) intentionally stayed in tests/claude-code/, and PR #133s own body flagged the further move of those ~70 shared tests to a neutral location as out-of-scope follow-up. Happy to file it as a separate cleanup PR if you want — out of scope here though.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should be moved, since they are not only for claude code but for other agents to @coderabbitai do we have folder for it, or new generic one should be created?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Show the top-level tests directory structure
tree tests/ -L 2 2>/dev/null || find tests/ -maxdepth 2 -type d | sort

echo "---"
# List all test files in the root of tests/ (non-agent-specific)
ls tests/ 2>/dev/null

echo "---"
# Show all subdirectories under tests/
find tests/ -mindepth 1 -maxdepth 1 -type d | sort

Length of output: 5025


@kaghni There's no existing generic folder yet — the current subdirectories are all agent-specific (claude-code, codex, cursor, hermes, pi, openclaw) plus cli. The shared infra tests would need a new home.

A common convention for this pattern is tests/shared/ (clear, low-surprise) or tests/core/ (common in monorepos). Given the existing naming style, I'd suggest tests/shared/. The candidates to move there would be at minimum:

File pattern Reason it's shared
deeplake-api*.test.ts Tests src/deeplake-api.ts — used by all agents
deeplake-schema.test.ts Tests src/deeplake-schema.ts — used by all agents
embeddings-schema.test.ts, embeddings-*.test.ts Shared embeddings infra
schema-scenarios.test.ts Cross-agent schema migration scenarios
skillify-skills-table.test.ts Tests src/skillify/skills-table.ts
sql.test.ts, index-marker-store.test.ts, etc. Shared utilities

Since this is out of scope for the current PR, would you like me to open a GitHub issue to track the move to tests/shared/?


🧬 Code Graph Analysis Results

[src/deeplake-api.ts] createTableWithRetry (private async createTableWithRetry(sql: string, label: string): Promise)

  • Summary:
    • Purpose: Perform a CREATE TABLE with an additional outer retry budget on transient failures.
    • Parameters:
      • sql: string — the CREATE TABLE SQL statement to execute.
      • label: string — a human-friendly label used in logging.
    • Returns: Promise (resolves on successful CREATE; rejects with the last error after all retries).
    • Error handling:
      • Retries up to OUTER_BACKOFFS_MS.length + 1 total attempts with backoffs [2000, 5000, 10000] ms.
      • On each attempt failure, logs the attempt and error message, then sleeps before the next attempt (except after the final attempt).
      • If all attempts fail, throws the last encountered error (lastErr).
    • Important implementation details:
      • Uses this.query(sql) to perform the CREATE TABLE.
      • Maintains lastErr to propagate the final failure after retries.
      • Logging includes the attempt number and error message for easier debugging.

[src/deeplake-api.ts] ensureTable(name?: string)

  • Summary:
    • Purpose: Ensure the specified memory table exists; if absent, create it with a retry mechanism and then heal the schema.
    • Parameters:
      • name?: string — optional table name; if omitted, uses the instance’s tableName.
    • Returns: Promise (resolves after ensuring creation and healing).
    • Error handling:
      • Validates drift protection up front, derives a safe table name via sqlIdent, and checks existing tables via listTables.
      • If the table is missing, calls createTableWithRetry(buildCreateTableSql(tbl, MEMORY_COLUMNS), tbl) to create it, then updates an internal cache if needed.
      • Regardless of creation, runs healSchema(tbl, MEMORY_COLUMNS) to align the schema.
    • Important implementation details:
      • Uses MEMORY_COLUMNS to define the table schema during creation.
      • Uses listTables to avoid races where another writer creates the table concurrently.
      • The final heal step ensures the table’s columns match the canonical schema, ensuring consistent behavior across writers.
      • Note: There is a comment indicating that BM25 indexing is currently disabled for fresh tables and not re-enabled here.

efenocchi added 2 commits May 19, 2026 17:44
…-all-tables

# Conflicts:
#	bundle/cli.js
#	claude-code/bundle/capture.js
#	claude-code/bundle/commands/auth-login.js
#	claude-code/bundle/pre-tool-use.js
#	claude-code/bundle/session-start-setup.js
#	claude-code/bundle/session-start.js
#	claude-code/bundle/shell/deeplake-shell.js
#	codex/bundle/capture.js
#	codex/bundle/commands/auth-login.js
#	codex/bundle/pre-tool-use.js
#	codex/bundle/session-start-setup.js
#	codex/bundle/session-start.js
#	codex/bundle/shell/deeplake-shell.js
#	codex/bundle/stop.js
#	cursor/bundle/capture.js
#	cursor/bundle/commands/auth-login.js
#	cursor/bundle/pre-tool-use.js
#	cursor/bundle/session-start.js
#	cursor/bundle/shell/deeplake-shell.js
#	hermes/bundle/capture.js
#	hermes/bundle/commands/auth-login.js
#	hermes/bundle/pre-tool-use.js
#	hermes/bundle/session-start.js
#	hermes/bundle/shell/deeplake-shell.js
#	mcp/bundle/server.js
#	pi/bundle/autopull-worker.js
#	src/deeplake-api.ts
The 90% per-file branch threshold for `src/deeplake-api.ts` broke after
the merge of `main`: `maybeSignalBalanceExhausted` enqueues a banner with
`enqueueNotification({...}).catch(e => log(...))`, and the resolved-path
was the only one exercised. The `.catch` branch was uncovered, taking
the file from 91.0% to 89.55% — below the gate.

Add one regression in `tests/claude-code/deeplake-api-balance-exhausted.test.ts`
that overrides the default `mockResolvedValue(undefined)` with
`mockRejectedValueOnce(...)`, drives the same 402 path, and asserts:
  * the original `Query failed: 402` still propagates to the caller
  * the enqueue rejection is silently logged (not re-thrown, not surfaced
    as an unhandled rejection)

Brings branch coverage back to 90.29% (over the 90% threshold), and pins
the fire-and-forget semantics so a future refactor that drops the
`.catch` would fail this test instead of letting the rejection escape.
@efenocchi
Copy link
Copy Markdown
Collaborator Author

@kaghni replied inline on all three of your review comments. Quick recap of where things stand:

  • Table creation hasnt moved out of skills-table.ts — its still inline in insertSkillRow at line 104, inside the isMissingTableError catch. The diff only inserts healMissingColumns between CREATE and the retry INSERT, to close the listTables()-stale race CodeRabbit flagged.
  • First-skill case is covered: tests/claude-code/skillify-skills-table.test.ts exercises both on first INSERT failing because the table is missing: CREATE → heal pass → retry INSERT (line 99) and the race regression (line 121).
  • On tests/claude-code/ vs per-agent folders: this file lives there by the convention PR chore(tests): relocate agent-specific tests to tests/<agent>/ #133 set — tests/<agent>/ is reserved for src/hooks/<agent>/* tests, shared infra tests stay under tests/claude-code/ pending the ~70-file shared-tests relocation PR #133s body flagged as out-of-scope follow-up.

Also: CI was failing after the main merge because the coverage gate on src/deeplake-api.ts slipped from 91.0% → 89.55% (under the 90% per-file branches threshold). Pushed 6e9b1ca with a test that covers the .catch of enqueueNotification — back to 90.29%, all checks green now.

Let me know if anything else needs another pass.

efenocchi added 2 commits May 19, 2026 21:13
Addresses kaghni's review comment on PR #177
(`All tests are in claude code folder`). PR #183 carved out
`tests/shared/` as the new home for tests that exercise non-agent-
specific `src/` modules; this PR's tests all fit that bucket and were
sitting under `tests/claude-code/` by the old convention.

Moves the 7 tests this PR touches:

  tests/claude-code/deeplake-api-balance-exhausted.test.ts
  tests/claude-code/deeplake-api-retry.test.ts
  tests/claude-code/deeplake-api.test.ts
  tests/claude-code/deeplake-schema.test.ts
  tests/claude-code/embeddings-schema.test.ts
  tests/claude-code/schema-scenarios.test.ts
  tests/claude-code/skillify-skills-table.test.ts

→ tests/shared/<same names>

Mechanical move only:
  * Every import in every file is either `vitest`, `node:*`, or
    `../../src/...`. The relative depth from `tests/shared/` matches
    the prior depth from `tests/claude-code/`, so no path edits are
    needed.
  * No `__dirname` / `import.meta.url` / `process.cwd()` usage that
    would react to the new path.
  * The bundle-scan in `embeddings-schema.test.ts` reads files
    relative to CWD (`claude-code/bundle/...`), which vitest sets to
    the repo root regardless of which tests/ subdirectory the file
    lives in.

`tests/shared/` is already wired into `vitest.config.ts` test.include
(added by PR #183), and `vitest.config.ts` per-file coverage
thresholds key on source paths, not test paths.

Full suite (env-clean) post-move: 137/137 files, 2796/2796 tests.
@efenocchi efenocchi merged commit 9e06d52 into main May 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants