refactor(schema): lazy heal for all Deeplake tables (memory, sessions, skills)#177
Conversation
…helper
Introduce a single source of truth for the three Deeplake-managed tables
(memory, sessions, skills) as `{name, sql}` arrays in a new
`src/deeplake-schema.ts`. Both `CREATE TABLE` and the heal path are
derived from the same list, so adding a column means one edit.
Public surface:
- `MEMORY_COLUMNS`, `SESSIONS_COLUMNS`, `SKILLS_COLUMNS` — frozen arrays.
- `buildCreateTableSql(table, cols)` — renders the canonical CREATE.
- `healMissingColumns({ query, tableName, workspaceId, columns })` — one
SELECT against `information_schema.columns`, diff in memory, then
`ALTER TABLE ADD COLUMN` only the truly missing columns. ALTER without
`IF NOT EXISTS` so real failures surface; the single tolerated race
("already exists" from a concurrent writer) is caught and re-verified
via a second SELECT.
- `isMissingTableError` / `isMissingColumnError` — generic classifiers
used by the worker's INSERT retry path (next commit). Both exclude
`permission denied` to avoid masking auth problems.
Module-load lint rejects any `NOT NULL` column lacking `DEFAULT` —
ALTER on a populated table needs something to backfill with.
Background on the design: an earlier path used a local marker file
(`col_<name>` under the index-marker dir) to skip even the SELECT after
the first confirmation, motivated by a Deeplake post-ALTER bug (~30s
window of failing INSERTs after every ALTER). Re-probed against
`api.deeplake.ai` on 2026-05-18 in the `test_plugin` org: bug no longer
reproduces (71/71 INSERTs OK, first success 2ms after ALTER). Marker
removed; the SELECT-first pattern survives because each ALTER still
costs ~800ms and a targeted diff produces clearer logs.
Tests cover the lint, builder identifier injection, the heal flow
(no-op, single missing, multiple missing in schema order, race
tolerance with re-SELECT, non-race failure propagation,
case-insensitive catalog match), and both error classifiers.
…t contributors Drop the hard-coded `contributors`-only retry. When the worker's INSERT fails with a missing-column error (any column name), run a single heal pass over the full SKILLS_COLUMNS schema via the helper introduced in the previous commit: SELECT info_schema, diff, ALTER only the genuinely missing columns, retry the INSERT once. Why this matters: every time a new column lands in the skills schema (contributors was an early example, more will follow), pre-existing tables on workspaces that haven't seen a fresh INSERT yet would otherwise drop INSERTs forever. With this change, the first worker that lands on a stale table heals it surgically. Semantics: - `isMissingTableError` → CREATE TABLE with canonical schema, retry INSERT. - `isMissingColumnError` → heal pass, retry INSERT. If the heal pass reports `missing: []` (the failing column isn't in our schema), the original error propagates rather than looping. - A race where another writer healed the column between our SELECT and our ALTER is tolerated transparently (`healMissingColumns` handles it via re-SELECT). - No local marker — the worker is short-lived and the INSERT itself is the cheapest possible "is the schema current?" probe. `InsertSkillRowArgs.workspaceId` added (mandatory) so the `information_schema` SELECT scopes to this workspace's catalog entry — prevents a false PRESENT from a same-named table in another workspace. Plumbed in `skillify-worker.ts` via `cfg.workspaceId`. Removed unused exports (`createSkillsTableSql`, `addContributorsColumnSql`, the local `isMissingContributorsColumnError`). Tests rewritten against the new flow: heal of `contributors`, heal of any other column (regression guard for future schema additions), multiple missing columns in one pass, race tolerance, rethrow on unknown column, rethrow on permission-denied.
… ALTER Replace the per-column `ensureColumn` / `ensureEmbeddingColumn` calls (one SELECT info_schema per column, with on-disk marker) with a single `healSchema(table, XXX_COLUMNS)` that delegates to the helper added two commits back: one SELECT against `information_schema.columns` per table, then ALTER only the genuinely missing columns. What goes away: - `private ensureColumn(table, column, sqlType)` and its on-disk marker. - `private ensureEmbeddingColumn(table, column)` shim. - Ad-hoc `ensureColumn(tbl, "agent", …)` / `ensureColumn(tbl, "plugin_version", …)` calls duplicated across `ensureTable` and `ensureSessionsTable`. Future columns no longer need a matching call site. What stays: - `createTableWithRetry` — outer network-resilience loop on CREATE. - `ensureLookupIndex` and its marker (CREATE INDEX is a different beast — failure is rare, success is permanent, marker is cheap). - Strict identifier validation via `sqlIdent` on every table name. What stays correct by construction: - CREATE TABLE on a missing table uses `buildCreateTableSql(safe, XXX_COLUMNS)`, so the canonical schema is whatever `deeplake-schema.ts` says. No second copy to keep in sync. - Fresh CREATE skips the heal pass — the column set is exactly the one we just declared, so a SELECT would just confirm it for no work. - A drift assert at the bottom of `ensureTable` rejects a future refactor that removes `summary_embedding` from MEMORY_COLUMNS without updating `embeddings/columns.ts`. Test rewrite covers the full matrix: - `deeplake-api.test.ts` — fetch-level mocks of every code path (CREATE-only, SELECT+ALTER, all-present, info_schema error, non-race ALTER error, race-tolerated ALTER, custom name, cross-method listTables cache). - `schema-scenarios.test.ts` — the 7 schema/upgrade scenarios (greenfield, full legacy, half legacy memory/sessions, fully migrated, mixed mem-emb, mixed sess-emb) plus the cross-cutting invariants (race tolerance, non-race propagation, INSERT vector::at bubbling, legacy-`agent` heal). - `embeddings-schema.test.ts` — source-of-truth checks now look in `src/deeplake-schema.ts` and scope each regex to a single object literal so a later entry's SQL can't satisfy a regression match. Smoke-tested end-to-end against the live `test_plugin` org on 2026-05-18: a `skills` table missing `contributors` triggered exactly [INSERT-fail, SELECT info_schema (16 cols, no contributors), targeted ALTER, INSERT-retry-OK]; a second call against the now-current schema emitted exactly one SQL call (the INSERT).
Regenerated bundles for every agent (claude-code, codex, cursor, hermes, pi) plus the MCP server and the CLI: they inline `deeplake-schema.ts` and the new `healMissingColumns` flow, drop the old per-column ensureColumn / contributors-only retry helpers, and embed the canonical column lists. Produced by `npm run build` (tsc + esbuild). No source-level changes in this commit; the previous three commits are the actual refactor.
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 4 files changed
Generated for commit 749d6f6. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR refactors Deeplake table schema management from ad-hoc, marker-gated per-column migrations to a centralized schema-driven approach. A new ChangesCore Schema Refactoring
Test Coverage
Bundled Deployments
Sequence Diagram(s)sequenceDiagram
participant Caller
participant healMissingColumns
participant informationSchema as information_schema
participant alterTable as ALTER TABLE
Caller->>healMissingColumns: query, tableName, workspaceId, columns
healMissingColumns->>informationSchema: SELECT existing columns
informationSchema-->>healMissingColumns: column_name list
healMissingColumns->>healMissingColumns: diff against expected
alt all columns present
healMissingColumns-->>Caller: {missing:[], altered:[]}
else missing columns found
loop for each missing column
healMissingColumns->>alterTable: ALTER TABLE ADD COLUMN
alt success
alterTable-->>healMissingColumns: ✓
else already exists error
healMissingColumns->>informationSchema: re-check presence
alt confirmed present
informationSchema-->>healMissingColumns: found
else still missing
healMissingColumns-->>Caller: ✗ error (re-throw)
end
end
end
healMissingColumns-->>Caller: {missing:[...], altered:[...]}
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The refactor is substantial and repetitive across many bundled variants (homogeneous pattern), but involves intricate schema-healing logic with race conditions, multiple test suite rewrites, and subtle behavioral changes to table provisioning that demand careful verification of correctness across each component. Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cursor/bundle/session-start.js (1)
1548-1566:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAuto-pull still bypasses the new skills-table heal path.
Line 1552 creates a
DeeplakeApi, but this path goes straight torunPull()without callingensureSkillsTable(). On a drifted legacy skills table, read-only clients will still fail on any missing column other thancontributors, andautoPullSkills()then just swallows that asreason: "error"until some writer happens to repair the schema.Suggested fix
async function autoPullSkills(deps = {}) { if (process.env.HIVEMIND_AUTOPULL_DISABLED === "1") { log4("disabled via HIVEMIND_AUTOPULL_DISABLED=1"); return { pulled: 0, skipped: true, reason: "disabled" }; } const loadFn = deps.loadConfigFn ?? loadConfig; const config = loadFn(); if (!config) { log4("skipped: not logged in"); return { pulled: 0, skipped: true, reason: "not-logged-in" }; } let query; + let ensureSkillsTable; if (deps.queryFn) { query = deps.queryFn; } else { const api = new DeeplakeApi(config.token, config.apiUrl, config.orgId, config.workspaceId, config.skillsTableName); query = (sql) => api.query(sql); + ensureSkillsTable = () => api.ensureSkillsTable(config.skillsTableName); } const install = deps.install ?? "global"; const timeoutMs = deps.timeoutMs ?? DEFAULT_TIMEOUT_MS; try { - const summary = await withTimeout(runPull({ - query, - tableName: config.skillsTableName, - install, - cwd: install === "project" ? deps.cwd ?? process.cwd() : void 0, - users: [], - dryRun: false, - force: false - }), timeoutMs); + const summary = await withTimeout((async () => { + await ensureSkillsTable?.(); + return runPull({ + query, + tableName: config.skillsTableName, + install, + cwd: install === "project" ? deps.cwd ?? process.cwd() : void 0, + users: [], + dryRun: false, + force: false + }); + })(), timeoutMs); log4(`pulled scanned=${summary.scanned} wrote=${summary.wrote} skipped=${summary.skipped}`); return { pulled: summary.wrote, skipped: false }; } catch (e) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cursor/bundle/session-start.js` around lines 1548 - 1566, The auto-pull path constructs a DeeplakeApi and then calls runPull without first invoking the skills-table migration/validation, so legacy-read-only clients can fail on missing columns; before calling runPull in the branch that creates DeeplakeApi, call the existing ensureSkillsTable (or the migration helper used elsewhere) for config.skillsTableName using that DeeplakeApi instance (or call autoPullSkills()/ensureSkillsTable-like helper) and only proceed to runPull after ensureSkillsTable completes successfully, keeping the same timeout handling and passing the same install/cwd/users/dryRun/force parameters.
🧹 Nitpick comments (2)
src/deeplake-schema.ts (1)
134-137: ⚡ Quick winSanitize
ColumnDef.namewhere the SQL is emitted.
validateSchema()only protects the three built-in arrays at module load. These helpers are exported and accept arbitrarycolumns, so a future caller can bypass identifier validation for column names even thoughtableNameis already sanitized.🔒 Proposed fix
export function buildCreateTableSql(tableName: string, cols: readonly ColumnDef[]): string { const safe = sqlIdent(tableName); - const colSql = cols.map(c => `${c.name} ${c.sql}`).join(", "); + const colSql = cols.map(c => `${sqlIdent(c.name)} ${c.sql}`).join(", "); return `CREATE TABLE IF NOT EXISTS "${safe}" (${colSql}) USING deeplake`; } @@ const altered: string[] = []; for (const col of missingCols) { try { - await args.query(`ALTER TABLE "${safeTable}" ADD COLUMN ${col.name} ${col.sql}`); + const safeColumn = sqlIdent(col.name); + await args.query(`ALTER TABLE "${safeTable}" ADD COLUMN ${safeColumn} ${col.sql}`); altered.push(col.name);Also applies to: 196-216
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/deeplake-schema.ts` around lines 134 - 137, The code emits raw ColumnDef.name into SQL; update buildCreateTableSql (and the related helper around lines 196-216, e.g., the alter/create helpers) to sanitize column identifiers by passing c.name through sqlIdent (or the module's identifier-quoting helper) instead of interpolating c.name directly; build each column fragment as `${sqlIdent(c.name)} ${c.sql}` (or equivalent) so column names are properly escaped/quoted while preserving c.sql for the type/constraints.mcp/bundle/server.js (1)
23806-23821: 💤 Low valueDrift check only runs for existing tables.
The
SUMMARY_EMBEDDING_COLvalidation at lines 23819-23821 only executes when the table already exists. If the schema constant drifts, a freshly created table would silently have the wrong schema. Consider moving this guard before the table-existence branch or validating it once at module load alongside thevalidateSchemacalls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mcp/bundle/server.js` around lines 23806 - 23821, The SUMMARY_EMBEDDING_COL presence check currently runs only for existing tables inside ensureTable; move or add that guard so it runs before the create-or-heal branch (or validate once at module load) to prevent creating a new table with a drifted MEMORY_COLUMNS schema. Specifically, in ensureTable (or during module init), assert MEMORY_COLUMNS.some(c => c.name === SUMMARY_EMBEDDING_COL) and throw the same Error if missing; this check should occur prior to calling listTables/createTableWithRetry/healSchema and should use the same SUMMARY_EMBEDDING_COL, MEMORY_COLUMNS, and createTableWithRetry/healSchema symbols so the code fails fast on schema drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bundle/cli.js`:
- Around line 4509-4518: The ensure* table functions currently return
immediately after creating a table when listTables() didn't include it, which
skips the subsequent healSchema call; change the flow so the post-create path
falls through to call healSchema(tbl, MEMORY_COLUMNS) (i.e., remove the early
return in the create branch, ensure _tablesCache is updated to include tbl if
needed, then call healSchema), and apply the same fallthrough fix to
ensureSessionsTable and ensureSkillsTable so the schema heal runs even when
CREATE TABLE IF NOT EXISTS was a no-op.
In `@claude-code/bundle/session-start.js`:
- Around line 603-615: The early-return after the CREATE attempt in ensureTable
can skip healSchema when another worker created the table concurrently; change
the flow so that after calling createTableWithRetry(buildCreateTableSql(...),
tbl) you do NOT return immediately but refresh/compute the current table name
list and always call healSchema(tbl, MEMORY_COLUMNS) (and only then update
this._tablesCache if needed); apply the same pattern to ensureSessionsTable and
ensureSkillsTable so healSchema always runs even when CREATE TABLE IF NOT EXISTS
was a no-op.
In `@claude-code/bundle/shell/deeplake-shell.js`:
- Around line 67283-67298: In ensureTable, the invariant that MEMORY_COLUMNS
includes SUMMARY_EMBEDDING_COL must run before the early-return create branch;
move the drift guard that checks SUMMARY_EMBEDDING_COL up so it executes before
the "if (!tables.includes(tbl)) { ... return; }" block in ensureTable, and throw
the same Error if the column is missing; this ensures createTableWithRetry
cannot run for a fresh table when MEMORY_COLUMNS has drifted.
In `@codex/bundle/shell/deeplake-shell.js`:
- Around line 67287-67295: The create-branch currently returns after creating
the table (in the block that calls createTableWithRetry(buildCreateTableSql(tbl,
MEMORY_COLUMNS), tbl)), which skips the subsequent healSchema(tbl,
MEMORY_COLUMNS) and leaves concurrently-created older schemas unhealed; remove
the early return so the flow falls through and always calls healSchema(tbl,
MEMORY_COLUMNS) after the CREATE path. Make the identical change in the other
two variants referenced (the sessions and skills variants around the same
pattern), ensuring that after calling createTableWithRetry(...) you update
_tablesCache as needed but do not return—allow execution to proceed to
healSchema(...) in createTableWithRetry, sessions, and skills code paths.
In `@cursor/bundle/shell/deeplake-shell.js`:
- Around line 67287-67295: The current branch skips healSchema(tbl,
MEMORY_COLUMNS) when createTableWithRetry is called but another process created
the table first; always run the schema healing pass after the CREATE attempt.
Change the flow in the block that checks tables/includes tbl so that after
awaiting createTableWithRetry(buildCreateTableSql(tbl, MEMORY_COLUMNS), tbl) you
always call await this.healSchema(tbl, MEMORY_COLUMNS) (and then update
this._tablesCache = [...tables, tbl] as needed), ensuring the same adjustment is
applied to the other identical blocks (the ones around
createTableWithRetry/buildCreateTableSql/MEMORY_COLUMNS at the referenced
locations) so healSchema runs regardless of whether the CREATE was a no-op.
In `@hermes/bundle/commands/auth-login.js`:
- Around line 780-791: The ensureTable implementation currently returns
immediately after creating a missing table and thus skips calling healSchema,
which can leave an older schema if CREATE TABLE was a no-op; update ensureTable
so that after awaiting createTableWithRetry(buildCreateTableSql(tbl,
MEMORY_COLUMNS), tbl) you still update this._tablesCache as needed and then call
await this.healSchema(tbl, MEMORY_COLUMNS) before returning; apply the same
change in the other create-branch locations (the blocks around the other
ensureTable-like branches mentioned) so createTableWithRetry, listTables,
_tablesCache and healSchema are used consistently.
In `@hermes/bundle/session-start.js`:
- Around line 594-608: ensureTable currently skips healSchema and the
SUMMARY_EMBEDDING_COL check when the cached listTables() shows the table missing
and createTableWithRetry becomes a no-op; change the flow so that after
attempting creation (via createTableWithRetry) you always call healSchema(tbl,
MEMORY_COLUMNS) and perform the SUMMARY_EMBEDDING_COL check regardless of
whether the table was believed to exist beforehand. Apply the same pattern to
ensureSessionsTable() and ensureSkillsTable(): always run healSchema(...) and
validate MEMORY_COLUMNS (including SUMMARY_EMBEDDING_COL) after the
create-or-exists attempt, and update the _tablesCache only for cache maintenance
but not to gate schema healing.
In `@hermes/bundle/shell/deeplake-shell.js`:
- Around line 67287-67298: The drift check for SUMMARY_EMBEDDING_COL is skipped
for the create-path return; move or duplicate the assertion so that after
creating a table via createTableWithRetry(buildCreateTableSql(tbl,
MEMORY_COLUMNS), tbl) (and before returning) you run the same schema drift check
that follows healSchema: verify MEMORY_COLUMNS.some(c => c.name ===
SUMMARY_EMBEDDING_COL) and throw the Error if missing; ensure the check still
runs for the existing-table path that calls healSchema(tbl, MEMORY_COLUMNS) so
both new and existing tables validate the embedding column (references:
SUMMARY_EMBEDDING_COL, MEMORY_COLUMNS, createTableWithRetry,
buildCreateTableSql, healSchema, _tablesCache).
In `@pi/bundle/skillify-worker.js`:
- Around line 474-477: The current branch that handles isMissingTableError(msg)
immediately runs CREATE TABLE IF NOT EXISTS via
buildCreateTableSql(args.tableName, SKILLS_COLUMNS) then retries the INSERT
(args.query(sql)), which can still fail if another worker created an older table
missing columns; change the flow so that after running the CREATE, you do not
blindly retry the INSERT — instead re-run the INSERT and if it fails with a
missing-column error delegate to the existing missing-column recovery logic (the
same handler used elsewhere), or explicitly check the table schema for missing
SKILLS_COLUMNS and apply the missing-column migration before retrying; locate
isMissingTableError, buildCreateTableSql, SKILLS_COLUMNS and the args.query(sql)
retry to implement this handoff.
In `@src/deeplake-api.ts`:
- Around line 402-416: The code currently bails out after creating a table when
listTables() (which is cached) didn't include tbl, so a stale cache plus CREATE
TABLE IF NOT EXISTS can leave legacy columns unhealed; modify
ensureMemoryTable/ensureSessionsTable/ensureSkillsTable so that after calling
createTableWithRetry(buildCreateTableSql(tbl, ...), tbl) you do NOT return
immediately but instead refresh/update the tables cache (update
this._tablesCache to include tbl) and call await this.healSchema(tbl,
<appropriate_COLUMNS>) unconditionally (or re-check existence and then call
healSchema) so the schema healing always runs after the create attempt;
reference methods: listTables, createTableWithRetry, buildCreateTableSql,
healSchema, and the _tablesCache field.
In `@src/skillify/skills-table.ts`:
- Around line 96-100: The lazy-create flow uses buildCreateTableSql and then
immediately retries the INSERT, which can still fail if another worker created
an older schema; after awaiting args.query(buildCreateTableSql(args.tableName,
SKILLS_COLUMNS)) invoke the schema-heal step to ensure the table has all columns
defined in SKILLS_COLUMNS (e.g., run the existing schema migration/alter logic
or an ensureColumns function for args.tableName) before retrying args.query(sql)
once; keep using isMissingTableError to gate this path and still only retry the
insert a single time.
---
Outside diff comments:
In `@cursor/bundle/session-start.js`:
- Around line 1548-1566: The auto-pull path constructs a DeeplakeApi and then
calls runPull without first invoking the skills-table migration/validation, so
legacy-read-only clients can fail on missing columns; before calling runPull in
the branch that creates DeeplakeApi, call the existing ensureSkillsTable (or the
migration helper used elsewhere) for config.skillsTableName using that
DeeplakeApi instance (or call autoPullSkills()/ensureSkillsTable-like helper)
and only proceed to runPull after ensureSkillsTable completes successfully,
keeping the same timeout handling and passing the same
install/cwd/users/dryRun/force parameters.
---
Nitpick comments:
In `@mcp/bundle/server.js`:
- Around line 23806-23821: The SUMMARY_EMBEDDING_COL presence check currently
runs only for existing tables inside ensureTable; move or add that guard so it
runs before the create-or-heal branch (or validate once at module load) to
prevent creating a new table with a drifted MEMORY_COLUMNS schema. Specifically,
in ensureTable (or during module init), assert MEMORY_COLUMNS.some(c => c.name
=== SUMMARY_EMBEDDING_COL) and throw the same Error if missing; this check
should occur prior to calling listTables/createTableWithRetry/healSchema and
should use the same SUMMARY_EMBEDDING_COL, MEMORY_COLUMNS, and
createTableWithRetry/healSchema symbols so the code fails fast on schema drift.
In `@src/deeplake-schema.ts`:
- Around line 134-137: The code emits raw ColumnDef.name into SQL; update
buildCreateTableSql (and the related helper around lines 196-216, e.g., the
alter/create helpers) to sanitize column identifiers by passing c.name through
sqlIdent (or the module's identifier-quoting helper) instead of interpolating
c.name directly; build each column fragment as `${sqlIdent(c.name)} ${c.sql}`
(or equivalent) so column names are properly escaped/quoted while preserving
c.sql for the type/constraints.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 0860dccc-2902-4979-a980-51c5a8e6cb52
📒 Files selected for processing (40)
bundle/cli.jsclaude-code/bundle/capture.jsclaude-code/bundle/commands/auth-login.jsclaude-code/bundle/pre-tool-use.jsclaude-code/bundle/session-start-setup.jsclaude-code/bundle/session-start.jsclaude-code/bundle/shell/deeplake-shell.jsclaude-code/bundle/skillify-worker.jscodex/bundle/capture.jscodex/bundle/commands/auth-login.jscodex/bundle/pre-tool-use.jscodex/bundle/session-start-setup.jscodex/bundle/session-start.jscodex/bundle/shell/deeplake-shell.jscodex/bundle/skillify-worker.jscodex/bundle/stop.jscursor/bundle/capture.jscursor/bundle/commands/auth-login.jscursor/bundle/pre-tool-use.jscursor/bundle/session-start.jscursor/bundle/shell/deeplake-shell.jscursor/bundle/skillify-worker.jshermes/bundle/capture.jshermes/bundle/commands/auth-login.jshermes/bundle/pre-tool-use.jshermes/bundle/session-start.jshermes/bundle/shell/deeplake-shell.jshermes/bundle/skillify-worker.jsmcp/bundle/server.jspi/bundle/autopull-worker.jspi/bundle/skillify-worker.jssrc/deeplake-api.tssrc/deeplake-schema.tssrc/skillify/skillify-worker.tssrc/skillify/skills-table.tstests/claude-code/deeplake-api.test.tstests/claude-code/deeplake-schema.test.tstests/claude-code/embeddings-schema.test.tstests/claude-code/schema-scenarios.test.tstests/claude-code/skillify-skills-table.test.ts
Address CodeRabbit review on PR #177. Three coupled fixes, one root cause: `listTables()` is cached, so `CREATE TABLE IF NOT EXISTS` can silently no-op against a concurrently-created legacy table, and the previous code skipped `healSchema()` on that path. The next read/write then failed on missing columns. Changes: - `src/deeplake-api.ts` — `ensureTable`, `ensureSessionsTable`, `ensureSkillsTable` now run `healSchema(...)` unconditionally after the create/exists decision. Removed the early `return` on the CREATE branch in `ensureTable` and the `if-else` split in the other two. On a genuinely fresh CREATE the heal pass adds one SELECT info_schema (~250ms) and zero ALTERs; on a race-detected legacy table it ALTERs only the missing columns. - `src/deeplake-api.ts` — moved the `SUMMARY_EMBEDDING_COL` drift guard to the very top of `ensureTable`, before any SQL. Previously it ran after `healSchema` and was skipped on the create-and-return path, so a fresh CREATE could materialise an incomplete schema if `MEMORY_COLUMNS` ever drifted from `embeddings/columns.ts`. - `src/skillify/skills-table.ts` — `insertSkillRow`'s `isMissingTableError` branch now calls `healMissingColumns(...)` between the CREATE and the retry INSERT. Same race: a concurrent worker could land a legacy table between our INSERT failure and our CREATE no-op; without the heal pass, the retry INSERT would fail with the same missing-column error. Tests updated to match the new sequence: - `tests/claude-code/deeplake-api.test.ts` — fresh-CREATE paths now expect `listTables + CREATE + post-CREATE heal SELECT`. New regression: `heals after CREATE: race-detected legacy table gets ALTERed before returning`. - `tests/claude-code/schema-scenarios.test.ts` — scenarios 1, 3, 4 updated for the unconditional post-CREATE heal SELECT. - `tests/claude-code/skillify-skills-table.test.ts` — the missing-table path now asserts `INSERT-fail → CREATE → SELECT info_schema → INSERT-retry`. New regression: `CREATE-then-INSERT race: lazy-create no-ops vs legacy table → heal pass adds the missing column`. - `tests/claude-code/deeplake-api-retry.test.ts` — fix Response body-reuse error after the extra heal SELECT was added. Smoke re-run against the live `test_plugin` org with the same legacy schema repro: the worker still emits exactly INSERT-fail → SELECT info_schema → ALTER → INSERT-retry, and a follow-up call on the healed table emits a single INSERT. Full test suite: 2601 / 2601.
Regenerated bundles for every agent (claude-code, codex, cursor, hermes, pi) plus MCP server and CLI. Bundles now inline the unconditional post-CREATE heal pass in `ensureTable` / `ensureSessionsTable` / `ensureSkillsTable` and the worker's CREATE-then-heal-then-retry path in `insertSkillRow`. Produced by `npm run build`. No further source changes in this commit.
…bles Mirror the existing race regression test from `ensureTable` to `ensureSessionsTable` and `ensureSkillsTable`. The source already calls `healSchema` unconditionally on all three paths (commit a2700c8), but only the memory table had an explicit test proving the heal pass repairs a race-detected legacy table; sessions and skills were validated only by code-shape parity. Each new test feeds the mock fetch sequence: listTables(empty) → CREATE(no-op vs legacy) → heal SELECT(legacy schema missing one column) → ALTER → CREATE INDEX and asserts the targeted ALTER fires before `ensureLookupIndex` returns. Closes the [P2] coverage gap codex flagged on PR #177.
| // otherwise break CREATE TABLE / CREATE INDEX startup). | ||
| const safe = sqlIdent(tableName); | ||
| return ( | ||
| `CREATE TABLE IF NOT EXISTS "${safe}" (` + |
There was a problem hiding this comment.
@efenocchi was table creation moved to somewhere else? did you cover the case of creating the first skill? so the case when there is no skills table
There was a problem hiding this comment.
Table creation is still in skills-table.ts, inline in insertSkillRow (the lazy-create branch). First-skill flow:
- INSERT fires (line 92)
- catches
isMissingTableError(line 96) — table truly does not exist yet CREATE TABLE IF NOT EXISTS(line 104)healMissingColumnsheals the schema in case another worker raced us with a legacy CREATE between our INSERT and our own CREATE (lines 105–110)- retry the INSERT (line 111)
Covered in tests/claude-code/skillify-skills-table.test.ts: on first INSERT failing because the table is missing: CREATE → heal pass → retry INSERT (line 99) plus the race regression CREATE-then-INSERT race: lazy-create no-ops vs legacy table → heal pass adds the missing column (line 121).
| // retry would otherwise fail with the same missing-column error. | ||
| // healMissingColumns is cheap on a freshly-created table (1 | ||
| // SELECT info_schema, 0 ALTERs). | ||
| await args.query(buildCreateTableSql(args.tableName, SKILLS_COLUMNS)); |
There was a problem hiding this comment.
@efenocchi ok you moved it here seems like, but pls double check and confirm
There was a problem hiding this comment.
Confirmed. Table creation is still inline in insertSkillRow, at this exact line. Flow: INSERT → catch isMissingTableError → CREATE TABLE IF NOT EXISTS here (104) → healMissingColumns (105–110) → retry INSERT (111). Nothing was moved out of this file; the diff just inserts the explicit heal pass between CREATE and the retry, to close the listTables()-stale race CodeRabbit flagged on the parallel call sites in deeplake-api.ts.
There was a problem hiding this comment.
All tests are in claude code folder
There was a problem hiding this comment.
This file is in the right folder per the current convention. PR #133 (chore(tests): relocate agent-specific tests to tests/<agent>/) carved out tests/<agent>/ only for tests that exercise src/hooks/<agent>/*. Shared infrastructure tests (deeplake-api*, deeplake-schema*, skillify-*, embeddings-*, etc.) intentionally stayed in tests/claude-code/, and PR #133s own body flagged the further move of those ~70 shared tests to a neutral location as out-of-scope follow-up. Happy to file it as a separate cleanup PR if you want — out of scope here though.
There was a problem hiding this comment.
No, it should be moved, since they are not only for claude code but for other agents to @coderabbitai do we have folder for it, or new generic one should be created?
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Show the top-level tests directory structure
tree tests/ -L 2 2>/dev/null || find tests/ -maxdepth 2 -type d | sort
echo "---"
# List all test files in the root of tests/ (non-agent-specific)
ls tests/ 2>/dev/null
echo "---"
# Show all subdirectories under tests/
find tests/ -mindepth 1 -maxdepth 1 -type d | sort
Length of output: 5025
@kaghni There's no existing generic folder yet — the current subdirectories are all agent-specific (claude-code, codex, cursor, hermes, pi, openclaw) plus cli. The shared infra tests would need a new home.
A common convention for this pattern is tests/shared/ (clear, low-surprise) or tests/core/ (common in monorepos). Given the existing naming style, I'd suggest tests/shared/. The candidates to move there would be at minimum:
| File pattern | Reason it's shared |
|---|---|
deeplake-api*.test.ts |
Tests src/deeplake-api.ts — used by all agents |
deeplake-schema.test.ts |
Tests src/deeplake-schema.ts — used by all agents |
embeddings-schema.test.ts, embeddings-*.test.ts |
Shared embeddings infra |
schema-scenarios.test.ts |
Cross-agent schema migration scenarios |
skillify-skills-table.test.ts |
Tests src/skillify/skills-table.ts |
sql.test.ts, index-marker-store.test.ts, etc. |
Shared utilities |
Since this is out of scope for the current PR, would you like me to open a GitHub issue to track the move to tests/shared/?
🧬 Code Graph Analysis Results
[src/deeplake-api.ts] createTableWithRetry (private async createTableWithRetry(sql: string, label: string): Promise)
- Summary:
- Purpose: Perform a CREATE TABLE with an additional outer retry budget on transient failures.
- Parameters:
- sql: string — the CREATE TABLE SQL statement to execute.
- label: string — a human-friendly label used in logging.
- Returns: Promise (resolves on successful CREATE; rejects with the last error after all retries).
- Error handling:
- Retries up to OUTER_BACKOFFS_MS.length + 1 total attempts with backoffs [2000, 5000, 10000] ms.
- On each attempt failure, logs the attempt and error message, then sleeps before the next attempt (except after the final attempt).
- If all attempts fail, throws the last encountered error (lastErr).
- Important implementation details:
- Uses this.query(sql) to perform the CREATE TABLE.
- Maintains lastErr to propagate the final failure after retries.
- Logging includes the attempt number and error message for easier debugging.
[src/deeplake-api.ts] ensureTable(name?: string)
- Summary:
- Purpose: Ensure the specified memory table exists; if absent, create it with a retry mechanism and then heal the schema.
- Parameters:
- name?: string — optional table name; if omitted, uses the instance’s tableName.
- Returns: Promise (resolves after ensuring creation and healing).
- Error handling:
- Validates drift protection up front, derives a safe table name via sqlIdent, and checks existing tables via listTables.
- If the table is missing, calls createTableWithRetry(buildCreateTableSql(tbl, MEMORY_COLUMNS), tbl) to create it, then updates an internal cache if needed.
- Regardless of creation, runs healSchema(tbl, MEMORY_COLUMNS) to align the schema.
- Important implementation details:
- Uses MEMORY_COLUMNS to define the table schema during creation.
- Uses listTables to avoid races where another writer creates the table concurrently.
- The final heal step ensures the table’s columns match the canonical schema, ensuring consistent behavior across writers.
- Note: There is a comment indicating that BM25 indexing is currently disabled for fresh tables and not re-enabled here.
…-all-tables # Conflicts: # bundle/cli.js # claude-code/bundle/capture.js # claude-code/bundle/commands/auth-login.js # claude-code/bundle/pre-tool-use.js # claude-code/bundle/session-start-setup.js # claude-code/bundle/session-start.js # claude-code/bundle/shell/deeplake-shell.js # codex/bundle/capture.js # codex/bundle/commands/auth-login.js # codex/bundle/pre-tool-use.js # codex/bundle/session-start-setup.js # codex/bundle/session-start.js # codex/bundle/shell/deeplake-shell.js # codex/bundle/stop.js # cursor/bundle/capture.js # cursor/bundle/commands/auth-login.js # cursor/bundle/pre-tool-use.js # cursor/bundle/session-start.js # cursor/bundle/shell/deeplake-shell.js # hermes/bundle/capture.js # hermes/bundle/commands/auth-login.js # hermes/bundle/pre-tool-use.js # hermes/bundle/session-start.js # hermes/bundle/shell/deeplake-shell.js # mcp/bundle/server.js # pi/bundle/autopull-worker.js # src/deeplake-api.ts
The 90% per-file branch threshold for `src/deeplake-api.ts` broke after
the merge of `main`: `maybeSignalBalanceExhausted` enqueues a banner with
`enqueueNotification({...}).catch(e => log(...))`, and the resolved-path
was the only one exercised. The `.catch` branch was uncovered, taking
the file from 91.0% to 89.55% — below the gate.
Add one regression in `tests/claude-code/deeplake-api-balance-exhausted.test.ts`
that overrides the default `mockResolvedValue(undefined)` with
`mockRejectedValueOnce(...)`, drives the same 402 path, and asserts:
* the original `Query failed: 402` still propagates to the caller
* the enqueue rejection is silently logged (not re-thrown, not surfaced
as an unhandled rejection)
Brings branch coverage back to 90.29% (over the 90% threshold), and pins
the fire-and-forget semantics so a future refactor that drops the
`.catch` would fail this test instead of letting the rejection escape.
|
@kaghni replied inline on all three of your review comments. Quick recap of where things stand:
Also: CI was failing after the main merge because the coverage gate on Let me know if anything else needs another pass. |
Addresses kaghni's review comment on PR #177 (`All tests are in claude code folder`). PR #183 carved out `tests/shared/` as the new home for tests that exercise non-agent- specific `src/` modules; this PR's tests all fit that bucket and were sitting under `tests/claude-code/` by the old convention. Moves the 7 tests this PR touches: tests/claude-code/deeplake-api-balance-exhausted.test.ts tests/claude-code/deeplake-api-retry.test.ts tests/claude-code/deeplake-api.test.ts tests/claude-code/deeplake-schema.test.ts tests/claude-code/embeddings-schema.test.ts tests/claude-code/schema-scenarios.test.ts tests/claude-code/skillify-skills-table.test.ts → tests/shared/<same names> Mechanical move only: * Every import in every file is either `vitest`, `node:*`, or `../../src/...`. The relative depth from `tests/shared/` matches the prior depth from `tests/claude-code/`, so no path edits are needed. * No `__dirname` / `import.meta.url` / `process.cwd()` usage that would react to the new path. * The bundle-scan in `embeddings-schema.test.ts` reads files relative to CWD (`claude-code/bundle/...`), which vitest sets to the repo root regardless of which tests/ subdirectory the file lives in. `tests/shared/` is already wired into `vitest.config.ts` test.include (added by PR #183), and `vitest.config.ts` per-file coverage thresholds key on source paths, not test paths. Full suite (env-clean) post-move: 137/137 files, 2796/2796 tests.
Why
Schema healing was inconsistent across the three Deeplake-managed tables:
memory/sessions(ensureTable/ensureSessionsTable): one SELECT info_schema per column plus a local marker file, with ad-hocensureColumn(table, "agent", …)/ensureColumn(table, "plugin_version", …)calls. Every new column required a matching call site or pre-existing tables would silently drop INSERTs after upgrade.skills(workerinsertSkillRow): retry only matched the literal column namecontributors. Any future schema addition would fall through the retry and fail INSERTs forever on stale tables.Underlying reason for the per-column / marker design: a historical Deeplake "post-ALTER bug" (a ~30s window of failing INSERTs after every ALTER). Re-probed against
api.deeplake.aiin thetest_pluginorg on 2026-05-18: bug no longer reproduces (71/71 INSERTs OK, first success 2ms after ALTER — repro script in PR description footer).What changes
New module
src/deeplake-schema.tsSingle source of truth as
{name, sql}arrays:MEMORY_COLUMNS(14),SESSIONS_COLUMNS(14),SKILLS_COLUMNS(17)buildCreateTableSql(table, cols)— derivesCREATE TABLEfrom the list.healMissingColumns({ query, tableName, workspaceId, columns })— one SELECT againstinformation_schema.columns, diff in memory, thenALTER TABLE ADD COLUMNonly the genuinely missing columns. Race ("already exists" from a concurrent writer) caught and re-verified via a second SELECT — that's the only tolerated ALTER failure.isMissingTableError,isMissingColumnError— generic error classifiers. Both excludepermission deniedso auth problems aren't masked as heal-able.Module-load lint rejects any
NOT NULLcolumn lackingDEFAULT(ALTER on a populated table would otherwise fail).src/deeplake-api.tsensureTable/ensureSessionsTable/ensureSkillsTablenow:listTables(cached) → CREATE withbuildCreateTableSql(..., XXX_COLUMNS)if missing.healMissingColumnsonce → one SELECT info_schema, ALTER only the truly missing.Gone:
ensureColumn,ensureEmbeddingColumn, the per-column on-disk marker, the ad-hocensureColumn(tbl, "agent", …)calls.ensureLookupIndexand its marker stay (CREATE INDEX is different — rare failure, permanent success, cheap marker).src/skillify/skills-table.ts(worker INSERT retry)Drops the
contributors-only branch. OnisMissingColumnError, runshealMissingColumnsoverSKILLS_COLUMNSand retries the INSERT once. If the diff reportsmissing: [](the failing column isn't in our schema), the original error propagates rather than looping.InsertSkillRowArgs.workspaceIdadded (mandatory) so the catalog SELECT scopes to this workspace — prevents a false PRESENT from a same-named table elsewhere.Behaviour summary
contributors→ INSERT fails foreverensureColumncall required in 2 places + worker retry updateXXX_COLUMNSarrayTests
tests/claude-code/deeplake-schema.test.ts(28 tests): schema lint, builder injection guard, heal flow (no-op / one missing / multiple in schema order / race tolerance / non-race propagation / case-insensitive catalog match), classifiers.tests/claude-code/deeplake-api.test.ts— rewrote theensureXxxTabledescribes for the new flow (CREATE-only / SELECT+ALTER / all-present / info_schema error / non-race ALTER error / race-tolerated ALTER / custom name / cross-method listTables cache).tests/claude-code/schema-scenarios.test.ts— all 7 schema/upgrade scenarios + cross-cutting invariants rewritten against the new SQL shape.tests/claude-code/skillify-skills-table.test.ts— worker retry covers any missing column (regression guard for future schema additions), multi-column heal, race tolerance, rethrow on unknown column, rethrow on permission-denied.tests/claude-code/embeddings-schema.test.ts— points atsrc/deeplake-schema.ts; regex scoped to single object literals so a later entry's SQL can't satisfy a regression match.Full suite: 2599 / 2599 passing. (Two pre-existing unhandled rejections in
session-notifications-hook.test.tsare unrelated to this change — missing/tmp/hivemind-notif-hook-*/paths from another test fixture.)Test plan
npm run typechecknpm run buildnpx vitest runtest_pluginorg: created askillstable with the legacy schema (nocontributors), calledinsertSkillRowagainst it, observed the exact sequence:Re-probe script for the post-ALTER bug
For posterity / future readers wondering why the marker is gone:
If a future Deeplake regression re-introduces the window, add the marker back at the
healMissingColumnsboundary — the rest of the surface doesn't need to change.Summary by CodeRabbit
Bug Fixes
Refactor