feat(sql): cross-DB join key inference via prefix/suffix overlap by sahrizvi · Pull Request #761 · AltimateAI/altimate-code

sahrizvi · 2026-04-28T10:39:51Z

Closes #758

Motivation

When altimate-code is connected to multiple warehouses, columns that point at the same entity often have mismatched naming or formatting — e.g., one DB stores cust_42 while another stores 42, or one warehouse uses businessid_X and another uses businessref_X. There's currently no built-in way to surface candidate joins across DBs; the agent has to discover them per-pair, which costs tokens and is brittle.

Common situations where this lands:

Post-acquisition system integration (acquired company's ID format ≠ parent's)
Multi-vendor analytics (Salesforce AccountId ↔ Hubspot company_id ↔ internal account_external_id)
Migrations leaving prefix differences (v1_user_123 vs user_123)
Per-source ingestion with system-specific ID schemes

What's added

A new tool altimate_core.detect_join_candidates that:

Pulls ~50 sample values per string-typed column from each configured warehouse via the existing connector.execute plumbing.
Computes the longest common prefix per column, walking back to the last _/-/: separator (so partial-name matches don't count).
Compares prefix-stripped suffixes pairwise across DBs; emits a candidate when both columns have non-empty distinct prefixes AND share at least one suffix.
Returns ranked candidates: {left_db, left_table, left_col, right_db, right_table, right_col, prefix_rule, suffix_overlap, confidence} where confidence = overlap / min(|left_suffixes|, |right_suffixes|) (cheap and monotonic, deliberately not a probability).

Errors are scoped: per-table sample failures don't kill the scan; per-connection failures are returned in connection_errors. Identifier quoting is ANSI-safe (double quotes). The STRING_TYPE_PATTERN filter restricts sampling to text-like columns (varchar/text/char/string/uuid/json) — numeric/temporal columns are skipped.

Tests

21 new tests in test/altimate/tools/altimate-core-detect-join-candidates.test.ts:

commonPrefix LCP logic + separator handling (_/-/:) + no-separator rejection + non-string defensive paths
detectJoinCandidatesFromBags: canonical pattern, same-DB rejection, identical-prefix rejection, zero-overlap rejection, no-separator rejection, ranking by overlap/confidence, full N*(N-1)/2 fan-out
Integration: two real bun:sqlite :memory: DBs holding 10 businessid_X rows + 8 businessref_X rows (with intentional non-overlapping outliers), driven through the actual native handler with Registry.get stubbed
Tool surface: title/output formatting, empty-list rendering, connection-error rendering, dispatcher-throws envelope

Results:

This file: 21 pass / 0 fail
Sibling tests under test/altimate/tools/: 192 pass / 0 fail (the 9 pre-existing failures in sql-classify.test.ts were verified unrelated by stashing this branch's changes — same 9 fail on main)
Full opencode suite when merged with [feat] Headless mode: commit best-guess answer at max-turns instead of meta-prose #759 + [feat] Collapse entity-per-table warehouses in schema_index (composite digest) #760: 7421 pass / 0 fail / 494 skip (~88s)
bun run typecheck: clean (0 errors after bun install in the worktree)

The existing altimate-core-native.test.ts count assertion was updated (34 → 35) to reflect the new method, with altimate_core.detect_join_candidates added to the canonical ALL_METHODS list.

Backwards compatibility

Pure addition. No existing behaviour changes; no existing tests modified beyond the count update.

Files

packages/opencode/src/altimate/native/connections/detect-join-candidates.ts (new — 317 LOC)
packages/opencode/src/altimate/native/connections/register.ts (handler registration)
packages/opencode/src/altimate/native/types.ts (params + result types)
packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts (new tool wrapper)
packages/opencode/src/altimate/index.ts (barrel export)
packages/opencode/src/tool/registry.ts (tool registration)
packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts (new — 385 LOC)
packages/opencode/test/altimate/altimate-core-native.test.ts (count assertion update)

Summary by cubic

Adds cross-DB join key inference that finds candidate joins by stripping distinct prefixes and matching suffixes across warehouses. Now uses dialect-aware sampling SQL, stricter permission/error handling, and safer input limits.

New Features
- Exposes altimate_core.detect_join_candidates and the altimate_core_detect_join_candidates tool; registered and exported.
- Builds sampling SQL with dialect-aware quoting via quoteIdentForDialect; no inline LIMITs (caps passed to drivers).
- Prefix logic trims to _/-/:; prefixes must differ; emits on suffix overlap; ranks by suffix_overlap, then match_score = overlap / min(left, right).
- Samples ~50 string-like values/column; scans up to 50 tables/connection; optional schema_name; cross-DB only; skips non-string columns.
- Robustness: gated by sql_execute_read permission; parallel per-connection scans; returns connection_errors and bounded partial_errors; input caps (connections ≤ 16, sample_size ≤ 1000, max_tables_per_connection ≤ 500).

^{Written for commit 80758f5. Summary will update on new commits. Review in cubic}

Summary by CodeRabbit

New Features
- Added an Altimate tool to detect and rank cross-database join candidates with sampling, dialect-aware SQL handling, and detailed error reporting.
Chores
- Exposed the new detection capability through the native bridge and tool registry for discovery and use.
Tests
- Added comprehensive unit and integration tests covering detection logic, sampling, dialect quoting, and tool execution paths.

Adds the `altimate_core_detect_join_candidates` tool: given two or more warehouse connection names, pull a small bag of string sample values per (table, column), then for every cross-DB pair compute the longest common value prefix on each side. When both prefixes end in `_`, `-`, or `:`, differ from each other, and leave at least one matching suffix after stripping, emit a ranked join candidate. This targets the canonical pattern where one DB stores `businessid_42` and another stores `businessref_42` — the inference is purely value-based so it survives schemas that disagree on naming conventions. - Algorithm in `native/connections/detect-join-candidates.ts` (port of dab_bench's `_detect_join_candidates` / `_common_prefix`). - Native handler registered as `altimate_core.detect_join_candidates`, ranked by suffix overlap then confidence. - Tool wrapper, registry entry, and barrel export. - Tests: 21 cases covering the prefix walk-back, suffix overlap, ranking, and an integration test against two `bun:sqlite` `:memory:` DBs holding the canonical `businessid_X` ↔ `businessref_X` pattern.

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai · 2026-04-28T10:40:05Z

📝 Walkthrough

Walkthrough

Adds an end-to-end cross-database join-candidate detection feature: sampling string-like column values from multiple connections, inferring separator-aware prefix/suffix join patterns, exposing a native handler, a tool wrapper, types, registry export, and tests.

Changes

Cohort / File(s)	Summary
Core Detection Logic `packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`	New native module implementing sampling, separator-aware prefix/suffix inference, candidate ranking, and the `detectJoinCandidates` entrypoint.
Types & Dispatcher Registration `packages/opencode/src/altimate/native/types.ts`, `packages/opencode/src/altimate/native/connections/register.ts`	Adds `AltimateCoreDetectJoinCandidatesParams` and registers `altimate_core.detect_join_candidates` handler.
Tool Layer `packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts`	New tool wrapper `altimate_core_detect_join_candidates` that requests permission, calls dispatcher, formats results, and exposes internals for tests.
Barrel Export & Registry `packages/opencode/src/altimate/index.ts`, `packages/opencode/src/tool/registry.ts`	Re-exports the new tool and adds it to the tool registry list.
Identifier Quoting Helper `packages/opencode/src/altimate/native/connections/data-diff.ts`	Made `quoteIdentForDialect` exported so other native modules reuse dialect-aware identifier quoting.
Tests `packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts`, `packages/opencode/test/altimate/altimate-core-native.test.ts`	Adds comprehensive unit/integration tests for detection logic, sampling SQL, end-to-end sampling with in-memory DBs, tool execution envelopes, and updates native-method count assertions.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Tool as AltimateCoreDetectJoinCandidatesTool
    participant Dispatcher
    participant Detection as detectJoinCandidates
    participant Sampler as collectSampleBags
    participant DB1 as Connection1
    participant DB2 as Connection2
    participant Inference as detectJoinCandidatesFromBags

    Client->>Tool: execute(params)
    Tool->>Dispatcher: call("altimate_core.detect_join_candidates", params)
    Dispatcher->>Detection: detectJoinCandidates(params)
    Detection->>Sampler: collectSampleBags(params)
    par Parallel Sampling
        Sampler->>DB1: sample non-null string columns
        DB1-->>Sampler: sample values
        Sampler->>DB2: sample non-null string columns
        DB2-->>Sampler: sample values
    end
    Sampler-->>Detection: { bags, partialErrors, connectionErrors }
    Detection->>Inference: detectJoinCandidatesFromBags(bags)
    Inference-->>Detection: sorted JoinCandidate[]
    Detection-->>Dispatcher: { success, candidates, errors, partialErrors }
    Dispatcher-->>Tool: AltimateCoreResult
    Tool->>Client: formatted output + metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[feat] Cross-DB join key inference (prefix/suffix overlap) #758 — Implements the cross-DB join-key inference feature described in the issue (native handler, types, tool, and tests).

Possibly related PRs

feat: add MSSQL/Fabric support to data-parity skill #705 — Both PRs expose/reuse the dialect-aware identifier-quoting logic (quoteIdentForDialect) used by the new detector.

Suggested labels

contributor

Poem

🐰 Hops and hops, I sample strings by night,
Prefixes trimmed, suffixes taking flight.
Candidates lined up, ranked in a row,
Cross-DB joins now wink and glow.
A little rabbit cheers: "Let queries go!"

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is missing the required 'PINEAPPLE' identifier at the top, as mandated by the template for AI-generated contributions.	Add 'PINEAPPLE' as the very first line of the PR description, before any other content, per the template requirement for AI-generated content.
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat(sql): cross-DB join key inference via prefix/suffix overlap' is concise, specific, and accurately summarizes the primary change—introducing cross-database join candidate detection.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/cross-db-join-inference

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-28T10:40:40Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

github-actions · 2026-04-28T10:41:14Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

packages/opencode/src/altimate/index.ts (1)
19-19: Avoid re-exporting the tool’s test-only internals.

export * here also exposes _altimateCoreDetectJoinCandidatesInternal from the tool module through the public Altimate barrel. Please export just AltimateCoreDetectJoinCandidatesTool here, or keep the test helper out of the module’s public exports.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/index.ts` at line 19, The barrel currently
re-exports everything from "./tools/altimate-core-detect-join-candidates", which
unintentionally exposes the test-only symbol
_altimateCoreDetectJoinCandidatesInternal; change the wildcard export to a named
export that only exports the public tool (AltimateCoreDetectJoinCandidatesTool)
so the internal helper is not leaked, i.e., replace the export * with an
explicit export of AltimateCoreDetectJoinCandidatesTool (or alternatively move
_altimateCoreDetectJoinCandidatesInternal out of the module's public exports).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`:
- Around line 256-261: The helper safeListSchemas currently swallows errors from
Connector.listSchemas and returns ["public"], hiding real failures; change
safeListSchemas to not silently fall back — remove the catch that returns
["public"] and instead let the original error propagate (or throw a new Error
with context) so upstream code (e.g., detect-join-candidates logic that calls
safeListSchemas) can record the failure in connection_errors and surface a
failed run rather than returning an empty candidate set.
- Around line 183-203: fetchColumnSamples currently constructs SQL with
hard-coded double-quoted identifiers and a trailing LIMIT which breaks on
several dialects and then swallows errors; update it to use the existing
dialect-aware quoting (e.g., reuse quoteIdentForDialect or the same logic from
data-diff.ts) for schema/table/column identifiers, replace the hard-coded LIMIT
with dialect-appropriate paging (TOP/OFFSET/FETCH or FETCH FIRST ... ROWS ONLY
depending on connector.dialect), and stop silently swallowing exceptions —
surface or log the connector error (throw or return a propagated error) so
callers know sampling failed; locate fetchColumnSamples and the helper
quoteIdentForDialect/quoteIdent usage to apply the changes.

In
`@packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts`:
- Around line 59-75: The current return always prints "Join candidates: X found"
even when detectJoinCandidates returned a failure; update the logic in the
function that builds the response (the block using result, error, candidates,
connectionErrors and calling formatCandidates) to first check result.success
and/or error and, if the call failed, return a failure response instead of
formatting candidates: set metadata.success = false, include the raw error
string in title and metadata.error, include connection_errors if present, and
set output to either an error-aware message or the original result.error rather
than formatCandidates(candidates, ...); otherwise (success) keep the existing
title/count and call formatCandidates as before. Ensure you reference result,
error, candidates, connectionErrors and formatCandidates in your change.
- Around line 51-58: The execute method in altimate-core-detect-join-candidates
currently calls Dispatcher.call("altimate_core.detect_join_candidates", ...)
without requesting user approval; add an explicit approval step using ctx.ask({
permission: "sql_execute_read", title: "...", description: "..." }) (matching
the other read-only warehouse tools) before invoking Dispatcher.call; if ctx.ask
is denied or returns falsy, abort/throw or return an appropriate error,
otherwise proceed to call Dispatcher.call with the same args; ensure this
approval lives inside the async execute(args, _ctx) function and surrounds the
call to Dispatcher.call so no SELECTs are issued before approval.

---

Nitpick comments:
In `@packages/opencode/src/altimate/index.ts`:
- Line 19: The barrel currently re-exports everything from
"./tools/altimate-core-detect-join-candidates", which unintentionally exposes
the test-only symbol _altimateCoreDetectJoinCandidatesInternal; change the
wildcard export to a named export that only exports the public tool
(AltimateCoreDetectJoinCandidatesTool) so the internal helper is not leaked,
i.e., replace the export * with an explicit export of
AltimateCoreDetectJoinCandidatesTool (or alternatively move
_altimateCoreDetectJoinCandidatesInternal out of the module's public exports).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d2afa6cf-1811-4eaa-98f4-31961064966e

📥 Commits

Reviewing files that changed from the base of the PR and between 941c8ca and d16a92f.

📒 Files selected for processing (8)

packages/opencode/src/altimate/index.ts
packages/opencode/src/altimate/native/connections/detect-join-candidates.ts
packages/opencode/src/altimate/native/connections/register.ts
packages/opencode/src/altimate/native/types.ts
packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts
packages/opencode/src/tool/registry.ts
packages/opencode/test/altimate/altimate-core-native.test.ts
packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts

cubic-dev-ai

4 issues found across 8 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:176">
P1: Dialect-specific identifier quoting is required; always using ANSI double quotes breaks MySQL/MariaDB when `ANSI_QUOTES` is not enabled.</violation>

<violation number="2" location="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:192">
P1: Do not hardcode `LIMIT` in the sampling SQL; rely on `connector.execute(..., sampleSize)` to apply dialect-specific limiting.</violation>

<violation number="3" location="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:260">
P2: Falling back to `["public"]` when `listSchemas()` fails silently produces zero candidates on databases where the default schema is not `public` (e.g., SQL Server uses `dbo`, Oracle uses the username). The error isn't surfaced in `connection_errors`, so the run appears successful with an empty result. Either propagate the error or use a dialect-aware default schema.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts:66">
P2: When the native call returns `{ success: false, error }` without throwing, this code path is still reached and renders `Join candidates: 0 found` / `No cross-DB join candidates detected`, hiding the actual error from the user. Add an early return for `!result.success` that surfaces the error message (similar to the catch block below that returns `title: "Join candidates: ERROR"`).</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

sahrizvi · 2026-04-28T12:22:01Z

Multi-Model Consensus Code Review — PR #761

Title: feat(sql): cross-DB join key inference via prefix/suffix overlap
Branch: feat/cross-db-join-inference
Net change: +875 / -2 across 8 files
New code: detect-join-candidates.ts (317 LOC) + altimate-core-detect-join-candidates.ts (125 LOC) + 385 LOC test file

Verdict: REQUEST CHANGES

Three independent reviewers (Claude, GPT, Gemini) flagged the SQL portability problem as a blocker; GPT additionally flagged a missing permission-gating step. Kimi gave a "conditional yes" but its concerns overlap with the major issues raised elsewhere. The pure algorithm and test architecture are unanimously praised; the warehouse-facing implementation is unanimously called out for not aligning with the repo's dialect-aware patterns.

The fixes are small and bounded — likely ~30 LOC plus a couple of tests — and worth doing before merge so the tool doesn't ship as silently-broken on MySQL/SQL Server/Oracle/BigQuery.

CRITICAL

C1 — `LIMIT` + ANSI double-quote identifier quoting are not portable across the supported drivers

Reviewers: Claude, GPT, Gemini, Kimi (4/4 active reviewers — Consensus)
Category: Bug / Portability
Location: packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:175–177 (quoteIdent), lines 190–194 (fetchColumnSamples)
Why it matters: The handler builds:

const sql = `SELECT ${col} FROM ${target} WHERE ${col} IS NOT NULL LIMIT ${sampleSize}`

Both ingredients break on warehouses this repo ships drivers for:

LIMIT N is invalid on SQL Server (uses TOP/OFFSET ... FETCH NEXT) and pre-12c Oracle (uses ROWNUM/FETCH FIRST).
Double-quoted identifiers are parsed as string literals by MySQL/MariaDB in default mode (without ANSI_QUOTES); Databricks/Spark and BigQuery use backticks; SQL Server uses brackets.

Combined with the bare catch {} in fetchColumnSamples (line 201), the failure mode is silent: the user gets success: true, candidates: [], bags_scanned: 0 and no signal that every sample query threw.

Suggested fix: Drop the inline LIMIT and let the existing driver contract apply row-capping (Connector.execute(sql, sampleSize) already does this — see packages/drivers/src/sqlite.ts:25–47). For identifier quoting, either delegate to a dialect-aware helper (the repo has dialect-aware quoting in data-diff) or branch on Registry.getConfig(name)?.type. GPT supplied a concrete patch:

const warehouseType = Registry.getConfig(name)?.type ?? "unknown"
const quotedTarget = [schema, table].filter(Boolean)
  .map(p => quoteIdentForDialect(p, warehouseTypeToDialect(warehouseType)))
  .join(".")
const quotedCol = quoteIdentForDialect(column, warehouseTypeToDialect(warehouseType))
const sql = `SELECT ${quotedCol} FROM ${quotedTarget} WHERE ${quotedCol} IS NOT NULL`
const result = await connector.execute(sql, sampleSize)

MAJOR

M1 — No permission gating on warehouse reads (policy bypass)

Reviewers: GPT (Unique, but high-impact)
Category: Security / Design
Location: packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts:51
Why it matters: This tool issues many SELECT reads across every string column of every scanned table on every configured connection — but it never goes through the sql_execute_read permission flow that data_diff uses for less-invasive operations. Even though raw values aren't echoed back to the user, they are pulled from the warehouse and held in memory. This is a policy-consistency gap that the security review will catch downstream if not now.
Suggested fix: Gate execution with ctx.ask({ permission: "sql_execute_read", ... }) before the dispatcher call, and add a tool test asserting the permission request occurs.

M2 — `safeListSchemas` silent fallback to `["public"]` masks real failures

Reviewers: Claude, GPT, Gemini (3/4 — Consensus)
Category: Logic Error / Observability
Location: detect-join-candidates.ts:256–262
Why it matters: If listSchemas() fails, the helper returns ["public"]. That is wrong for Snowflake (PUBLIC uppercase, per-database scoping), wrong for BigQuery (no public concept), wrong for Oracle, ClickHouse, MySQL. The user gets an empty result with no entry in connection_errors — the silent-failure mode again.
Suggested fix: If listSchemas() fails and the caller didn't pass schema_name, record the connection-level error and skip the connection:

let schemas: string[]
try {
  schemas = params.schema_name ? [params.schema_name] : await connector.listSchemas()
} catch (e) {
  errors[name] = `Failed to list schemas: ${String(e)}`
  continue
}

M3 — Per-table sample failures swallowed without any reporting

Reviewers: Claude, GPT (2/4 — Consensus, also implicit in Kimi's commentary)
Category: Design / Observability
Location: detect-join-candidates.ts:201–203 (fetchColumnSamples), lines 256–286 (other safe* helpers)
Why it matters: The PR description claims "per-table failures don't kill the scan; per-connection failures are returned in connection_errors" — but there is no surface for per-table or per-column errors. A warehouse user who can list tables but lacks SELECT on one of them gets a silently-skipped table. Combined with C1+M2, the user has no signal that anything went wrong.
Suggested fix: Add a counter (or a small bounded list) of swept errors to the result envelope, even if just swept_errors: number.

M4 — Tool wrapper mishandles non-throwing native failures

Reviewers: GPT (Unique)
Category: Bug
Location: altimate-core-detect-join-candidates.ts:59–75
Why it matters: If the dispatcher returns { success: false, error: ... } (which happens when the connection-count guard fires at line 295–301 of the native handler), the tool wrapper still renders Join candidates: 0 found / No cross-DB join candidates detected. That presents an operational failure as a clean negative result. Existing tests cover the thrown-error path but not the success: false envelope.
Suggested fix:

if (!result.success || error) {
  return {
    title: "Join candidates: FAILED",
    metadata: { success: false, candidate_count: 0, ...(error && { error }) },
    output: `Failed to detect join candidates: ${error ?? "Unknown error"}`,
  }
}

M5 — `confidence` is a misleading field name

Reviewers: Claude (Unique)
Category: Design / Documentation
Location: detect-join-candidates.ts:88–93, line 124
Why it matters: confidence = overlap / min(|lsuf|, |rsuf|). With sample_size: 50, two columns whose 50 sampled values happen to share a suffix yield confidence = 1.0 — even if the underlying tables are mostly disjoint. The doc-comment notes "not a probability" but the field name in the JSON output is confidence, which an LLM consumer will read as "very likely correct."
Suggested fix: Rename to sample_overlap_ratio (consistent with suffix_overlap naming) and propagate to the formatter.

M6 — Sample collection is fully serial across connections, schemas, tables, columns

Reviewers: Gemini (Unique)
Category: Performance
Location: detect-join-candidates.ts:220–251
Why it matters: Every await in collectSampleBags is sequential. With defaults (50 tables × N string columns × M connections), total latency = sum of all per-column queries. For a real two-warehouse scan, this is many seconds to minutes.
Suggested fix: Parallelize at the connection level with Promise.all; consider chunked parallelism within a connection (e.g., p-limit-style) to avoid hammering a single warehouse.

M7 — No bound on column count or sample-size, allowing memory/time blowups

Reviewers: Kimi (Unique, valid)
Category: Code Quality / Performance
Location: altimate-core-detect-join-candidates.ts:38–49 (Zod schema)
Why it matters: sample_size and max_tables_per_connection accept any positive integer. sample_size: 1000000 ⇒ allocate sets of 1M strings × N columns × M connections. The Promise.all parallelization (M6) makes this worse.
Suggested fix:

sample_size: z.number().int().positive().max(1000).optional()
max_tables_per_connection: z.number().int().positive().max(500).optional()

MINOR

MI1 — `register.ts:634` registers handler with implicit `any` params

Reviewers: Claude (Unique)
Location: register.ts:634
Every other register() call in the file types its params. Add the import + type annotation:

register("altimate_core.detect_join_candidates",
  async (params: AltimateCoreDetectJoinCandidatesParams) => detectJoinCandidates(params))

MI2 — `STRING_TYPE_PATTERN` may miss/include real types

Reviewers: Claude, Kimi
Location: detect-join-candidates.ts:163
ClickHouse FixedString(8) is not anchored against string (the ^ anchor will fail). Snowflake STRING covered. Postgres xml not in pattern (good — it's not text-y in this sense). Worth a unit test for the type filter to pin behavior.

MI3 — No exclusion of obviously-non-key string columns

Reviewers: Kimi (Unique)
Location: detect-join-candidates.ts:234–244
Wide tables expose description, notes, email, address etc. — all string-typed but not realistic join keys. Suggested skip pattern:

const SKIP_COLUMN_PATTERNS = /^(description|notes?|comments?|email|phone|address|created_at|updated_at|deleted_at)$/i

MI4 — No deduplication of overlapping candidates within a connection pair

Reviewers: Kimi (Unique)
Location: detect-join-candidates.ts:100–145
If both DBs have id1 and id2 columns, you get N×M candidates with identical or near-identical suffix sets. Either dedupe by suffix-set hash or document the behavior.

MI5 — `table` field encodes `schema.table` as a single string

Reviewers: Kimi (Unique)
Location: detect-join-candidates.ts:244
bag.table = ${schema}.${t.name}`` forces consumers to parse. Add an explicit schema field to `ColumnSampleBag` and `JoinCandidate`. Low-impact but improves the public type contract.

MI6 — `formatCandidates` may produce thousands of lines on real workloads

Reviewers: Gemini (Unique)
Location: altimate-core-detect-join-candidates.ts:105–115
Truncate human-readable output to top ~20–30 candidates; keep the full count in metadata.candidate_count.

MI7 — Test stub references `ALTIMATE_TELEMETRY_DISABLED` but handler doesn't emit telemetry

Reviewers: Claude (Unique)
Location: test file lines 30–36
The beforeEach sets ALTIMATE_TELEMETRY_DISABLED=true but detectJoinCandidates doesn't call Telemetry.track. Either wire telemetry in (precedent: register.ts:393–406 for sql.execute) or remove the dead env-var setup.

MI8 — `tablesScanned` cap doesn't surface truncation

Reviewers: Claude (Unique)
Location: detect-join-candidates.ts:226–247
When max_tables_per_connection fires mid-scan, the response has no field telling the caller "we hit the cap." Add truncated: boolean or tables_skipped: number to bags_scanned-adjacent metadata.

NIT

N1 (Claude, Kimi): The doc comment at detect-join-candidates.ts:17 references "dab_bench's preindexer" — internal codename leaks into a public-looking source file. Either inline the algorithm summary or remove the breadcrumb.
N2 (Claude): Unicode ↔ in the formatter output may render badly in some CI logs. Consider <->.
N3 (Kimi): Double-wrapped error handling — both the native handler and the tool wrapper catch and stringify exceptions; can produce nested Failed: ... prefixes.

Positive observations (Consensus)

Pure-vs-IO split is excellent. commonPrefix / stripPrefixSet / detectJoinCandidatesFromBags are side-effect-free and exported. Praised by every reviewer.
Test architecture is genuinely good. Three layers (pure unit, SQLite-backed integration, tool surface) plus dispatcher-throws path. ~21 tests for ~317 LOC of new logic.
altimate_change markers correctly applied in every modified upstream-shared file.
Zod validation at the tool boundary plus a defensive re-check in the native handler — belt-and-braces, appropriate for an LLM-callable surface.
Type contract is registered in BridgeMethods and reuses the generic AltimateCoreResult envelope. No schema sprawl.
Graceful degradation on per-connection failures (when they actually surface) is the right shape for a multi-warehouse tool.

Missing tests / edge cases (Consensus)

Non-ANSI dialect path. No test issues SQL the way an MSSQL/MySQL/Oracle driver would receive it. Add at least one test that proves the SQL is portable, or accept the C1 fix and prove driver-side limit handling works.
max_tables_per_connection cap actually applied. The cap can be removed without any test failing.
schema_name filter. Branch at lines 223–225 is uncovered.
Permission-gating (if M1 is accepted).
success: false envelope at the tool layer (M4).
String-type filter. Insert an INTEGER amount column (the SQLite fixture already has one at line 239) and assert it's not in the bags.
Empty-suffix / whitespace-only values (Kimi).
Case sensitivity — e.g. BusinessID_ vs businessid_ (Kimi).

Attribution

Finding	Severity	Reviewers
C1 SQL portability (LIMIT + quoting)	CRITICAL	Claude, GPT, Gemini, Kimi
M1 No permission gating	MAJOR	GPT
M2 `["public"]` schema fallback	MAJOR	Claude, GPT, Gemini
M3 Silent per-table errors	MAJOR	Claude, GPT
M4 `success: false` mishandled	MAJOR	GPT
M5 `confidence` misleading	MAJOR	Claude
M6 Serial sample collection	MAJOR	Gemini
M7 No upper bounds on params	MAJOR	Kimi
MI1 Implicit `any` register	MINOR	Claude
MI2 String-type pattern coverage	MINOR	Claude, Kimi
MI3 No column exclusion list	MINOR	Kimi
MI4 No candidate dedup	MINOR	Kimi
MI5 schema/table compound key	MINOR	Kimi
MI6 Output truncation	MINOR	Gemini
MI7 Dead telemetry env-var	MINOR	Claude
MI8 Cap-truncation not signaled	MINOR	Claude
N1 dab_bench codename leak	NIT	Claude, Kimi
N2 Unicode separator	NIT	Claude
N3 Double-wrapped errors	NIT	Kimi

Disagreements

Kimi's overall verdict was "conditional yes — fix chore(deps): Bump minimatch from 10.0.3 to 10.2.3 in /packages/altimate-code #1, chore(deps): Bump @ai-sdk/xai from 2.0.51 to 3.0.60 #4, chore(deps): Bump glob from 13.0.5 to 13.0.6 #8 before merge" while Claude / GPT / Gemini lean "request changes" (with C1 as blocker). The disagreement reduces to whether SQL portability is a release blocker vs. a follow-up: GPT and Claude argue it ships visibly broken on multiple supported warehouses (silent zero-result), so blocker. Kimi assumed dialect parity would be sorted post-merge.
Performance framing differs: Gemini emphasizes the serial-I/O latency (M6); Kimi emphasizes algorithmic worst-case in commonPrefix (long values); Claude emphasizes the unbounded bag count (M7-related). All are valid; the bounded-fix is M7's Zod cap, the latency fix is M6.
commonPrefix long-string defensiveness (Kimi suggested a 100-char early-exit) vs. leave as-is (Claude's view): the values are sample-bounded by sample_size <= 50 by default, so the worst case is small. Apply only if M7 doesn't add the upper bound.

Footer

Reviewed by 8 participants: Claude + GPT 5.4 Codex + Gemini 3.1 Pro + Kimi K2.5 + MiniMax M2.7 + GLM-5.1 + Qwen 3.6 + MiMo V2 Pro.

Active responses: 4/8 — Claude, GPT (full), Gemini (partial — Gemini API quota exhaustion mid-run, but produced a coherent review), Kimi (full).

Failed/timed-out: 4/8 — GLM-5.1, Qwen 3.6, MiniMax M2.7, MiMo V2 Pro all blocked by a kilo SQLite database lock (a separate consensus run on PR #762 held the DB lock; running these models sequentially still hit the contention plus per-process startup races). The synthesis above represents the consensus of the 4 active reviewers.

This synthesis is single-pass (no convergence rounds) per the user's instruction.

Fixes the CRITICAL and MAJOR issues raised in the multi-model consensus review on PR #761. CRITICAL — SQL portability: - Reuse `quoteIdentForDialect` from `data-diff.ts` (now exported) so identifier quoting matches the per-dialect convention: backticks on MySQL/MariaDB/ClickHouse, square brackets on T-SQL/Fabric, ANSI double-quotes elsewhere. - Drop the hardcoded `LIMIT N` clause and pass the cap through `connector.execute(sql, sampleSize)` so each driver applies its native limit syntax (`LIMIT`, `TOP`, `FETCH FIRST`, ...). - Extract `buildSampleSql` as a pure helper so tests can snapshot the emitted SQL per dialect without going through I/O. MAJOR fixes: - Permission gating: tool wrapper now requests `sql_execute_read` via `ctx.ask()` before issuing any SELECT, matching `data_diff` / `sql_execute`. - Drop the unsafe `["public"]` schema fallback. When `listSchemas()` fails and no `schema_name` is provided, record the failure as a connection-level error and skip the connection rather than silently scanning the wrong schema. - Surface per-table / per-column sampling failures via a bounded `partial_errors: Record<string, string[]>` field instead of swallowing them. Format includes errors in the human-readable output. - Tool wrapper now returns a `FAILED` envelope when the dispatcher returns `{ success: false, error }` (e.g. on the connection-count guard) instead of rendering "0 found". - Rename `confidence` → `match_score` with an updated JSDoc that calls out it's a heuristic ranking signal (`overlap / min(|left|, |right|)`), not a probability. - Parallelize `collectSampleBags` at the connection level via `Promise.all` — connections are independent and the per-connection cap keeps blast radius bounded. - Add Zod upper bounds: `connections` <= 16, `sample_size` <= 1000, `max_tables_per_connection` <= 500. Prevents an oversized LLM call from blowing memory or holding warehouse connections indefinitely. Minor fix: - Type the dispatcher registration with `AltimateCoreDetectJoinCandidatesParams` / `AltimateCoreResult` so the handler matches the rest of the file. Tests: - 14 new tests covering dialect-aware SQL emission (MySQL, T-SQL, Fabric, Postgres, Snowflake, generic), absence of inlined LIMIT, delimiter escaping, the `success: false` envelope path, the permission-gating call, the Zod upper-bound rejections, the `partial_errors` collection, and the no-fallback `listSchemas` behaviour. Updates existing tests for the `confidence` → `match_score` rename. Final tally: 35 passing tests, 0 failing. Defers (per consensus-review tags): all MINOR/NIT findings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-28T15:14:45Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

github-actions · 2026-04-28T15:15:33Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

github-actions · 2026-04-28T15:16:31Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

sahrizvi · 2026-04-28T15:16:31Z

Update: addressed-vs-pending from the consensus review

Pushed 80758f5c2 on top of the original branch.

Addressed (this commit)

ID	Severity	Issue	Resolution
C1	CRITICAL	SQL portability — `LIMIT` + ANSI quoting break MySQL/SQL Server/Oracle/Databricks/BigQuery	Reuse exported `quoteIdentForDialect` from `data-diff.ts` (backticks/brackets/ANSI per dialect); drop inlined `LIMIT N`; delegate row cap to `connector.execute(sql, sampleSize)` so each driver applies its native limit syntax. Extracted `buildSampleSql` as a pure helper.
M1	MAJOR	Permission gating bypass	Tool wrapper now calls `ctx.ask({permission:"sql_execute_read", ...})` before any SELECT, mirroring `data_diff` / `sql_execute`.
M2	MAJOR	`["public"]` schema fallback is unsafe	Removed `safeListSchemas → ["public"]` fallback. `listSchemas` failure now surfaces as a connection-level error and the connection is skipped.
M3	MAJOR	Silent per-table errors	Per-table / per-column sampling failures collected in bounded `partial_errors: Record<string, string[]>` (cap 20/connection); surfaced in tool output and metadata.
M4	MAJOR	`success: false` mishandling	Tool wrapper returns `title: "Join candidates: FAILED"` envelope when dispatcher returns `{success:false, error}`.
M5	MAJOR	Misleading `confidence` field name	Renamed to `match_score`, JSDoc updated to call out it's a heuristic ranking signal, not a probability. Note this is a pre-merge rename — no aliased back-compat field.
M6	MAJOR	Serial I/O across connections	`collectSampleBags` parallelizes at the connection level via `Promise.all`. Per-connection cap still bounds blast radius.
M7	MAJOR	No Zod upper bounds	Added `.max()` caps: `connections ≤ 16`, `sample_size ≤ 1000`, `max_tables_per_connection ≤ 500`.
MI1	MINOR	(drive-by) Untyped dispatcher registration	Typed with `AltimateCoreDetectJoinCandidatesParams` / `AltimateCoreResult`.

Tests: 35 pass / 0 fail in the targeted file (was 21; +14 new for dialect SQL emission across MySQL/T-SQL/Fabric/Postgres/Snowflake/generic, no-LIMIT inlining, delimiter escaping, permission gating, success: false envelope, partial_errors collection, no-public fallback, Zod upper-bound rejection). Sibling regression: 2,886 pass. Typecheck clean.

Deferred (per consensus tags)

All MINOR / NIT findings — MI2 (string-type pattern coverage), MI3 (column-name skip list), MI4 (candidate dedup), MI5 (compound schema.table field), MI6 (output truncation), MI7 (dead telemetry env-var), MI8 (cap-truncation signal), N1 (codename leak), N2 (unicode), N3 (double-wrap). Also deferred CodeRabbit's index.ts barrel cleanup since it wasn't in the consensus list.

Demoted as invalid: none from this PR (no Kimi-style misreads).

Heads-up

quoteIdentForDialect is now exported from data-diff.ts. Inside the package it's only consumed by detect-join-candidates.ts today. Tool API changes from this round:

confidence → match_score (rename, not aliased — pre-merge so no installed users yet)
New partial_errors field on the result envelope
First invocation triggers a sql_execute_read permission prompt; non-interactive harnesses must auto-approve

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opencode/src/altimate/native/connections/data-diff.ts (1)
42-56: ⚠️ Potential issue | 🟠 Major

Add an explicit BigQuery branch here.

detect-join-candidates now reuses this helper for sampling SQL, but the fallback branch still emits ANSI-style "identifier" quoting. BigQuery does not use that quoting mode for identifiers, so the new feature will still generate invalid SQL there whenever a table/column name needs quoting.
In BigQuery Standard SQL, what quoting syntax is required for quoted identifiers, and are double quotes valid identifier quotes?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines
42 - 56, The fallback ANSI double-quote branch emits invalid identifier quoting
for BigQuery; update quoteIdentForDialect to add an explicit "bigquery" case
that quotes identifiers with backticks (same style as the
mysql/mariadb/clickhouse branch) and properly escapes any backticks inside the
identifier (e.g., identifier.replace(/`/g, "``")), so detect-join-candidates
generates valid BigQuery SQL; modify the switch in quoteIdentForDialect to
include case "bigquery": return `\`${identifier.replace(/`/g, "``")}\``.

🧹 Nitpick comments (1)

packages/opencode/src/altimate/native/connections/detect-join-candidates.ts (1)
279-293: Surface when the table cap truncates a scan.

Once tablesScanned hits max_tables_per_connection, sampling just stops and the response still looks complete. Returning per-connection metadata like truncated, tables_scanned, or tables_skipped would make zero/low-result scans much easier to interpret.

Also applies to: 388-395
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`
around lines 279 - 293, The scanning loop that uses tablesScanned and maxTables
(in detect-join-candidates.ts around the for (const schema of schemas) loop and
the similar loop at the later block) currently stops silently when tablesScanned
>= maxTables; update the per-connection result object returned by this code to
include metadata fields such as truncated (boolean), tables_scanned (number),
and tables_skipped (number) and set truncated = true whenever you break out due
to the cap; increment tables_scanned as you already do, compute tables_skipped
as the remaining tables not scanned in this connector/schema, and ensure the
same changes are applied to the second loop (the block around lines 388-395) and
any place that returns connector scan results so callers can see when scans were
truncated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`:
- Around line 150-156: stripPrefixSet currently adds suffixes that are only
whitespace because it checks suf.length > 0; update stripPrefixSet(values,
prefix) to trim whitespace before deciding and storing: compute suf =
v.slice(prefix.length), then let trimmed = suf.trim() and only add trimmed to
the Set if trimmed.length > 0 (store trimmed, not the raw suf). This ensures
whitespace-only suffixes like " " or "\t" are ignored when building the returned
Set.

---

Duplicate comments:
In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 42-56: The fallback ANSI double-quote branch emits invalid
identifier quoting for BigQuery; update quoteIdentForDialect to add an explicit
"bigquery" case that quotes identifiers with backticks (same style as the
mysql/mariadb/clickhouse branch) and properly escapes any backticks inside the
identifier (e.g., identifier.replace(/`/g, "``")), so detect-join-candidates
generates valid BigQuery SQL; modify the switch in quoteIdentForDialect to
include case "bigquery": return `\`${identifier.replace(/`/g, "``")}\``.

---

Nitpick comments:
In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`:
- Around line 279-293: The scanning loop that uses tablesScanned and maxTables
(in detect-join-candidates.ts around the for (const schema of schemas) loop and
the similar loop at the later block) currently stops silently when tablesScanned
>= maxTables; update the per-connection result object returned by this code to
include metadata fields such as truncated (boolean), tables_scanned (number),
and tables_skipped (number) and set truncated = true whenever you break out due
to the cap; increment tables_scanned as you already do, compute tables_skipped
as the remaining tables not scanned in this connector/schema, and ensure the
same changes are applied to the second loop (the block around lines 388-395) and
any place that returns connector scan results so callers can see when scans were
truncated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 245f5ad4-089b-4383-b8e8-5300dc34628e

📥 Commits

Reviewing files that changed from the base of the PR and between d16a92f and 80758f5.

📒 Files selected for processing (5)

packages/opencode/src/altimate/native/connections/data-diff.ts
packages/opencode/src/altimate/native/connections/detect-join-candidates.ts
packages/opencode/src/altimate/native/connections/register.ts
packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts
packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts
packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts

coderabbitai · 2026-04-28T15:22:30Z

+function stripPrefixSet(values: readonly string[], prefix: string): Set<string> {
+  const out = new Set<string>()
+  for (const v of values) {
+    if (typeof v === "string" && v.startsWith(prefix)) {
+      const suf = v.slice(prefix.length)
+      if (suf.length > 0) out.add(suf)
+    }


⚠️ Potential issue | 🟡 Minor

Ignore whitespace-only suffixes before scoring.

stripPrefixSet() currently treats " " / "\t" as real suffixes because it only checks length > 0. That can produce suffix_overlap > 0 for columns that don't actually contain a usable join token.

Suggested fix

function stripPrefixSet(values: readonly string[], prefix: string): Set<string> { const out = new Set<string>() for (const v of values) { if (typeof v === "string" && v.startsWith(prefix)) { const suf = v.slice(prefix.length) - if (suf.length > 0) out.add(suf) + if (suf.trim().length > 0) out.add(suf) } } return out }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function stripPrefixSet(values: readonly string[], prefix: string): Set<string> {

const out = new Set<string>()

for (const v of values) {

if (typeof v === "string" && v.startsWith(prefix)) {

const suf = v.slice(prefix.length)

if (suf.length > 0) out.add(suf)

}

function stripPrefixSet(values: readonly string[], prefix: string): Set<string> {

const out = new Set<string>()

for (const v of values) {

if (typeof v === "string" && v.startsWith(prefix)) {

const suf = v.slice(prefix.length)

if (suf.trim().length > 0) out.add(suf)

}

}

return out

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts` around lines 150 - 156, stripPrefixSet currently adds suffixes that are only whitespace because it checks suf.length > 0; update stripPrefixSet(values, prefix) to trim whitespace before deciding and storing: compute suf = v.slice(prefix.length), then let trimmed = suf.trim() and only add trimmed to the Set if trimmed.length > 0 (store trimmed, not the raw suf). This ensures whitespace-only suffixes like " " or "\t" are ignored when building the returned Set.

dev-punia-altimate · 2026-04-30T16:46:52Z

✅ Tests — All Passed

TypeScript — passed

cc @sahrizvi

Haider added 2 commits April 28, 2026 13:32

test: bump altimate_core method count to 35 for detect_join_candidates

d16a92f

claude Bot reviewed Apr 28, 2026

View reviewed changes

github-actions Bot added the needs-review:blocked label Apr 28, 2026

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Apr 28, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Conversation

sahrizvi commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What's added

Tests

Backwards compatibility

Files

Summary by cubic

Summary by CodeRabbit

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sahrizvi commented Apr 28, 2026

Multi-Model Consensus Code Review — PR #761

Verdict: REQUEST CHANGES

CRITICAL

C1 — LIMIT + ANSI double-quote identifier quoting are not portable across the supported drivers

MAJOR

M1 — No permission gating on warehouse reads (policy bypass)

M2 — safeListSchemas silent fallback to ["public"] masks real failures

M3 — Per-table sample failures swallowed without any reporting

M4 — Tool wrapper mishandles non-throwing native failures

M5 — confidence is a misleading field name

M6 — Sample collection is fully serial across connections, schemas, tables, columns

M7 — No bound on column count or sample-size, allowing memory/time blowups

MINOR

MI1 — register.ts:634 registers handler with implicit any params

MI2 — STRING_TYPE_PATTERN may miss/include real types

MI3 — No exclusion of obviously-non-key string columns

MI4 — No deduplication of overlapping candidates within a connection pair

MI5 — table field encodes schema.table as a single string

MI6 — formatCandidates may produce thousands of lines on real workloads

MI7 — Test stub references ALTIMATE_TELEMETRY_DISABLED but handler doesn't emit telemetry

MI8 — tablesScanned cap doesn't surface truncation

NIT

Positive observations (Consensus)

Missing tests / edge cases (Consensus)

Attribution

Disagreements

Footer

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

sahrizvi commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

C1 — `LIMIT` + ANSI double-quote identifier quoting are not portable across the supported drivers

M2 — `safeListSchemas` silent fallback to `["public"]` masks real failures

M5 — `confidence` is a misleading field name

MI1 — `register.ts:634` registers handler with implicit `any` params

MI2 — `STRING_TYPE_PATTERN` may miss/include real types

MI5 — `table` field encodes `schema.table` as a single string

MI6 — `formatCandidates` may produce thousands of lines on real workloads

MI7 — Test stub references `ALTIMATE_TELEMETRY_DISABLED` but handler doesn't emit telemetry

MI8 — `tablesScanned` cap doesn't surface truncation