Skip to content

feat(sql): cross-DB join key inference via prefix/suffix overlap#761

Open
sahrizvi wants to merge 3 commits intomainfrom
feat/cross-db-join-inference
Open

feat(sql): cross-DB join key inference via prefix/suffix overlap#761
sahrizvi wants to merge 3 commits intomainfrom
feat/cross-db-join-inference

Conversation

@sahrizvi
Copy link
Copy Markdown

@sahrizvi sahrizvi commented Apr 28, 2026

Closes #758

Motivation

When altimate-code is connected to multiple warehouses, columns that point at the same entity often have mismatched naming or formatting — e.g., one DB stores cust_42 while another stores 42, or one warehouse uses businessid_X and another uses businessref_X. There's currently no built-in way to surface candidate joins across DBs; the agent has to discover them per-pair, which costs tokens and is brittle.

Common situations where this lands:

  • Post-acquisition system integration (acquired company's ID format ≠ parent's)
  • Multi-vendor analytics (Salesforce AccountId ↔ Hubspot company_id ↔ internal account_external_id)
  • Migrations leaving prefix differences (v1_user_123 vs user_123)
  • Per-source ingestion with system-specific ID schemes

What's added

A new tool altimate_core.detect_join_candidates that:

  1. Pulls ~50 sample values per string-typed column from each configured warehouse via the existing connector.execute plumbing.
  2. Computes the longest common prefix per column, walking back to the last _/-/: separator (so partial-name matches don't count).
  3. Compares prefix-stripped suffixes pairwise across DBs; emits a candidate when both columns have non-empty distinct prefixes AND share at least one suffix.
  4. Returns ranked candidates: {left_db, left_table, left_col, right_db, right_table, right_col, prefix_rule, suffix_overlap, confidence} where confidence = overlap / min(|left_suffixes|, |right_suffixes|) (cheap and monotonic, deliberately not a probability).

Errors are scoped: per-table sample failures don't kill the scan; per-connection failures are returned in connection_errors. Identifier quoting is ANSI-safe (double quotes). The STRING_TYPE_PATTERN filter restricts sampling to text-like columns (varchar/text/char/string/uuid/json) — numeric/temporal columns are skipped.

Tests

21 new tests in test/altimate/tools/altimate-core-detect-join-candidates.test.ts:

  • commonPrefix LCP logic + separator handling (_/-/:) + no-separator rejection + non-string defensive paths
  • detectJoinCandidatesFromBags: canonical pattern, same-DB rejection, identical-prefix rejection, zero-overlap rejection, no-separator rejection, ranking by overlap/confidence, full N*(N-1)/2 fan-out
  • Integration: two real bun:sqlite :memory: DBs holding 10 businessid_X rows + 8 businessref_X rows (with intentional non-overlapping outliers), driven through the actual native handler with Registry.get stubbed
  • Tool surface: title/output formatting, empty-list rendering, connection-error rendering, dispatcher-throws envelope

Results:

The existing altimate-core-native.test.ts count assertion was updated (34 → 35) to reflect the new method, with altimate_core.detect_join_candidates added to the canonical ALL_METHODS list.

Backwards compatibility

Pure addition. No existing behaviour changes; no existing tests modified beyond the count update.

Files

  • packages/opencode/src/altimate/native/connections/detect-join-candidates.ts (new — 317 LOC)
  • packages/opencode/src/altimate/native/connections/register.ts (handler registration)
  • packages/opencode/src/altimate/native/types.ts (params + result types)
  • packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts (new tool wrapper)
  • packages/opencode/src/altimate/index.ts (barrel export)
  • packages/opencode/src/tool/registry.ts (tool registration)
  • packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts (new — 385 LOC)
  • packages/opencode/test/altimate/altimate-core-native.test.ts (count assertion update)

Summary by cubic

Adds cross-DB join key inference that finds candidate joins by stripping distinct prefixes and matching suffixes across warehouses. Now uses dialect-aware sampling SQL, stricter permission/error handling, and safer input limits.

  • New Features
    • Exposes altimate_core.detect_join_candidates and the altimate_core_detect_join_candidates tool; registered and exported.
    • Builds sampling SQL with dialect-aware quoting via quoteIdentForDialect; no inline LIMITs (caps passed to drivers).
    • Prefix logic trims to _/-/:; prefixes must differ; emits on suffix overlap; ranks by suffix_overlap, then match_score = overlap / min(left, right).
    • Samples ~50 string-like values/column; scans up to 50 tables/connection; optional schema_name; cross-DB only; skips non-string columns.
    • Robustness: gated by sql_execute_read permission; parallel per-connection scans; returns connection_errors and bounded partial_errors; input caps (connections ≤ 16, sample_size ≤ 1000, max_tables_per_connection ≤ 500).

Written for commit 80758f5. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • New Features
    • Added an Altimate tool to detect and rank cross-database join candidates with sampling, dialect-aware SQL handling, and detailed error reporting.
  • Chores
    • Exposed the new detection capability through the native bridge and tool registry for discovery and use.
  • Tests
    • Added comprehensive unit and integration tests covering detection logic, sampling, dialect quoting, and tool execution paths.

Haider added 2 commits April 28, 2026 13:32
Adds the `altimate_core_detect_join_candidates` tool: given two or more
warehouse connection names, pull a small bag of string sample values per
(table, column), then for every cross-DB pair compute the longest common
value prefix on each side. When both prefixes end in `_`, `-`, or `:`,
differ from each other, and leave at least one matching suffix after
stripping, emit a ranked join candidate.

This targets the canonical pattern where one DB stores `businessid_42`
and another stores `businessref_42` — the inference is purely value-based
so it survives schemas that disagree on naming conventions.

- Algorithm in `native/connections/detect-join-candidates.ts` (port of
  dab_bench's `_detect_join_candidates` / `_common_prefix`).
- Native handler registered as `altimate_core.detect_join_candidates`,
  ranked by suffix overlap then confidence.
- Tool wrapper, registry entry, and barrel export.
- Tests: 21 cases covering the prefix walk-back, suffix overlap, ranking,
  and an integration test against two `bun:sqlite` `:memory:` DBs holding
  the canonical `businessid_X` ↔ `businessref_X` pattern.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

Adds an end-to-end cross-database join-candidate detection feature: sampling string-like column values from multiple connections, inferring separator-aware prefix/suffix join patterns, exposing a native handler, a tool wrapper, types, registry export, and tests.

Changes

Cohort / File(s) Summary
Core Detection Logic
packages/opencode/src/altimate/native/connections/detect-join-candidates.ts
New native module implementing sampling, separator-aware prefix/suffix inference, candidate ranking, and the detectJoinCandidates entrypoint.
Types & Dispatcher Registration
packages/opencode/src/altimate/native/types.ts, packages/opencode/src/altimate/native/connections/register.ts
Adds AltimateCoreDetectJoinCandidatesParams and registers altimate_core.detect_join_candidates handler.
Tool Layer
packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts
New tool wrapper altimate_core_detect_join_candidates that requests permission, calls dispatcher, formats results, and exposes internals for tests.
Barrel Export & Registry
packages/opencode/src/altimate/index.ts, packages/opencode/src/tool/registry.ts
Re-exports the new tool and adds it to the tool registry list.
Identifier Quoting Helper
packages/opencode/src/altimate/native/connections/data-diff.ts
Made quoteIdentForDialect exported so other native modules reuse dialect-aware identifier quoting.
Tests
packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts, packages/opencode/test/altimate/altimate-core-native.test.ts
Adds comprehensive unit/integration tests for detection logic, sampling SQL, end-to-end sampling with in-memory DBs, tool execution envelopes, and updates native-method count assertions.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Tool as AltimateCoreDetectJoinCandidatesTool
    participant Dispatcher
    participant Detection as detectJoinCandidates
    participant Sampler as collectSampleBags
    participant DB1 as Connection1
    participant DB2 as Connection2
    participant Inference as detectJoinCandidatesFromBags

    Client->>Tool: execute(params)
    Tool->>Dispatcher: call("altimate_core.detect_join_candidates", params)
    Dispatcher->>Detection: detectJoinCandidates(params)
    Detection->>Sampler: collectSampleBags(params)
    par Parallel Sampling
        Sampler->>DB1: sample non-null string columns
        DB1-->>Sampler: sample values
        Sampler->>DB2: sample non-null string columns
        DB2-->>Sampler: sample values
    end
    Sampler-->>Detection: { bags, partialErrors, connectionErrors }
    Detection->>Inference: detectJoinCandidatesFromBags(bags)
    Inference-->>Detection: sorted JoinCandidate[]
    Detection-->>Dispatcher: { success, candidates, errors, partialErrors }
    Dispatcher-->>Tool: AltimateCoreResult
    Tool->>Client: formatted output + metadata
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

contributor

Poem

🐰 Hops and hops, I sample strings by night,
Prefixes trimmed, suffixes taking flight.
Candidates lined up, ranked in a row,
Cross-DB joins now wink and glow.
A little rabbit cheers: "Let queries go!"

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is missing the required 'PINEAPPLE' identifier at the top, as mandated by the template for AI-generated contributions. Add 'PINEAPPLE' as the very first line of the PR description, before any other content, per the template requirement for AI-generated content.
Docstring Coverage ⚠️ Warning Docstring coverage is 64.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat(sql): cross-DB join key inference via prefix/suffix overlap' is concise, specific, and accurately summarizes the primary change—introducing cross-database join candidate detection.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cross-db-join-inference

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

1 similar comment
@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
packages/opencode/src/altimate/index.ts (1)

19-19: Avoid re-exporting the tool’s test-only internals.

export * here also exposes _altimateCoreDetectJoinCandidatesInternal from the tool module through the public Altimate barrel. Please export just AltimateCoreDetectJoinCandidatesTool here, or keep the test helper out of the module’s public exports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/index.ts` at line 19, The barrel currently
re-exports everything from "./tools/altimate-core-detect-join-candidates", which
unintentionally exposes the test-only symbol
_altimateCoreDetectJoinCandidatesInternal; change the wildcard export to a named
export that only exports the public tool (AltimateCoreDetectJoinCandidatesTool)
so the internal helper is not leaked, i.e., replace the export * with an
explicit export of AltimateCoreDetectJoinCandidatesTool (or alternatively move
_altimateCoreDetectJoinCandidatesInternal out of the module's public exports).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`:
- Around line 256-261: The helper safeListSchemas currently swallows errors from
Connector.listSchemas and returns ["public"], hiding real failures; change
safeListSchemas to not silently fall back — remove the catch that returns
["public"] and instead let the original error propagate (or throw a new Error
with context) so upstream code (e.g., detect-join-candidates logic that calls
safeListSchemas) can record the failure in connection_errors and surface a
failed run rather than returning an empty candidate set.
- Around line 183-203: fetchColumnSamples currently constructs SQL with
hard-coded double-quoted identifiers and a trailing LIMIT which breaks on
several dialects and then swallows errors; update it to use the existing
dialect-aware quoting (e.g., reuse quoteIdentForDialect or the same logic from
data-diff.ts) for schema/table/column identifiers, replace the hard-coded LIMIT
with dialect-appropriate paging (TOP/OFFSET/FETCH or FETCH FIRST ... ROWS ONLY
depending on connector.dialect), and stop silently swallowing exceptions —
surface or log the connector error (throw or return a propagated error) so
callers know sampling failed; locate fetchColumnSamples and the helper
quoteIdentForDialect/quoteIdent usage to apply the changes.

In
`@packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts`:
- Around line 59-75: The current return always prints "Join candidates: X found"
even when detectJoinCandidates returned a failure; update the logic in the
function that builds the response (the block using result, error, candidates,
connectionErrors and calling formatCandidates) to first check result.success
and/or error and, if the call failed, return a failure response instead of
formatting candidates: set metadata.success = false, include the raw error
string in title and metadata.error, include connection_errors if present, and
set output to either an error-aware message or the original result.error rather
than formatCandidates(candidates, ...); otherwise (success) keep the existing
title/count and call formatCandidates as before. Ensure you reference result,
error, candidates, connectionErrors and formatCandidates in your change.
- Around line 51-58: The execute method in altimate-core-detect-join-candidates
currently calls Dispatcher.call("altimate_core.detect_join_candidates", ...)
without requesting user approval; add an explicit approval step using ctx.ask({
permission: "sql_execute_read", title: "...", description: "..." }) (matching
the other read-only warehouse tools) before invoking Dispatcher.call; if ctx.ask
is denied or returns falsy, abort/throw or return an appropriate error,
otherwise proceed to call Dispatcher.call with the same args; ensure this
approval lives inside the async execute(args, _ctx) function and surrounds the
call to Dispatcher.call so no SELECTs are issued before approval.

---

Nitpick comments:
In `@packages/opencode/src/altimate/index.ts`:
- Line 19: The barrel currently re-exports everything from
"./tools/altimate-core-detect-join-candidates", which unintentionally exposes
the test-only symbol _altimateCoreDetectJoinCandidatesInternal; change the
wildcard export to a named export that only exports the public tool
(AltimateCoreDetectJoinCandidatesTool) so the internal helper is not leaked,
i.e., replace the export * with an explicit export of
AltimateCoreDetectJoinCandidatesTool (or alternatively move
_altimateCoreDetectJoinCandidatesInternal out of the module's public exports).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d2afa6cf-1811-4eaa-98f4-31961064966e

📥 Commits

Reviewing files that changed from the base of the PR and between 941c8ca and d16a92f.

📒 Files selected for processing (8)
  • packages/opencode/src/altimate/index.ts
  • packages/opencode/src/altimate/native/connections/detect-join-candidates.ts
  • packages/opencode/src/altimate/native/connections/register.ts
  • packages/opencode/src/altimate/native/types.ts
  • packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts
  • packages/opencode/src/tool/registry.ts
  • packages/opencode/test/altimate/altimate-core-native.test.ts
  • packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts

Comment thread packages/opencode/src/altimate/native/connections/detect-join-candidates.ts Outdated
Comment thread packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:176">
P1: Dialect-specific identifier quoting is required; always using ANSI double quotes breaks MySQL/MariaDB when `ANSI_QUOTES` is not enabled.</violation>

<violation number="2" location="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:192">
P1: Do not hardcode `LIMIT` in the sampling SQL; rely on `connector.execute(..., sampleSize)` to apply dialect-specific limiting.</violation>

<violation number="3" location="packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:260">
P2: Falling back to `["public"]` when `listSchemas()` fails silently produces zero candidates on databases where the default schema is not `public` (e.g., SQL Server uses `dbo`, Oracle uses the username). The error isn't surfaced in `connection_errors`, so the run appears successful with an empty result. Either propagate the error or use a dialect-aware default schema.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts:66">
P2: When the native call returns `{ success: false, error }` without throwing, this code path is still reached and renders `Join candidates: 0 found` / `No cross-DB join candidates detected`, hiding the actual error from the user. Add an early return for `!result.success` that surfaces the error message (similar to the catch block below that returns `title: "Join candidates: ERROR"`).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/opencode/src/altimate/native/connections/detect-join-candidates.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/detect-join-candidates.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/detect-join-candidates.ts Outdated
@sahrizvi
Copy link
Copy Markdown
Author

Multi-Model Consensus Code Review — PR #761

Title: feat(sql): cross-DB join key inference via prefix/suffix overlap
Branch: feat/cross-db-join-inference
Net change: +875 / -2 across 8 files
New code: detect-join-candidates.ts (317 LOC) + altimate-core-detect-join-candidates.ts (125 LOC) + 385 LOC test file


Verdict: REQUEST CHANGES

Three independent reviewers (Claude, GPT, Gemini) flagged the SQL portability problem as a blocker; GPT additionally flagged a missing permission-gating step. Kimi gave a "conditional yes" but its concerns overlap with the major issues raised elsewhere. The pure algorithm and test architecture are unanimously praised; the warehouse-facing implementation is unanimously called out for not aligning with the repo's dialect-aware patterns.

The fixes are small and bounded — likely ~30 LOC plus a couple of tests — and worth doing before merge so the tool doesn't ship as silently-broken on MySQL/SQL Server/Oracle/BigQuery.


CRITICAL

C1 — LIMIT + ANSI double-quote identifier quoting are not portable across the supported drivers

Reviewers: Claude, GPT, Gemini, Kimi (4/4 active reviewers — Consensus)
Category: Bug / Portability
Location: packages/opencode/src/altimate/native/connections/detect-join-candidates.ts:175–177 (quoteIdent), lines 190–194 (fetchColumnSamples)
Why it matters: The handler builds:

const sql = `SELECT ${col} FROM ${target} WHERE ${col} IS NOT NULL LIMIT ${sampleSize}`

Both ingredients break on warehouses this repo ships drivers for:

  • LIMIT N is invalid on SQL Server (uses TOP/OFFSET ... FETCH NEXT) and pre-12c Oracle (uses ROWNUM/FETCH FIRST).
  • Double-quoted identifiers are parsed as string literals by MySQL/MariaDB in default mode (without ANSI_QUOTES); Databricks/Spark and BigQuery use backticks; SQL Server uses brackets.

Combined with the bare catch {} in fetchColumnSamples (line 201), the failure mode is silent: the user gets success: true, candidates: [], bags_scanned: 0 and no signal that every sample query threw.

Suggested fix: Drop the inline LIMIT and let the existing driver contract apply row-capping (Connector.execute(sql, sampleSize) already does this — see packages/drivers/src/sqlite.ts:25–47). For identifier quoting, either delegate to a dialect-aware helper (the repo has dialect-aware quoting in data-diff) or branch on Registry.getConfig(name)?.type. GPT supplied a concrete patch:

const warehouseType = Registry.getConfig(name)?.type ?? "unknown"
const quotedTarget = [schema, table].filter(Boolean)
  .map(p => quoteIdentForDialect(p, warehouseTypeToDialect(warehouseType)))
  .join(".")
const quotedCol = quoteIdentForDialect(column, warehouseTypeToDialect(warehouseType))
const sql = `SELECT ${quotedCol} FROM ${quotedTarget} WHERE ${quotedCol} IS NOT NULL`
const result = await connector.execute(sql, sampleSize)

MAJOR

M1 — No permission gating on warehouse reads (policy bypass)

Reviewers: GPT (Unique, but high-impact)
Category: Security / Design
Location: packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts:51
Why it matters: This tool issues many SELECT reads across every string column of every scanned table on every configured connection — but it never goes through the sql_execute_read permission flow that data_diff uses for less-invasive operations. Even though raw values aren't echoed back to the user, they are pulled from the warehouse and held in memory. This is a policy-consistency gap that the security review will catch downstream if not now.
Suggested fix: Gate execution with ctx.ask({ permission: "sql_execute_read", ... }) before the dispatcher call, and add a tool test asserting the permission request occurs.

M2 — safeListSchemas silent fallback to ["public"] masks real failures

Reviewers: Claude, GPT, Gemini (3/4 — Consensus)
Category: Logic Error / Observability
Location: detect-join-candidates.ts:256–262
Why it matters: If listSchemas() fails, the helper returns ["public"]. That is wrong for Snowflake (PUBLIC uppercase, per-database scoping), wrong for BigQuery (no public concept), wrong for Oracle, ClickHouse, MySQL. The user gets an empty result with no entry in connection_errors — the silent-failure mode again.
Suggested fix: If listSchemas() fails and the caller didn't pass schema_name, record the connection-level error and skip the connection:

let schemas: string[]
try {
  schemas = params.schema_name ? [params.schema_name] : await connector.listSchemas()
} catch (e) {
  errors[name] = `Failed to list schemas: ${String(e)}`
  continue
}

M3 — Per-table sample failures swallowed without any reporting

Reviewers: Claude, GPT (2/4 — Consensus, also implicit in Kimi's commentary)
Category: Design / Observability
Location: detect-join-candidates.ts:201–203 (fetchColumnSamples), lines 256–286 (other safe* helpers)
Why it matters: The PR description claims "per-table failures don't kill the scan; per-connection failures are returned in connection_errors" — but there is no surface for per-table or per-column errors. A warehouse user who can list tables but lacks SELECT on one of them gets a silently-skipped table. Combined with C1+M2, the user has no signal that anything went wrong.
Suggested fix: Add a counter (or a small bounded list) of swept errors to the result envelope, even if just swept_errors: number.

M4 — Tool wrapper mishandles non-throwing native failures

Reviewers: GPT (Unique)
Category: Bug
Location: altimate-core-detect-join-candidates.ts:59–75
Why it matters: If the dispatcher returns { success: false, error: ... } (which happens when the connection-count guard fires at line 295–301 of the native handler), the tool wrapper still renders Join candidates: 0 found / No cross-DB join candidates detected. That presents an operational failure as a clean negative result. Existing tests cover the thrown-error path but not the success: false envelope.
Suggested fix:

if (!result.success || error) {
  return {
    title: "Join candidates: FAILED",
    metadata: { success: false, candidate_count: 0, ...(error && { error }) },
    output: `Failed to detect join candidates: ${error ?? "Unknown error"}`,
  }
}

M5 — confidence is a misleading field name

Reviewers: Claude (Unique)
Category: Design / Documentation
Location: detect-join-candidates.ts:88–93, line 124
Why it matters: confidence = overlap / min(|lsuf|, |rsuf|). With sample_size: 50, two columns whose 50 sampled values happen to share a suffix yield confidence = 1.0 — even if the underlying tables are mostly disjoint. The doc-comment notes "not a probability" but the field name in the JSON output is confidence, which an LLM consumer will read as "very likely correct."
Suggested fix: Rename to sample_overlap_ratio (consistent with suffix_overlap naming) and propagate to the formatter.

M6 — Sample collection is fully serial across connections, schemas, tables, columns

Reviewers: Gemini (Unique)
Category: Performance
Location: detect-join-candidates.ts:220–251
Why it matters: Every await in collectSampleBags is sequential. With defaults (50 tables × N string columns × M connections), total latency = sum of all per-column queries. For a real two-warehouse scan, this is many seconds to minutes.
Suggested fix: Parallelize at the connection level with Promise.all; consider chunked parallelism within a connection (e.g., p-limit-style) to avoid hammering a single warehouse.

M7 — No bound on column count or sample-size, allowing memory/time blowups

Reviewers: Kimi (Unique, valid)
Category: Code Quality / Performance
Location: altimate-core-detect-join-candidates.ts:38–49 (Zod schema)
Why it matters: sample_size and max_tables_per_connection accept any positive integer. sample_size: 1000000 ⇒ allocate sets of 1M strings × N columns × M connections. The Promise.all parallelization (M6) makes this worse.
Suggested fix:

sample_size: z.number().int().positive().max(1000).optional()
max_tables_per_connection: z.number().int().positive().max(500).optional()

MINOR

MI1 — register.ts:634 registers handler with implicit any params

Reviewers: Claude (Unique)
Location: register.ts:634
Every other register() call in the file types its params. Add the import + type annotation:

register("altimate_core.detect_join_candidates",
  async (params: AltimateCoreDetectJoinCandidatesParams) => detectJoinCandidates(params))

MI2 — STRING_TYPE_PATTERN may miss/include real types

Reviewers: Claude, Kimi
Location: detect-join-candidates.ts:163
ClickHouse FixedString(8) is not anchored against string (the ^ anchor will fail). Snowflake STRING covered. Postgres xml not in pattern (good — it's not text-y in this sense). Worth a unit test for the type filter to pin behavior.

MI3 — No exclusion of obviously-non-key string columns

Reviewers: Kimi (Unique)
Location: detect-join-candidates.ts:234–244
Wide tables expose description, notes, email, address etc. — all string-typed but not realistic join keys. Suggested skip pattern:

const SKIP_COLUMN_PATTERNS = /^(description|notes?|comments?|email|phone|address|created_at|updated_at|deleted_at)$/i

MI4 — No deduplication of overlapping candidates within a connection pair

Reviewers: Kimi (Unique)
Location: detect-join-candidates.ts:100–145
If both DBs have id1 and id2 columns, you get N×M candidates with identical or near-identical suffix sets. Either dedupe by suffix-set hash or document the behavior.

MI5 — table field encodes schema.table as a single string

Reviewers: Kimi (Unique)
Location: detect-join-candidates.ts:244
bag.table = ${schema}.${t.name}`` forces consumers to parse. Add an explicit schema field to `ColumnSampleBag` and `JoinCandidate`. Low-impact but improves the public type contract.

MI6 — formatCandidates may produce thousands of lines on real workloads

Reviewers: Gemini (Unique)
Location: altimate-core-detect-join-candidates.ts:105–115
Truncate human-readable output to top ~20–30 candidates; keep the full count in metadata.candidate_count.

MI7 — Test stub references ALTIMATE_TELEMETRY_DISABLED but handler doesn't emit telemetry

Reviewers: Claude (Unique)
Location: test file lines 30–36
The beforeEach sets ALTIMATE_TELEMETRY_DISABLED=true but detectJoinCandidates doesn't call Telemetry.track. Either wire telemetry in (precedent: register.ts:393–406 for sql.execute) or remove the dead env-var setup.

MI8 — tablesScanned cap doesn't surface truncation

Reviewers: Claude (Unique)
Location: detect-join-candidates.ts:226–247
When max_tables_per_connection fires mid-scan, the response has no field telling the caller "we hit the cap." Add truncated: boolean or tables_skipped: number to bags_scanned-adjacent metadata.


NIT

  • N1 (Claude, Kimi): The doc comment at detect-join-candidates.ts:17 references "dab_bench's preindexer" — internal codename leaks into a public-looking source file. Either inline the algorithm summary or remove the breadcrumb.
  • N2 (Claude): Unicode in the formatter output may render badly in some CI logs. Consider <->.
  • N3 (Kimi): Double-wrapped error handling — both the native handler and the tool wrapper catch and stringify exceptions; can produce nested Failed: ... prefixes.

Positive observations (Consensus)

  • Pure-vs-IO split is excellent. commonPrefix / stripPrefixSet / detectJoinCandidatesFromBags are side-effect-free and exported. Praised by every reviewer.
  • Test architecture is genuinely good. Three layers (pure unit, SQLite-backed integration, tool surface) plus dispatcher-throws path. ~21 tests for ~317 LOC of new logic.
  • altimate_change markers correctly applied in every modified upstream-shared file.
  • Zod validation at the tool boundary plus a defensive re-check in the native handler — belt-and-braces, appropriate for an LLM-callable surface.
  • Type contract is registered in BridgeMethods and reuses the generic AltimateCoreResult envelope. No schema sprawl.
  • Graceful degradation on per-connection failures (when they actually surface) is the right shape for a multi-warehouse tool.

Missing tests / edge cases (Consensus)

  1. Non-ANSI dialect path. No test issues SQL the way an MSSQL/MySQL/Oracle driver would receive it. Add at least one test that proves the SQL is portable, or accept the C1 fix and prove driver-side limit handling works.
  2. max_tables_per_connection cap actually applied. The cap can be removed without any test failing.
  3. schema_name filter. Branch at lines 223–225 is uncovered.
  4. Permission-gating (if M1 is accepted).
  5. success: false envelope at the tool layer (M4).
  6. String-type filter. Insert an INTEGER amount column (the SQLite fixture already has one at line 239) and assert it's not in the bags.
  7. Empty-suffix / whitespace-only values (Kimi).
  8. Case sensitivity — e.g. BusinessID_ vs businessid_ (Kimi).

Attribution

Finding Severity Reviewers
C1 SQL portability (LIMIT + quoting) CRITICAL Claude, GPT, Gemini, Kimi
M1 No permission gating MAJOR GPT
M2 ["public"] schema fallback MAJOR Claude, GPT, Gemini
M3 Silent per-table errors MAJOR Claude, GPT
M4 success: false mishandled MAJOR GPT
M5 confidence misleading MAJOR Claude
M6 Serial sample collection MAJOR Gemini
M7 No upper bounds on params MAJOR Kimi
MI1 Implicit any register MINOR Claude
MI2 String-type pattern coverage MINOR Claude, Kimi
MI3 No column exclusion list MINOR Kimi
MI4 No candidate dedup MINOR Kimi
MI5 schema/table compound key MINOR Kimi
MI6 Output truncation MINOR Gemini
MI7 Dead telemetry env-var MINOR Claude
MI8 Cap-truncation not signaled MINOR Claude
N1 dab_bench codename leak NIT Claude, Kimi
N2 Unicode separator NIT Claude
N3 Double-wrapped errors NIT Kimi

Disagreements

  • Kimi's overall verdict was "conditional yes — fix chore(deps): Bump minimatch from 10.0.3 to 10.2.3 in /packages/altimate-code #1, chore(deps): Bump @ai-sdk/xai from 2.0.51 to 3.0.60 #4, chore(deps): Bump glob from 13.0.5 to 13.0.6 #8 before merge" while Claude / GPT / Gemini lean "request changes" (with C1 as blocker). The disagreement reduces to whether SQL portability is a release blocker vs. a follow-up: GPT and Claude argue it ships visibly broken on multiple supported warehouses (silent zero-result), so blocker. Kimi assumed dialect parity would be sorted post-merge.
  • Performance framing differs: Gemini emphasizes the serial-I/O latency (M6); Kimi emphasizes algorithmic worst-case in commonPrefix (long values); Claude emphasizes the unbounded bag count (M7-related). All are valid; the bounded-fix is M7's Zod cap, the latency fix is M6.
  • commonPrefix long-string defensiveness (Kimi suggested a 100-char early-exit) vs. leave as-is (Claude's view): the values are sample-bounded by sample_size <= 50 by default, so the worst case is small. Apply only if M7 doesn't add the upper bound.

Footer

Reviewed by 8 participants: Claude + GPT 5.4 Codex + Gemini 3.1 Pro + Kimi K2.5 + MiniMax M2.7 + GLM-5.1 + Qwen 3.6 + MiMo V2 Pro.

Active responses: 4/8 — Claude, GPT (full), Gemini (partial — Gemini API quota exhaustion mid-run, but produced a coherent review), Kimi (full).

Failed/timed-out: 4/8 — GLM-5.1, Qwen 3.6, MiniMax M2.7, MiMo V2 Pro all blocked by a kilo SQLite database lock (a separate consensus run on PR #762 held the DB lock; running these models sequentially still hit the contention plus per-process startup races). The synthesis above represents the consensus of the 4 active reviewers.

This synthesis is single-pass (no convergence rounds) per the user's instruction.

Fixes the CRITICAL and MAJOR issues raised in the multi-model consensus
review on PR #761.

CRITICAL — SQL portability:
  - Reuse `quoteIdentForDialect` from `data-diff.ts` (now exported) so
    identifier quoting matches the per-dialect convention: backticks on
    MySQL/MariaDB/ClickHouse, square brackets on T-SQL/Fabric, ANSI
    double-quotes elsewhere.
  - Drop the hardcoded `LIMIT N` clause and pass the cap through
    `connector.execute(sql, sampleSize)` so each driver applies its
    native limit syntax (`LIMIT`, `TOP`, `FETCH FIRST`, ...).
  - Extract `buildSampleSql` as a pure helper so tests can snapshot the
    emitted SQL per dialect without going through I/O.

MAJOR fixes:
  - Permission gating: tool wrapper now requests `sql_execute_read` via
    `ctx.ask()` before issuing any SELECT, matching `data_diff` /
    `sql_execute`.
  - Drop the unsafe `["public"]` schema fallback. When `listSchemas()`
    fails and no `schema_name` is provided, record the failure as a
    connection-level error and skip the connection rather than silently
    scanning the wrong schema.
  - Surface per-table / per-column sampling failures via a bounded
    `partial_errors: Record<string, string[]>` field instead of
    swallowing them. Format includes errors in the human-readable output.
  - Tool wrapper now returns a `FAILED` envelope when the dispatcher
    returns `{ success: false, error }` (e.g. on the connection-count
    guard) instead of rendering "0 found".
  - Rename `confidence` → `match_score` with an updated JSDoc that calls
    out it's a heuristic ranking signal (`overlap / min(|left|, |right|)`),
    not a probability.
  - Parallelize `collectSampleBags` at the connection level via
    `Promise.all` — connections are independent and the per-connection
    cap keeps blast radius bounded.
  - Add Zod upper bounds: `connections` <= 16, `sample_size` <= 1000,
    `max_tables_per_connection` <= 500. Prevents an oversized LLM call
    from blowing memory or holding warehouse connections indefinitely.

Minor fix:
  - Type the dispatcher registration with
    `AltimateCoreDetectJoinCandidatesParams` / `AltimateCoreResult` so
    the handler matches the rest of the file.

Tests:
  - 14 new tests covering dialect-aware SQL emission (MySQL, T-SQL,
    Fabric, Postgres, Snowflake, generic), absence of inlined LIMIT,
    delimiter escaping, the `success: false` envelope path, the
    permission-gating call, the Zod upper-bound rejections, the
    `partial_errors` collection, and the no-fallback `listSchemas`
    behaviour. Updates existing tests for the `confidence` → `match_score`
    rename. Final tally: 35 passing tests, 0 failing.

Defers (per consensus-review tags): all MINOR/NIT findings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

2 similar comments
@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@sahrizvi
Copy link
Copy Markdown
Author

Update: addressed-vs-pending from the consensus review

Pushed 80758f5c2 on top of the original branch.

Addressed (this commit)

ID Severity Issue Resolution
C1 CRITICAL SQL portability — LIMIT + ANSI quoting break MySQL/SQL Server/Oracle/Databricks/BigQuery Reuse exported quoteIdentForDialect from data-diff.ts (backticks/brackets/ANSI per dialect); drop inlined LIMIT N; delegate row cap to connector.execute(sql, sampleSize) so each driver applies its native limit syntax. Extracted buildSampleSql as a pure helper.
M1 MAJOR Permission gating bypass Tool wrapper now calls ctx.ask({permission:"sql_execute_read", ...}) before any SELECT, mirroring data_diff / sql_execute.
M2 MAJOR ["public"] schema fallback is unsafe Removed safeListSchemas → ["public"] fallback. listSchemas failure now surfaces as a connection-level error and the connection is skipped.
M3 MAJOR Silent per-table errors Per-table / per-column sampling failures collected in bounded partial_errors: Record<string, string[]> (cap 20/connection); surfaced in tool output and metadata.
M4 MAJOR success: false mishandling Tool wrapper returns title: "Join candidates: FAILED" envelope when dispatcher returns {success:false, error}.
M5 MAJOR Misleading confidence field name Renamed to match_score, JSDoc updated to call out it's a heuristic ranking signal, not a probability. Note this is a pre-merge rename — no aliased back-compat field.
M6 MAJOR Serial I/O across connections collectSampleBags parallelizes at the connection level via Promise.all. Per-connection cap still bounds blast radius.
M7 MAJOR No Zod upper bounds Added .max() caps: connections ≤ 16, sample_size ≤ 1000, max_tables_per_connection ≤ 500.
MI1 MINOR (drive-by) Untyped dispatcher registration Typed with AltimateCoreDetectJoinCandidatesParams / AltimateCoreResult.

Tests: 35 pass / 0 fail in the targeted file (was 21; +14 new for dialect SQL emission across MySQL/T-SQL/Fabric/Postgres/Snowflake/generic, no-LIMIT inlining, delimiter escaping, permission gating, success: false envelope, partial_errors collection, no-public fallback, Zod upper-bound rejection). Sibling regression: 2,886 pass. Typecheck clean.

Deferred (per consensus tags)

All MINOR / NIT findings — MI2 (string-type pattern coverage), MI3 (column-name skip list), MI4 (candidate dedup), MI5 (compound schema.table field), MI6 (output truncation), MI7 (dead telemetry env-var), MI8 (cap-truncation signal), N1 (codename leak), N2 (unicode), N3 (double-wrap). Also deferred CodeRabbit's index.ts barrel cleanup since it wasn't in the consensus list.

Demoted as invalid: none from this PR (no Kimi-style misreads).

Heads-up

quoteIdentForDialect is now exported from data-diff.ts. Inside the package it's only consumed by detect-join-candidates.ts today. Tool API changes from this round:

  • confidencematch_score (rename, not aliased — pre-merge so no installed users yet)
  • New partial_errors field on the result envelope
  • First invocation triggers a sql_execute_read permission prompt; non-interactive harnesses must auto-approve

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/opencode/src/altimate/native/connections/data-diff.ts (1)

42-56: ⚠️ Potential issue | 🟠 Major

Add an explicit BigQuery branch here.

detect-join-candidates now reuses this helper for sampling SQL, but the fallback branch still emits ANSI-style "identifier" quoting. BigQuery does not use that quoting mode for identifiers, so the new feature will still generate invalid SQL there whenever a table/column name needs quoting.

In BigQuery Standard SQL, what quoting syntax is required for quoted identifiers, and are double quotes valid identifier quotes?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines
42 - 56, The fallback ANSI double-quote branch emits invalid identifier quoting
for BigQuery; update quoteIdentForDialect to add an explicit "bigquery" case
that quotes identifiers with backticks (same style as the
mysql/mariadb/clickhouse branch) and properly escapes any backticks inside the
identifier (e.g., identifier.replace(/`/g, "``")), so detect-join-candidates
generates valid BigQuery SQL; modify the switch in quoteIdentForDialect to
include case "bigquery": return `\`${identifier.replace(/`/g, "``")}\``.
🧹 Nitpick comments (1)
packages/opencode/src/altimate/native/connections/detect-join-candidates.ts (1)

279-293: Surface when the table cap truncates a scan.

Once tablesScanned hits max_tables_per_connection, sampling just stops and the response still looks complete. Returning per-connection metadata like truncated, tables_scanned, or tables_skipped would make zero/low-result scans much easier to interpret.

Also applies to: 388-395

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`
around lines 279 - 293, The scanning loop that uses tablesScanned and maxTables
(in detect-join-candidates.ts around the for (const schema of schemas) loop and
the similar loop at the later block) currently stops silently when tablesScanned
>= maxTables; update the per-connection result object returned by this code to
include metadata fields such as truncated (boolean), tables_scanned (number),
and tables_skipped (number) and set truncated = true whenever you break out due
to the cap; increment tables_scanned as you already do, compute tables_skipped
as the remaining tables not scanned in this connector/schema, and ensure the
same changes are applied to the second loop (the block around lines 388-395) and
any place that returns connector scan results so callers can see when scans were
truncated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`:
- Around line 150-156: stripPrefixSet currently adds suffixes that are only
whitespace because it checks suf.length > 0; update stripPrefixSet(values,
prefix) to trim whitespace before deciding and storing: compute suf =
v.slice(prefix.length), then let trimmed = suf.trim() and only add trimmed to
the Set if trimmed.length > 0 (store trimmed, not the raw suf). This ensures
whitespace-only suffixes like " " or "\t" are ignored when building the returned
Set.

---

Duplicate comments:
In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 42-56: The fallback ANSI double-quote branch emits invalid
identifier quoting for BigQuery; update quoteIdentForDialect to add an explicit
"bigquery" case that quotes identifiers with backticks (same style as the
mysql/mariadb/clickhouse branch) and properly escapes any backticks inside the
identifier (e.g., identifier.replace(/`/g, "``")), so detect-join-candidates
generates valid BigQuery SQL; modify the switch in quoteIdentForDialect to
include case "bigquery": return `\`${identifier.replace(/`/g, "``")}\``.

---

Nitpick comments:
In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`:
- Around line 279-293: The scanning loop that uses tablesScanned and maxTables
(in detect-join-candidates.ts around the for (const schema of schemas) loop and
the similar loop at the later block) currently stops silently when tablesScanned
>= maxTables; update the per-connection result object returned by this code to
include metadata fields such as truncated (boolean), tables_scanned (number),
and tables_skipped (number) and set truncated = true whenever you break out due
to the cap; increment tables_scanned as you already do, compute tables_skipped
as the remaining tables not scanned in this connector/schema, and ensure the
same changes are applied to the second loop (the block around lines 388-395) and
any place that returns connector scan results so callers can see when scans were
truncated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 245f5ad4-089b-4383-b8e8-5300dc34628e

📥 Commits

Reviewing files that changed from the base of the PR and between d16a92f and 80758f5.

📒 Files selected for processing (5)
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/src/altimate/native/connections/detect-join-candidates.ts
  • packages/opencode/src/altimate/native/connections/register.ts
  • packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts
  • packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/opencode/src/altimate/tools/altimate-core-detect-join-candidates.ts
  • packages/opencode/test/altimate/tools/altimate-core-detect-join-candidates.test.ts

Comment on lines +150 to +156
function stripPrefixSet(values: readonly string[], prefix: string): Set<string> {
const out = new Set<string>()
for (const v of values) {
if (typeof v === "string" && v.startsWith(prefix)) {
const suf = v.slice(prefix.length)
if (suf.length > 0) out.add(suf)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ignore whitespace-only suffixes before scoring.

stripPrefixSet() currently treats " " / "\t" as real suffixes because it only checks length > 0. That can produce suffix_overlap > 0 for columns that don't actually contain a usable join token.

Suggested fix
 function stripPrefixSet(values: readonly string[], prefix: string): Set<string> {
   const out = new Set<string>()
   for (const v of values) {
     if (typeof v === "string" && v.startsWith(prefix)) {
       const suf = v.slice(prefix.length)
-      if (suf.length > 0) out.add(suf)
+      if (suf.trim().length > 0) out.add(suf)
     }
   }
   return out
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function stripPrefixSet(values: readonly string[], prefix: string): Set<string> {
const out = new Set<string>()
for (const v of values) {
if (typeof v === "string" && v.startsWith(prefix)) {
const suf = v.slice(prefix.length)
if (suf.length > 0) out.add(suf)
}
function stripPrefixSet(values: readonly string[], prefix: string): Set<string> {
const out = new Set<string>()
for (const v of values) {
if (typeof v === "string" && v.startsWith(prefix)) {
const suf = v.slice(prefix.length)
if (suf.trim().length > 0) out.add(suf)
}
}
return out
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/detect-join-candidates.ts`
around lines 150 - 156, stripPrefixSet currently adds suffixes that are only
whitespace because it checks suf.length > 0; update stripPrefixSet(values,
prefix) to trim whitespace before deciding and storing: compute suf =
v.slice(prefix.length), then let trimmed = suf.trim() and only add trimmed to
the Set if trimmed.length > 0 (store trimmed, not the raw suf). This ensures
whitespace-only suffixes like " " or "\t" are ignored when building the returned
Set.

@dev-punia-altimate
Copy link
Copy Markdown

✅ Tests — All Passed

TypeScript — passed

cc @sahrizvi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] Cross-DB join key inference (prefix/suffix overlap)

2 participants