Skip to content

Support Oracle-style SQL block splitting#325

Merged
debba merged 3 commits into
TabularisDB:mainfrom
haos666:haos666/oracle-dm-sql-splitter
Jun 29, 2026
Merged

Support Oracle-style SQL block splitting#325
debba merged 3 commits into
TabularisDB:mainfrom
haos666:haos666/oracle-dm-sql-splitter

Conversation

@haos666

@haos666 haos666 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #316 for the Oracle/DM splitter work discussed with @ymadd.

This keeps the behavior additive and dialect-gated behind the existing oracle SQL dialect:

  • enable line-leading / as an Oracle-style statement terminator
  • fold PL/SQL-style source units until the next slash-terminated segment instead of splitting on internal semicolons
  • support Oracle q-quoting / nq-quoting so ; and / inside quoted literals do not split statements
  • include DM-specific block openers: CREATE CLASS, CREATE CLASS BODY, and CREATE JAVA CLASS
  • keep other dialect presets structurally unchanged by leaving the new flags disabled outside oracle

This PR does not include trigger RPC forwarding (#321), batch forwarding, SQL import/export, or BLOB hooks.

Validation

Automated checks:

  • pnpm exec vitest run tests/utils/sqlSplitter
  • pnpm exec vitest run tests/utils/sqlSplitter/dialects.test.ts tests/utils/sqlSplitter/splitter.test.ts tests/utils/sqlSplitter/tokenizer.test.ts
  • pnpm exec vitest run (130 files / 2588 tests)
  • pnpm run typecheck
  • pnpm run lint
  • pnpm run build
  • pnpm run test:rust (671 Rust tests)
  • git diff --check

Local DM/UI smoke on the DM external plugin:

  • CREATE OR REPLACE TRIGGER ... BEGIN ... END; / plus a following SELECT is detected as 2 statements, not split at internal semicolons.
  • q-quoted text containing both ; and a line-leading / stays inside one statement.
  • simple slash-separated SQL statements split as expected under the oracle dialect.

@kilo-code-bot

kilo-code-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Previous Issues Resolved

All 7 issues from the previous review have been addressed:

# File Issue Resolution
1 src/utils/sqlSplitter/splitter.ts CREATE TYPE (spec) and CREATE LIBRARY incorrectly treated as block openers TYPE now only opens on BODY; LIBRARY returns false
2 src/utils/sqlSplitter/splitter.ts Missing NOFORCE in Oracle CREATE JAVA grammar Added NOFORCE consumption step
3 src/utils/sqlSplitter/splitter.ts sql.slice(segment.start, segment.end) copied entire segment into temporary string leadingSignificantTokens now takes bounds params, avoiding slice
4 src/utils/sqlSplitter/splitter.ts EOF-folded span ended at endSegment.end, missing trailing whitespace foldedToEof flag now uses sql.length when no slash delimiter found
5 src/utils/sqlSplitter/tokenizer.ts Q-quote delimiter allowed whitespace (invalid per Oracle) Added isQQuoteWhitespaceDelimiter guard
6 src/utils/sqlSplitter/index.ts Option flag slashTerminator mismatched token kind slashDelimiter Renamed to slashDelimiter throughout
7 tests/utils/sqlSplitter/splitter.test.ts EOF test did not assert trailing END; was preserved Added expect(result[0]).toContain('END;')
Files Reviewed (5 files)
  • src/utils/sqlSplitter/index.ts — Renamed slashTerminatorslashDelimiter in interface and all 6 dialect presets
  • src/utils/sqlSplitter/splitter.ts — Fixed EOF folding bounds; added NOFORCE parsing; corrected TYPE/LIBRARY block-opener logic; refactored leadingSignificantTokens to avoid string slicing; extracted MAX_LEADING_TOKENS and WORD_RE constants
  • src/utils/sqlSplitter/tokenizer.ts — Replaced slashTerminator usage; added whitespace rejection for q-quote delimiters
  • tests/utils/sqlSplitter/splitter.test.ts — Added tests for CRLF slash, NOFORCE JAVA SOURCE, TYPE spec/LIBRARY exclusion, TYPE BODY, WITH PROCEDURE, and END; EOF assertion
  • tests/utils/sqlSplitter/tokenizer.test.ts — Added tests for whitespace q-quote rejection and single-quote q-quote delimiter
Previous Review Summary (commit 6ee4d24)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit 6ee4d24)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)
  • src/utils/sqlSplitter/index.ts — Adds Oracle dialect flags (slashTerminator, plsqlBlocks, qQuoting) to DialectOptions
  • src/utils/sqlSplitter/splitter.ts — Implements PL/SQL block folding until slash terminators; robust opener detection for CREATE, BEGIN, DECLARE, WITH FUNCTION/PROCEDURE
  • src/utils/sqlSplitter/tokenizer.ts — Adds slashDelimiter and Q-quoted string scanning (q'…', nq'…')
  • tests/utils/sqlSplitter/splitter.test.ts — Comprehensive tests covering block folding, q-quoting, labeled blocks, inline functions, EOF handling, and SQL*Plus tradeoffs
  • tests/utils/sqlSplitter/tokenizer.test.ts — Tokenizer tests for slash delimiter line-leading logic and q-quoting boundaries

Reviewed by kimi-k2.6-20260420 · Input: 57K · Output: 17.8K · Cached: 189.4K

@ymadd

ymadd commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Thanks @haos666 for the follow-up — happy to take a look!

Comment thread src/utils/sqlSplitter/splitter.ts Outdated
case 'PACKAGE':
case 'TYPE':
case 'CLASS':
return next === 'BODY' || next !== undefined;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[should-fix] CREATE TYPE (spec) and CREATE LIBRARY are mis-classified as PL/SQL block openers, so foldBlocks swallows the following statement when no / is present.

return next === 'BODY' || next !== undefined reduces to next !== undefined (the first clause is dead), so any CREATE TYPE <name> is an opener; LIBRARY returns true unconditionally at L201. Object-type specs and CREATE LIBRARY carry no internal ; — the ; is their terminator — so folding them has no upside and merges the next statement. Tabularis sends statements to a driver one-by-one (no /, no trailing ;), so a merged CREATE TYPE ...; SELECT ... fails (ORA-00911) and the SELECT is lost.

The right criterion is "does it contain internal ;": PACKAGE/CLASS specs do, TYPE spec / LIBRARY don't.

case 'FUNCTION': case 'PROCEDURE': case 'TRIGGER':
  return true;
case 'LIBRARY':
  return false;                 // one-line DDL, terminated by ;
case 'PACKAGE': case 'CLASS':
  return next !== undefined;    // specs carry internal ;
case 'TYPE':
  return next === 'BODY';       // only TYPE BODY carries internal ;

A blanket next === 'BODY' for all three breaks the existing DM CREATE CLASS <name> spec test, so keep PACKAGE/CLASS as next !== undefined. Please add regressions: CREATE TYPE ...; SELECT ... -> 2 statements, CREATE LIBRARY ...; SELECT ... -> 2.

) {
index++;
}
if (tokens[index] === 'AND' && isJavaCompileOption(tokens[index + 1])) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle grammar is CREATE [OR REPLACE] [AND {RESOLVE|COMPILE}] [NOFORCE] JAVA {SOURCE|CLASS|RESOURCE} .... Only AND RESOLVE|COMPILE is skipped here; the optional NOFORCE that follows is not, so CREATE OR REPLACE AND RESOLVE NOFORCE JAVA SOURCE ... falls through to default and the Java body splits on its internal ;. Add if (tokens[index] === 'NOFORCE') index++; after the AND-skip. Low frequency, but a real grammar gap.

Comment thread src/utils/sqlSplitter/splitter.ts Outdated
}

function isPlsqlBlockOpener(sql: string, segment: RawSegment): boolean {
const tokens = leadingSignificantTokens(sql.slice(segment.start, segment.end));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sql.slice(segment.start, segment.end) copies the whole segment (a package body can be tens of KB) even though leadingSignificantTokens only needs the opener prefix. This is avoidable GC pressure on the parse path.

I would not fix this by capping the slice to a fixed byte/char count, because a segment can start with a long license/comment block and still have a valid opener later. A safer shape is to let leadingSignificantTokens scan the original sql with start/end bounds, stopping once it has collected enough tokens.

Same function, minor: the 12 cap deserves a named const/comment, and source.slice(position) in the word loop allocates per token. A sticky /[A-Za-z_][A-Za-z0-9_$#]*/y (matching the existing DOLLAR_TAG_RE/GO_RE/SLASH_RE convention) avoids that allocation.

Comment thread src/utils/sqlSplitter/splitter.ts Outdated
const endSegment = segments[endIndex];
output.push({
start: segment.start,
end: endSegment.end,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no / is found, the merged span ends at endSegment.end, which is before the segment terminator. For a block ending exactly at EOF, END; becomes END in both Statement.text and Statement.range.

The current backend sanitization (sanitize_user_query -> trim_end_matches(';'), src-tauri/src/commands.rs:89) masks the driver impact for now because it strips that same trailing semicolon before execution anyway. The splitter should still preserve the original statement text/range; otherwise this becomes an execution bug as soon as semicolon stripping is made statement-aware.

let eofFold = false;
if (endIndex >= segments.length) {
  endIndex = segments.length - 1;
  eofFold = true;
}
// ...
end: eofFold ? sql.length : endSegment.end,

const openCodePoint = source.codePointAt(delimiterPosition);
if (openCodePoint === undefined) return null;

const openDelimiter = String.fromCodePoint(openCodePoint);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Oracle, a q-quote delimiter cannot be a space, tab, or newline; these are accepted here, so q' ...;... ' would be mis-scanned as a quoted literal. Low impact (only over-shields malformed input), but a guard rejecting whitespace delimiters would match the spec — note a single quote is a valid delimiter, so don't reject '. The UTF-16/codePoint handling and the identifier-boundary guard here look correct.

Comment thread src/utils/sqlSplitter/index.ts Outdated
readonly dollarQuoting: boolean;
readonly customDelimiter: boolean;
readonly goDelimiter: boolean;
readonly slashTerminator: boolean;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the option flag is slashTerminator but the token kind / SegmentTerminator value is slashDelimiter. The pre-existing goDelimiter uses one word for both the flag and the token. Aligning to slashDelimiter (flag + the tokenizer.ts guard) keeps the grep trail consistent.

);
const result = splitQueries(sql, 'oracle');
expect(result).toHaveLength(1);
expect(result[0]).toContain('NULL;');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asserts NULL; but not the trailing END;, so the EOF semicolon loss (see the foldBlocks comment) passes unnoticed. Add expect(result[0]).toContain('END;'). Also worth adding: CREATE TYPE ...; SELECT ... and CREATE LIBRARY ...; SELECT ... -> length 2 (opener fix), CREATE TYPE BODY ... / -> length 1, WITH PROCEDURE ..., and a CRLF / separator case.

@ymadd

ymadd commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Thanks @haos666 — clean, well-tested change, and the dialect gating looks solid: the new slashTerminator/plsqlBlocks/qQuoting flags are all false for the other presets, so this should keep non-oracle splitting behavior isolated from the new Oracle path. The q-quote scanner (UTF-16/codePoint delimiter, identifier-boundary guard, N'...' not mis-parsed as q-quoting, nq'...'), the WITH FUNCTION-vs-CTE disambiguation, and label handling all check out.

I went through this carefully and checked the Oracle semantics against the Oracle SQL reference and the node-oracledb docs. The one correctness item I'd like resolved before merge is inline: isCreateBlockOpener treats CREATE TYPE spec / CREATE LIBRARY as block openers, so they swallow the next statement when no / follows. Heads-up that the fix revises our #316 opener list — CREATE TYPE/LIBRARY were included there on SQL*Plus / semantics, but Tabularis sends statements to a driver one-by-one, where / is not part of the statement and SQL statement delimiter semicolons are not sent. For this splitter path, "does the construct contain internal ;?" is the safer test. Happy to discuss.

Out of scope of this PR's files, but they block PL/SQL execution end-to-end through the UI (NOT regressions from this PR — flagging to track separately, probably as their own issues):

  1. sanitize_user_query (src-tauri/src/commands.rs:89) does trim_end_matches(';') on every statement before it reaches the driver (execute_query / execute_query_batch). That strips the structural ; in END;, so BEGIN ... END; arrives as BEGIN ... END and can hit PLS-00103. This also masks the EOF ; preservation issue today, and means even a correctly-split trigger/procedure won't execute from the editor until semicolon stripping is made statement/dialect-aware.
  2. runQuery / runMultipleQueries run extractQueryParams on each statement (src/pages/Editor.tsx; regex in src/utils/queryParameters.ts:21). It matches :OLD / :NEW, so a trigger body (exactly the fixture added here) is read as having bind parameters — either a missing-params modal appears, or the pseudorecords can be interpolated and corrupt the DDL.

The MCP path (src-tauri/src/mcp/mod.rs:1090) calls the driver directly without sanitize_user_query, so it behaves differently from the editor path. I couldn't test against a live Oracle/DM instance, so these two out-of-scope notes are from reading the code plus the Oracle/node-oracledb docs — would be good to confirm on DM8.

@haos666

haos666 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review. I pushed an update addressing the requested splitter fixes:

  • CREATE TYPE specs and CREATE LIBRARY now split as normal SQL statements; only CREATE TYPE BODY is folded as a block.
  • CREATE OR REPLACE AND RESOLVE NOFORCE JAVA SOURCE is recognized as a Java block opener.
  • isPlsqlBlockOpener now scans bounded source text directly instead of slicing per segment, with a named leading-token cap.
  • EOF PL/SQL blocks preserve the final END;.
  • q-quote whitespace delimiters are rejected, while single-quote delimiters remain supported.
  • Added regression coverage for TYPE spec, LIBRARY, TYPE BODY, NOFORCE Java, CRLF slash delimiters, WITH PROCEDURE, EOF END;, and q-quote delimiter behavior.

Validation run locally:

  • pnpm exec vitest run
  • pnpm run typecheck
  • pnpm run lint
  • pnpm run build
  • pnpm run test:rust

I also verified the updated dev app UI against a local DM connection: the query chooser now splits the reviewed cases as expected, including TYPE/LIBRARY as two statements and trigger/Java blocks plus following SELECT as two statements.

@debba debba requested a review from ymadd June 24, 2026 06:50

@ymadd ymadd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This addresses all my earlier review points — rename to slashDelimiter, leadingSignificantTokens substring removal + sticky WORD_RE, EOF fold preserving the trailing ;, TYPE/PACKAGE/CLASS split, NOFORCE + JAVA RESOURCE, q-quote whitespace-delimiter guard, and CRLF separator coverage. @debba good to merge from my side.

One non-blocking edge case I'll note for the record (no action needed here): a type spec whose name tokenizes to BODYCREATE TYPE "BODY" AS OBJECT (...), or unquoted CREATE TYPE body ... since BODY isn't a SQL reserved word — is mis-folded as a TYPE BODY opener, so a following statement gets merged into it. The trigger is pathological, so it's not worth holding this PR. I'll fold the fix (a lookahead so a BODY-named spec doesn't satisfy the body pattern, plus a regression test) into a later splitter change.

@debba

debba commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

That's great.
I will merge it, and I will expect another PR from you, thanks a lot @ymadd

@debba debba merged commit 1513125 into TabularisDB:main Jun 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants