Skip to content

feat(sea): expose maxConnections + lock complex types as Arrow default#392

Open
msrathore-db wants to merge 29 commits into
msrathore/sea-auth-u2mfrom
msrathore/sea-max-connections
Open

feat(sea): expose maxConnections + lock complex types as Arrow default#392
msrathore-db wants to merge 29 commits into
msrathore/sea-auth-u2mfrom
msrathore/sea-max-connections

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

Summary

Two features bundled into a single SEA-side PR:

  1. maxConnections (F1) — exposes the kernel's HttpConfig::pool_max_idle_per_host knob through ConnectionOptions.maxConnections on the public client options. Forwarded through buildSeaConnectionOptions into the napi openSession call. When omitted, the kernel default (100) applies. Mirrors the Python connector's max_connections kwarg on the SEA backend. NodeJS Thrift backend does NOT expose this knob — it's a SEA-only extension; no parity break.

  2. Complex types as Arrow default lock-in (F2) — adds a regression e2e test against pecotesting that selects ARRAY/MAP/STRUCT/nested ARRAY and asserts the returned Arrow schema fields are List/Map/Struct/List (not Utf8 JSON strings). Kernel ResultConfig::complex_types_as_json already defaults to false (Arrow-native), and the SEA wire request hardcodes format = "ARROW_STREAM" — so this is the existing default. The test locks the contract.

Plumbing — F1

  • IDBSQLClient.ConnectionOptions.maxConnections?: number with doc note that Thrift backend currently ignores it
  • SeaSessionDefaults (the shared base across auth-mode variants) carries the field so all three auth modes (PAT, OAuth M2M, OAuth U2M) round-trip it
  • buildSeaConnectionOptions validates positive-integer-ness at the JS boundary (rejects 0, negative, non-integer, NaN) with HiveDriverError
  • native/sea/index.d.ts adds the field in step with kernel branch msrathore/krn-max-connections

Cross-repo dependency

Requires kernel branch msrathore/krn-max-connections (PR https://github.com/databricks/databricks-sql-kernel/pull/74) for the napi-binding side.

Test plan

  • tests/unit/sea/auth-max-connections.test.ts — 9 cases passing (omit, valid, boundary 1, 0-reject, neg-reject, non-int-reject, NaN-reject, M2M arm, U2M arm)
  • Existing auth-pat.test.ts regression check — 8/8 pass
  • tsc: zero new TS errors from F1+F2 (16 errors total = 14 pre-existing in SeaOperationBackend.ts + 2 from missing lib/version.ts generated by prepare hook)
  • No lint errors in touched files (IDBSQLClient.ts, SeaAuth.ts, SeaBackend.ts, new tests)
  • Live warehouse e2e — tests/e2e/sea/complex-types-e2e.test.ts runs against pecotesting when env vars set; pending merge of kernel PR to test full path

Creates the napi-rs binding skeleton: Cargo.toml + lib.rs + module
stubs for database/connection/statement/result/error/logger. Captures
napi-rs tokio Handle via OnceCell in runtime.rs. Single working
#[napi] fn version() proves the binding loads + executes end-to-end
in Node.

Depends on krn-async-public-api branch (path dep on kernel).

Round 2 will add open/execute/fetch methods.
…wired

Adds real async methods on the opaque wrappers backing M0:
- openSession (free function) with PAT → kernel Session
- Connection::execute_statement → kernel ExecutedStatement
- Statement::fetch_next_batch / schema / cancel / close → kernel ResultStream
- Arrow batches returned as IPC bytes (per Layer 2 design)
- Error mapping preserves kernel ErrorCode + SQLSTATE for TS layer
- All entry points wrapped in catch_unwind

End-to-end smoke test against pecotesting passes.
No new dependencies beyond arrow-{ipc,array,schema} + futures.
Uses kernel async public API (no block_on).

Co-authored-by: Isaac
…indings

Round 1 scaffold declared tracing + tracing-subscriber as deps but
never used them. Removed. Logger bridge will re-add in round 3.

Other findings from 6b3affd-2026-05-15.md reviewed:
- Finding 2 (Database::Drop unreachable in Round 1b) — obsoleted by
  Round 2 (40d0b57): database.rs no longer declares a Database struct
  or Drop impl; it is now an `open_session` free function.
- Finding 3 (empty Connection::Drop) — obsoleted by Round 2: the Drop
  impl now spawns a real fire-and-forget close on the captured tokio
  handle.

Co-authored-by: Isaac
Per D-006 architectural decision (Python team's workspace pattern):
all language bindings (PyO3, napi-rs) now live as workspace siblings
in the kernel repo at databricks-sql-kernel/{pyo3,napi}/.

What this commit removes from the nodejs repo:
- native/sea/Cargo.toml (path dep relocated; package now at
  databricks-sql-kernel/napi/Cargo.toml with path = "..")
- native/sea/build.rs
- native/sea/src/* (lib, runtime, database, connection, statement,
  result, error, logger, util — all 9 files)
- native/sea/package.json (the @databricks/sea-native-linux-x64-gnu
  sub-package moves to the kernel workspace too)
- native/sea/index.js (regenerated artifact)

What stays in nodejs:
- native/sea/index.d.ts — TS declarations consumed by lib/sea/ adapter
- native/sea/README.md (new) — explains the move; points readers at
  databricks-sql-kernel/napi/

What's updated:
- package.json: `build:native` and `build:native:debug` scripts now
  delegate to the kernel workspace via $DATABRICKS_SQL_KERNEL_REPO
  (defaults to ../../databricks-sql-kernel-sea-WT/napi-binding for the
  local dev worktree layout). Build copies index.node + index.d.ts
  back into native/sea/ for the loader to find.

Why workspace co-location:
- Arrow version pinning lockstep — no silent IPC version drift
- path = ".." (clean) vs ../../../../databricks-sql-kernel-sea-WT/...
- Single CI: cargo build --workspace covers kernel + pyo3 + napi
- Kernel API changes that break either binding caught at PR-review time
- Future cgo binding for Go SEA slots in as another workspace member

This branch (sea-napi-binding) is now a thin consumer of the kernel
napi crate. The actual Rust code lives at krn-napi-binding HEAD on
the kernel repo (commit debe3d7).
Single mapping function in lib/sea/SeaErrorMapping.ts converts the
napi-binding's surfaced kernel error (code+message+sqlstate) to the
appropriate existing JS error class. M0 minimum: PAT auth errors
land as AuthenticationError; cancel/timeout as OperationStateError;
network/internal as HiveDriverError. SQLSTATE preserved on the
error object via .sqlState property.

No new error classes. M1 may add nuance.
SeaBackend.connect() now wires PAT options to the napi binding's
openSession(). Non-PAT modes rejected with clear M0-scope error
(OAuth/Azure/Federation land in M1). E2E test against pecotesting
confirms PAT round-trips: connect → openSession → close all clean.

No new dependencies. SeaAuth helper is ~30 LOC.
SeaSessionBackend wraps the napi Connection handle. executeStatement
passes through to napi.executeStatement and returns an
IOperationBackend (SeaOperationBackend in sea-results feature).
Session config + initialCatalog/initialSchema flow to napi
openSession. M0 stops at executeStatement; metadata methods +
per-stmt overrides defer to M1.

No new dependencies. Reuses existing ConnectionOptions / Session
config shapes.

Co-authored-by: Isaac
Two assertions in tests/unit/sea/execution.test.ts were specific to
the pre-merge SeaBackend / SeaOperationBackend stubs:

1. connect() missing-token rejection now flows through
   SeaAuth.buildSeaConnectionOptions which throws AuthenticationError
   (still a HiveDriverError subclass) with message "non-empty PAT".
   Updated the regex match accordingly.

2. fetchChunk() is no longer a stub — the merged
   SeaOperationBackend uses the sea-results pipeline
   (SeaResultsProvider + ArrowResultConverter + ResultSlicer). The
   "throws M1-deferred error owned by sea-results" test is now
   incorrect by design; removed it with a pointer comment to the
   real coverage in SeaOperationBackend.test.ts and
   results-e2e.test.ts.

891/891 unit tests passing post-merge.
Implements IOperationBackend over the napi binding's Statement.
fetchChunk decodes Arrow IPC bytes → apache-arrow RecordBatch →
ArrowResultConverter (Phase 1+2 reused unchanged) → JS rows.

All M0 datatypes round-trip via the same converter the thrift path
uses (BOOL, INT8/16/32/64, FLOAT, DOUBLE, DECIMAL, STRING, BINARY,
DATE, TIMESTAMP, INTERVAL, ARRAY, MAP, STRUCT). Unit tests
construct synthetic IPC batches; e2e test against pecotesting
confirms byte-identical parity vs thrift.

No new dependencies. ArrowResultConverter / ResultSlicer /
OperationIterator all reused unchanged (DRY).
Rust source changes (native/sea/src/connection.rs + statement.rs) deferred to kernel repo PR #29 (kernel-napi-statement-validity) since napi/src now lives in databricks-sql-kernel/napi/.

SeaOperationBackend.ts conflict resolved using integration commit 3da7aa7 (combining sea-results fetch pipeline with sea-operation lifecycle helpers).
- YEAR-MONTH: convert Arrow Interval[YearMonth] to thrift "N-M" string
  format (with leading "-" for negatives) in Phase 1 of converter
- DAY-TIME: pre-process IPC schema bytes before apache-arrow@13 decode
  (which predates the Arrow Duration type id 18) to remap Duration ->
  Int64 with original time unit preserved in `databricks.arrow.duration_unit`
  field metadata; convert Int64 duration values to thrift
  "D HH:mm:ss.fffffffff" string format

Both interval flavours are formatted by the same converter helper
(formatDayTimeFromTotal); the duration_unit metadata gates between the
native Arrow Interval Int32Array path and the rewritten Duration Int64
path. No apache-arrow bump, no node_modules edits, no kernel-side change.

New: lib/sea/SeaArrowIpcDurationFix.ts (FlatBuffer rewriter using
apache-arrow's internal fb/* accessors).

M0 datatype parity now 25/25.
Resolves the modify/delete conflict from merging sea-napi-binding:
sea-integration had the StatementInner ValidityFlag fix in
native/sea/src/statement.rs (inherited via sea-operation merge), while
sea-napi-binding wants those files deleted because the Rust source
now lives in the kernel workspace at databricks-sql-kernel/napi/.

Resolution: propagate the StatementInner fix to the kernel napi crate
(commit a8d8df7 on krn-napi-binding), then accept the deletion here.
Built artifacts (.node, .d.ts) are repopulated by
`npm run build:native` which delegates to the kernel workspace.

native/sea/ now contains only README.md and the generated index.d.ts;
index.linux-x64-gnu.node sits next to it as a build artifact.
Adds OAuth M2M and U2M onto the SEA auth path. The auth-u2m worktree
landed both at once (rather than rebasing on top of the M2M branch)
because the JS adapter flow-selector (`oauthClientSecret defined ?
M2M : U2M`, mirroring thrift `DBSQLClient.ts:143`) is cleanest when
both arms exist together — the no-secret branch was rejection-only
in the M2M-alone state.

The napi binding's `AuthMode` enum gains a third variant (`OAuthU2m`),
crossing the FFI as the PascalCase string `'OAuthU2m'`. The JS adapter
hardcodes `oauthRedirectPort: 8030` on the U2M payload to override
the kernel default of 8020 — preserves parity with thrift, which
defaults to 8030 (`OAuthManager.ts:245`). All other U2M knobs
(`client_id`, `scopes`, `callback_timeout`, `token_url_override`)
stay at kernel defaults; thrift hides them from its public surface
too, so SEA follows the same pattern.

`OAuthPersistence` is rejected on U2M with an explicit M1-Phase-2
deferral message: thrift exposes the hook, the kernel doesn't yet —
parity gap to close once `AuthConfig::External` lands. The kernel
disk cache at `~/.config/databricks-sql-kernel/oauth/{sha256}.json`
covers the standard flow today. Azure-direct knobs
(`azureTenantId` / `useDatabricksOAuthInAzure`) rejected on both
M2M and U2M with the same "Phase 2" message — kernel uses workspace
OIDC which works for Azure-databricks workspaces regardless.

Task: M1 OAuth M2M + U2M (sea-auth feature, U2M worktree).

Files:
- native/sea/src/database.rs — AuthMode { Pat, OAuthM2m, OAuthU2m } +
  ConnectionOptions union + open_session dispatch with U2M arm
  forwarding `oauth_redirect_port` from JS and leaving every other
  U2M kernel knob at None
- native/sea/index.{d.ts,js} — regenerated napi artifact
- lib/sea/SeaAuth.ts — buildSeaConnectionOptions grows M2M + U2M
  branches; flow selector mirrors thrift; persistence rejection
  message reads as a parity gap, not a feature add
- lib/sea/SeaNativeLoader.ts — SeaNativeBinding.openSession type
  accepts the three-arm discriminated payload
- tests/unit/sea/auth-pat.test.ts — assertions updated for new
  `authMode: 'Pat'` round-trip; no-secret OAuth now asserts U2M
  happy-path dispatch
- tests/unit/sea/auth-m2m.test.ts — new (8 cases — same as the
  M2M-worktree commit, minus the now-obsolete no-secret rejection)
- tests/unit/sea/auth-u2m.test.ts — new (7 cases — happy path,
  port 8030 hardcode, clientId not propagated, path slash prepend,
  Azure rejected, persistence rejected, SeaBackend round-trip)
- tests/integration/sea/auth-m2m-e2e.test.ts — env-gated live e2e
  (mirrors the M2M-worktree e2e)
- tests/integration/sea/auth-u2m-e2e.test.ts — new (it.skip pending
  TBD-oauth_u2m_test_harness; comment points at testing-agent's
  Playwright/Puppeteer harness work)

Tests:
- Unit: 55/55 pass (`npm run test -- 'tests/unit/sea/**/*.test.ts'`):
  13 PAT (assertions updated for authMode + no-secret now U2M),
  8 M2M, 7 U2M, 25 SeaErrorMapping regression, 2 ConnectionOptions
  base.
- U2M e2e: 1 pending (intentional `it.skip` — gated on browser
  harness).
- M2M e2e: same as the M2M-worktree commit — kernel-side OAuth
  plumbing reaches the workspace; pecotesting SP credentials produce
  the workspace's `invalid_client` (verified reproducible via
  direct curl), an environmental issue not a code defect.

Co-authored-by: Isaac
…helper, doc + loader cleanup

Ports the M2M-side round-1 review fixes (commit 88d7d21 on
sea-auth-m2m) onto the U2M worktree so the two branches stay aligned
in review quality. The U2M-specific work in 5eba37f is unchanged;
this commit is pure cleanup applied across all three SEA-auth test
files (PAT / M2M / U2M).

- Extracted `makeFakeBinding()` to
  `tests/unit/sea/_helpers/fakeBinding.ts` and refactored all three
  auth-*.test.ts files to import it. The U2M-worktree commit had
  THREE copies of the closure (the third was the cause for the
  bloat reviewer's "rule of three" call-out that the M2M-worktree
  fixup was meant to forestall).
- Dropped the unused `SeaAuthMode` type alias from `SeaAuth.ts` —
  zero callers; inlined literals already power the discriminated
  union.
- Tightened `SeaNativeBinding.openSession` parameter type to consume
  the discriminated `SeaNativeConnectionOptions` union from
  `SeaAuth.ts`, restoring compile-time per-mode field enforcement
  at the FFI seam.
- Augmented the Rust `AuthMode` doc-comment with the napi-emission
  explanation (PascalCase verbatim, not kebab-case) plus the
  cross-reference reminder to extend `open_session()`'s match on
  every new variant.
- Added the const-enum hazard note to `SeaNativeConnectionOptions`'
  doc-comment, locking in the duplicated-literal pattern as
  deliberate (importing the napi `const enum AuthMode` breaks
  `isolatedModules`).
- Cleaned up the conditional-type-cast lobotomy in `auth-pat.test.ts`
  on the token-provider fixture; plain `as any` + eslint-disable.

Skipped findings (same justification as M2M-worktree commit):
F-3 borderline error-class taxonomy, F-4 cosmetic arg-order,
F-5 redundant comment-anchor (compiler already enforces), F-8 null
vs undefined paranoia, F-9 mocha named-function style.

Tests:
- Unit: 55/55 pass (same count as 5eba37f — pure restructure).
- Native build: clean (1m04s release profile).
- Type-check: clean (tsc --noEmit).

Co-authored-by: Isaac
…ediums)

Ports the M2M-side devils-advocate round-1 fixes (commit eef9d30 on
sea-auth-m2m) onto the U2M worktree. Same shape, with the U2M-specific
adjustments noted below.

Added `decodeNapiKernelError(err: unknown): Error` to
`SeaErrorMapping.ts` and wrapped `SeaBackend.openSession`'s napi call
in `try`/`catch` + the decoder. The wiring step was missing on both
branches; now M2M and U2M users see typed errors
(`AuthenticationError` on Unauthenticated, `HiveDriverError` on
NetworkError, etc.) instead of raw `Error` with sentinel-prefixed
message bodies.

`buildSeaConnectionOptions` rejects:
- PAT path + stray OAuth fields → HiveDriverError "cannot supply
  both `token` and `oauthClientId`/`...Secret`".
- OAuth path (M2M or U2M) + stray `token` → HiveDriverError
  "cannot supply `token` alongside `authType: 'databricks-oauth'`".

The OAuth-side check fires BEFORE the M2M/U2M split, so the U2M
arm gets the protection too.

Rewrote three M2M-arm error messages plus the U2M persistence
message to be time-bound free:
- "U2M lands in sea-auth-u2m" → "OAuth M2M requires
  `oauthClientSecret`. For interactive OAuth (U2M), see the driver
  OAuth U2M docs."
- "Azure-direct OAuth ... is a later M1 task" → "Azure-direct OAuth
  ... is not supported. The workspace-OIDC discovery path handles
  Azure workspaces today without these options."
- "M1+ follow-ups" → "Supported modes on the SEA backend today: ..."
- U2M persistence: dropped "M1 Phase 2" — kept the technical
  explanation citing kernel-side `AuthConfig::External` plumbing
  (durable; describes the kernel gap, not the feature roadmap).

Zero hits for `sea-auth-u2m|sea-auth-m2m|later M1|M1\+ follow|M1 Phase`
in `lib/sea/`. Updated regex assertions in lockstep.

`isBlankOrReserved(s)` helper trims + rejects empty-after-trim and
literal `'undefined'` / `'null'` strings. Applied to `token`,
`oauthClientId`, `oauthClientSecret`. E2e env-gate hardened the
same way.

Added `tests/unit/sea/auth-edge-cases.test.ts` with 18 cases:
- Whitespace + reserved-literal PAT (3)
- Same for `oauthClientId` / `oauthClientSecret` on M2M (4)
- Ambiguous-creds: PAT+id, PAT+secret, M2M+token, U2M+token (4)
- Explicit-undefined Azure-direct discriminants on M2M + U2M (3)
- `decodeNapiKernelError` for Unauthenticated, NetworkError,
  SQLSTATE preservation, plain napi pass-through, corrupted
  envelope fallback (5)

Added a bad-secret `it(...)` block to `auth-m2m-e2e.test.ts` that
asserts `AuthenticationError` + `invalid_client`. Closes the loop on
DA-F1 by proving the kernel-side error path surfaces correctly.

The U2M e2e remains `it.skip` pending the Playwright/Puppeteer harness;
once it lands, the same negative-path pattern can be added there.

L-F3, L-F4, L-F5 — deferred per the previous fixup's reasoning.

Tests:
- Unit: 74/74 pass (was 55 before this commit: +18 from edge-cases
  + 1 from new pending placeholder count).
- TypeScript build: clean.
- Native build: unchanged (no Rust changes this commit).

Co-authored-by: Isaac
…raise on U2M+id, case-insensitive validation, preserve all error envelope fields

Addresses devils-advocate-auth-u2m-1 round-1 findings on commit
8e99b40. NF-1 is a HIGH continuation of DA-F1 — the previous fixup
wired `decodeNapiKernelError` at `SeaBackend.openSession` but missed
the second extant napi call site, `SeaSessionBackend.close()`. NF-2
through NF-4 are mediums.

`SeaSessionBackend.close()` calls `await this.connection.close()` on
the napi `Connection`. Kernel errors from there (e.g., a delete-
session RPC failure that the kernel chose to surface despite the
fire-and-forget pattern) were reaching JS callers as raw `Error`
with `__databricks_error__:` envelope. Wrapped in try/catch +
`throw decodeNapiKernelError(err)` — same 3-line shape as
openSession.

Per `grep -rn "await this\.native\.\|await this\.connection\." lib/sea/`,
these are the only two napi call sites on `sea-auth-u2m`. Future
napi call sites on sea-execution / sea-results / sea-operation
branches need the same wrap (Phase 2; tracked elsewhere).

Previously, `oauthClientId` set without `oauthClientSecret` was
silently dropped (kernel uses built-in `databricks-cli`). A user
setting the field clearly expects it honored; silent-drop hid
intent. Flipped to throw HiveDriverError with explicit guidance
("kernel uses 'databricks-cli'; omit for U2M or supply
oauthClientSecret for M2M").

The matching unit test in `tests/unit/sea/auth-u2m.test.ts` flipped
from "drops the supplied oauthClientId" to "rejects oauthClientId
on the U2M path with a clear 'not supported' error".

`isBlankOrReserved` previously compared `trimmed === 'undefined'`
and `=== 'null'`, so `'UNDEFINED'`, `'Null'`, `'NULL'`, `'nUlL'`,
etc. slipped through. Changed to `trimmed.toLowerCase()` before
the comparison. New unit case in `auth-edge-cases.test.ts` iterates
five case variants and asserts each rejects.

`decodeNapiKernelError` previously routed only `{code, message,
sqlState}` to the JS error class. The kernel envelope at
`native/sea/src/error.rs:50-89` actually carries 7 fields total
(`code`, `message`, `sqlState`, `errorCode`, `vendorCode`,
`httpStatus`, `retryable`, `queryId`). The remaining 5 were
silently dropped. Thrift parity demand: thrift errors carry these
fields.

Added `attachMetadata(error, meta)` helper that `Object.defineProperty`s
each non-undefined field as a non-enumerable own-property — matches
the way `attachSqlState` works and the way Node attaches `.code` to
system errors. Two new unit tests verify (a) all 5 fields round-trip
through a synthetic envelope, (b) they remain non-enumerable
(absent from `Object.keys(err)`) but accessible via direct property
read.

- NF-5 (envelope versioning): `__databricks_error__:` payload has
  no `version` field. A kernel refactor that renames a field would
  silently break the JS decoder. Phase 2: add `version: 1` to the
  kernel-side serialization, check + fallback on JS side. Not in
  this commit because it requires coordinated kernel-side change.
- NF-6 (U2M e2e harness): `auth-u2m-e2e.test.ts` is fully
  `it.skip` pending the Playwright/Puppeteer harness. Devils-
  advocate noted that port-collision + headless-negative-path
  subsets don't strictly need a browser. Phase 2: enable those
  subsets when the harness lands. Not in this commit because the
  work is mostly harness-wiring rather than test code.

Tests:
- Unit: 79/79 pass (was 74 before this commit: +5 — 1 case-insensitive
  reserved literal sweep, 1 M2M oauthClientSecret reserved-literal
  reject, 2 envelope-metadata preservation, 1 close() decode + 1
  flipped from drop-to-reject which kept the count net same — but
  the OAuthClientId test rewrite is on a different file).
- TypeScript build: clean.
- Native build: unchanged (no Rust changes this commit).

Co-authored-by: Isaac
…cate, type-guard envelope, treat blank secret as U2M

Addresses devils-advocate-auth-u2m-2 round-1 findings on commit
98d5ecf. NF-N1 is a real bug (collision between the kernel envelope's
textual `errorCode` field and the pre-existing enum-typed `errorCode`
on `OperationStateError` / `RetryError`). NF-N2..NF-N4 are mediums.
Includes B-4 collapse (one defineProperty helper for both top-level
sqlState and the kernel metadata namespace).

## NF-N1 (HIGH) — namespace kernel metadata + B-4 collapse

Before this commit, `decodeNapiKernelError` `defineProperty`d each of
the 5 kernel envelope fields (`errorCode`, `vendorCode`, `httpStatus`,
`retryable`, `queryId`) directly on the JS error. But
`OperationStateError.ts:21` and `RetryError.ts:12` already declare a
top-level `errorCode: enum` field, and `DBSQLOperation.ts:209`
switches on `err.errorCode === OperationStateErrorCode.Canceled`. A
Cancelled kernel envelope with `errorCode: "USER_REQUESTED_CANCEL"`
would clobber the enum string `'CANCELED'`, silently breaking
cancel detection.

Going with option (a) from team-lead's three remediation paths:
nest the 5 kernel envelope fields under a single
`error.kernelMetadata.*` namespace. Clean separation, no surprise,
matches the way `attachSqlState`'s pattern keeps `sqlState` at the
top level (which is collision-free).

Folded B-4 simultaneously: replaced the two helpers (`attachSqlState`,
`attachMetadata`) with one `defineErrorMetadata(error, key, value)`
that owns the `defineProperty` flags. Both `sqlState` (top-level)
and `kernelMetadata` (namespaced) go through the same helper now.

## NF-N2 (medium) — dedupe e2e `looksReal` against production

`auth-m2m-e2e.test.ts:58-62` had a `looksReal` predicate that was
still case-sensitive even though round-2's `isBlankOrReserved` is
case-insensitive. Exported `isBlankOrReserved` from `SeaAuth.ts` and
imported it in the e2e test. Eliminates the predicate-drift risk
(also resolves the bloat-watchdog's B-3).

## NF-N3 (medium) — blank/reserved oauthClientSecret routes to U2M

A user passing `oauthClientSecret: process.env.MY_SECRET || ''`
previously hit the M2M arm's "secret must be non-empty" rejection,
which never mentions U2M. Now blank/whitespace/reserved-literal
secrets route to the U2M arm — where if `oauthClientId` is also
set, the dedicated "not supported on U2M" rejection fires (round-2
NF-2 work). The error message correctly points at the right flow.

Updated 5 existing test cases that had assumed the old M2M-rejects
behavior; they now assert the U2M-via-id-rejection path. Added 3
new cases (empty string, whitespace-only, literal `'undefined'`
routing to U2M happy path when no clientId is set).

## NF-N4 (medium) — per-field envelope type-guards

`decodeNapiKernelError` previously cast `parsed` to a typed shape
without runtime-validating the 5 optional fields. A kernel bug that
emits `retryable: "true"` (string) instead of `true` (boolean)
would propagate the wrong-typed property to JS callers. Added a
`buildKernelMetadata(parsed: Record<string, unknown>)` helper that
checks `typeof` per-field and discards mis-typed values. New unit
test verifies all 5 wrong-type variants are dropped while a single
correctly-typed field survives.

Also: when the parsed envelope has no validated metadata fields,
the decoder now omits the `kernelMetadata` namespace entirely
(rather than attaching `{}`-shaped noise). Pinned by a new unit
test.

## DEFERRED to Phase 2

- NF-N5 (low — SeaNativeLoader top-level require): per team-lead's
  guidance, defer to Phase 2 (deploy-time visibility issue).
- Language-expert-auth-u2m-2's 1 medium + 6 low.

## Kernel fix consumption note

team-lead's message indicated kernel-author landed the
Error::io() → Error::unauthenticated() fix on `krn-napi-binding` at
commit `a64479a`. My napi binding's path-dep
(`native/sea/Cargo.toml`) points to
`../../../../databricks-sql-kernel-sea-WT/async-public-api`, not
`krn-napi-binding`. As of the round-3 build, `async-public-api`
still has the OLD `Error::io()` at `m2m.rs:270`. So my rebuild
this round picks up the new TS code only — NOT the kernel error-
class fix.

Consequence for the bad-secret e2e: it would STILL fail with
HiveDriverError (not AuthenticationError) on a live run today,
because the kernel envelope on the worktree my path-dep reaches
still carries `code: "Internal"`. The kernel author's fix needs to
land on `async-public-api` (the branch my path-dep tracks), or my
path-dep needs to point at `krn-napi-binding`. Flagging to
team-lead in the reply.

Tests:
- Unit: 85/85 pass (was 79 before this commit: +6 net — added 4
  new cases for NF-N3 routing + NF-N1 collision + NF-N4 type-guard +
  NF-N4 metadata-omitted; flipped 3 existing M2M-rejection cases to
  U2M-rejection-via-id; updated 2 NF-4 metadata tests to read
  through the new namespace).
- TypeScript build: clean.
- Native build: cached (no Rust changes from this commit).

Co-authored-by: Isaac
The pecotesting workspace SP we were targeting (DATABRICKS_PECO_CLIENT_*)
is NOT registered on the warehouse — yields `invalid_client` on token
exchange. The Azure AD SP (DATABRICKS_PECOTESTING_AAD_CLIENT_*) IS
registered on HTTP_PATH2 (warehouse 00adc7b6c00429b8), so flip the e2e
to those creds. Both happy-path (openSession 730ms) and bad-secret
(AuthenticationError 217ms) now pass against the napi-binding kernel
worktree (carries DA-F1 fix a64479a).

Co-authored-by: Isaac
…p envelope sentinel, trim RetryError doc

- NF3-2 (HIGH): when oauthClientId is set and oauthClientSecret is
  blank/reserved, raise AuthenticationError (M2M intent) instead of
  routing to U2M which then raises HiveDriverError. The round-3
  NF-N3 fix over-applied — U2M routing only kicks in when BOTH
  id and secret are blank/absent.
- NF3-3 (MEDIUM): on corrupted-envelope JSON.parse failure, strip
  the internal __databricks_error__: sentinel from the message
  before returning to the caller.
- NF3-6 (LOW): trim RetryError mention from KernelMetadata.errorCode
  doc-comments; no kernel ErrorCode currently maps to RetryError.

Deferred per team-lead disposition: NF3-1 (kernel RequestTokenError
sub-classification — Phase 2 kernel work), NF3-4 (e2e
kernel-error-code assertion — blocked on NF3-1), NF3-5 (path-dep
checksum — resolves when kernel publishes), NF3-7 (looksReal
double-neg — cosmetic), LE3-1..7 (Phase 2 decoder polish).

Co-authored-by: Isaac
…th test, message mutation safety, class-pin simplification

- DA4-1 (HIGH): rewrite buildSeaConnectionOptions function-level
  JSDoc to describe the id-keyed flow selector (round-4 NF3-2 fix);
  the block-comment was updated but the public-API JSDoc was missed.
- DA4-2 (MEDIUM): add test for the SeaAuth.ts:201-210
  defense-in-depth U2M+id rejection branch (zero coverage after
  round-4 flipped the three tests that previously exercised it).
- DA4-3a (MEDIUM): wrap err.message mutation on corrupted-envelope
  path in try/catch; fall back to a fresh HiveDriverError if the
  message descriptor is non-writable (defensive for future napi-rs
  versions; mutation preserves napi-side stack on the common path).
- DA4-3b (MEDIUM): delete redundant constructor.name check from
  class-pin test; instanceof AuthenticationError is sufficient
  because instanceof is a one-way subclass check. Fix the comment
  that incorrectly claimed instanceof couldn't distinguish.
- LE4-1 (MEDIUM): add this.name = 'AuthenticationError' constructor
  to the AuthenticationError class so err.name / err.toString()
  identify the subclass (3 lines; doesn't extend to sibling error
  classes in this PR).
- DA4-4 (LOW): drop "reserved for future RetryError mappings" from
  three SeaErrorMapping.ts doc-comments — no kernel ErrorCode maps
  to RetryError and there's no design doc proposing one.
- LE4-2 (LOW): unify the class-pin test to chai's idiomatic
  .to.throw(Class, /regex/) form, matching the rest of the suite.
- LE4-4 (LOW): one-line comment justifying mutate-vs-clone choice
  on the corrupted-envelope path.

Skipped per disposition: LE4-3 (idIsBlank/secretIsBlank symmetry —
LE-4 own recommended leave-as-is).

Deferred (carries over from round-3): NF3-1 kernel sub-classification
(Phase 2 kernel work), NF3-4 e2e kernel-error-code assertion (blocked
on NF3-1), NF3-5 path-dep SHA pin (resolves on kernel publish),
LE3-1..3 SeaErrorMapping decoder polish (Phase 2 bundle).

Co-authored-by: Isaac
…integration shape

Reconciles the sea-auth-u2m commits onto sea-integration (the linear-
stack target) by combining the post-integration SeaBackend / SeaSession-
Backend / SeaNativeLoader / test shapes with the OAuth M2M+U2M behaviors
introduced across the 5 review rounds on sea-auth-u2m.

Behaviors preserved from sea-auth-u2m:
- decodeNapiKernelError on openSession / executeStatement / close
  (kernelMetadata namespace, sentinel-stripping, AuthenticationError
  raising for kernel-side `Unauthenticated`)
- buildSeaConnectionOptions: id-keyed M2M-vs-U2M selector, blank-or-
  reserved-literal credential rejection, defense-in-depth U2M-with-id
- AuthenticationError this.name constructor (LE4-1)
- discriminated SeaNativeConnectionOptions union (Pat | OAuthM2m |
  OAuthU2m) at the napi-binding seam

Shape kept from sea-integration:
- SeaBackend constructor signature `{ context, nativeBinding? }`
  (DBSQLClient.ts:241 call-site stays compiling)
- SeaSessionBackend as a separate module (was inline in sea-auth-u2m)
- SeaSessionBackendOptions: { connection, context, defaults?, id? }
- SeaSessionBackend session ids via uuidv4() (auth-only counter scheme
  superseded; OAuth tests updated)
- post-integration SeaNativeLoader exports (SeaExecuteOptions,
  SeaArrow{Batch,Schema}, SeaNative{Statement,Connection}) carry through

Test reconciliations:
- new SeaBackend(binding) → new SeaBackend({ nativeBinding: binding })
  across 14 OAuth-test call-sites
- SeaBackendOptions.context made optional (constructor already
  downcasts undefined; runtime callers always supply via DBSQLClient)
- session-id regex from /^sea-session-\d+$/ to UUIDv4
- _helpers/fakeBinding.ts openSession return cast to SeaNativeConnection
- execution.test.ts: the "rejects databricks-oauth (M0 PAT-only)" test
  flips to "rejects unsupported auth modes (non-PAT, non-OAuth)" —
  databricks-oauth is now the U2M happy path
- execution.test.ts: openSession round-trip asserts new authMode:'Pat'
  field on the discriminated union

Skipped commit:
- 37156db (Cargo.toml path-dep flip) became empty after sea-integration's
  napi-source relocation — the native crate is no longer at
  native/sea/Cargo.toml, it's in the kernel workspace.

Verification (in /tmp/dry-run-nodejs):
- tsc --project tsconfig.build.json: clean
- SEA unit subset: 144/144 passing (87 sea-auth-u2m + 57 sea-integration)
- M2M e2e: 2/2 passing (happy-path 652ms + bad-secret AuthenticationError
  233ms)

This is a dry-run-only commit. Do not push or force-push the real
sea-auth-u2m branch from this work; the real branch stays at e9131ae
until owner approves the rebase. Branch:
`dryrun/sea-auth-u2m-on-integration-fresh` lives in /tmp/dry-run-nodejs.

Co-authored-by: Isaac
…rwarding to openSession

The napi binding moved initialCatalog/initialSchema/sessionConfig from
ExecuteOptions onto ConnectionOptions (matching pyo3) because the
kernel does not read StatementSpec::statement_conf — they were
silently no-op'd in the per-statement path. Adapter follows.

- SeaAuth.ts: extend SeaNativeConnectionOptions with optional
  catalog / schema / sessionConf (intersection with each auth-mode
  variant). New SeaSessionDefaults interface for the shared shape.
- SeaBackend.ts::openSession: fold OpenSessionRequest.initialCatalog /
  initialSchema / configuration into the napi options before the
  openSession call. Drop the SeaSessionBackend.defaults forwarding.
- SeaNativeLoader.ts: drop SeaExecuteOptions; executeStatement now
  takes only the SQL.
- SeaSessionBackend.ts: drop SeaSessionDefaults and defaults field;
  drop per-statement overlay logic. useCloudFetch becomes a no-op on
  the SEA path (kernel hardcodes disposition to
  INLINE_OR_EXTERNAL_LINKS; ResultConfig exposure is M1 work).
- tests: replace per-statement-forwarding assertions with
  openSession-arg assertions. 23/23 SEA execution tests pass
  (143/143 across the SEA suite).

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…i binding

Surfaces the kernel's `HttpConfig::pool_max_idle_per_host` knob
through `ConnectionOptions.maxConnections` on the public client
options, forwards it through `buildSeaConnectionOptions` into the
napi `openSession` call. When omitted, the kernel's default (100)
applies.

Mirrors the Python connector's `max_connections` kwarg on the SEA
backend. The NodeJS Thrift backend does NOT currently expose this
knob — it remains a no-op on that path, documented inline. No
break to Thrift parity (Thrift never had it).

Plumbing:
- `IDBSQLClient.ConnectionOptions` gains `maxConnections?: number`
  with a doc note that the Thrift backend ignores it today.
- `SeaSessionDefaults` (already the shared base across auth-mode
  variants in `SeaNativeConnectionOptions`) gains the field so all
  three auth modes (PAT, OAuth M2M, OAuth U2M) round-trip it.
- `buildSeaConnectionOptions` validates positive-integer-ness at
  the JS boundary (rejects 0, negative, non-integer, NaN) so a
  misuse fails here with a clear `HiveDriverError` instead of
  inside the kernel with a cryptic napi error.
- `native/sea/index.d.ts` (the committed copy of the napi-rs
  generated d.ts) gets the `maxConnections?: number` field on
  `ConnectionOptions`, in step with the kernel branch
  `msrathore/krn-max-connections` (commit `0c5ca3f`).

Tests:
- `tests/unit/sea/auth-max-connections.test.ts` — 9 cases covering
  omission, valid values, boundary (1), and rejection of 0,
  negative, fractional, NaN; plus the OAuth M2M / U2M arms.

Cross-PR dependency: stacks on `msrathore/sea-auth-u2m` (current
NodeJS tip) and requires kernel branch `msrathore/krn-max-connections`
(napi binding side of the same change).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Adds an end-to-end test against pecotesting (gated by
`DATABRICKS_PECOTESTING_*` env vars; skipped when absent) that
confirms ARRAY / MAP / STRUCT and nested ARRAY<STRUCT> come back
as **native Arrow** shapes (List / Map / Struct) — not Utf8 JSON
strings.

The kernel's `ResultConfig::complex_types_as_json` defaults to
`false` (Arrow-native), and the SEA wire request hardcodes
`format = ARROW_STREAM`, so this is the existing default. The
test locks the contract: a regression that flips the default
(or that an upstream change wraps the result post-processor in
the JSON pass) would fail this assertion immediately.

Matches the NodeJS Thrift backend's `complexTypesAsArrow=true`
default — see `DBSQLSession.getArrowOptions` where
`useArrowNativeTypes=true` propagates to `complexTypesAsArrow`.
Mirrors the kernel-side e2e at
`tests/v0_execute_e2e.rs::complex_types_as_json_flag_stringifies_complex_columns`
which exercises both the Arrow-native and JSON-string paths.

Decision — no opt-in toggle exposed at the JS layer: neither
Python `use_sea` nor NodeJS Thrift exposes a `complexTypesAsJson`
knob to end users; the kernel's `ResultConfig.complex_types_as_json`
remains internal until a consumer needs it. Adding the toggle now
would invite drift from the kernel; we revisit when a consumer
asks. Matrix row 11 of section 3 stays as "implemented — native
Arrow default matches Thrift behaviour".

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…rict typeId asserts

DA round 1 findings on F1 (max_connections) and F2 (complex types
as Arrow), addressed in a single fixup because both touch the F1
NodeJS branch.

F1 — M1 upper-bound: the JS-layer validation accepted any positive
integer but the napi binding accepts `u32`. A value > 2^32 - 1
would either silently truncate at FFI or throw a cryptic kernel
error. Add an explicit upper-bound check with a clear
`HiveDriverError` message that points at typical pool sizes.
Three new unit tests: accept MAX_U32, reject MAX_U32+1, reject
Number.MAX_SAFE_INTEGER.

F1 — M2 mock test: the existing unit suite verified the
buildSeaConnectionOptions output shape but did not exercise the
end-to-end SeaBackend.connect → openSession → napi.openSession
round-trip. Add two mock-binding tests via the existing
`makeFakeBinding` helper that:
1. Confirm maxConnections is forwarded to the napi openSession call
2. Confirm maxConnections is `undefined` (not 0) when omitted, so
   the napi binding's `Option<u32>` reads None and the kernel
   default of 100 applies.

F2 — H1 skip-gate: the e2e test imported `getSeaNative` at module
top-level, which transitively ran `SeaNativeLoader.ts`'s
module-level `require('../../native/sea/index.js')`. When the
`.node` artifact isn't built (`yarn build:native` not run), that
require throws MODULE_NOT_FOUND BEFORE mocha can invoke `before()`,
crashing test discovery for the whole suite. Fix:
1. Verify the native artifact exists in `before()` and skip with
   a clear "run yarn build:native" message if absent
2. Lazy-require the napi shim via `createRequire(import.meta.url)`
   inside `loadBinding()` so the require lives outside module-load
   path. `createRequire` (vs bare `require`) handles the ESM
   reparse path mocha 11+ takes for ts-node-emitted files
   (MODULE_TYPELESS_PACKAGE_JSON warning).

F2 — M2 strict assertions: prior regex-based assertions
(`/List/i`, `/Map|List/i`, etc.) were too permissive — they would
accept LargeList or FixedSizeList where the kernel must return
plain `List`. Switch to arrow-js `Type` enum comparisons via
`.type.typeId`. Adds:
- Strict typeId equality for c_arr (List), c_map (Map), c_struct
  (Struct), c_arr_struct (List)
- Nested-structural check that c_arr_struct's child is Struct
- Row-count sanity assert (literal SELECT → 1 row)

F2 — live e2e: ran against pecotesting (`DATABRICKS_PECOTESTING_*`)
with the freshly-built napi binary. All assertions passed in 655ms.
The kernel default `complex_types_as_json=false` is verified to
produce native Arrow shapes for ARRAY/MAP/STRUCT and nested
ARRAY<STRUCT> end-to-end.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
mocha 11+'s MODULE_TYPELESS_PACKAGE_JSON reparse-as-ESM path
breaks `import * as fs from 'fs'` / `import * as path from 'path'`:
under ESM, the namespace import doesn't expose the CJS module's
exports as own properties, so `path.resolve(...)` was undefined at
runtime. Switch to named imports (`resolve as resolvePath` aliased
to keep the call site readable). Same fix applied to F4 and F3b
in their respective branches.

Verified live e2e passes against pecotesting after the fix.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
DA round 2 findings on F2 / F4 / F3b H1 had a common root cause
the team-lead correctly flagged: `SeaNativeLoader.ts` does
`const native = require('../../native/sea/index.js')` at module
load, so any import from this module triggers the `.node`
artifact load. When `yarn build:native` hasn't run, that throws
MODULE_NOT_FOUND BEFORE mocha can fire `before()` skip-gates,
crashing test discovery for every e2e suite that imports
anything from this module — including transitive imports via
`DBSQLClient` → `SeaBackend` → `SeaNativeLoader`.

Round 1 dance: every e2e test got a `createRequire` +
filesystem-check workaround. Round 2 fixes the actual defect:
the loader itself memoises the require behind `getSeaNative()`,
so importing the module is free for code paths that never reach
SEA. Filesystem-check guards in the e2e tests stay as defense-
in-depth (they produce a friendlier "run yarn build:native"
diagnostic than napi-rs's raw MODULE_NOT_FOUND).

Verified the fix by moving the .node artifact aside and running
the F2 e2e: previously CRASHED at file load with
MODULE_NOT_FOUND; now SKIPS cleanly via the suite's `before()`
gate with an actionable message. Restored binary, re-ran the
test: passes against pecotesting in 480ms.

Also addresses F2 M1 lint:
- `lib/sea/SeaNativeLoader.ts:131` — added `import/extensions`
  to the eslint-disable comment on the lazy-require, since the
  `.js` extension is required (napi shim is CommonJS, not TS)
- `tests/e2e/sea/complex-types-e2e.test.ts:148` — switched to
  destructuring `const { schema } = table` per prefer-
  destructuring rule

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Adds an end-to-end test against pecotesting (gated by
`DATABRICKS_PECOTESTING_*` env vars; skipped when absent) that
verifies `maxConnections` flows from `ConnectionOptions` through
`buildSeaConnectionOptions` through the napi `openSession` into
the kernel's `HttpConfig::pool_max_idle_per_host`, and that a
session opened with a custom value round-trips a real query.

Combined with the existing unit + mock-binding tests, this proves
end-to-end:
- Unit tests pin the JS-side value validation + napi-shape (mock
  binding records the openSession args).
- This e2e proves the value actually reaches the kernel without
  breaking the wire path against a live warehouse.

Three cases:
1. `maxConnections=1` — minimum bound the JS layer accepts.
2. `maxConnections=200` — large pool size, exercises the high end.
3. Omitted — kernel default (100) regression check.

All three pass against pecotesting (~500ms each). DA round-1 F1
"L2 live" finding addressed.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db
Copy link
Copy Markdown
Contributor Author

Re-review requested — round-3 fixups landed; DA round-3 sign-off received.

Round-3 final tip: c21324193de6fb39fe79019860731d7265a7acf4

This PR carries F1 (maxConnections JS surface) + F2 (lock complex types as Arrow default). Three rounds of DA verification across both features; all findings closed:

F1 (maxConnections):

  • F1-M1 explicit > MAX_U32 upper-bound check in lib/sea/SeaAuth.ts — guards against ECMA-ToUint32 silent truncation
  • F1-M2 mock-binding test verifying napi openSession receives the value (tests/unit/sea/auth-max-connections.test.ts)
  • F1-L2 live e2e against pecotesting (tests/e2e/sea/max-connections-e2e.test.ts), 3/3 passing, independently reproduced by DA in ~1.6s

F2 (Arrow complex types):

  • F2-H1 root-cause SeaNativeLoader lazy-load eliminates the MODULE_NOT_FOUND-on-import defect (the architectural fix DA called "excellent")
  • F2-M1 lint clean (const { schema } = table; destructuring)
  • F2-M2 strict typeId assertions via ArrowType enum equality + nested c_arr_struct.type.children[0].type.typeId === ArrowType.Struct structural check, locks against List<Utf8> regression
  • F2-Live 1/1 passing, independently reproduced by DA at 467ms

Gates: yarn eslint <touched-files> clean; tsc --noEmit no new errors.

Co-authored-by: Isaac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant